 Hello world. Hello world. Okay. That's apt. Hello, everyone. And really nice to be here. Thanks for staying for the last session. I know there's a lot more coming after that. So before we start off, you know, I'm Karan. And let me just quickly test this. All right. The answer to your first question, yes, this is me. And, you know, I lead the international developer relations team at GitHub. By profession, a developer person, but very much an engineer at heart, which is why I'm here. And my boring, obvious alias on the internet is MV Karan. You can find me everywhere. So over the next, you know, few minutes, what I hope you folks will be able to take away is to get a sneak peek into how we at GitHub use Kubernetes to power our microservices and hopefully have, you know, take away something that could inform your practices as well on how you, you know, work within your organization. All right. So to start off it, the first thing is what is GitHub? Just kidding. You know, I know I don't have to talk about that. You know, most of you might already be using it. So let's not, let's not spend time of that. But one thing that I want to, you know, really in fact say about is the scale of GitHub. All right. I know most of you kind of already have an idea of, all right, you know, GitHub is being used by hundreds of millions of developers and all of that. Fine. But I want to give you a sneak peek, which is what the rest of the talk will be into what goes behind this. Now, we are very, very agile in terms of how we, how we develop, how we release, et cetera. In fact, we do almost 20 plus deployments a day, right to all of the GitHub.com that you folks will be using. And this is really, really massive as well, which, which consumes almost 125,000 build minutes per hour on whatever, you know, is happening for building GitHub.com, which almost approximates to 15,000 CI jobs in an hour. A lot of massive scale and, you know, huge compute also, which is used, which is close to a 150k cores of compute. So very, very massive in terms of how much goes behind building this platform that, you know, folks are using as well. All right. And it's not just the scale of what happens for, you know, building GitHub.com, but it's also what is this, you know, what is, what is the GitHub code base, right? Now, the code base of GitHub, surprisingly, is largely based on a Ruby on Rails monolith with, with, you know, millions to millions lines of code, which thousands of engineers are collaborating on a daily basis to build. However, there is, there is a lot that happens as well. So it's of course not just one code base and there's a lot happening, but there are hundreds of services, which are a part of the GitHub.com platform performing various different things. It could be, you know, a small internal tool to even maybe, you know, a large API that is supporting all of the production workloads, which, which, you know, applications or anyone else could be consuming as well. So, you know, it's, it's, it's kind of a mix between a lot of the microservices that are running as well as, you know, the monolith code base itself. So, of course, you know, one of, one of the things, if you look at it from, how do we run all of these services? Well, the first thing and an easy thing would be like just go spin up a VM for everything, right? That's, that's an easy part, but I wouldn't be here talking about that if that were that easy. So the first thing, of course, is very, very inefficient outside of like, you know, like just spin up a VM for every service. It's of course inefficient, but for a lot of different, you know, well thought about reasons as well. Now if you think about it, the first thing is the planning and capacity management itself becomes such a challenge. You know, hundreds of services, how do you manage the capacity? You know, you kind of have to keep on horizontally scaling or vertically scaling, figuring how to manage all of the VMs for all of these services, which is again a big challenge. It's also inefficient because if you say that, all right, fine, you know, you don't have to use a VM, but you know, let's, let's use Kubernetes and let each of the service owners do that. Again, a challenge, because if you're saying that, okay, you know, hey, you know, service owner, whoever is doing it, go run your own Kubernetes cluster, it also means that that service team need to have an expertise in managing that cluster, right? So should, should they manage on operating the cluster or should they manage on the application? Again, a challenge. Also a challenge because you'll get very less central visibility. Of course, you might get some idea about performance and things like that, but what about the other aspects of observability, which might not be there? Also, enforcing and then, you know, really tracking a lot of the security and compliance becomes, becomes a lot of challenge. And of course, this is, this is something which is, you know, very much, very much known and I'm, you know, sure that's something that, you know, your folks use as well. And that is where, you know, a lot of our paved path comes into play. Now, you might have heard paved path or various different terminologies that other organizations use for similar thing. And the paved path at GitHub is largely an ecosystem of tools. So it's, it's a comprehensive suite, which, which, you know, we have created to make it very easy for engineers at GitHub to really focus a lot on their service and their application rather than worry about, you know, where is it running? What's the infrastructure and all of those things. All right. So clear path. Now just let's, let's double click into it. You know, seeing paved path, of course, is easy, but what goes into it? What is GitHub's paved path? Now you might be thinking gates. Yes. However, it also covers a lot of the other aspects, you know, of just creating and also deploying, scaling and debugging the entire life cycle. So tools, practices, conventions that covers all of this life cycle. And again, it's not just Kubernetes, but also an ecosystem of tools, which of course includes Kubernetes and Docker, but also other things, infrastructure tools like your load balancers, a lot of other custom operators and tools, custom apps as well, which go into play. And, you know, we typically use this paved path for multiple things, as simple things like web apps or computation pipelines or even running a lot of the batch processors, et cetera. Again, keep in mind that we run a lot of varying workloads, internal tools to even API production workloads. And of course, our Kubernetes footprint runs in a multi cluster and a multi-region topology. So whenever, usually when I'm going to talk about deployments in a couple of minutes, we usually do that across clusters, across regions as well. So how does this help us? I mean, I'm not going to talk about, of course, benefits of Kubernetes, but benefits of the paved path being based on Kubernetes. Now, the first thing is, of course, easy to manage capacity centrally because you don't have to really worry about capacity for each virtual machine. You're just thinking about capacity from a node's perspective. And we have to just manage capacity for the Kubernetes nodes. Again, we can, of course, scale very rapidly as well due to centralized planning. Really look at what the workloads are running, how much capacity is needed. Again, scale rapidly as easy as adding more nodes. And also, a lot of the native features of Kubernetes can be used for getting insights into the app and the deployment performance itself, which is again difficult if you're thinking about it from a VM perspective. And also, managing all of the configurations and the deployments in a central control plane without having to, across all of the services without having to worry about, okay, this is the dashboard for this service and how do you manage capacity for this and all of those things. Of course, very, very helpful and successful as well. Now, this is one of the examples I want to take throughout the rest of the session. Now, we have our custom open graph image service, which generates all of this open graph images and social cards that you would have seen whenever there is a repo being shared or whenever there is a pull request like this as being shared, 122K, okay, that's a fancy number pull request I pulled out. Or an issue or a comment or whatever else it is. So, it's a custom open graph image service that we have written. It's a very, very simple what you could expect, a Node.js app which pulls whatever is the data from the public GitHub GraphQL API. Does some more HTML, CSS templating and yes, it does a quick screenshot of that HTML, CSS and then serves this image. A very, very simple example if you want to know more about this. There's also a blog post on GitHub.blog. So, let's take this as an example and I'll walk you through of what happens behind the scenes of say running this on our Kubernetes footprint on our paved path. However, before that, I want to introduce a Rockstar. No, not something that you think. I know it would be very cool if a celebrity just jumps up over here saying, let's have a party, but unfortunately not, but as equal to it is this open source project called Hubot.io slash Hubot, which is an open source framework to build chatbots. So, it was initially authored by us at GitHub and then made open source. So, this is a chatbot that we use internally as well as heavily, heavily depend on for a lot of operational tasks as well as just managing a lot of things. So, in fact, we call Hubot our sidekick because it helps us with a lot. So, this as a reference, so how would it look like if whatever was the example of the open graph image service that I mentioned? Just imagine that there is probably no code in the repo, all right? Or there is some code. How do we bring that on to the infrastructure of the paved path? So, we do something like this. We run a chat ops command, Hubot, and then the specific service which is handling all of this thing, let's say it's called GHI platform and then one other argument saying that this is the app which I want to scaffold and the name of the service, OG image, all right? Now, what this chat is going to do is it will work with a custom GitHub app that is installed on the OG image repo and generate a pool request with some scaffolding, all right? Now, the scaffolding is the key because it helps a lot of the engineers get started. What does the scaffolding consist of? This is where a lot of the things come into play. Now, the scaffolding consists of a few things. One, it has a very simple deployment.yaml with deployment environments. This is not the Kubernetes deployment object. So, here in this YAML file, we just define what are the environments that your service is going to operate in. It could be like your staging, your production, or it could be some other environment in between, et cetera, because these deployment environments become eventually namespaces within Kubernetes. So, whatever is the name of the service, like OG-image, and if the name of the environment is called staging, there's going to be a namespace called OG-image-staging, et cetera. So, that's going to be a key thing that will be scaffolded. Then, of course, there is a basic deployment and service object that is scaffolded and created in the pool request, which, again, just runs a deployment with a couple of replicas and just a service object of type load balancer. And then, also, just to get the service owner started off and start customizing, there is also a very, very trivial web server which is put into a Debian Docker file. Well, I think calling it as a web server might be a lot. It probably just doesn't echo or something, but something to get started to deploy and test. And, of course, whatever are the necessary CI that's needed, which are configured as GitHub checks, so that whenever an engineer does a push or any other such kind of activity, there is a build and push of the Docker images that happens. So, whenever... Now, just imagine that the OG-image has been scaffolded. We have the environments over here, say, staging and production. We have the K8 manifest with the deployment and the service object, a very trivial Docker file that's there and all of this configured. Now, and believe me when I say that once all this scaffolding is done, an engineer is ready to just go ahead and then deploy this because it just runs a trivial web server. You can always test how everything works. So, how do we go about deploying? Very similarly, we use again Hubot to do a deploy. So, the command or the service that we use is called deploy. And then there is OG-image slash bug fixes to staging. Now, if you notice this very well, there probably is a question in your mind saying, why are they deploying a branch and not main or a pull request? It's a whole different topic, but in a nutshell, at GitHub, we do branch deployments. We don't do main deployments. So, whenever there is a pull request that is created and it's all approved and everything is passed, we directly deploy that branch, beat staging or even to production. The way it helps us is because it keeps main very, very stable. So, in this case, if something goes wrong, the easiest thing to roll back is just redeploy main. That's all. Because main is stable and if this doesn't go well or if this kind of keeps iterating, merge and go ahead. So, again, that's a whole different topic, but it's one of the conventions that has helped us a lot. So, you know, say we're going to deploy this by doing a deploy OG-image slash bug fixes, which is the branch name, and then we say deploy this to the staging environment. All right? So, this is how someone would go ahead with deployment. But what exactly is going to happen when, you know, this deploy kicks off? All right? Let's look at that. So, say, you know, this is the GitHub engineer and here it says merges PR, but it could be a push or any other activity on OG-image and bug fixes as a branch. All right? Now, the first thing that's going to happen, again, remember, this is all a part of our pave parts. So, it's something that, you know, not every service owner has to configure that this is how it has to work. We've built it to make it a lot more easier. So, as soon as this is done, immediately there is, you know, a build and a push that gets kicked off. Even it's for just a push. So, if a developer, if an engineer is on a pull request and he just does one other commit, there is going to be one other build and push that will happen. So, whatever is the Docker image which is there within that branch builds and pushes and it goes into our internal artifact registry. And, of course, it's going to run all of the CI checks as well, whatever has been configured, et cetera. Now, once the CI checks and all are complete, that is when, you know, on the pull request, the engineer can then say, okay, you know, everything is ready. So, I am ready to go ahead and deploy. And that's when they go ahead and then say, okay, I'm going to run this chat-op command for deploying which is Hubot deploy, OG image bug fixes to staging. And that is something which gets picked up by the Hubot chat-op service which we have an instance of that running internally. And this is what goes ahead and triggers the build and the deploy. All right? Fairly simple, fairly easy on whatever happens till now. However, a lot of our magic working with Kubernetes and also the paved path is what comes after this within the build systems. So, our build systems and which are a part of the paved path that we have configured goes ahead, fetches all of the manifest files which is there within that specific branch. And then, you know, fetches the latest Docker image again from the artifact registry and also fetches all of the application secrets from a central secret store and then does whatever is the necessary, hydration and expansion of the manifest files, whatever is needed, runs some more custom tools and custom operators, et cetera, and prepares the final deployable Kubernetes manifest. All right? And that is when, if all of the build goes well, it will go ahead and automatically run and apply to the relevant Kubernetes clusters and within those specific namespaces. So, here you can see an example if it's a staging and, you know, a production environments, two separate namespaces, and if you do kind of, you know, a deploy to a specific environment, that is where the deployment will be happening. So, again, you know, our engineers are given a lot more control. Say, for example, there are, you know, many clusters across multiple regions. However, you know, say, I want my service to run on a single cluster, have just, you know, one pod, maybe, because it's not built to handle a lot of the, you know, a lot of parallel processing, per se, and there could probably be things like race conditions or so. So, you know, it's just a kind of matter of saying that, okay, you know, deploy this to a different cluster or a specific cluster or kind of configure. So, looking at it from, you know, a GitHub engineer perspective, the way that we have built our pave path is they have to only focus on whatever is the minimal deployment object configuration that's needed, whatever is the service object configuration that's needed, what are your environments? As simple as that, and go ahead and put whatever are the secrets that, you know, you are using within our central secret store, and you're good to go. At the end of this step is when, you know, an engineer can actually go ahead and have a running application, which is already. So, this is made very, very easy because of building a whole lot of things on top of Kubernetes. Be it, you know, the extensive use of things like annotations, be it, you know, going ahead and rehydrating a whole lot of things within the manifest within during the build phase, et cetera. So, you know, this is how things look like for an engineer who is kind of running like a deployment. Now, I know there's been a lot of questions that came in around multiple things, which is why we're going to talk about the security aspects as well. How secure is this, you know, or how is this all going to pan out from a security perspective as well, all right? So, there are a lot of things, and why this matters is we look at it not just from a security of, okay, whatever is this open graph image service or other things which is running, that has to be secure. Of course, that has to be secure. The service has to be secure. The paved path has to be secure. The clusters have to be secure. The underlying infrastructure has to be secure. The code also has to be secure, right? So, we kind of have a lot of practices which helps enforce the security throughout, and I'm going to, you know, talk about and share about some of them as well. So, one of the key things that we do is we have pre-built docker images that are to be used as base images, all right? So, very, very unlikely that one of the service which is running just uses like an upstream Ubuntu or an upstream Node.js image. So, we have some pre-built docker images which, you know, have only absolutely essential necessary packages, et cetera, and even if there is an installed set of binaries, making sure that they are auditable, et cetera. So, that way, it's something that, you know, is vetted and something that if I have to spin up a new service or an app, it's very easy to just use this base image because updates are being done to the base image. The second thing is, of course, there is a build time and also a periodic scanning of packages and all the container images for vulnerability. So, we do this again using some of our own GitHub security features like Dependabot by looking at, you know, all of the packages and the dependency manifest that are there, package.json or gem files or, you know, Composer, log files or whatever else it is. Looking at that, seeing if there are any vulnerabilities, raising those Dependabot alerts and making sure that, you know, it's something which is again addressed by the service owner. And these are, of course, what I'm talking about are, you know, runtime artifacts. But it's not just the runtime artifacts, but also the code themselves within the repository that are scanned periodically, again, during build time as well. So, for example, one of the CI checks will be configured to do static analysis so that if there is already, you know, a vulnerability that you're introducing, you just can't go ahead and merge or deploy that. There is, again, also a lot of and multiple authentication and authorization checks that are added if someone has to directly access the Kubernetes resources. Now, why I say this is because if you've seen in the past, you know, a few slides, an engineer just has to go and then, you know, do a scaffold which generates a PR. They don't need the underlying Kubernetes resources. They do a deploy using chat ops. Again, that also doesn't need access to underlying Kubernetes resources, all right? If you need access for some reason, maybe for debugging or, you know, maybe you're running a job and then, you know, it's going rogue. You want to take a look at the logs, you know, directly within the resource and things like that. There are a lot of authentication and authorization checks to make sure the relevant individuals are able to access the relevant resources as well directly, be it whatever the deployment or any other objects. There is, you know, again, also by default, all of the services, the hundreds of services or anything new are accessible only within the internal networks by default. So these are not by default exposed to the public internet. So very, very, or it doesn't happen where someone, you know, just creates a new service and while I, it's out in the public, it's only available default within the internal networks and there is, of course, more configuration, change the kind of load balancer and all of those things if it has to be accessed internally. However, it's the same pave path, the same workloads and the same clusters which support the internal tools as well as the external networks and most often it is the same clusters which are having, you know, similar workloads running together as well. There is also a lot of branch protection policies that are enforced on all of the production repose at GitHub. So an example which I gave right now is if a pull request is, you know, is probably failing a CI check or is failing one of the static analysis tests by, because it's probably introduces a vulnerable package, the branch protection policy doesn't allow you to, you know, proceed or merge at all. And you might be thinking, well, you know, but you guys do a branch deployment even in that case. When you kind of tell Hubot to do a deploy, it will say that, sorry, please pass the CI check and then we'll look into it. So a lot of those protection policies are also in place. There's a lot of comprehensive telemetry for threat detection as well for the services to make sure that, you know, there is, there is, you know, no other threats or any other intrusions that are coming into play either for the underlying services or for the platform themselves. And the last thing, and this is a really nice thing is we have a centralized secret store to manage all of the application and service secrets. So the way that this is configured is that each service for its own environment has one secret store or, you know, probably, you know, like one secret storage, I think that's the right word. So it's a centralized secret store but OG-image-staging will have a different store compared to OG-image-production. All right. So that ensures that, you know, secrets between environments again don't get leaked or mismatched, wherein it might happen that, you know, you're running a, you know, a delete star, you know, whatever it is. And then suddenly you realize that, oh, damn, it hit my production environment. It's always, you know, different. So again, a lot of things that we take into play while building the pave path and also while making it open to the engineers who are there. And a lot of this stuff, and you might not be surprised with it, happens by using GitHub itself. So we use things like pull request, like I mentioned, like brand protection policies, like dependabot, like actions, like code spaces and everything that's there to build the GitHub platform itself. Because that helps us firsthand understand how the products are working and also make it a lot more better, not just for us, but also for the rest of the users. This was a very, very small part that I covered on how we kind of leverage Kubernetes as the base layer of our pave path, which today runs a small internal tool that's accessible to employees or even the large workloads that you might not even know is running on this pave path behind a Kubernetes cluster. There is a lot more, you know, that is there around how we manage our pull requests, how we manage, you know, deployment pipelines, how we manage our database clusters, and a lot more behind GitHub. For that, feel free to just head over to the GitHub blog. You can scan that or just head over to GitHub blog, and you will have, you know, more information. So that was about it. Thank you, everyone. Dot on the timer. So if there is time for questions, happy to. Otherwise, feel free to reach me on MV Current anywhere on the internet. Looks like, I don't know if we have time for questions, but there's one gentleman here. So the first question that I had was like, with the implementation of HuBot, you guys largely ditched the concept of CD, right? It's not continuous deployment, it's somebody manually saying that, hey, go to this environment. It doesn't even need, like, there's an environment promotion pattern. You're against saying to HuBot, go to which environment. So what was the philosophy behind that? What made you go to this process instead of a traditional CD environment promotion model? Okay. Good question. A very quick answer to that is what I showcased was just an example of how you can manually do. So in what I mentioned, there is also one other, you know, article on the GitHub blog, which talks about our deployment pipelines. Now, for example, we use merge queues on GitHub. Say, for example, the GitHub main repo itself is very busy with a lot of engineers wanting to deploy. So we have, you know, merge queues, which is, again, a native feature on GitHub, where, you know, an engineer can add that pull request to the merge queue and then run a deployment pipeline. So how the deployment pipelines work is there on a blog post, but in a nutshell, we follow that practice as well and give the option to the service owner or the engineer how they want to configure the service. You just have to add one more configuration saying that, okay, this is what my pipeline looks like. First, you know, do one, you know, deployment to probably canary, change the percentage. And then after that, go ahead and then do a deployment to staging, change the percentage, then change canary, then change production or whatever it is, we kind of make that also available through configs code so that someone can say that, I don't want to use that, or someone can say that, I absolutely want to use that and deploy it. So both the paradigms are available. We are not ditching the CD concept and the pipelines concept or environment promotion. But again, we are not saying that, you know, you have to do a manual deployment. So we are not opinionated within that, within our ecosystem and provide both options for engineers on how over they want to use. Thank you so much. Anybody else had a question? Hi, this is related to similar to deployment. It's like, so you want and all like you are deploying, but do you also support reconciliation of the deployment? Like once it is done, similar to the way Argo CD does kind of thing. If there are sorry, I think there's something with the speakers. Can you just repeat again? Hello. Yeah, my question is basically, do you also support reconciliation of the deployment? Once the deployment is done, like if there is any change in the production happening, it will again push the same changes which are like similar to the way Argo CD works. Yeah. Is that supported as part of your like deployment mechanisms or how do you manage that? Sorry, I didn't get the last part. I think there is something with these speakers. So I'm not able to understand the last part which you mentioned. After Argo CD, you mentioned something similar to Argo CD, not Argo CD. Similar to Argo CD, the reconciliation mechanism of configuration deployment that you do to the production. Your question being, sorry, I am not able to get that. Is your deployment mechanism similar to the way Argo CD works? That is my question. Okay, got it. So it's not exactly similar. However, the mechanism again, like how I mentioned, is something that we leave it up to the service owners. So like how I mentioned, for example, what I haven't mentioned over here is we have a concept called as deployment pipelines where we provide a lot more control. Say for example, if someone wants to just do a canary deploy and then add a manual gate saying that, hey, just wait for 30 minutes. I'm just going to watch all my logs and everything else to make sure it's fine. That's something that they can do. Or if they want to say that, okay, I want to deploy everything altogether, they can do that. So the concept largely is to provide as much flexibility and power to the engineers and service owners as possible and not really to kind of emulate how Argo CD or any of the other CD tools works. And it's true even for say, for example, how the images are built and things like that as well. For example, again, in a Docker file, again, this is a bit unrelated, but just mentioning that it's not something that is very opinionated. You could choose to do a multi-stage build. You could choose to in a build and push. You could say that I don't want to deploy this image. I can just store it in my artifact registry and do anything else, et cetera as well. So it's a very similar principle right from how you want to create, how you want to scaffold, how you want to deploy, what you want to test, what you want to be a part of the CI ecosystem as well. The objective is to provide a path for the problem. Hey, I want to do a multi-stage Docker builds. Yes, you can do this. Hey, I don't want to use pipelines at all. I just want to randomly keep deploying. Yes. Hey, I want to actually have a review environment with the canary. Yes. So providing that option is a part of the concept of providing the paved path. I don't know if I answered your question completely, but happy to catch up offline on that. Thank you.