 So thanks to Abhinav, it was a great and amazing talk. You were gonna see a lot of what he talked about in this talk too. Right, so a little bit about me. I'm a product engineer at Gojek. I work specifically on the kernel team which works and is specifically caring about what the developer productivity is and the know-hows of how the systems in Gojek have evolved and we care deeply about it, right? I have contributed to an open-source project called Overt which is under the Red Hat umbrella. So if you have needs to have a centralized way of managing your VMs, your networking and storage, inside a data center, Overt is the way to go, right? I love backpacking. I mean, who else here loves traveling? Quick raise of hands. You, you, all right? I love backpacking. If you have anything to discuss about backpacking, catch me outside after this talk, but more of that later. I like cooking. Like, luckily my friends don't complain too much when I cook for them. They just gorge it up. But I'm sure they will complain after my back, like behind my back. A huge Chelsea FC fan. Those of you who watch EPL, right? Well, that's more or less about me. But what does Gojek do, right? A few stats here. Like, we started our office in Bangalore in 2015 and like in three years, we have seen 1100X growth in volumes. We do three million plus orders every day. Products ranging from go food to go pay to go ride. You have a super app which you can use and install your app and go to any Southeast Asian country. Most of them, as of now, but we will eventually move on to other countries too in Southeast Asia. But you can just open your app and do whatever you want or order something or maybe pay for your cab and do whatever you feel like, right? Which we are providing as. So 18 plus products with roughly like 200 plus engineers. But what am I gonna talk about? Definitely not about butterflies always. Although I would love to talk about them. Like they're beautiful and we used to run around them in our gardens back when we were kids. I'm gonna be talking more about how we evolved our infrastructure at Gojek, right? Right from the very nascent stages to how we were in the middle of 2016 to 17 and where we are now and how we are moving forward too, right? Traveling back a little bit in time. As you could see that we had rapid demand in terms of user requests and people getting onboarded. One product launching every month, crazy growth, right? So with these kinds of rapid demands by the product teams and the product managers and such kind of releases where you easily see a huge amount of users getting acquired in one single day or a week, it obviously takes a toll in your infrastructure, right? How do you deal with it, right? Because you've never seen this before. How do you deal with such things, right? One common pattern we see across companies is that we have a central infrastructure team, more of the systems team or the DevOps team, a quick group of fans who work inside the infrastructure of their companies, like quite a few fields and like you would know what I'm talking about, right? So a central infrastructure team, we formed a central infrastructure team which used to handle like a lot of ad hoc requests or things whichever were related to systems like managing VMs, creating databases, creating readers boxes, everything under the sun, right? At the end of the day, we were handling everything in and out for whatever was related to infrastructure for Gojek, right? The intent originally and it's still that that we want to abstract out infrastructure for product teams. We want them to be able to reiterate and deliver their product so that they get to show the features which they want to their customers, right? So that was the original intent and it's still that for now, right? What is the outcome for that? People can relate that you get a lot of ad hoc requests like all the time, people would come onto Slack and then you would basically, they would basically be saying that, oh, hey, I want a Postgres box, can you provision it for me? I want it right now because everything is important, right? They want it immediately and someone from the team would obviously initially say that, oh, hey, okay, give me five minutes and I'll do it for you, right? Simple. It's simple as long as you have a few product teams or a few products which you're supporting, right? But what happens after you grow after a certain number, right? You see that you are effectively doing only requests and ad hoc requests all day long, right? You come at 10, you do requests, you eat lunch and then you come back to your desk, you are doing requests, what's there, right? So we wanted to see how much we are actually spending time in servicing requests from developers, right? So one small quote which I really liked from Galileo is he mentioned that measure what is measurable and make measurable what is not so, right? So we were at a phase where we were not really recording how much time we were spending in servicing requests, like how much time are we actually spending in every request to a product group or what amount of time a particular developer in our infrastructure team is spending in servicing requests, right? So we sort of started out very nascently in terms of like managing, like asking developers to create requests and tag them with the relevant tags, like you can see that you would see what's the status of the staff, like you can see this is a sample request which we get in our ticketing system and then you can see that the status is done, created by, created at, assigned to whom, a particular person, what is the category, is it like you can have labels here like Postgres, DNS, whatever, right? And then which team is it, right? And you can see what are they wanting in this. Now, this obviously takes away a little bit of pain in giving out immediate response in terms of Slack responses because Slack as much as it tries to be asynchronous, but you'd be constantly getting pinked all the time, right? And you would see a bunch of tags somewhere and multiple channels across the Slack group where you are being tagged constantly about service requests, right? So we tried centralizing it, right? So that's the first part of solving that problem. Again, like a different kind of request which people used to get like, oh, my DB is full, right? Attach a disk, what am I gonna do? My DB is almost full and my app is gonna crash. Please help me, right? Now, this didn't actually solve how we were throttling requests. We could see that requests were piling up day in and day out and each day, even if we were solving 2030 tickets, we could see that we were always trying to catch up with tickets, right? And we thought that there must be a better way, right? People don't do this all day. They can't be this way. You can't say that, oh, everything is fine, right? When a developer is like literally doing just tickets, you could see his eyes red and then he's crying about, why did I join this company? I'm doing only tickets. Why should I only do this? I'm here to develop and build tools. This is not normal, right? Now, one possible way to solve this problem immediately is, let's say you have five developers in the infrastructure team. You go on and pitch to people that come join my team, like make it 10, right? And you train them, you onboard them to your tools and then back to the same grind, right? Doing the same tickets day in and day out because it doesn't scale. It's not sustainable. You can obviously see that as your company is scaling up, the number of tickets is also increasing and those five developers which you hired are also doing tickets, right? What's different, right? So it's very hard to do. It's very hard to do, but it's not possible. You can't solve this problem like this, right? Eventually, we noticed that we were becoming the bottlenecks. Product managers were coming down on our heads asking, what are you doing? My product release is pending. It's been pending for the past two weeks. What are you doing, right? And then we would be doing things again in ad hoc nature to push something out in production and then unblock them. So we effectively were becoming the bottleneck which we were not proud of, right? Now, how do you solve this further, right? You can probably give access to some senior member in the product team for your cloud provider or maybe your DC or wherever you are hosting your application set. Well, this obviously solves a bit of the problem when you onboard them and you have good documentation about what the process should be like about doing certain things around your infrastructure. So let's say your product group has a Google Cloud project and you give access to the tech lead of that product group and then he's doing something inside, right? He's unblocking themselves without our intervention, right? But what does this entail, right? Chances of security loopholes. Why do I say that? Because a developer would only care about pushing a product out to production. He would only care about seeing this being used by customers, right? He or she would not care about what is the security back practices that I should follow, what should be the ideal architecture if let's say my load is right now 100K RPM, what should be the architecture so that I can easily handle more than that? Maybe a million of throughput, right? Chances of security loopholes being insane, right? And this also makes me come to a point where you have something called a broken window, right? You see the tech lead doing certain things. The other developers follow suit. They would be like, oh, he's doing this, it's working this way, why should I not do it? And we are not actually enforcing the rules in a valid manner where we are actually monitoring them and what they are doing inside the GCP projects. So it leads to broken windows, right? You see one broken window, you would break the, you would definitely see other broken window paints, right? What do we do then, right? So quick detour, let's forget about all that. Where did systems administration actually start, right? So it all started back in like dingy labs, right in the basements of some colleges in states or like in other countries where people would just sit around and manage their compute instances, right? Where they would be writing scripts or they would be doing things manually, but it all started back from there, right? And that's where the humble beginnings for systems administration started and people evolved from those places to industries, right? Now, let's come to the main point, like, how did we evolve automation at Gojek, right? Well, everybody starts small. First, it's manual and then we evolve to scripts. We move to maybe Ansible scripts, Chef cookbooks. You may take it a step further. You have Rundec, then you have deployment scripts to deploy things, right? Seems very normal, right? Up to this point, like everything is fine and dandy. You have things automated in terms of scripts. So manual errors are reduced to a certain percentage, but not completely, right? Now, what are the problems with these solutions? It really didn't solve everything which we wanted to solve, right? Because there were multiple ways to do and build automation, right? Chef cookbooks, who's maintaining them? Ansible scripts, who's maintaining them, right? And managing dependencies for this automation became a task in itself because people were distributed across cloud, like people were on AWS, people were on GCloud. So we didn't want to become even a bottleneck in that sense, right? So you can imagine our general perspective in that point, right? And since there was a lot of segregation in terms of solutions which we were providing, there was a lack of convention leading to a lot meager contributions from developers because the end goal is to enable the developers to contribute to automation because 10 people contributing to automation doesn't really scale after the point because each and every one would be coming out with different requirements, right? You will never be able to fulfill each and every one of them, right? Coming to tools like Knife and Terraform, if you try giving them out to developers, again, that again opens up a lot of security loopholes, right? Random stray incidents where you would have production outages just because of the fact that we did not have the right tooling which we gave out to them so that we could limit certain restrictions, like make certain restrictions to the access and to certain way in which they were using those tools, right? Which again brings back to the problem and which is that there's no central place for storing the automation. Again, strewn around in multiple places, right? Now, even with all this, the main problem was still not solved. We were still doing tickets, right? So what's the solution for this, right? So we went down to the fundamentals of clearing our fundamental infrastructure depths, like how should our VPCs look like? How should our network look like? How should a Postgres come up if it's supposed to come up? You know, all the configurations like, how should the slave come up? Like, you know, how should it join? Like everything related to scaling something from zero to 100, right? So we cleared a lot of infrastructure depths. It was a time where we were only clearing infrastructure depths along with solving tickets, right? Now, the next phase which we thought was after we did all this, what's next, right? Moving from maintenance mode to innovation mode, right? Maintenance mode being V being solving tickets and us clearing our fundamental infrastructure depths to moving to innovation, right? The goal evolved a little bit, right? So we wanted to make infrastructure as boring as possible for product teams. They should know that infrastructure is there for them and it should just work, right? Nothing shiny, right? Nothing should break, right? They should be able to predict things in their fingertips and they should go to sleep without them getting paged in the middle of the night, right? This was our aim. Now, how do we solve that, right? So this brought like our face in which we started developing our central automation orchestrator called Proctor. If you wanna check it out, it's completely open source that Gojik slash Proctor and GitHub. A little more about the architecture about what is the architecture of Proctor itself. So you can see that it's a simple service which has like an API exposed over, right? And you are basically having a demon running somewhere and then you have other components like a Postgres, like a Redis and another secret store, two Redisys, one metadata store and one secret store. I'll go on and explain what these two are in a little bit and then you can see that there's a client for this demon and then you're talking over the internet to this demon, right? And what is this demon doing? This client is submitting a job to the service and then there's an executor, which can be anything. A Kubernetes in this case and then it executes the job, gives back the request to this main demon and then this demon is actually polling the job and giving back the completed request whether it has failed or not back to the client, right? Now this brings me to the point where you are actually seeing something given out to developers in the form of a tool, the tools interface being the CLI, right? And you're giving them out to the developers, right? And coming back to the metadata store, it's, I'll come to it right after a few slides. So this is how the interface looks like for the demon. It's a simple Golang demon which you can just do a brew install and have it inside your laptop. And these are the various commands which you can give to it, right? The one which I want you to notice is the execute command and then you can see the description here as it being execute a product with given arguments, right? Now take a step back and if you have used a tool like Jenkins or maybe Rundec, imagine it in this sense where you give a job scheduler a task and it executes it for you and then you see something executing and then it completes to execution whether it has failed or it has succeeded is another point but you give something a job. This is exactly that, right? Installation is very simple. We have tried keeping it as simple as possible. We have a simple hell installation where you can just do a hell install and it will be, the demon will be deployed inside your Kubernetes cluster with the bare defaults. You can choose what the values of the latest metadata store and the Postgres are inside if you want to run them as stateful stats or as normal VMs is completely upon you. So the configuration where you would want to set that is in this file, right? Now, how do you automate using Proctor? Now I have talked about how we thought that this would solve our problem but is it really solving our problem, right? So if you remember some slides back I showed you a ticket where people were asking us to attach or increase disks for VMs, right? Like DB is getting full, increase my disk right now. Now, ideal flow would be someone would create a ticket and then it would come to a Slack channel where you have an integration with the ticketing system and then someone sees, someone gets assigned and then someone sits on it, maybe depending on the familiarity of the developer on infrastructure person, they would sit on it and then they would do all the disk attachment stuff and then ideally it should take what roughly 31, 30 minutes, 45 minutes roughly if you take a very best case, best case 10 minutes, right? No, nothing better than that but what if I remove the infrastructure person out of this equation? What if there is no infrastructure person involved? What if you can do it? So you can see by the way of this tool that what are the arguments I'm gonna pass to this particular job which is gonna be increased instant disk size. The arguments that you see here are the mandatory ones being the VM name, which zone is it? What is the size which you're going to attach? Which environment is it? And what is the disk type, right? So it's as simple as running this command in your CLI to increase disk for your VM, right? Now, one problem which we wanted to solve along with this was to have a central place for having automation included for every developer because again, it was strewn around everywhere. Chef scripts, Ansible scripts, random Perl scripts, random Python scripts like people running, doing things manually. So we wanted to solve that too at the same place, right? So the metadata store which you see in this slide would store each and every job, each and every job's metadata in like descriptions like bare minimum defaults. What you would store here in this secret store is what each job, like an increased instant disk job secrets you would store in this. And then the image registry would be the image which you would use to run this particular job, right? So this nothing being a Docker, like a container image which is being passed certain arguments and then it running a certain script, right? Similar to how you run a application inside a container, right? And then you're passing some arguments to it and then it picking it up and then it executing it, right? Makes sense so far? Awesome. So we wanted to solve scripts being added by developers inside the automation orchestrator in a central manner. So people started adding things all by themselves. We just added documentation about how you would run something inside Proctor, how you would add it, what are the bare minimum requirements? And then people started adding all of this, right? Create DNS, create elastic search cluster, things like create HAProxy, create Java app, create RabbitMQ cluster, each and every one of them. We were barely contributing maybe 20% of the scripts and most of them were coming from developers, right? So we opened up and it blew away, right? A quick demo of what it's like to run something like this. Is this visible at the back, right? So while this runs, it's basically running a create RabbitMQ cluster job and then you're basically giving it the bare minimum mandatory arguments like what should be the name of the test, like what should be the name of the cluster, which team does this belong to, which environment am I deploying it to and what is the cluster size, right? And what is the cluster name, right? Now, going back again to what this actually means is we just wanted to track which team, since this opened up the tool to developers, we were opening other things where people were just provisioning each and everything using this tool and infrastructure costs were increasing. So we wanted to see how much each team is spending and tell them, hey, this is your cost, optimized, right? Which environment, this would basically mean which environment this job is gonna run and where is it gonna create this particular cluster, right? And what is actually happening inside the back? Like you can only see that this is abstracted to the point where you are only giving a few words to this demon to create something, right? What's happening in the back? Now, if you think about it, if you wanna do something like this, you would, if you go to the bare minimums, you would go to the cloud provider, you would create a VM, you would maybe go to the documentation of RabbitMQ and like install certain stuff, put some configuration and then maybe it's running in one hour, maybe it's running in two hours depending on your familiarity with RabbitMQ. Now, what this does is it would take a Chef cookbook which is specifically put inside a certain artifact tree where it would pull the Chef cookbook, it would run this Chef cookbook and with the respective arguments, create the RabbitMQ cluster with the best practices which we have placed inside the Chef cookbook and give back a RabbitMQ cluster to you in probably 10 or nine minutes which basically depends on how quickly your script can run and that's more or less. So as this executes, I'll move forward to a few slides, right? So is it profit? Like did we solve our first problem? Like not solving tickets, right? Yes, we did solve our first problem. Tickets were decreasing at an exponential rate. We could see that if we were not able to solve 30, 40 tickets a day, we were seeing barely one or two tickets being not solved at the end of the day. Huge win for us, right? Huge win for us. But what's the next step, right? We have enabled developers to create infrastructure using a central automation orchestrator but what if they want to do something specialized? Like what if they want to make use of something like Terraform for that infrastructure where they are defining each and every bit inside their VPC, right? Now you, if you look back at it, the first way to solve it is having a central infrastructure team where they again become the bottleneck after a point because you are having maybe 40, 50 product groups and then each one coming to you with tickets and trying to solve them. Again, back to square one. We don't want to go there, right? How do you solve that then? The suggested way and the thing, the best, the pattern which I've seen people doing is run Terraform inside CI, right? We tried replicating just that, right? And let's see how it meant for us. But before that, let's say a product group comes to you and they are just going to be onboarded to your platform. I would like to see us as a platform team, I would like to see us as a platform team, so don't mind. But before that, like, how do you even create the G Cloud project? Again, dock-fooding ourselves, right? So you can see that there was again a proctor script which would be able to create a G Cloud project from scratch, right? Imagine the number of things and the number of security best practices which you have to take in count when you are creating a G Cloud project, right? Like IAM rules, connectivity with different GCP projects like VPC pairing, subnetting and whatnot, right? Like imagine, like ideally it used to take us like seven or five or maybe three hours to create a GCP project because it used to involve a lot of custom hand-holding and like manipulating things for certain teams, but we stopped doing that, right? Because not, because at this rate, we would only be limiting ourselves because each and every team would say that, oh, hey, I want this, please do it for me. And it's a very natural outcome because you are the one responsible for the infrastructure, not them, right? So you can see that you could create a Google project with whatever you would like to name it. The folder ID here is the S3 bucket where your state would be stored. So people who have not worked with Terraform, a quick raise of fan, so I'll just give a brief of what Terraform is, right? Just Terraform, so basically, Terraform would be a tool which would be used to handle certain requests to the Cloud provider. So imagine you not doing things manually and delegating it into a tool called Terraform and you have config files where you do a Terraform plan and apply. And at the end, what you get to see is a diff of what your initial state was and what your final state was, right? So I hope that just is enough for now. Maybe we can discuss later if you have, you can catch on above, for sure, right? So what this would, what this command would create is, like, this would create a Git repository inside our source code repository where you would see a certain directory structure, right? So you can see here Cloud DNS and then you can see a few files like backend. I'll come again to what each and everything means here, but I'm just gonna run you through what the Cloud, what the directory structure for this GCP project means. Like, you can see a project directory called project. We have VPCs, multiple VPCs, the core one being where you are having connectivity to our main infrastructure GCP project for access to knife or things like artifact tree where you would want escalated privileges or access, and then IAM, right? We'll talk about it in a bit. And a little bit more sophisticated structure where people would go ahead and modify what they would want to add inside the GCP project would be like VPN gateways, VPN tunnels. So we give access to developers using client VPNs, which we manage all by ourselves, right? Now, using Terraform inside CI is empowering. The reason why I say that is because if someone has used Terraform inside their terminal as an infrastructure guy who has been, or someone who has been given access to Terraform to inside their company, they would know that if you're like Terraform earlier didn't used to come with remote locking. So let's say if you're maintaining, like if you're trying to manipulate the same state by state, I mean, let's say in this part for cloud DNS, you have specified S3 or GCS as the backend to store the state. By state, I mean the state of your current cloud DNS resources inside this GCP project. Let's say you, me and Anubhav are trying to do a Terraform plan or apply at the same time. We might have conflicts and that would be destructive if it's a critical part of your infrastructure, right? So a very, like a very early way to do or solve this problem is to like shout at people, say on the Slack channel, tag certain people, oh, hey, I'm doing Terraform, apply, don't do anything on the state. Now you can obviously see this doesn't scale after like five, six people in the team, right? Like you can't keep like tagging people on Slack saying, oh, hey, you don't do a Terraform, apply right now, I'll kill you, right? No, okay, of course we'll not kill anybody, but you can see it does limiting things to your team, right? And to your agility, right? So how do we solve that? Again, having Terraform inside GitLab does exactly what we wanted, right? So a sample CI file for a GCP project would look something like this. You have an image where it's pulling a certain Terraform image where you have like the bare necessities of the binary for Terraform with pre-baked auto approvals for applying and planning inside this image, and then you have stages called plan and apply, right? What is the plan stage doing? It's basically going to this particular GCP projects directory and going to instances and then to this directory integration end, and what is it doing? It's executing this trip called TF plan, right? And then where is it storing the artifacts? By artifacts, I mean the plans output, like what am I gonna see after I do a plan, right? And the next step obviously would be apply, right? And which would be a manual step, as you can see here. We don't want people destroying infrastructure just like that. Like it should not be auto approved, right? Like what happens if your whole GCP project goes for a bust, right? Imagine everybody coming down and saying, oh, transport is not working. Oh, go write us down, right? You don't want to do that, right? Now, coming back to what this is actually doing here is, you are basically running, you can have multiple plan and apply stages in each and every folder directory. Like imagine having like, you have a GitLab CIML at this root directory and then with each commit inside, since with each commit, you have a plan happening for each and every one of them, right? So you get something like this at the end of the day. So let's say you have your GCP project here, some developer clones it and then he makes certain changes like maybe adding himself to as a user, as an editor inside this GCP project and then what happens is the plan step happens, right? And then you would have a plan step here for IM which does happen and then it stores the state and then you could see the output here. You click on this and then you see what's happening, right? And then you do a manual step of apply. Now let's take a step back of what's happening here. Like you have completely decentralized out of maybe your job, right? Like this is what a lot of people used to do before the advent in our team. Like manually adding files to Terraform and then doing an a plan, doing an apply and then that's more or less what a lot of people we used to get pulled in because this was not there. We hadn't built this yet and it immediately saved up a lot of our time, right? Because we feel collectively that we should not be doing manual steps. That should not be the one thing which you should do as an infrastructure team. I mean, you can't really solve yourself out of the problem but you can keep solving problems at a faster pace where you don't do such things all the time, right? Now, how are we doing this? If you come and see here in this particular folder director structure like how are people actually creating IM rules? So if you go in this main.ts they would actually be referring to Terraform registry modules. Now, we thought that since they don't know what and what and how should like these resources be created we went ahead and created a lot of Terraform modules which can be used for creating DNSs, creating VPCs, creating gateways, creating XYZ, right? So and it kept growing at this. We again, the power which we gave to developers for them to add these modules, enable them to add their own requirements like creating cloud resources like a cloud DNS and all these things were added by them, not us, right? Now, what are the outcome? What is the outcome here? The problem which we saw after a while even after having Procter inside our infrastructure was that teams were still asking us to create certain things like cloud DNS or creating DNS and like create me a tunnel from this project to this project and then we effectively moved in, moved them into a mindset where, hey, you know, you are responsible for your infrastructure. We can consult with you and sit down with you for maybe a couple of hours but this is something which you should be handling, right? And how did we sell it? We told them that the average time for you to come to production is X days instead of three or four or XYZ amount of weeks. This is how we used to sell, right? Imagine a product manager coming to me or our team and asking how much time does it need for my product to get go live, right? Now, what are the OSS alternatives to this? Like you could obviously see that this was a lot of custom patching and like monkey patching on top of something but we had a lot of fun, right? The only thing which we missed was Atlantis, right? Why not Atlantis? I think it's at this very moment very tight to the GitHub infrastructure, GitHub ecosystem and we were completely on GitLab for our CICD as well as our source code repository, hosting needs vent as far as that, right? What is the ideal state of an infrastructure team, right? So I would just like to quote Carla Giezer from the Google Sari book. So if a human operator needs to touch your system during normal operations, you have a bug, right? The definition of normal changes as your system grows. So you growing from 20 VMs to 30 VMs to 1000 VMs to maybe a couple of 1000 VMs, you can't keep doing the same thing, right? You can't keep doing the manual changes, right? You have to look for alternative solutions, right? Now, what are the known caveats on this? Solutions which we provided to our product teams, right? Deletion of infra, right? People used to keep creating and forget about what they were creating, where we used to manually come in and say, oh, hey, your costs have risen XYZ in this month. Please control your infrastructure costs, right? Again, teams forget what they are using. They don't know which services whom like constant ownership of services being changed from one team to the other, right? What are the lessons learned here? Like we did solve quite a few problems where we gave out a centralized orchestration platform for automation to developers. We gave them to manage their own infrastructure in the form of a CI. Now, what are the lessons learned here, right? One biggest thing which we noticed was people used to like prematurely try automating something or the other, right? Which I feel is the biggest problem. The ideal way in which I've seen something being a good kind of automation is you try doing it manually for two or three times and then you maybe think about putting it as a generic solution to your user group. The user group being product teams, right? High service requests. This is something which we personally feel is a smell, like code smells. We personally feel this is a smell for an infrastructure team. High service requests from product teams is a smell. How? Because then you know that your documentation is maybe not up to the mark or your tooling is maybe not necessarily scaling as much as your org grows. Which is why they come to you in the form of tickets or maybe sit down right next to you in your desk or maybe come bug you in the slack handles, right? So we see it as a smell, right? So we try seeing metrics where how many requests are we getting from people? And if it's more than a number, we try sitting down and closing everything down and sit only to see how we can unblock people without our intervention, right? Again, no brainer on this. No big bank changes, please. Like don't think one day you can switch from maybe a postgres, a managed postgres solution to maybe something running inside your VMs. You can't do it on one day. If it's like a high throughput service, please refrain from making big back changes. We try reiterating it as much as possible to our product teams because since now they are the ones managing the infrastructure, but most of the times they are prudent enough to come to us and consult with us that, hey, we are trying to do this, can you help us out, right? We feel that documentation should go hand in hand, period. The reason for that is it would affect your productivity directly. How? Because as your tools grow and if you are lacking in documentation, most of the things which you put out as features are not actually visible to the developers, right? How would they know that you added something? How would they know that you actually enabled them to manage their own infrastructure in GitLab, right? They wouldn't know if you don't document it. So we feel it's paramount for infrastructure guys inside our team to document and make product releases and send out emails to people, right? Whichever way your org is comfortable in sharing these kinds of information, right? Reduce the steps for onboarding to your tooling. Paramount to increasing their developer productivity and as well as reducing the pain for you. How? Because the lesser the time they take to onboard to your tooling, the lesser the time they will be in a confused state and they would be not in a limbo of what to do next, right? And it obviously affects if you are clicking 10 buttons and if you click one, which one is better? One, right? Invisible infrastructure, right? So I'll just quickly go through these few learnings. Invisible infrastructure, like people should be, product teams should be thinking about infrastructure as something boring and not something out of the blue. Oh, wow, this is magic. It should not be magic to them, right? It should be magic to them to a certain extent, but not completely, right? They should be able to see what's going wrong and page the right people. Again, this is something which a lot of people don't have the liberty to have, but product managers inside your infrastructure team is really important because they would think in the way of how a product should evolve, right? Your infrastructure team is a product to whom? Product teams, right? Prioritizing on innovation, of course. A few links and references for you to ponder about of what I talked about. This is where you can find me on Twitter and this is where I blog about sometimes. If you feel like you would want to like read a few bits of what I write about my travel stories and tech, please feel free to ping me on Twitter.