 Hello everyone and welcome to the hitchhiker's guide to evaluating dependency updates to Kubernetes. Wow, that was quite a mouthful. I hope you all have been having an amazing conference so far. This session is going to be about the dependencies of co-projects. What are they? Why is it important to keep track of them? And how we use a tool called AppStart to do that in the upstream Kubernetes project. But let me introduce myself before we begin. I am Arash Sharma and I work as a member of technical staff at VMware. I'm also part of the current Kubernetes 1.23 release team. Other than that, I help out the project in other areas like SEGA architecture, docs, testing, and with onboarding new contributors to the project. With that out of the way, let's get started. Before we begin, I want to give a brief overview of what I'll be covering in the talk. So you have an idea of what to expect. I want to keep this session very beginner friendly. So without assuming any prior knowledge, we'll start by a brief introduction to what dependencies even are. How Go handles them and why you should even care about your project dependencies in the first place. Then I'll introduce you to Depstart, which is the command line tool we use in the upstream project to analyze dependencies. I'll first show you all the sub commands it offers. And then in the end, we look at how exactly we use it for the Kubernetes repository. Fair warning though, this talk is gift heavy and has a couple of hitchhikers guide to Galaxy references. So first things first, what exactly are dependencies? Well, dependencies put simply are external packages which your code uses. These external packages are distributed as modules. As per the definition of a module in Go, it is nothing but a directory containing a collection of nested and related Go packages with a Go.Mod file at its root. If you aren't familiar with what a Go.Mod file is, don't worry, I'll be covering that in the next few slides. For example, here you can see that to make HTTP requests, we are using this very common module called Julian Schmidt HTTP router, which then ends up being a dependency of our project. If you look closely, there are other packages and modules that we are importing to, but they are internal to go so we don't consider them as dependencies of our project. By default, if you create a main.go file and start writing code in it, you won't get the support of Go's dependency management tools. You will need to put your code in its own module in order to track and manage the dependencies you add. This can be done by running the Go.Mod init command as shown here. A fun fact is that now since your code is present in its own module, others can import it if they have some use of it in their code and then their project will have a dependency on your module. Once you put your code in its own module, you'll see that a Go.Mod file appears in your project directory. A Go.Mod file describes the module's properties, including its dependencies on other modules and on versions of Go. So when you add dependencies, Go tools also create a Go.Sum file that contains the checksums of all the modules you depend on. Go uses this to verify the integrity of the downloaded module files. Please note that the Go.Sum file is auto-generated based on your Go.Mod file and you should never need to edit it manually. To keep your managed dependency set tidy, you can use the Go.ModTidy command. Using the set of packages imported in your code, this command edits your Go.Mod file to add modules that are necessary but missing. It also removes unused modules that don't provide any relevant packages. For example, let's say in your code, you stopped using example.com, this module. Then if you run Go.ModTidy, it would remove that from the Go.Mod file. Lastly, what Go.ModTidy will also do is regenerate the Go.Sum file based on the updated Go.Mod file. So now that you know what dependencies are and how Go handles them, let's talk a bit about why you should even care about managing and keeping track of your project dependencies. Well, the thing is that sooner or later, you would have to update the dependencies of your project. Now this might be because you want the changes in the latest release of that dependency. But let's say even if you're satisfied with the current features, you might have to update the dependencies because a security vulnerability was found in the older release, which got fixed in the newer one. And updating dependencies with it brings with it, it's a whole set of new headaches. You'll have to make sure that it doesn't break the current code and that it is compatible with the existing versions of other dependencies you're using in your project. So I think when it comes to dependencies, it is safe to say the lesser the better. Now, please don't mistake this as me suggesting that you should try implementing the functionality of each external package on your own. No, no. The reason I say that is because lesser dependencies mean you'll have to keep track of fewer releases for your project dependencies and would have a much easier time updating those dependencies. Now, all of this to be honest may seem a little trivial for a small project and maybe you could get away with not caring about the dependencies of your project at all. But when a project grows to the size of Kubernetes, all of this becomes very important. Updating dependencies could often mean breaking stuff. Skipping a crucial dependency update could mean exposing a lot of users to a security risk. So long story short, the simple dependency chains are the better. Being particular about your project's dependencies right from the start and tracking them is extremely helpful and I cannot say this mode extremely helpful in the long run. If you're not particular about them from the start, you might find yourself in a situation similar to that of Marvin the paranoid outright. So before we went out looking for a solution to our problems, it was important for us to identify what exactly did we even want, right? We knew we needed something to analyze dependencies but what should this thing do? The biggest problem we wanted to solve was that the Kubernetes repository was receiving so many pull requests and it was getting tough to notice which of these were changing dependencies. Not only that, but more importantly, how were these PRs which changed the dependencies and what was the impact of these changes? We also wanted there to be a way using which PR authors can themselves see the impact of the dependency changes they are making without one of the maintainers having to go and ping them. And it was to solve all these problems that we came up with DevStat. DevStat is a command line tool for analyzing dependencies of co-modules enabled projects. You can install it by running go install github.com slash Kubernetes slicks slash DevStat at the latest or by grabbing the latest binaries from the repository. It provides us with four sub commands, each of which we'll now be looking at in detail in the following slides. The first and the most important thing from the point of view of upstream Kubernetes DevStat provides us with is the stats sub command. Running DevStat stats on your project directory would give an output which would look something like this. It would show you the number of direct dependencies, transitive dependencies, total dependencies and max depth of dependencies. Let us go over what each of these means. Direct dependencies as the name suggests are dependencies which are used by us in our project directly. Now what does using directly mean? If we go back to the previous code, we see here that we import Julia Schmidt HTTP router and then in the first line under the main function we use it. Now if this module internally uses some other module to do what it does, we don't care about that even though that is technically essential for our code to work properly. So in this case, Julia Schmidt HTTP router would be a direct dependency of our project and whatever external module if any Julia Schmidt HTTP router is using internally to do what it does, that would end up being a transitive dependency of our project. So to sum it all up, transitive dependencies are nothing but the dependencies which are further needed by the direct dependencies of our module. Another way to put that would be that they are basically the direct dependencies of the direct dependencies of our module. The next thing in the output is the total number of dependencies of our project which is pretty self-explanatory apart from one slight caveat. Even though here in this example you see that the sum of direct dependencies and total dependencies is equal to and transitive dependencies is equal to the total dependencies. It doesn't necessarily have to be so. Why is that? Simply because a dependency can be both a direct as well as transitive dependency. This is best explained with the following example. Let us say our module depends on this module called woof which internally uses this other module called meow. But our module also directly uses meow. So in this case, meow is both a direct as well as a transitive dependency. So if you see in this example, the number of direct dependencies is two, which is woof and meow. The number of transitive dependencies is just one, which is meow. But the total number of dependencies is not two plus one rather it is two woof and meow. So this is the reason that it is not always necessary for the total dependencies to be equal to the sum of the direct and transitive dependencies. The final thing in the output is the max depth of dependencies, which is nothing but the length of the longest dependency chain. Going back to the previous slide, we see that there are two dependency chains here. One is from our module to woof to meow, which has a length of three. And the other is from our module to meow, which has a length of two. So in this case, the max depth of dependencies would be three. You can also run the start sub command with the verbose mode on to see the actual chain whose length is calculated by max depth of dependencies. The next sub command that depth stat offers us is the graph sub command. This generates a graph dot dot file which can be used by graph is this dot command to visualize the dependencies of a project. Graph is for those of you who are not familiar is another command line tool which provides a way of representing structural information as diagrams of graphs. So if you look here, you can run depth stat graph and it will inform you that it has created a graph dot dot file. It is this file that graph will graph is will use to generate the actual graph running the graph is command to generate the graph. You would see a DAG dot SVG file appear in your directory. If you open this file, you should see something which looks like this highlighted in yellow will be your main project module. And from that you can visualize the direct dependencies coming out and from those you will see the transitive dependencies originating further. Now this graph is produced by a simple go modules project I had created for this talk. But I also want to show you the beauty which gets generated when we create a graph out of all the dependencies of the Kubernetes repository. Beautiful, isn't it? Well, this should give you an idea of the complex dependency chains in a project as large as Kubernetes and should probably also tell you why we needed a way to analyze these dependencies. They were very clearly getting out of hand. The graph sub command also comes with a useful flag which lets you specify a particular dependency whose change you want to be highlighted. Let's say you only wanted to see the chains which have get up dot com slash ks slash text in them. Then you can run the command shown and you would see an output similar to what you see right now. The third sub command that depth start provides us with is the cycle sub command. What this does is show all the cycles present in the dependencies of the project. So here for the simple project I've been using till now. You'll see that the cycles in dependencies are due to Xnet depending on crypto, which further depends on Xnet. For Kubernetes, there are many more cycles which are also much more complicated and longer. And this isn't even like the complete output of the command. The final sub command that depth start provides us with is a very simple one, which is depth start lists. All this does is simply print a sorted list of all the project dependencies. So now that you know what depth start does, let me go over how we use it in the upstream Kubernetes project. Depth start runs is part of two proud jobs for the Kubernetes slash Kubernetes repository. What is proud you might ask? Proud is a Kubernetes based CI CD system. Proud jobs can be triggered by various types of events and report their statuses to different services. To put it in very simple terms and in the context of this talk, proud is basically responsible for running certain tests on PRs that are made. It can also run these tests on the master branch of a repository. For depth start, we have two proud jobs. One is a periodic job which runs once every six hours on the master branch of KK. KK is just short for Kubernetes slash Kubernetes. So this job would produce an output which would look similar to this once every six hours. We also have a pre-submit proud job which runs automatically on pull requests which change the go dot mod file, go dot sum file, or any other files in the vendor directly, which is where the stuff related to dependencies is present for the KK repository. This can also be manually triggered on PRs by commenting forward slash test check dependency stats. All thanks to proud. What this job does is run depth start on the code which is present in the pull request and then print its difference with the output of running depth start on the master branch of the Kubernetes repository. This way we get to know what all will change in terms of dependencies for the project if we merge that particular PR. So if your PR changes dependencies, proud would catch that and run the check depth stats job, which would give an output similar to the one you see on the right. Here for this PR, the number of direct dependencies were being changed by one, which is what depth start reported. Another way the periodic job helps us is it allows us to give a track of how the project dependencies are changing over time. Not only that, it is also helpful for us to see how long depth start took to produce the output and that gives us a way to measure how complicated the chains are getting because the longer it takes, the more complicated the chains have gotten. So here is another example of a PR where a total of 16 dependencies were removed, which is a huge number and goes a long way in improving the dependency chains of the project. So this was it from my side and thank you so, so much for attending. If you have any questions related to the project or how you can use it or try it or if you want to contribute to it, please feel free to reach out to me on Twitter or drop a mail. Once again, thank you so much for attending and I hope you learned something new from this session.