 Hello, thank you everyone for coming today. Today we're going to talk about keeping helm reliable and usable. And when we were putting this together, we realized there's one more thing we wanted to throw in there, and that's stable. So welcome to keeping helm reliable, stable, and usable. Hi, I'm Matt Farina. I'm one of the maintainers. And let me pass it off to these others to introduce themselves. And hi, I'm George. I work at Bloomberg. I'm one of the managed retainers as well. Hi, I'm Karina Angel, and I work at Red Hat. And I'm Ian Zink, and I'm a software engineer, solutions architect, developer advocate. And to start, I'm going to quickly first define what we mean by these terms. So I like definitions. They help us be on the same page. They can highlight characteristics that might not be immediately what we might not immediately think of. So here are two definitions of reliable that we think are useful. Consider these definitions as related to helm, both the CLI running for seconds and the SDK, which is used in long running processes, like a controller. If you think about reliability, you need to think about the system and the software around helm. For example, if helm raises an error because it can't download a chart or communicate with a Kubernetes cluster, is the problem due to helm or something else like a network problem? For helm to be reliable, the eras need to be outside of helm, even if helm is the one reporting them. We want to focus on this reliability aspect that helm should not cause errors within its control. And there are perhaps two themes regarding stability. First, there is the purpose of helm as a project. Helm is no longer a startup project. Knowing its purpose for both you and us helps us know how to use and develop helm. And users want to know that helm. What do you want to know? Oh, OK. Certifier. OK. I'll start this. There are perhaps two themes regarding stability. First, there is the purpose of helm as a project. Helm is no longer a startup project. Knowing its purpose for both you and us helps us know how to use and develop helm. And users want to know that helm can be depended on for the long term. Second, there is the software itself. Of course, helm needs to change over time. There are always features to add. But it is even more important for helm to be stable. We'll talk more about why and how we do this in the following slides. We also need to talk about quality. Quality goes along with reliability and stability. In fact, software reliability for helm has more to do with quality than traditional reliability engineering, for example. And so this is the project management triangle. This is an iron triangle. But you need to, if you want features, you cannot control for cost. The same in the case of helm, they are a limited number of people invested in the project. So to add features or make changes takes time. If we were to try to move too fast with helm, it would lead to reduced quality making helm less stable. Usable. Jacob Nelson holds a PhD in human-computer interaction. He used to be called the king of usability. I think we all know what useful means. Settling, which is useful, has a benefit. It is practical to use. Utility is in providing features you need or want. This is touching on helm's purpose and the types of features delivered to fulfill that purpose. With this in mind, let's look at usability. And Jacob has a different mission to go right on with this equation. If you want something more formal, and hopefully with me quickly running through these things, I'll set the context, the remainder of the presentation. All right, so we want to ask, why is helm concerned with some of these things? And to dig into this, I started to pull some of the metrics and numbers here. And so if we looked just at the last month and helm the client binary downloads, we find for just the latest version of helm, we had approximately 2.2 million downloads, right? And total downloads across all versions were over 11 million, and that was over 151 terabytes of data transferred out just to download the helm client. That's not small. But if we dig in a little bit deeper here, what we can see is helm 39 that came out a couple years ago had over 440,000 downloads, right? And helm 321, which came out years ago, had over 470,000 downloads. So you see these old versions are still being downloaded and used. So when we think about the interfaces and things around helm and how you interact with it, it actually matters how people consume it because they're building their CI pipelines around this stuff. They're building their experience around it, even with integration with things like our CDN and how we provide that out. And even 217 that came out years ago, it's been years since that's deprecated, we still have a lot of people still using that. And that might be surprising. I would recommend people not use some of these old versions because the CVEs in them, right? Things have been discovered in the dependency trees, things like that, and it's still used. So the CLI is downloaded and used by a lot of people in a lot of places. And then there's things like the SDK. So helm 3 introduced the Go SDK where you can work your own applications on top of it and interact with charts and registries and management, and we've got a lot of big projects that use it, you know, OpenShift, Rancher, Flux, there's all these different things out there that are using the SDK, building integrations on top of it in order to do things. And so there's just so many people out there using it. If you break something, that means we get support requests, right? And we're volunteers working on this project. If we break an experience or something, that breaks their tooling's workflows and that's not a good thing either. So we realize that we're kind of a foundational building piece and so we've got to be careful in that. And then there's things like, there's tens of thousands of scripts on GitHub that are doing things like installs and upgrades and running template. This isn't fetching helm. This is actually working with it. So if you break things like command line switches, this impacts these scripts. And this is just what's publicly available on GitHub. There's all these, you know, thousands of use cases out in the world of companies' private scripts and you start breaking companies' workflows and things like that. So people depend on helm. And then there's, of course, the public charts, right? We can't know how many charts there are out in the world but Artifact Hub, another CNC project, lets you discover distributed charts all over the world that people wanna have listed. And we've got over 11,000 of them out there, right? These are all these people building charts, sharing them with people, people consuming them and this doesn't even account for all the private ones people use inside of their organizations and do it on their own time. And so we have to be careful not to break their experience. And what that boils down to is really, helm is widely used and expected to be mature software. And so we have to treat it that way. We can't go, if we go breaking things, that's a real burden on us to deal with the support influx. If, you know, we break things, that means we make people unhappy because there's so many people out there using that. And we're gonna dive in a little bit more into who they are. So we have a better idea of what this means before we get into how we deal with it. So let's talk a little bit about how helm uses scope to keep helm reliable and usable. So when we talk about scope, one of the very first useful things we can do is say, well, what is helm to start to define that scope? And luckily the helm homepage helps us there because helm is the package manager for Kubernetes. And that leads to the next question, which is, well, what's a package manager? And it's a system or collection of software tools for automating the process of installing, upgrading, configuring and removing computer programs. And I think, you know, throughout computing history, there's been many of these that we've interacted with and Kubernetes and Helm together being one of the latest iterations of that. It's also useful to say, what isn't a package manager? It is not a configuration manager. And if you try to use it in that way, you might have a bad time. So it's not like Ansible where it's gonna help you iterate across versions. And it's not a GitOps tool like Flux that can manage your configuration and keep it up to date from point to point. This model really kind of helps drive that point in. So you can kind of see right where Helm's at across like comparing to Linux, right? Like you have app and Helm kind of fits in there. You can kind of see where the binaries would map to images and config and to manifest and bringing that back to scope, you have to think about when you're contributing to the Helm project and writing issues and requesting things, what should a package manager accept? So whenever you're working with the Helm project, try to keep that in mind that that's our scope and what we're attempting to excel at. And what of those features should also be in a higher level abstraction or something that's orchestrating those interactions. And the other really important thing when you're considering scope is what is the priority order of each thing, of each user profile when you're considering it? So for Helm, we are prioritizing the application operator. So if you have a PR or an issue and it's about an operator issue or if it's going to negatively effect an application operator, we're going to look at that either positively or negatively depending on how that particular persona is interacting. So if you can help each one of those items, those are gonna be things that'll really help the Helm project. Okay, let's talk about versioning. Okay, changes in Helm, they follow semantic versioning. Breaking changes are in the major versions and new features go into your minor versions. Bug fixes go into the patch releases and you wanna add a new feature, you need to wait for the next minor. Releases are monthly. Currently, we have three minor releases a year and those follow the Kubernetes releases. They at least shadow them, but every month that is not a minor release, it's a patch release with bug fixes. So you can keep an eye on those if you're looking for one of your features to get in or one of your bugs to get fixed. Typically, releases are on the second Wednesday of the month unless it's not. But if we're talking about January, it's on the third Wednesday. Sometimes there are exceptions if something happened with the build. Typically, well, now you may wonder what's in an API. Because we are calling out that we're not changing or breaking APIs. So for Helm, this includes the Go SDK as well as the CLI input and output. The SDK is used by applications, the CLI is typically used by scripts. This does mean, again, no new features go in the patch release. You can't add to the Go API and add features even with flags. We have a policy on what changes with the major version and that is recorded in HIP for. So HIP is a Helm improvement proposal. And that is in the community repo. So Helm improvement proposal four provides guidance on how to handle everything from command line flags and output to handling changes to the Go API. Again, it is documented in the repo. And we'll talk more about HIP's later. Again, Helm is written in Go and we're providing the Go SDK. It follows the direction from the Go project on module compatibility. So if you've ever wondered about that and if you work with the SDK and wonder why we don't add to the existing Go interfaces but instead create new ones. It's because, again, the Go community direction. We are asked that a lot. So we follow that. Helm policies are mostly captured alongside other proposals in the community repo. If you can tell, I keep pointing you back to the repo. We get a lot of questions on these HIPs and where to find more information. Now the governance. In order to graduate, you have to follow, you have to have formal governance in place. As a long time graduated project, it has been in multiple years. Helm has governance about how certain things are handled in the project. The governance says how we will handle certain things and that will be covered again more a little bit later. All right, another question we are asked frequently. Helm four, who's interested in Helm four? Maybe? Okay. So if we're gonna talk about versions, we may as well talk about major versions. So Helm four. It has been four years since Helm three was released and in fact, the four year anniversary is just a week away. A little over a week away. Since we have been working on the minor versions and adding features in that four years, we've accumulated a lot of debt. So you can imagine after all this time, really we can't undo anything without making a breaking change and we're talking about being reliable and stable, right? So and really Kubernetes has seen a lot of new features that we would love to take advantage of. So again, these are gonna be breaking changes. So let's talk a little bit more about this. There's two areas that we've started to discuss. Now, first we're figuring out how we will work on Helm four while still maintaining Helm three. We talked about the monthly releases and we need to keep Helm current with Kubernetes. At the same time, we need to figure out how to make these changes. While we don't have all the details yet, hopefully next KubeCon we can dive into that more. What we are looking at doing is what we did during Helm threes development. So we still had to maintain Helm two. And during that process, we maintained Helm two with full bug and maintenance support for six months followed by six months of security fixes. So really we would love feedback. It's time to talk with us as now and this is what we're looking into. Okay, second, we have to figure out what we're gonna change in Helm. We aren't gonna change everything and we're not gonna have a years long development window and I know you're thinking, okay, what is gonna change? We don't know yet. So this is the time that we all should be talking about this. We're in the process right now of figuring this out. We have a whole backlog of PRs and issues that many of them, we've discussed with a lot of you that we can't implement it yet because it would be a breaking change, right? But do you wanna call out? Don't worry, we're not breaking existing charts. Quick show of hands, is anybody using V2 charts still? If anybody wants to admit, okay. So remember what Matt said, you have to consider the vulnerabilities but we're still not gonna break you. That would break a large part of the ecosystem. V2, V3 charts, there will still be compatibility. We don't want to cause anything to break. And if you wanna come help with V4, please come talk to us. This afternoon we'll be at the Project Pavilion, at the Helm booth, also weekly developer meetings. Those are Thursdays at noon Eastern and the Helm Slack, but there's so many different ways to help from contributing changes, giving feedback on ideas, becoming a maintainer to help drive this work. So please come talk to us. All right, let's talk about testing. How do we pull off this testing thing in keeping Helms stable, reliable, and all of those things? Because testing is at the core of what we're gonna do. So let's talk about some of the different types of testing. Now, first of all, we've got a lot of tests written in Go because Go is the language that we use. Used to be run in CircleCI. Today it's run in GitHub Actions. We did the migration a little while ago and so we moved over. And we've even got automation that kicks off this. So this case, this is a pull request to automatically update dependencies because we wanna keep dependencies up to date to knock out CVEs in the dependency tree. We wanna shake those things out. But we do have lots of automated tests. And so let's dig into some of those tests here. The first thing we have is we have unit tests, right? Obviously, we do lots of testing. So testing down to the unit of work. And in this case, the unit test is also testing the Go API. So if we go make some kind of change under the hood and these tests start failing, it'll tell us that, hey, we broke the API somewhere because it's important for us to have test coverage of that for the interactions that people do because the Helm CLI that everybody works with is actually built on the SDK. So it can even break the Helm CLI if we break the SDK. So there's lots of testing in there down at that unit level. But we also do other kinds of testing. And in this case, this is an example test we have where we are testing the CLI itself and we run commands, you know, Helm install this, Helm template that. And we go see whether the output generated by Helm is the same as what we expect. And we call it a golden output. And we run a series of tests. You can see the first test case there at the bottom, but there's a whole series of test cases for different commands. And this is just an install. For each of the commands, we try to have this. And we're not connecting to a real Kubernetes cluster to do this, standing that up in CI, having fresh environments is hard. And so, and it's not very fast. So we do mock things. We've got a mock Kubernetes API client that Helm uses. We have got a mock release storage. And so we've built this all up. So these tests execute very, very quickly. Which is useful for developer feedback, even in CI and everywhere else. But we also do have some integration tests. And this here is from an editor and it shows, hey, we're actually standing up distribution in CI and doing OCI tests against it. So we can push and pull charts from the OCI registry that we stand up and we pull in an actual Docker distribution now, CNCF project in to do that. Now it's not just CI testing that we do as far as testing goes. And one of the things we do run across everything is we're using CodeQL. And we will scan the code with this on every pull request, every change that lands in a branch. It's constantly being scanned for those kinds of things. And if you're not familiar with CodeQL, CodeQL is provided by GitHub and it does semantic analysis of all of the software to say, here's where we think we found vulnerabilities in your code based on just static analysis. And we run it on everything, it's nicely provided by GitHub and it's a free tool that can be used for there for your projects, but we use it as well. But this kind of analysis isn't the only thing we do and all tests don't just live with the Helm project. So we also do fuzz testing over our stuff. And you'll see this lives in a different repository. This is the CNCF fuzzing repository. It's a project that was funded and sponsored by the CNCF to get started. And what it does is fuzz testing. Now, who in here knows what fuzz testing is? Anybody know? Okay, for those of you who don't, you take something like a string input to a function and throw whatever at it and see if the function handles it, right? So you've got a string, so with the Helm command you can do set something, right? What if you did 20,000 periods? How would Helm do this, right? How does garbage get handled? Does memory explode? Does something happen? And it looks for cases like this by slowly throwing more and more things to find where your code doesn't handle it. And this is important for finding security vulnerabilities because if you've got outside interactors and you're building something on the SDK, maybe they denial of service you. Maybe they cause a memory overflow which is a panic you can't recover from in your code. And so it looks for things like this and it helps you do that. And so we have a whole bunch of fuzz testing against Helm that helps us look for these things and that ultimately at the end of the day brings stability and improved security of the code base because you can't dost it as well. And this work is funneled through the OSS Fuzz project which is something sponsored by Google and there's a number of projects that go through it but we are run through this system regularly and it'll tell us if this system just running in the background finds more things on Helm and then we can go in and fix those situations. And some of the CVEs from last year were fixed because of that. All right, we also do manual testing. A security audit really is a manual testing and so periodically we do things like that in order to manually test and look for these kinds of things as well. But it doesn't always catch everything and we actually had an oops in Helm 3.13 which was fixed in 3.13.2 that went out today when working with registries because we stand up a registry in CI and we test but it turns out authentication against registries isn't to a spec and not everybody does things the same and a recent change broke some of that interaction and the only way to test that would be to go to all the major registries and test against it in CI or somewhere else and we don't have that setup today but it is fixed and we do catch these things but sometimes we don't catch them right away so there's always room to grow and improve the testing but that is one of the things is we do invest heavily in testing. So much so that if you're looking for a place you want to contribute and you don't have an idea of where to get started a great way to learn the code base and to get involved is actually to come contribute tests down. We would love that and our existing test layout our style of what we're expecting and you learn the code how to use it for your own SDK uses or whatever by writing tests. So with that, I'm going to hand it off for the next section. Here we will describe how the Helm project inclusive of all the maintainers and the community is managed how we manage changes how we govern the project from selecting new maintainers to getting releases and generally how the project aims to meet the reliability and usability goals that we set out earlier. So community and governance and where it starts with Helm Karina briefly mentioned governance I'll go into more detail here and how governance interacts with the community. Helm is an entirely is entirely a community maintained project and so community engagement is critical to Helm. It requires the community to contribute to the additional features or functionality are desired. For example, more frequent releases or faster PR reviews. And so to ensure Helm can meet the goals of the project just like all CNCF graduated projects Helm must have a strong governance model that is having to find rules that oversee the stewardship oversee the stewardship of the project. And so the governance governance for the Helm project is comprehensive from major things like the votes needed to elect election new maintainers and ensuring maintainers uphold the goals of the project to more routine things like running a weekly developer call or defining release cadence and criteria. We've mentioned tips already but just like many projects out there Helm requires significant changes to be proposed with a Helm enhancement proposal or HIP. I'm sure many of you here are familiar with Python's HIPs or community's HIPs the Python and community's equivalents. Helm has HIPs for very similar reasons. The rationale is that significant changes need more design and more review. Significant changes need a way for the author to adequately describe the problem and for the community to adequately comment on the proposed solutions. And the results of these discussions need to be more formally recorded for example than a GitHub issue. It's worth mentioning that HIPs are not limited to feature proposals only. They can, they could be for improvements in process or simply informational for example we all have a Helm V4 HIP. When Helm wants to more formally document their intention of the project and in essence HIPs allow anyone from the community to formally propose a direction for the project. And while running a HIP might seem like an overhead compared to a quick GitHub issue in PR they certainly result in proposals which are much more well thought out and with higher quality results. They ensure changes are aligned with Helm for the expense of implementing and the HIP document is much more easily discoverable. As a follow up to the points that were made earlier around testing, stability, release process and visioning I want to re-emphasize these things and the why is quality. The Helm to continue to be successful as we said it must continue to be reliable or as this defines there's ensuring quality remains high over time. Practically this means changes to Helm must fit in with the scope of the Helm project and contributions must have a high standard of implementation. Recent bugs and issues we have had have had roots in well intentioned changes that have come from perhaps not well thought out ideas or poor implementation. Fixing these bugs has literally broken other workflow flows as such is critical for Helm to only accept well thought out and well implemented changes. And in particular scope as Ian talked about is actually a difficult thing when your user base and usage is so varied and wide and the Kubernetes ecosystem is growing so fast. Okay, so let's talk about contributions a little bit because these are so important to keeping a project like Helm usable and reliable. So over the last year there's been over 900 issues closed and every issue, I guess not every issue but most issues they come down to something we could have done better with documentation or even the issue itself produces documentation. So if you run into a problem with Helm please do produce an issue and so we can look at it, we can document it and get it resolved. Over the last 12 months we've had over 150 pull requests closed still more than 300 outstanding. So there's still a lot of work to be done to clear that backlog and make sure everyone's heard. If you like stats you can go to the CNCF DevStats site for Helm which I've linked and you can look at all sorts of stats on who's doing what and it's kind of a fun browse anyway. This is one of those charts I kind of massaged it a little bit to smooth it out but when I was working in the Helm booth at Kube County, one of the themes that we heard was hey, I don't feel like you're listening to our PRs I just wanted to kind of demonstrate that we are doing much, much better. We've gone from 28 weeks or so to on average receiving feedback all the way down to two and a half weeks right now on your PRs. So if you open a PR you're gonna get feedback much faster and we're gonna kind of help you guide you through that process. So please do make PRs, join our calls and talk with us. We also need more people to help review these things. It really helps when someone else chimes in and says, hey, you know, I reviewed this, this looks good, this would help me as issues appear on the repo. You know, feel free to interact and you know, provide people guidance. You don't even have to be a maintainer to help out with that stuff. So please do. And we are having a retro, a very special retro. This is something that I've wanted to do for a while in the Helm community. We're all gonna get together. All of you are welcome. And we're gonna have an extra long developer call and we're gonna have a retro board. We're gonna go over what's going well, what you all are thinking about, what you're upset about or what's not going that great. We're gonna discuss the most upvoted items and figure out what we can do better. Great action items on what we can do. So I'm really looking forward to that. There is a link to the developer meeting. You can Google it. It will also be the first link. If you Google the Helm developer call and add it to your calendar and please do join and just discuss the project with us and what we can do. So the more people that show up, the better it'll be. Finally, this is a link for feedback on this talk and I'll open it up to Q and A. All right, does anybody have any questions? We've got a few minutes left. If you do, walk up to the microphone please. It can be anything Helm related. Version four. Okay, well we will be here. We can continue to answer questions. If you wanna step up to the mic, go ahead. We'll be up here for about three more minutes then we'll step off and you can catch us on the side if you'd like. Thank you.