 Alrighty, in terms of TSE members, Alexis and Jung sent their regrets due to conflicts. I have Michelle, Liz, and Brendan on the call right now. Let me know if I missed anyone. That may be dialed in. Well, might as well get started then. It's our community presentation meeting. So we have cloud events in Tuff going up first with their incubation and graduation reviews followed by Kudo and the Kepton projects for sandbox. So might as well get started since it's a packed agenda today. So Doug with cloud events first. All right, cool. Can you hear me okay? I can loud and clear. Okay, cool. Okay. For those of you who may not know or missed the TSE presentation, I guess a couple of months back that we did a review of cloud events. Cloud events is a specification. It's not a typical open source project with code. It's a spec about how to modify current events to add well-defined metadata to help manage the routing and filtering or very common middleware type of functionality without requiring that middleware to actually understand the business logic. So as I said, it defines common metadata across events, a common location for that metadata to appear so the middleware can do this basic processing not have to understand the business logic. We're delivering obviously the specification, serialization rules for common transports like HTTP, MQTTP, stuff like that, serialization for JSON, as well as a primer, some SDKs, as well as extensions as well that didn't meet the criteria for actually being part of the spec itself. We have had some demos at previous KubeCon and stuff with some links there. You guys can look at that if you want. Some of them are self-explanatory, some not. So if you're more interested or if you're if you'd like to get more information about the demos, just pin me offline. I'll tell you what's really going on. From a status perspective, technically we're at 0.3, but don't let the number feel you or fool you, we're actually more at a 0.9. Actually, we're hoping to approve 0.9 this week, which will technically be a release candidate 1.0 and then approve that hopefully before KubeCon and have some wonderful PR around that. For those of you have never seen it before, on the right-hand side, you could see what a cloud event looks like, basically take the HTTP message with the stuff in bold, which is the cloud event stuff and basically turn any HTTP message into a cloud event. So all we did there is add four new attributes and that turns into a cloud event. That's that common bit of metadata to help people do routing for things like what type of event it is and who sent in stuff like that. With that quickly behind us, let's jump into why we're here. As of right now, cloud events is a sandbox project and we're going for incubator status. So let's go ahead and jump to the next slide. Just a quick reminder for those of you who don't know, we have to meet three criteria. We have the document that's being used by at least three independent end users, healthy number of committers, as well as demonstrated an ongoing good flow of commitments and merge stuff. Okay, so let's go to the next slide. So the first criteria, I guess, so a little bit of a preface before we actually get to the first criteria. Because this is a specification, it's a little bit difficult to come across sort of end users. It's a different situation basically than a normal open source project. However, we did want to highlight all the different companies that are actually implementing cloud events. So you'll see a very distinguished list of companies here on this list. And you basically can assume that people who are using those particular products who are going through the right code path to use cloud events are obviously using cloud events. So for example, let me pick on Knative because that's the one I'm involved in. Anybody using Knative event thing is going to be using cloud events under the covers. It's just built into that system. I don't know if I can make that same statement about all these other ones. They may have a separate code path for cloud events, but we do know that it is being used by some people of these products. The challenge you run into is a lot of people don't necessarily feel comfortable stating in public that, hey, yes, we're using this particular technology to the degree that we need for the TOC review here. But I did want to mention that we know it is being used for sure, at least in these products. So usually those products are very likely actually using it under the covers. So let's go to the next slide. So here are the three that we did manage to get our approval to mention. Roberto from Adobe has two different end users, Wipe Red and Pandora. And then I came across Accenture in the public, willing to admit that their product, the React interaction gateway is actually using cloud events under the covers as well. And you can look at their documentation to see it. For the Adobe stuff, they're using the GDPR events or the passing around GDPR events as cloud events under the covers. So they are actually using it through their product. All right, let's go to the next slide. Obviously, stop me if you guys have any questions, otherwise I'm going to try to go fast to get the other guys in on the call. So criteria two, number of committers. Again, I have to sort of preface this a little. Because we are a spec project, not a code project, the rate of change in our spec is much, much lower than you would see in a code open source project. And our goal here, when you submit a pull request to change the spec, isn't to try to get the one or two maintainers to approve it. So it's not how quickly can you get your code in there. The point here is to get community consensus. Because this isn't about changing code in one open source project that you're hoping to be useful to people, but still just one code base. This is about convincing the community that this is worthy enough to be implemented by a lot of people, as many as possible. So consensus and community building is incredibly important here. Meeting that minimum bar of one or two maintainers, like many open source projects have, isn't going to cut it. We really, really need to get that consensus building here. So when you take that into account with the other factors, like many of the PRs that we have aren't offered by a single user, right? Oftentimes, these PRs are a collaborative in nature, either on our weekly phone calls or offline. And then one person sort of takes the pen and makes the code changes. Looking at the number of PRs submitted by any one particular person isn't necessarily a good reflection of their participation in the community. Because there are a lot of people who contribute verbally on the phone calls, comments on PRs, but they don't actually become the main author of PR. And so it's not really fair to them to not include them as part of the quote, maintainers of the project. And at the same time, though, if we did look at PR count, we don't want to encourage people to start playing games, right? By submitting PRs just to get their PR count up. That's not what this is all about. Okay? Again, it's about consensus building. So the other thing is, most of the people have other main jobs. And I know that's true of everybody for every project. But unlike many open source projects where people, we have a group of people who are basically seem like they live there 24 seven coding, because of the specification, rate of change is lower. This is, as I said in here at the side gig. Okay? So again, the number of PRs are going to be much much lower than open source projects. So I'm just giving you guys a fair warning. And we also need to make sure that those PRs are very, very carefully reviewed. Okay? We just don't want something to slip into the covers. The pure surprised about consensus is almost important to us. Okay? As I said, PR count is an accurate representation of contribution. There are a lot of things going on. However, having said all that, we do have SDKs which operate like quote, normal open source coding projects. And you can look at those in terms of participation. And those follow the normal rules of, you know, you do more PRs, you get nominated for being maintainer and you work your way up the chain, that kind of stuff. So those are quote, normal. Okay? So let's go to the next slide. So with that in mind, we do not have committers in the normal sense, right? Technically the only people that have, it has the right access to the repo are the maintain, are the admin. So I think it's only two of us, right? What ends up happening is issues are open. We discuss them on the weekly calls or offline through the issues themselves and get up. When PRs are opened, we only approve PRs during the weekly phone calls. And technically, any significant changes to PRs have to be in at least two days in advance to give people a chance to actually review them. That way no one feels like anything was slipped in at the last minute and they didn't have a chance to review it properly. Okay? The PRs themselves can technically, Vito isn't the right word here, but I couldn't think of a better phrase. Technically, anybody can sort of block a PR. And I mean that by anybody at all, not just the people who regularly come to the phone calls. Anybody who just happens to show up on an issue can make a comment on there. And if it sounds like it's not completely insane, right? It sounds like it's a valid concern. We want to address it. That basically puts the PR in a block state. And we have to resolve all open comments on PRs before we accept the PR in there. Now, obviously that means things could technically go a little slower and they do at times. I mean, it's a spec. You got to get right about consensus, right? And, but in the end, what ends up happening is that forces people to work offline to come back with a solution that is more, has broader support in it. Okay? Now, obviously, not everything can actually be kumbaya and everybody agrees on everything. So ultimately though, ultimately though, if something does happen and we can't get to a unanimous agreement on things, we eventually do take a vote. So then the question is, well, if you don't have maintainers, who gets to vote? What we ended up coming up with is a rule that says people who show up to the weekly phone call on a regular basis get a, get to have votes or get that voting right, I should say. What that means is if you were there for the last three out of four meetings, but by there, I mean, you or your alternative from your company are there for the last three or four meetings, then you have voting rights. Now, all that really means is that you care enough to actually participate in the weekly phone calls. Okay? Now, you may look at that and say, okay, that sounds fine for people who make the phone call, but what are the people who can't make the phone call? Well, then that kind of goes back to, you know, anybody can block the PR through a comment on the issue. And you might still say, well, that doesn't seem quite right because they don't get the vote. Okay? True. But let's go to the next slide. And if you actually look at the votes that we've been forced to take, there hasn't been many, ignore the administrative votes because those are mindless things about, you know, do you want to go to the next level, which are relatively minor. If you look at the technical votes, we've only really had five. And if you look at all of those, they've all been landslide votes. Right? And that tells me that we're not trying to squeeze an issue through with a one-vote margin of error kind of a thing, right? These are generally have consensus built into them. And it's just one lone hold out that just couldn't manage to convince the community. But everybody else basically said, no, this is the right way to go. And I think the fact that they are overwhelming landslide votes tells us that the process we have in place to ensure that we have a community consensus is actually taking hold. And the fact that we don't have traditional maintainers isn't really a problem. And in fact, we even had people ask on the phone call of the community, do we want to change our rules? And most people's reaction, right, I should say everybody's reaction at the last thing we asked of this was let's not fix what's not, let's not change what's not broken, basically. So everybody is basically okay with it. So I feel pretty good about the fact that we're a little bit different from the normal process. Okay? So hopefully you'll see that we are community based. It's not traditional PR account type thing, but we do have something in there to make sure that everything is equitable as best we can. All right, next slide. criteria demonstrate ongoing flow of commits. As I said, we have we do have an ongoing flow of commits. You can look at the PR account in there. Honestly, I don't think the PR account matters at all, because we don't actually use that for anything, especially when it comes to voting rights. But I did show it there in case you're interested. The people are cost out of people who were active one point, but then I've kind of dropped off to go work on other things. But if you look at it, this is a good example of why the PR account doesn't really work for us, because if you actually look at this, if you say, okay, maybe if you have five PRs or more, you get to be maintained or 10 or more, right? That's a good number. Well, you don't only have maybe three maintainers in the group. And that's not really fair or representative of the level of contribution from everybody in the community. However, if you look at the graph, you can see we do have a constant flow of PRs. Again, it's not a high count. This is a spec, not code. But you can see there are some most weeks we have at least one. Some weeks we have a whole bunch, right? So there is a fair amount of activity going on in the group. So we are making fairly good progress. On average, we have around 27 people attending the phone call every week. I think that's pretty good for a spec. Most people would rather get shot in the head than actually work on a spec instead of code. So 27 people on a weekly call is pretty darn good. And that's spanning 78 different organizations with four people coming from, or coming to us from non-companies, right? They're just self-affiliated. So I do think it shows that we actually have a fairly good rate of participation in the spec itself going forward. All right. Let's go to the next slide. I think that actually might be most of it. Okay. The next two slides technically talk about the SDKs in terms of their activity. I just included this here just to show that we do have a fair amount of activity going on there. I think the orange is probably the go SDK. That's the most popular one, probably being driven mainly by the K-native guys, because as I said, they're really using it. So it's good. The next slide you could see, I think that's more of a PR count kind of thing. You could see what's going on there in terms of activity. But the SDK work isn't technically part of the review process for going to Inkey Bearers. I don't think it is. It's more of the spec itself. But I just wanted to show you that there is other activity going on that's code-related that is part of the community. And I know I went kind of fast, but I think that kind of hits the main points. Well, I kind of looked through the questions in the chat. Are there any verbal questions people have? Hi, Doug. Yeah. Since the cloud events were sort of first muted, has anybody come up with any kind of competing specs or other initiatives that you're aware of? Excellent question. So the shorter answer is no, but I will throw one thing out there that I have to make clear to people. This is not what I would call yet another common event format. Many times in the past, people have tried to create a cloud eventing structure that all events are supposed to adhere to and there's one cloud event to rule them all kind of a thing. That's not what this is about. This is simply about taking your existing message in most cases and adding a few bits of metadata to it. And that's really all it is. And I'm not aware of any other project that tries to have what I would call a very limited non-sexy scope to it. We're not trying to be that exciting. It's just a common little piece of metadata to make life a little bit easier, solve some pain points. We're not trying to solve world hunger here. So as a result, I don't think anybody's really thought of it or thought about doing a competing one yet. So just to clarify that, how would someone consume one of these events if the spec doesn't really tell them what's in the event? I might have missed the first couple of minutes of the talk. Yeah. So let's go back to, I guess, almost the very first slide, because that shows a sample cloud event. Oh, I did see that. Yeah. Can we get some of them to go back? I'm not sure who has control over the slide deck. Yeah. Go all the way back to the very first non-intro slide. Mark Peek with the SMTP analogy. Yeah. I'll go back one more thing. Yeah, this one. Right there. Yeah. So if you look at this in the gray box, right, that's just an ACP message coming across, right? Now, if you look at the four ACP headers that are in bold, those do actually tell you very, because I'm very key piece of information. The spec version, it's the cloud event spec version itself. That's not as key that much. But the next one, the type, right? That tells you the type of event that this is, right? So someone receiving this message, if they are doing some sort of generic filtering, and the person who specified the filter says, I want everything from bigcode.com, right? They can say to their cloud event middleware, give me all cloud events whose CE type attribute is com.bigcode.star, right? So this piece of middleware actually doesn't have to understand the message. All it has to do is, in essence, understand regular expression matching, right? To do this basic filtering or basic routing type of stuff. And that's the kind of the point here, right? We're trying to make this middleware have this ability to process these messages without understanding what's going on. And in fact, that's exactly what Knative is doing with cloud events. If you're understanding what's going on with Knative, they're actually inventing some basic building blocks for events routing through the infrastructure, with fan out, fan in, filtering, all that other stuff. And they're basing it upon the cloud event structure. So you can do basic filtering on these fields, like source, meaning where this event came from. And the middleware doesn't have to understand that this is an event from some AWS service or IBM or Google service whatsoever, right? As long as they add these little bit of headers to it, the middleware should be able to get its job done. Does that answer your question, Clinton? Yes, it does. Thank you. Yep, sure. All right. Any other questions? No, we did not register him with Anna. I guess technically we can consider that at some point. Just to honestly just have the employment conversation yet, to be honest. Awesome. Cool. Can I ask one quick question? Yeah. Have you guys surveyed like other things besides Knative that might be adopting the standard? Like one I know, like Argo events is something that I think has adopted cloud events. I don't know if there's any. I mean, that should be like, have you done like an open source kind of survey? I mean, companies might not go on the record saying they're using it, but have you looked for other open source projects that might be adopting the standard or the spec? Yeah. So, okay. So obviously on one of the first slides, I showed some places where it is being used. I think Canadian might have been one of the only open source ones there. I think everything else was proprietary. From an open source perspective, I'm pretty sure there are a couple of places out there that are using it. I just don't know what they are offhand. I don't think you've actually done an official survey to answer your question. No, but I do know it is being picked up, at least by those proprietary members that we mentioned on slide three or something like that. Okay. Okay. I'll say Argo events happens to be one of them. That's a project that Intuit's involved in, but there's probably other ones that might be a good thing to do. Yeah. Thank you, Mark. I'll take that link to stick it into the chart that for the next time someone asks about this, because that's good information. Thank you. Okay. Any other questions? All right. Cool. Thank you guys very much. Yeah. You can skip all the background or the backup information. Sorry about that. I wasn't sure where we were going to go in the conversation. No worries. Thanks Doug. I think we have Justin next and we'll go over the sandbox proposals. Justin here. Yes. Awesome. Go for it. Steerway. All right. So this is the graduation review for Tuft. And Tuft as I think many of you know, because you've probably heard talks or discussions about Tuft over time, is the purpose of it is to let you do things like update or install software. And to make this process be secure, this secure this process, even when an attacker goes and does things like breaks into your repository, steals a key, is a man in the middle on your network, and so on. It's fairly easy to design a system. And we see a lot of people designing systems that work perfectly if you are perfect. But to actually make a system that isn't sort of an attacker gets in one place and you lose massive amounts of security is actually very difficult. So Tuft, in order to provide these sorts of properties to make a system so that even when attackers break into like a repository that stores your Docker images or break into a place that stores your updates, your software packages for a distribution or something like that. In order to make all that work, Tuft uses a combination of roles, threshold signature, selective delegation, and so on to make this happen. It's also one of these things that's surprisingly easy to go and deploy and adopt. It's something that you sort of drop into your system and it works and people don't even know it's there. In fact, sometimes we don't know. In fact, I'll even say most of the time, we don't know when people have actually adopted it. The way that we find out is people putting a blog post to talk about it or in one case, there was someone who had forgotten to have their key of long enough lifetimes. They started to get error messages saying that we're talking about tough in them. So that was sort of bittersweet to learn about their adoption via error messages from them having not managed their keys correctly. But Tuft itself is a specification project. We have a very strong security focus as you might imagine with a project like this and our intention is to have a minimal design with low churn. It was created in 2010 to address issues that I found when I worked with some of the folks at Tor to try to do a new updater for their sort of nation state actor threat model that they deal with on a daily basis. And we were admitted to the CNCF in 2017 along with Notary. Notary is the most widely used cloud. The most widely used implementation of Tuft, at least in the cloud. Although there are other large companies like Datadog that use our reference implementation or Python reference implementation of Tuft in production. And there's an automotive variant that's very, very popular called Uptane where on the server side, it's basically just vanilla Tuft with a few very, very minor tweaks. And on the client side, it deals with the facts that cars are very difficult, challenging, weird environments. And you have a bunch of devices that don't have their own connections out and don't have a notion of time and don't have a lot of other things that we sort of take for granted in cloud environments. There are about a dozen different implementations of Tuft or the Tuft variant obtained by different organizations. I'll talk a little bit about some of these in a moment. And Tuft itself, being a standard process, first I want to just say thanks to Doug for like really hammering home a lot of points about how specs are different than projects. That was like the absolute perfect opening act. Thank you so much. You did such a great job of covering that. We have a formal process for changing the Tuft standard that we're very, very conservative about and really try to build complete consensus within our community and have even for all of the changes we've made to Tuft had actually 100% consensus for them after a lot of lobbying and a lot of discussion with different adopters. Next slide, please. All right. So the production use of Tuft, as I said in the cloud native space, we're used by a lot of large companies, as you can see there. You can find on our adoptions page, which links at the bottom. You can find the links to blog posts and other things that talk about these. I estimate, and I might be a little off on this, but something at or over 80% of the cloud users use notary and at or somewhere around 20% use the Tuft reference implementation. We're also used a lot in automotive through uptane. Anyone who's dealt with automotive knows it's a very secretive industry. It's very strange to me because we have people that are selling products based on it, but won't let us list their name on us like a website or say things about it. Like when I talk to the press about, yeah, you can buy it from lots of places. I can't say this. Once again, I don't really understand why that industry is so secretive, but there are quite a few implementations we can talk about and quite a few that are public. But there are also major, major tier one vendors and major OEMs that are using it that we're just not allowed to name. We're included in automotive grade Linux through the integration of a product called Actualizer, which is an obtained implementation done by a company, ATS, that was bought for a lot of money by a larger company called Here, which is one of the major automotive vendors for infotainment and navigation units. Based on projections that we've seen from different OEMs, we have at least one, or actually we have multiple OEMs in the U.S., in Asia, and in Europe that have adopted uptane. And so in the next three to four-ish years, the projection that we've seen is that over a third of new cars sold in the United States will include uptane as the way they do uptates. Uptane itself was adopted under the Joint Development Foundation and the Linux Foundation. So that's where the spec sort of lives now. And we are also an IEEE Isto standard for uptane. And we have a lot of use outside of cloud and automotive too. Facebook is going and has given a bunch of money for Python to go and integrate tough into warehouse. Google's using us in Fuchsia. We have LEAP. We have a bunch of other programming languages. We have Arch Linux that's going and adopting tough and so on. Hey Justin, you muted yourself. Next slide please. Okay, thanks. All right, so in terms of committers, once again, it's a funny thing to sort of look at this. But if you look more broadly at the reference implementation, then looking at our committers, we have different folks from different groups. Notary, I'm just putting it up there. They're a separate CNCF project. They're not part of this graduation review. They'll be a separate discussion about notary at a future point. But notary and tough both have similar numbers of committers and organizations. Uptane has over 100 people participate in the forum and has something like 60 people that are standards participants. And we regularly have a couple dozens of people on weekly standards calls, which is as Doug said so nicely, it's very hard to get people to really care and dig in and look at this. And we've had well over 100 people from about 50 different organizations. We have vendors, regulators, folks from agencies like NISTA or others like DHS and things like that that come to our obtained meetings, like come to specific meetings just for obtain, fly in in order to talk about and help to move the industry along. And we've had a ton of support from OEMs. One number I do have approval to say publicly is in our very first meeting, 78% of cars on U.S. roads had a representative in that meeting from like their security team. And our attendance has only increased over time. So we're really something that you know the security folks in the automotive industry are very active on. The spec itself is low turn. And so this also makes a lot of our implementations be quite low turn. We don't know our goal here isn't to add every bell and whistle. It's to have a solid secure common core that can be used. Next slide. All right. So looking at the flow of commits, a significant change, any significant change to tough like any significant addition or modification requires a process called a tap process. Those are tough augmentation proposals. And so what this process does is it basically gets all the important stakeholders together. It's written in sort of like an RC style format document. You can go on our site and look at these. And these changes add or tweak or do things that are important to basically tough as a whole and add functionality such as key rotation or multi repository support or other things like this. And there tend to be comments and discussions on this. We've had a bunch of different 10 or so different contributors that have written parts of tabs or worked on tabs to help to improve the tough spec. And we've also had a bunch of like kind of typo fixes and other things like this that are very, very, very minor that would not represent something like changing code. And those have the stats for those there. Notary and tough also both have a history of committers from different groups that are integrating or doing other things with them. And so you can get some commit information there as well. Next slide. All right. And I think this is my last slide here. So I just wanted to mention we have checked all the boxes. We've adopted the CNCF code of conduct. We have our governance and contributors process. You can find our adopters list for tough there. The adopters list for obtain is once again a little harder because we can't make a lot of things public, but you can find a lot of information about that on the obtained site. We have a CII best practices badge. We are at Silver. We are two things away from Gold. By the way, there are no projects that have a Gold star as far as I can tell that haven't cheated with XNIF options where they link to a site that isn't their site. So I don't think it's actually possible to legitimately or but let me just say that I don't think anyone is legitimately getting a Gold best practices badge now. And I think there's some little tweaks in there that would make it hard or that could be done to make that process a little better. But I feel very proud of where we're at. We're by far the highest CNCF projects in this regard, I think. And if you want to look at the stats about this, you can see that on there. And with that, I will answer a few questions that I saw fly by. I think... Okay, so does someone want to jump in and ask or should I answer the separation between obtained and tough? First, okay. So obtained is a specification, but the client side is very different than tough. It does a lot of... It basically you can view it almost like a superset, but it's a superset with some tweaks that make more sense in automotive. And so if you take the server side part of tough, like tough server implementations, you have about 90% of what you need for obtained. The things you're missing is in obtain the vehicle's report back information about the versions of the different ECUs, the different little computers in the car, and so on. So obtained is sort of a superset of tough. And on the individual components in the vehicle, beefy components do something that is basically tough plus a little bit of extra functionality that makes sense for cars. If they're very weak components, they do something that is like a weak subset of tough because the little microcontroller that decides when you're pulling your seat belt, whether it should tighten or not, is a really weak little microcontroller. It's a little weak, tiny computer in there, and you can't do all of the more expensive things you would need to do to decide how to update that or your dome light in your car, other very weak computers like that. So it's a stripped down streamlined version of top that has weaker security guarantees and acknowledges this. It like repeatedly explains the differences and what you lose and so on with this. So obtained, you can view it as a mostly a superset of tough. Does that answer your question, Quinton, at least? Yeah, partially. I'm still not totally sure. So if the tough specification changed, would obtain also have to change or do they approve the changes to tougher? It's not that sort of relationship is not clear. When we make changes to tough, we work with the automotive community because a lot of the times, it's effectively almost always the case that you want the flow to be between the two. If there's something good and obtained that tough would benefit from, you want that to come down and you want the opposite to be true. So we've had some flow between, but they're not strictly lockstep. If there are such like, you know, obtained and tough, the people who are like the process that you go through to approve changes to each are different. And there are different communities, but they have a lot of enough overlap. And so once again, I think viewing tough as mostly a subset of obtained, but also being a part of it that's more focused and more applicable to non-automotive because it's not that tough is like obtained minus minus. It's that obtained is all the weird stuff that has to happen to make it work in a car. And if we went in, we did medical device version, there would be all the weird stuff you have to do for medical devices. But tough is the core of both of those projects. And tough would be the core of, you know, anything else in those regards. Okay. Thank you. Okay. Thank you. If anyone has any other questions, I'm happy to answer them. Otherwise, we can move on to the next presentation. Cool. Thanks, Justin. Hey, folks. I'm Toby Knell. I'm co-presenting together with Jared here. So Jared and I are on the team that created Kudo. I essentially, you know, served as the product manager for getting it off the ground. And I'll let Jared introduce himself. Saras, figuring out my new key. Hey, everyone. I'm Jared Dillon. I'm a member of technical staff here at D2IQ. And yeah, I work on Kudo day to day. All right. So let's start with what is Kudo? Kudo is a toolkit for building operators. And it specifically focuses on day two operations and specifically for services that need, you know, fairly complex day two operations like distributed data services. Kudo is a little bit different from other approaches to building operators in that it actually ships with a controller already. And so folks that build with Kudo don't have to implement their own controller. They instead just write a YAML spec that they can use to define the operations for their particular workload. And so, you know, a Kudo controller can not way manage multiple different types of workloads. The main abstractions in Kudo are actually inspired by D2S Commons, which is a similar sort of toolkit or SDK for building data service orchestration on top of Apache Mesos. And it's been used for a couple of years to run these data services, you know, things like Kafka and Cassandra and others in productions for a couple of years. And still folks that have used these services came to us and said, hey, you know, can you give us a similar experience on top of Kubernetes? And that's how Kudo was born. Next slide. So when we talk to folks that are building operators, you know, here's some challenges that we found that people run into. So obviously, you know, a controller or an operator is not a simple piece of software. And we found a lot of folks that just don't have the skills on staff on the team to write a lot of distributed systems code and go. And, you know, operators typically, you know, at least the ones that are production greater are more advanced, have more than 10,000 lines of code. So, and if you look at a lot of those data services, a lot of the big data stuff is really written in JVM languages. And so those teams simply don't have the people on staff and they find it challenging to hire people also to write these things and go. Also, we found a lot of code duplication between operators, so folks that have to build multiple ones, just a lot of code they have to write and then, more importantly, also maintain. So when client APIs change and new versions come out, they need to make sure that stuff still works. So it's a pretty significant burden. And another challenge we found that is that it's not that easy to integrate with other CNCF ecosystem tools. And so CUDA has some some abstract, some ideas for how to do that that we'll get to a little later. Next slide. So talking to users that want to deploy operators on their clusters, we also found a couple of challenges. And, you know, Kelsey sent this tweet a few months ago that, you know, basically says people people really struggle with this still. And what makes it complicated for folks is that really different operators have different workflows and different APIs. A lot of times when people deploy specifically distributed data services, they have to run multiple, right? They might use, for instance, you know, Kafka to ingest events from, you know, say IoT sensors or other sources, and then run, you know, a Spark streaming job behind that and put some data into elastic or Cassandra. So they run multiple of these things together. And so the DevOps people that manage these clusters, they have to know all these different operators. They have to know how to debug them when things go wrong. They have to know these different APIs. And so what they find themselves with is controller sprawl, right? They deploy multiple different controllers for these workloads and have to become experts in all of them. And, you know, that's that makes it complicated for folks. Next slide. So how does CUDA help? We'll talk about developers first, developers of operators, and then users. So some of the main abstractions that CUDA has are around sequencing lifecycle operations. So lifecycle operations like, you know, installing one of those services, upgrading a two-interversion, rolling out new config, doing a backup or a restore of a data service. And those are the abstractions that, you know, the DCS Commons SDK was using for years to create these lifecycle operations. So plans are the highest level concept. I'll go into a little bit of detail about the other ones in a few slides. But think about those things as runbooks. You know, this all started when building Apache Meso frameworks was also incredibly hard. And we found that the people that are building these things, they often have more of an operations background and weren't too familiar with distributed systems engineering. But we found abstractions that feel natural to them, that they're used to using from, you know, writing runbooks. So it's a language of subtractions that allows them to sequence those operations that feels natural to DevOps people. CUDA reduces the amount of code duplication and boilerplate between different operators. That's a good thing for many reasons. Obviously less work, less maintenance burden, less chance for bugs and security issues. It also reduces the number of controllers in a cluster that people have to maintain and control access to and upgrades. So another thing that it introduces is an extension mechanism. So this came up when we talked to a lot of users that want to put operators into the environment, but have some specifics, some specific extensions or tweaks that they have to do to an operator. Let's say it's a, you know, it's very common at a bank, for example, or a pharma company who are regulated and have specific security requirements or, you know, some other type of policy from their IT team that they have to follow to deploy an operator. And so often the only chance to do that is to actually take an operator and fork it. So, you know, that's not ideal. And so what CUDA does is, you know, this is under development. We have a process to create a flavor where essentially an organization can take a base operator that's available in the open source and they can customize it to meet their regulatory requirements or other types of policies that they have. Very common issue that we ran into. And then essentially it's, you know, it's a tool that gives ISV software vendors a way to ship the best practices for the day to operations alongside their software. Oftentimes they already have backup tools, restore tools and other tools like that. So CUDA provides an easy way to wrap those and that way make it easy for people to follow based practices. Next slide. So how does CUDA help the end users? So the folks that just want to run these operators. It ships with a plugin for a kube-cuddle, kube-cuddle-cudo. That's sort of your main interface to deploy and upgrade and manage kudo-based workloads. And it provides a standard interface. So, you know, if I want to run Kafka and Cassandra and Elastic together, I use the same command line tool. It provides a similar interface. And that really helps with this issue that I mentioned earlier, where folks have to, you know, use different APIs or different debugging tools with operators that are not built on the same foundation. So the example you see on this slide on the right here is kudo plan status, which prints how, you know, how much progress kudo made deploying a particular plan. So I can easily follow along, say, you know, which steps are completed here, which phases are completed, where is it stuck? And then, you know, I can dig in further and investigate if something went wrong. So really simplifies deploying multiple operators in a cluster that way, because I only have to run one controller for that, actually. And, you know, use kube-cuddle-cudo to deploy these packages, these YAML manifests to allow this one controller to manage multiple different types of workloads. So in a nutshell, simplified API and CLI experience. Next slide. So here's an example of what defining an operator with kudo actually looks like. It's, you know, a part of what you would build, but it's sort of the main part. This is what a plan looks like. So in this case, we're defining a deployment plan for this workload. That's the top-level item here. A plan breaks down into phases. So phases are essentially a grouping mechanism for different tasks that need to be executed. And phases get executed using a strategy. So there's a serial or parallel strategy, because different workloads need either serial or parallel. You know, there's also the option to plug in custom strategies. It's under development. Within each phase, you have steps. And steps, again, have a strategy parallel or serial. And while these are pretty simple abstractions, right, plans and phases, the steps and a strategy to go along with them, we've orchestrated some pretty complex workloads with that. So HDFS, for instance, has a pretty complicated lifecycle. Some things get deployed in parallel, some things get deployed in serial. But, you know, these pretty simple abstractions allowed us to actually do this. And so we found it to be both simple and really powerful even for advanced workloads. Within each step, you have tasks. And then tasks are, you know, very simply are templated Kubernetes manifests. So alongside with the Kudo definition, you ship a number of these templates and Kudo fills in variables that you define as part of Kudo at install time or with defaults defined by the package author. And yeah, so going really fast here, but those are the high level abstractions in Kudo. Next slide. I'll hand it over to Jared. Perfect. So what we're trying to do a little bit is thread the needle with Kudo. So just to give some comparisons of different things we're trying to do, first I'll compare to operator SDK and KubeBuilder. And so really Kudo is built on top of KubeBuilder initially. We've dropped down more into controller runtime. But we're looking at ourselves as a polymorphic controller. And what we want to do is have a single or a subset of controllers that can run multiple types of operators. And we want to configure those with various CRDs and make it extensible down the road to support more use cases. And this goes back to a really important thing that I think was brought up early on. And that is to why Kudo is that a lot of the stateful services already have a large SDK around them. They have CLIs already built around them. They have teams working on these. And so we're orienting Kudo around existing clients and tooling rather than rebuilding all of the ops functionality in Go using Go APIs. We love KubeBuilder. KubeBuilder is great. Controller SDK is great. Operator SDK is great. We want it to be compatible with those things. But we want to be a little more opinionated and take the 60% of the way if you follow those opinions and take it to 80, 90% if you follow those opinions. And we want to allow people to build operators using a set of Kubernetes primitives. We have our built-in testing harness rather than having to do a bunch of software development with these SDKs for a certain use case. Again, we're not trying to replace these toolings. We're trying to be the right choice for the right situation. If you were to look at a high-level framework for operators, that's what Kudo is intended to be. Next slide. So that naturally brings up a comparison to MetaController. And for those who aren't aware, MetaController is very much in the same space. It's a polymorphic controller for multiple types of applications that can effectively call out to webhooks to define various manifests and runs those in a set of orders. So MetaController ships with a custom set of... Sorry, I just pulled the wrong cable. MetaController ships with a certain set of controllers. We avoid that. And we just try to use Kubernetes primitives directly. And Kudo is intended to also be an operator for CRDs. So Kudo will do reference counting on those CRDs. It'll make sure those are registered. Whereas if you were to look at the old Vitess operator from MetaController, you'd have to run the NCD operator independently. And so Kudo supports this idea of depending on sets of CRDs that come from necessarily elsewhere. And that gets into the next point. Like Kudo, what we're wanting to do is support dependencies so that you can build a lot of modular operators. So for example, one of our reference implementations is Kudo, which depends... I'm sorry, Kafka, which depends on Zook, looking at more modularity between our various operators. And then Kudo is all about sequencing of complicated applications. Something that MetaController and other frameworks don't necessarily look at, but can be important with really complicated services. One of those that we've dealt with at D2IQ and on Mesos in the past has been HDFS, which requires some level of sequencing that can't just be solved by throwing a bunch of manifests at the API server all at once. And MetaController, also we're looking really at the what happens after I've deployed. So having plans for backup, restore, adding a Kafka topic, stuff like that is very important to Kudo, what happens once I actually deploy. We look at application awareness in upgrades and scale up and scale down. If you were to look at, for example, NCD, NCD requires API action to be able to add a member, remove a member. And so we want to enable developers to add building that application awareness in a high level way to their applications. Next slide, please. So looking at Helm, we love Helm. We're actually about to support Helm as a manifest format. And really the differences we're looking at with Helm and us is more that we're a framework for operators, whereas Helm is a framework for and templating system for applications. So we're looking at what happens post-deploy. And so we'll get drift detection, repair, alerting and monitoring, which we're working on. And again, those sequencing steps. We're also looking for higher level features of supportability, as well as doing some work around sandboxing, even solving that root tiller problem, which is a hard problem to solve. I know Helm 3 does not have tiller. So we're working on ways to solve that. So coming in the next version of Helm is this idea of being able to rely on Helm charts as a base and then progressively enhance Helm charts into an operator where you can start to add other plans around upgrading around backup restore other things you might want to do, but use a well-tested Helm chart as your base for that. Next slide, please. So look at the Kudo ecosystem. Like I said, we built on, we built on top of Q builder and controller runtime. We're well involved with the API machinery, CIG, inside of Kubernetes and chat with them a lot. Like I said before, we're wanting to extend upon existing Helm charts. We're also looking at Cdab bundles. We're not trying to solve, other than getting an initial idea of how to do the sequencing, the application definition problem. We want to provide the, I have an application, what do I do now solution there and solve that problem. And I think that's all I have to say on that slide. So next slide. So roadmap, we're looking at becoming a better at registering and managing CRDs so that you get an operator like experience of custom CRD plus controller. We were relying on some bug fixes that landed in Kubernetes 115 to do that. And so that development arc has begun. We're looking at incorporating other things to the community like applications CRD. Dependencies are in progress so that you can have a web of dependencies between operators and build up larger abstractions based on individual operators. As Toby mentioned, we're working on extensions right now. Package distribution is now in and that actually, that point has been fixed as of our last release. And then we plan on testing these in a large mixed workload way out in the open so that if you were to install a Kafka operator, we have vetted that across to a certain scale for a certain number of Kafka operators. And so when we say an operator is stable, what we mean is Kafka running on ZooKeeper and all those components are stable working together as a dependent set of tooling. Next slide. So why the CNCF? We've built this project out from the beginning hoping to drive a larger day to awareness inside the CNCF. We want to grow that sentiment and continue that work. So we followed open contribution from the beginning. We followed everything required. It was a large topic in the last meeting about the certain things you need to do. We tried to do that from the beginning to be a neutral home and try to promote the mission of this rather than the certain project. And what we really wanted to do is have people building more operators, build more stateful services on top of Kubernetes and provide a platform in which to do so. So all we've set forth here is a vision for that and hopefully we can bring in more and more people to help really sharpen that knife and bring a really great day to and stable service experience to Kubernetes. And again, it's really about growing the community around this project and around what we're trying to do for, I have deployed my application, I've deployed my stable service, now what do I do? Next slide, please. I guess that's it for us. So thank you very much. I'm Jared Dillon and presenting with me was Toby Knopp. And we're happy to take your questions. I don't know what time we have. We're over time, so I'll turn it back over. Sorry. I'll take questions in the CNCF Slack, I guess, and Kubernetes Slack. Cool. Thanks. That pretty much puts us over time. So we'll reschedule the Kepton presentation to the app delivery SIG and we'll see each other next time. So thanks, everyone. And thanks, everyone, for that.