 Okay, so The agenda items are also in chat. I'm posting them now and The first agenda item is an update on the operator framework recurring with Rob Hello, everyone. How's everyone doing? So I've got a quick update So things have been progressing. Well The three things I want to talk about some new Website work that we're working on The CNCF Donation status and then some updates to the capability model So I'll start with the first one. So we're Have purchased some domain names and we'll have we're planning websites for each of the community projects So the SDK Olem and operator hub, you know already has its own website But filling out some more information about that as well as an overall wrapper project or homepage That has information about things like just what is an operator? What do we think the definition of an operator is? Why are these important? Who are the types of people that are using operators that are making operators all that type of stuff? So those are underway. We're hoping to get that done Hopefully it doesn't really matter before the CNCF donation But it's just like then we'll have a web presence when we you know unveil this to the world and all that kind of stuff So I wanted to show off a really quick So if you can see my screen here awesome, so this is a really quick In progress kind of dev site for this, but you can see operator framework This is like kind of the main landing page. What's in it? You know all the different steps that you do to build an operator. What is an operator things like that? And then we're gonna keep working on these what and why pages Which are you know, what is an operator? How do I start building one? Who's building them etc and then The what and why will also have this new capability model updates in it They're not reflected here yet and you know, obviously this text is open for change And so once this is actually kind of In a better state where folks can start adding feedback and if they want to start submitting pull requests for content We'll send that out to the mailing list And you know it's in get today's you can go look at it's not a secret But it'll just be easier for everybody to have focused attention on something that we know is kind of ready for a first pass So if you have any comments on that Ideally wait, but other than that you can open them on the repo I'll pause there for questions. Anyone have any immediate Questions concerns feedback Could you put a link to the get in chat? Yes, I will do that once I stop sharing or somebody wants to find it before I get to it All right the next update was around the CNCF donation, so there's Then progress as always I feel like I have the same update here But the main point of contention was around if you haven't heard that there was a Project proposed and kind of not necessarily following the guidelines of the CNCF for a CNCF wide hub This would be to host all kinds of content like Cystic, I guess the project is Falco configurations and helm charts and Operators and other pieces of content that folks want to use to it throughout the CNCF ecosystem And so that the status of that project was kind of delaying some things upstream And so now that that is a little bit unblocked. I think we have a pretty smooth path forward there And so they're going to retool that Kind of unified hub effort to go through the proper sandboxing process and all that type of stuff And so, you know, we're open to collaborating with that if it as long as it makes sense And you know, we don't lose some of the things that are important to us like You know, we do human curation for operator hub as part of a holistic process with some automation as well We think that's important because operators as they flow through our capability model it's really important that they adhere to those guidelines and that the the Grading that they have is actually accurate So we would love to be fully automated. But you know, there is a human in loop right now. We think that's important Yeah, they committed for other project to be open because it was really bad form that that other project was quote-unquote Behind closed doors or whatever you want to call it. I commented other people's commented that they just said, you know That come on now with the CNCF. That's not how we do things Yeah, so they kind of acknowledge that Yeah, so it's open now and one of the main things that was talked about was there's the fact that it was at hub at cncf.io Was just like looked like it was more of a vote of confidence or like Signal that this thing was more mature than it was or whatever So they're going to move that off to another test domain. And so we're making all the right tweaks and the code is now public and the The charter whatever you want to call it is public and they're going to spin off a working group to kind of focus on that So yeah, not the first roll out of that, but it's you know moving in the right direction. What's up? Once your next to see me. Is that later today or I think there was one that just happened like 15 minutes ago or something That I wasn't I think they were talking about that hub issue and a few other things But we're trying to get on the agenda for a vote. We got the backing of Dave Kahn the executive director In a meeting yesterday. So I think we're all smooth sailing there. It's just the you know Do the actual vote all those mechanics Really cool. It's exciting. Yeah Yeah, definitely All right. So I've mentioned a few times about this update to the capability model And I know Daniel was the one that worked through a lot of that and so I'm going to hand it over to him So walk you through that a bunch of great updates I don't think nothing meaningfully changes with the capability model itself So don't don't be worried, but there's just a bunch of added additional info to help folks think about it Over to you Daniel. Thank you. You're muted Sorry for that So we have heard multiple times that this particular topic of the capability models needs more meat to the bone in terms of What does it actually mean to be level three four and five, right? So we've taken a step at that based on some documents that were internally grown And have gathered over time. So it's here in the dock section of the operator SDK and You have the operator capabilities document here. So Essentially what we've done is we've described what we think is the criteria for each Level we are actually moving away from phases in the diagram. We always call these levels now and we've basically Provided more insight into what do we mean? What the operator should be doing at this level in the maturity stage What are examples for operators and their features at this stage? and What are some guiding questions for you as an operator developer, of course to be asking yourself What could my is my operator level one? What does it need to do in addition what it's doing today in order to become level three four and five, right? so Won't go through all of this here but we really wanted to point out that This is something that is obviously open to collaboration, right? That's why it's on github So we're looking forward to your PRs and issues on this There's also a mock-up of how this will look like on the framework side so this will also be consumable in a nice visual form on operator framework.io and This is really not you know set in stone or us making the rules. This is really about Creating some aspirations right what your operator should ideally be capable of right and We see a lot of operators out there that that are basically just you know deploying some of the resources and then they're kind of done, right? And we specifically call this out to be level one and there's a reason why there's you know, no no for other levels And you should be gradually reaching and growing to reach those Just to give you operator more features, right? Can you make it feel like a hosted service or a managed service as something that provides your workload as a service to a customer? and Yeah, hopefully this serves us a level of as a point of inspiration and certainly up to discussions, I think Rob you had the idea of for instance adding the ability to emit serverless events as you know level four level five that could be really interesting In order to make your operator more useful in an environment where k-native is available So your operator would basically trigger k-native Facebook nodes as well based on event That it omits. So these are a couple of ideas and we're looking for more of those So feel free to raise PRs against this discuss this in the issues section challenge us on this But keep in mind we want to Get people to write more mature and more feature-rich operators and That should hopefully serve as a blueprint on how to do some of that Yeah, and I think one of the things that we've heard pretty loud and clear is that the Capabilities were they were intentionally supposed to be vague because like not everything applies to your application or whatever But hopefully this is still is less vague, but still as is agnostic to applications as it was intended to be So you'll see a bunch of sample questions here and obviously transform that in your mind to you know Something that applies to you, but hopefully this flushes it out a little bit more and like Daniel said we'd love your feedback You have a question if that's okay, but you want to take a second sure go for it Yeah, yeah, I love this. I've been thinking all along these lines for some time But this these levels seem to be my opa Can say myopic, but they seem to be isolated maybe to an operator And I realize this is about operators But I almost feel like there's a level maturity that goes outside of operators So I wonder is there room for that in this Forum or on this particular set of criteria or should this be a separate thing? I'm thinking along the lines of like operator to operator interactions or dependency type stuff or things along that lines And there are certain things like the level three I Guess I I can't imagine even calling something an operator unless it has like level three capabilities the basic install is just not I mean it's it's a Seems experimental or a proof of concept. Maybe we're not really an operator. I don't know Yeah, that's there I would even say to like anchor it to a baseline of what folks like are used to at least with the existing kubikos You know just helmets don't chart or something like that. And then yeah, just you know, there's no other touching enough for that kind of thing Yeah, maybe something maybe this is for a different forum or different You suggested having some communications on pull requests, but it'd be pretty awesome to have maybe a meeting around this Talk about that a little bit later talk about Interplay between different operators or just this document Yeah, both, but yeah, I was thinking about I Guess I would love to have more participation with those in the know around this particular particular topic and Don't always have that other than my own internal team So, okay Yeah, love to expand. Yeah. Well, yeah, let's either put it on the agenda for next time or if we've got Some free time at the end of this meeting. Let's talk about it Yeah, quickly wanted to provide the additional pointer to the best practice document that we also have in the same GitHub org but in a different repository for the Community operators actually so we had us on a block up last week as well. So That's where we gather this kind of, you know, general best practice Level information about development best practices as well as runtime behavioral best practices So maybe some of that ends up being there I just want to quickly highlight that this exists as well and could also be potentially subject to contributions You'll see this moving over to those websites over time so that you know, there aren't a bazillion places to go for these things Not that each project will have its own docs, you know The SDK will have all of its own stuff as well, but we'll try to unify it as much as possible Hello My name is David Zager and I've been seeing a lot of questions about Like writing idiomatic reconcile functions and I was wondering if that was something you guys have considered Like what reconcile functions are supposed to look like? I Haven't seen anything to point people to and it seemed it's a hard question to like answer Off off the cuff Yeah, let's open into the group. I was gonna say something I don't know that that isn't a pretty wide conversation and probably worth having a Document around as well. I and maybe that's part of what best practices has I'm not sure, but it's so dependent upon Domain of your operator and some in some regard and some things there's some easy stuff, right? Like, I don't know like always deep copy Performing something from a cache. I mean, I don't I don't know to what level you're referring to and It also I think it is language specific in The approaches The same concept supply But how you would approach it is somewhat language dependent. I don't know. Yeah, the specific questions I've seen have been specific to go based operators and To build their based projects and those kinds of things, but it it just opened up the question for me like what would the Recommendation be for an idiot like what would that even look like? I Seem like a hard question to answer because it's pretty dependent on What you're doing, but Figured I should ask So the question that I would have is When we should we maybe start with the things that we know are non idiomatic and things that you shouldn't do Because I'm wondering if things that you should do is Or that you could do that would be fine is too large of a space, but stuff that we know can cause problems Seems like a space that we have some knowledge of today I'm wondering if maybe that helps narrow down what we tell people Like not to do like these are things that would go poorly if you do them And then I guess the question is like is anybody asking this in context of like a specific use case that they have and Like they want to deal with that idiomatic versus non idiomatic like or is this just a open-ended? What does an idiomatic operator look like the I think I've seen it a handful of times in the past week or so And it's been pretty open-ended. I'm new to writing operators and I want to write it idiomatically What does that look like? But I do think you are right that saying Trying to answer that question might be more difficult than just saying here's the pretty short list of things you shouldn't do Maybe a different tactic is to like Have a curated list of like what we think are good operators at least and maybe here's like a real implementation That might not be the prettiest thing or the most perfect thing, but at least it's real like Prometheus operator There's a lot of them Yeah, I think some of the struggle might be that the advice you would give is already based on certain assumptions Right like you should never pull the API Rapidly pull would be a bad idea which means you're going to move towards probably a caching mechanism and watch and then based on that there's Approaches of how you would use those things that are somewhat idiomatic within that space But so I don't know how I don't know where the assumptions start I guess but yeah We could probably start with the assumption that people are using control or runtime at least right and that the caching mechanism should be And everything to do with the cache and how you watch and everything else should be encapsulated in the Library functions that we've written Do you think that's not a safe assumption? I think it's a good one. I don't think it captures all of all of those folks Potentially, but I think it captures the majority. I Think if you use that as the assumption and you document The best practice is based on that assumption that at least gives people a Place to go look to see like well, what is controller runtime doing? And what are the best practices that you get based on that assumption? And so people that You know are writing a Python operator for instance can at least go and look at controller runtime and be like, okay I need a cache. I need here's how they have structured their reconciler Interface all of that kind of stuff. So at least we have a starting point to say, you know If you're using control runtime Here's how it should look from a go perspective. If not, like go and look at controller runtime and it has all of those best practices kind of Built-in and Then we could expand from there if we need to from the controller runtime documentation perspective The one thing I would say like from a very high level if you're starting with controller runtime and its existing interfaces Your reconciler should basically like you're looking at getting the current state of the cluster You're looking at getting the Current state of your CR and you basically want to do a diff and apply any changes so that your Current cluster state looks like what your CR says it should look like and I realize that's like extremely high level and doesn't help you Solve some of the nitty-gritty problems But I think that heads off like a lot of the times I see questions Coming in about I really want to like people will say You know, I'm getting a Request into my reconciler, but I really want to get every single event and we always have to go back and say well No, you don't really want to see every single event And here's why and we link them over to the controller runtime documentation FAQ about why doing like event-based Diff like people are like I want to get a diff between What my CR used to look like and what it looks like now so that I can know what to change And we're like well now you don't really want to do that because you might see you might not get every single event or You events might be coalesced and that kind of stuff. So so I think the one thing I would say Is you're just looking at current state New CR state and applying the and you always want to make your cluster state look like what your CR said It should look like every single time you reconcile Yeah, I love it and then and then but then as you get into the more the details you start getting into what's the difference between a revision versus a generation And what are you paying attention to and how do you yeah, what why is one? worth watching versus another So I but I love I love the assumption of control runtime as that's the guiding function Yeah, I think where I've seen some people struggle is You know as Joe is describing you're basically Creating a state machine and implementing it. There's different ways you can structure that and some are A better fit for certain goals and certain workloads that you're managing than others Some are more natural To certain programming patterns or not and there's a lot of different choices You can make there so I think it would be interesting to see some examples I like that idea of examples and maybe even back to the meeting idea If maybe you know three different projects that felt good about their reconcile functions wanted to just showcase Hey, this is how we structured ours and show them off and do some show-and-tell and compare We might learn some interesting things about what works well for people and when I Would love to see that in the form of case study Like articles that we could host I think to that point there were some examples for a cube builder that Grew out of date fairly quickly So there's historical stuff to look at but it doesn't fully function now That there's probably room for a bunch of examples out there that are that are topical you know my Microfocus on a particular Interest of a controller that we really great in s2k. We have the the samples the pretty trusty case samples the repel The products there are very small. We could definitely prove them Like put to the test is we have a proposal for that to show how to test and all this stuff Which is common questions as well, and we have the getting started We made a blog post to try to clarify deeper as well but he Sometimes I have the feeling that the users. I don't know if they don't find the getting started Do you know because he the questions the common questions like that is one How I wash a resource, you know in the get started is there has an example But he probably the person didn't follow that So because this didn't get you know how the things works So I don't know how we can address is how we can Make it easier for them check the basic stuff information. I don't know if it just makes sense Yeah, totally, and I I hope that the dock site that Rob was shown to be getting helps with that in terms of a discoverability standpoint In my opinion, I think another aspect of it is that like the samples that you're talking about Camilla are extremely simple so like even if people do find that and Do kind of start to model their operator based on those samples They very quickly if you're writing any any sort of complex operator very quickly getting to the point where There's some complexities that you're dealing with in your operator that just don't exist in In the samples operators, so it's always it's always kind of difficult It's like okay well from the SDK developer standpoint like how much effort do we want to put into a Sample operator and how complex can we get it and then as soon as it gets complex from a sample standpoint? Like we're cool. We're going down maybe one complexity Rabbit hole when there's probably 15 or 20 different other complexities that we're not covering So how much bang for the buck would we even be getting by making a complex sample? So I think the the ideas of like having community operator developers Showcasing their reconcile Methods as Michael was kind of suggesting. I think I think that's a great idea Like I think as the more we can get the community involved in showcasing the way that they've solved the problem And maybe we can start adding links in the doc site Well-developed operators. I know we have like an awesome operators repo. I don't think we've Kept that up to date really well But maybe that would be a place that we could revisit and and see if there's a way that we can make that a little bit more usable rather than I Mean, I think I think we can show people our samples But I think we should in the same breath we could also say and if you're looking for more complex use cases that Other operator developers are actually working on like here's a set of operators in the community that that we think are good examples of more complex reconciliation Scenarios I Feel And I think if you we make the samples to complex you'll be hard for a whole started which is The biggest and common case in my point of view But what about we move the getting started for the STK Apple as well for example because it I think it who is started with that He starts by the quick start in the region and they don't have the full information Do you know and you maybe make it hard and the things the dogs will Will probably address this better as well. I agree too. I think the plan for the STK Documentation is to definitely have that getting started guide part of the quick start Section that you plan to have so like the dogs will be structured very similar between STK and O&M And you will have an introduction section with quick start And then you will have a section about you know general concepts and topics and then there will be a how-to section that describes Very particularly use cases of how to go from A to B the very prescriptive manner So I think the getting started guide would really fit well in the first section there Would could you we archive the getting started Apple and you move that document for the dogs in the STK Yeah, I think that's it for now Until we have the full dogs having someone objection to that I'd like us to think that about like I don't think that the problem that's being stated or those asked is as anything Into with like a quick start guide or or moving that to a different location. I really think that it's about people Coming and seeing things and then not knowing exactly what to do next so Once you write the quick start and you create you know now my memcaches is deploying three pods like What what's next like what I know my problem But I don't know how to go from this place where we're deploying a thing to where I need to manage my my My application, I think the problem gets into like how do I write that reconcile loop in such a way that like will be Start to walk down into like Okay, like where do I add the next thing that I need to add to this reconcile? Yeah, I think it's kind of like it's not about a quick start Yeah, no, go ahead Joe. I would just want to I think we're I think we're dialing in down to a problem Like we're trying to find a solution to a prop to a different problem And I don't know if the solution that we're working on actually solves the larger How to what's my next step after I start writing an operator? Like how do I make this good that people will not think this this is bad? I think it's probably mostly what people are asking and I think the best and that's where I think like the negative case is probably our best bet right now because I don't think we I Don't know if writing down a Exhaustive list of everything that is that you can do that would be good is going to be possible I think it'd be better to say like make sure you don't do this make sure you don't do that Make sure that you Are outputting, you know certain information In a certain format Like I think the conditions is probably something that we can be affirmative about like you should be using conditions They should look like this. This is the format for the conditions. Here's the and you know We soon will hopefully have a library for doing conditions work. You should be using the libraries for conditions Those sorts of things I think are what people are looking for And I don't know if the quick start guide being moved around is going to necessarily help folks Yeah, I'll and I agree with you Because it seems like the quick start guide looking at the operator capability level diagram that we're talking about before like the quick start guide Covered level one basically and I think what we're talking about here is like alright I figured out level one now. How do I go from level one up to level two three four five? And that's the piece that we don't really have any documentation about and it's hard to write that documentation so maybe that's what we maybe we can try to Figure out how we can get more docs about how to go from level one to level two and beyond and Maybe it's simpler from that standpoint to say like okay getting started guide covers level one If you want to go to level two like here's some you know Here's some operators that are level two That exists in the community may we do it that way or maybe we have a doc that says and here's the extra things that you can do And here's how you might Do something in your reconciled function that would make your operator level two and so on and so forth up through the different levels You cut out a bit So, you know, I was just gonna ask Zegger like that's that's more or less what he's seeing people ask for Like okay now that I got my initial thing like what do I do next is really what they're asking when they say How do I run an idiomatic operator Barbie thinks that there's something more to it than that? I get the feeling that they would I it would be nice to be able to say Answering that question is hard here is You know here's a link I can give you that gives you resources to like start answering that question for yourself You know links to other reconcile functions from other operators that we deem are good examples like Even having a list of things not to do I think are all I think this all sound good But having that link that you can share and say oh you want to know more about writing operators here you go Would be be really nice It's potentially this might be the reason why I don't see an operator unless you're level three because if you're Level one You don't even need to reconcile anything. You're just installing something one can assume So I don't even understand that to me. It doesn't make any sense, but I Wonder I love the the conversation as a whole. I wonder if there is a need for a list of have you considered In particular around reconciliation. Have you considered these things? and so just the list of links that go off to have you considered X and Y and Z and Because you won't be able to capture them all I I completely agree with that kind of that was made You know, there's a lot of assumptions that go into them, but Have you do you have a need to listen or watch other objects? Do you have you know, you could go through a list of those things, right? How often do you send events? Do you send events? Yeah, although one thing which would be awesome is blogs are awesome in this regard So if you like have a specific thing that you figured out is a great strategy for handling X wide feature, whatever Put it up on a blog. That would be awesome to see too Yeah, but I think the Part of the problem is the discoverability of that blog So it seems like being the kind of the community around operators here If you write that log, please let us know about it and maybe we need to find a way that we can highlight those things From this new website that we're adding in our Docs and where whatever it is to make it easier for people to Basically as Zager was saying like here's the link that has all the links to different blogs You know the FAQ the have you considered page. They would just kind of throw stuff there and have people Be able to go through that and find things on the road But yes, I agree blogs are good All right, any other topics folks want to chat about we've got Roughly 20 minutes left I think Daniel was requesting feedback on a get-of-issue. Yeah, it's actually linked in the agenda If you look at the agenda, there is an issue in the SDK repository about the proposal to extend the SDK scaffolding coverage to Riding cube control plugins or Or see a light Interfaces for your operator in general it interact with that operator using kind go so I was just Looking to bring this up with you and maybe get some feedback on Maybe some of you have actually attempted to do this And found there are particular pain points that is scaffolding to like the SDK could address or the testing could address Or stuff where you are a generally struggled And they would have liked to have the tool of something like the SDK So this is about shipping a Control utility with your operator to use instead of Cube CTL or as a plugin of Cube CTL, but Essentially that as a as a front end for your for your users for your human users in particular Versus the bearer API the chips if your operator Just throwing this out here and see if any people have some experience in doing this or some opinion on this I've dropped an inked in the chat. Yeah, I'm kind of trying to come to speed really quickly I would just add this comment that it wouldn't it to me would feel common to have New types defined Well, I think you builder does it well is I'll just put it that way It'd be common to create types for which you want to have some Control at the CLI level So having that as a support Request in some ways make sense But I have to read through it more detail. I mean I get how that you know having that as a separate project would be Challenging I would think and it would require duplication or I guess sub modules or something It's a burden on the operator developer Yeah Sorry, can I just clarify by what you're saying is that they make sense to have the A Cube a CLI In the same project as the operator repo to avoid I think so. Yeah. Yeah, I think so. I think that You know, there's a larger grouping of operators And then within that group or set of operators there's a subset that would want some kind of CLI plug-in in some way and in that grouping it is just a whole lot Easier burden on the operator developer, but it's in the same repository for sure And I don't know like even if you broke it up. I don't I don't know what the value would be in all likelihood all the things you would version Would be You know the lifecycle of of said projects would be identical. I can't imagine them Not being But I could be short-sighted. Not sure. I think in the common case it would be same life cycle Me it's it seems like the lifecycle would be tied to the apis That you're dealing with right so you might have your reconciler change a lot And maybe your apis would change less often potentially. I don't know Agreed completely with that. Yeah Hopefully they are hopefully not changing your api that much So it could be like like it could be that if you have your v1 api For your operator like that's not going to change really at all In that case like maybe it is okay to have a separate cli project That's outside of the operator project And all you have to do is vendor the client set for the v1 api and off you go with client go But You know It's six and one half a dozen of the other in my mind Yeah, would you do that for like as soon as you came in with maybe a A webhook would you have that in a separate project as well? I wouldn't think you would I guess You know how many of these components do you decide go into a separate repository? Versus in the same repository Yeah, that's a good point You're talking like a crd conversion webhook Yeah Yeah, because that would be kind of changing at the same rate as your api as well And yeah, my first thought would be that would probably just go in the operator repo. So That's where my thought was. Yeah. Yeah I don't mean to challenge it too much. I like what you're thinking is I agree. It's completely based on the api Yep Let's put this way. I don't see any reason why not to put it in your project directly, right? Um, I guess the only case would be like if you've got Maybe like four or five operators that all have some interactions with each other And you want to have like a single cli that kind of highs them all together Then of course, you're probably you probably have a separate project at that point But that seems like a pretty far out use case potentially Yeah, what a great example. Yeah, yeah, I agree Somebody's gonna want to do it though All right, any other topics for discussion folks want to bring up Anybody got any operator that they're working on that are cool that they want to Just give a quick shout out and 30 seconds about not working on an operator per se But I've got uh, we've worked is going to be working on operator here soon to Donate or contribute to operator hub We love it That he flux operator We flux that's that's what I'm thinking we're going with first. Yeah I'm hoping they do more than one Over at pantheon. We're doing an interesting thing. Can you guys hear me? Okay? Yep. Yeah So at pantheon, we're we built an operator So we're building a system that uses kubernetes and our internal orchestration platform together on the same nodes using container optimized os from google and We built a machine operator that provisions gke nodes Outside of the node pool And and then it does a lot of things registers it with our orchestrator Manages these nodes kind of overall machine management in this mixed workload so our internal orchestrator orchestrates system d containers on the same nodes as Docker images are running for our services that service those Customer containers So we built a machine operator that has a machine and a machine class crd And I don't think we're going to open source this because it's so specific to what we're doing But it's interesting from the perspective that we're managing gcloud objects and kind of the way a service broker would Doing all kinds of extra injection of things and c os is read only So there's a lot of extra work there that the operator does to do overlay mounts on the boxes through Executing remote scripts and things Kind of interesting No, that's super. That's interesting I'm curious. I'm curious. Have you run into issues? Tracking statuses and linking together different actions and like between things that are happening on the cloud apis And things that are happening on your box and then your source of truth ci system How do you kind of weave all that together? So, um, we have the operator integrated with the gcloud api and the kube api And then we actually had to do some modifications to our code because it needs to be able to access more than one kubernetes cluster at a time Watching things on multiple kube clusters So, um, which is interesting in itself the synchronization happens, um, when We create the box and it gets created in gcloud and then things happen We constantly check In a polling fashion on the state through the gcloud api because it's not event driven like the kubernetes api is right So, uh, there's definitely some polling that goes on there But also because these are being registered to a gke master We're able to use the The nodes api for With the operators so the operators watching the nodes api Looking for things that are not ready. And if we're not ready, then we investigate further with the gcloud api Yeah, cool Did you guys Based any of this on the cluster api work or perhaps look at it for inspiration at least The cluster api work. I'm not sure about that Ah, um, let me find i'll drop a quick link in the chat. There's a whole cluster api project that um, basically unifies a bunch of different cloud providers and And more recently bare metal providers around one api that has machine crd. There's a cluster crd and other related things Interesting Uh, it is interesting. It it as you can imagine. There's a lot of problems You have to solve and you're trying to make that sort of api Plugable where you have different providers all sort of implementing the same apis But they've been wrestling with that for like a year and a half now and i've come up with some pretty good stuff So i'll drop link in chat in just a moment Wonderful. We did look into i know that there are some operators out there that cloud things You know, uh, like kind of as a service not a service broker because it's like vm's and stuff, but Reconciling cloud state to crd's We found that a lot of them had way more than we needed So that was kind of the thing is like we're it's a very specific use case But we were unable to leverage what was already out there Due to those things being exponentially more complex for managing all your cloud resources more like a service broker And less like a vm manager so Yeah, that makes sense. Yeah, but i'd love to see what you have. Um, you know, we already have code It's interesting. Um, hopefully if it happens at all systems go in berlin this year we'll be presenting on System d and kubernetes mixed workloads on single container optimized osvm's Um, hoping that that conference continues Scheduled at this time. It's unknown Hey, daniel one question to you. Um, how does your operator get the credentials to talk to gcp? How do you give it the service account? So, um, well we use terraform to manage a large portion of our infrastructure So those are created by terraform and then imported in through secrets But um, the interesting piece here is if we had our operator running in the clusters that we're registering the nodes to Those nodes would have access to delete themselves Or if one of those nodes became compromised we run customer code on these Uh in the system decontainers Um, we would end up with a situation if they break out they'd be able to gain access to a service account that could delete the entire platform Um, so what we have is we have a gke cluster called our provision cluster It has one instance of the operator per production gke cluster with these mixed nodes and that lives in isolation So our secrets are isolated in a different project and google cloud In a different cluster than these nodes are being registered to and we're reaching across To get to the gke clusters that are control clusters for these nodes So a lot of you are on multiple times and you give it different configuration every for every instance So that was something we had to figure out is that um, is the different configurations because we're talking to multiple control clusters From one provision cluster, right? So we need to be able to have those credentials. Um, and in the end We ended up having to do some work around just like making a utility class that allowed us to instantiate a kube client Outside of the operator sdk and the runtime That allowed us to interact with a cluster. That wasn't the primary cluster. We were interacting with right Um, so it was there's a lot of complexity in the code there Um, I always felt like it would be interesting if we could have end kube clients For one operator the monitor things and like one operator that could manipulate resources on of multiple other clusters Um, which has been somewhat difficult, right to do with the current operator sdk Yeah, it comes down to like Do you make your life very easy and say an operator instance has the System level configuration and if you want to have multiple Different cloud accounts managed by an operator you install the operator multiple times or you give it that credential every time you use one of its apis It's something we are actually debating internally Whether or not it's it's a good and valid pattern to have the same operator installed multiple times in the same cluster or not And it sounds like your use case would actually advocate for that to be possible because there's no other Convenient way to give it some sort of pre configuration that goes with every request Very true. So that's in the end the solution So we still need to be able to listen for our crd on the provision cluster And have it provision The gke node and the demon sets that run on that gke node um Correctly in the in the foreign cluster and for each Like control cluster that these nodes are registering to we have an instance of the operator So we actually have each instance of the operator manages one control cluster from the provision cluster with n control clusters and n operators And then they still need the two sets of credentials because the node CRD gets added to the provision cluster So we have to monitor for those but then we have to add the node and register it to the the control cluster Which we have to wait for it to have the ready state and monitor its state in that cluster So we're actually watching resources in two different clusters for each instance of this operator And we're running many instances of it one for each control cluster So it's become quite complex unfortunately, we were shooting for simplicity of course in the beginning and I'm gonna hit some roadblocks after we had already committed So yeah Yeah, I dropped the link in the chat to a control runtime issue or somebody's requesting kind of related behavior For being able to watch resources in another cluster generally So it might be interesting to check that out and add your feedback or your use cases over there Yes, I just pulled it up and I also pulled up the cluster dash api Thanks for talking to me about it. I don't think we've talked with anyone about this at all actually So we just yeah And that's a very active group. Um, they have regular meetings on zoom and there's a slack channel in the community slack Yeah, that's a really good group that you could engage with Yeah, yeah, excellent. Well, I appreciate it And I'll throw out that if you're a talk does get cancelled or moved to virtual or something with the The conference we'd love to have you have a few minutes here to chat about that If you want to lift the same presentation you put together for that you want to use for this or something like that That would be awesome. Yeah Yeah, we have two hopefully for all systems go coming out of pantheon one about Fun with bind mounts and read only operating systems Um, which is crazy in itself um, and then um The mixed workloads with cube and system d so um on on a single gke node so um Hopefully that conference doesn't get cancelled. It's my favorite and I would be very sad Yeah, yeah, well fingers crossed on that one and so we would love to have that if you want to get a Added on to the agenda. Um, we can definitely make that happen or if you want to commit now I can put it on there So that's when the conference Hopefully that's why I'm hopeful it won't be cancelled far enough away. Yeah, hopefully Then your bind mount craziness and read on Bringing me back to my container linux days from core os where we did a bunch of really cool stuff and sim linking between User and in other locations that were not read only so you could blow away the sim links stuff was very cool Yeah, there's a lot of you wants to being able to do the system d containers on a read only operating system So and like being able to add containers because it's real You know at z system d system is read only so we have to do overlay bind mounts for that and then there's um We use run c with system d for oci compliance for oci specs and um and uh And there's nuance there with how we're bind mounting into our customer We like Move password d so that they look like the only user inside their container and we do a lot of custom work there So I know that's off topic for this call, but Yeah, that's the bind that sounds very cool. Yeah Cool. Thank you guys for talking with me about the machine operator Sure thing. Yeah, and let us know in september and we'll uh, we'll follow up then Yeah, great I think that brings us to the end of the call here unless anybody has any closing comments All right. Thank you all for joining stay safe out there and uh, we'll see you next month