 Hello, folks in the chat is the attendee list link. So if you could sign into that, that would be great. And we'll give everybody a few minutes to join. Let me just see if I can pop into here. We have a, but you might thought that you were in here as well, so that's great. If people have items for the agenda, just pop them into the chat and I will add them to this list. But I thought we'd give everybody a couple more minutes to sign, to log in and then ask Vadim to give an update on the latest release and get any feedback people have. And then maybe talk a little bit about what we're doing in the docs if Jamie's has joined us by then. Is there anything else that people have for the agenda today? F cost incoming updates, I like that one. Thank you, Timothy. All right. About one minute after the hour, I'm gonna stop sharing and turn Vadim. Are you there? Yeah. All right, sharing screen and talk about the latest release and then we'll motor on from there. Okay. So last weekend we released another version of 4.7. It is marked as rejected because our test suit has become much stricter and we didn't cherry pick one of the machine config demon bugs, but you can still upgrade to it. It's perfectly fine. We didn't risk and decided not to pull in latest changes in MCD. These have gone into the latest nightlies. That means the vSphere hardware version block should now be fixed. And some more host name changes fixes have been merged in. So that would need some more additional testing so that we could declare it stable and release it maybe this weekend or hopefully the weekend following up if the test pass correctly. In other news, Kubernetes 121 rebases merged into 4.8. So, and so did the Fedora Core S34 next branch. So eventually in a couple of weeks maybe we'll start cutting out the release candidates and get some more additional testing of those so that we could declare them stable. On the problem front, the podman copy bug has been fixed but we're still getting errors when we're using Fedora Core S, the latest stable as a base image in installs because of the problems with extended attributes or something. I will be chasing Colin who's familiar with the situation to get some updates on this. If I remember correctly, Fedora 34 rebates doesn't have this problem so we'll get it for free once we get updates to F34. And some more good news. We have a new member in our team helping us with OPD specifically. Please welcome Mustafa who has just joined Redhead and helping with a lot of things related to community and stability. I think the official name is something like stability which is exactly what we need on OPD front. And I think that's pretty much all I've got from my side. Oh, is that, that's Mustafa? If you wanna wave and turn your camera on, just say hi and introduce yourself. That would be great. Yeah, sorry, I just find it more comfortable to type while it's off, yeah. Hi, I'm Mustafa, I'm at Business in Berlin. I'm the first member in the new team. I'm helping here and also in the storage team. So yeah, I'm glad to be in Redhead and hope I can help. And did I miss here, Vadim? I think I was on a call last week in internal one where there's a number of folks coming on board similar to Mustafa to help and they're going to be kind of interning with the OKD working group. So you'll see, we're getting some additional resources and one of the ways we're training them up is to bring them in via the OKD stuff. So that's actually really good news and Mustafa, you're the first face to show up. But I heard the upwards of 20 new hires and so this is gonna be gonna be interesting to incorporate them and have them understand our CI CD workflows and help us get some stability there. And then they're gonna be using that, I think on the OCP itself, on the OCP nightlies as well. So there'll be some, a lot of new faces coming soon. So that's making me very happy and Vadim even happier, I am sure. So. Vadim might get a weekend off. Yeah, maybe, yeah. That would be wonderful. That means no OKD releases, so we'll see how it goes. We'll get there, we'll get there. But it is a really good sign that we're getting some stuff, resources there. And yeah, it'll be fun. So welcome very much Mustafa, you're a very welcome face. So yeah, so that's there. Timothy, I'm gonna put you up to do the update on, unless people have feedback already on this latest release or questions, maybe I should allow that. If somebody had a question, I heard that. Otherwise we'll move on to Timothy update, that's okay. All right, Timothy, take it away. I think you'd have to unmute yourself. Yes, all right. Okay, so a quick update on the progress in Fedora Chorus and what's coming next. So something we've already mentioned is we are going to rebase to Fedora 34. So of course we'll wait until Fedora 34 is released for that to happen. So the issue tracking that, I'm just pasting that in the chat. And yes, essentially I think this is going to happen rather soon-ish after the release. So the release, I think it's at the end of the month or something. And yeah, the rebase should happen a little bit after that. And the second item I have is that we are adding open shift machine config support to butane. So we renamed the tool that was called FCCT, the Fedora Chorus config transpiler. We've renamed it to butane. And we've also added support directly into butane, yes. To generate machine configs. So that's, I'm pasting the links right now. Yeah, so the idea behind the name is that it's fuel for ignition because essentially you convert your butane configs to ignition configs. That's the fuel you put in your engine to ignite it. Yeah, so we are updating our docs already in Fedora Chorus to mention butane instead of FCCT. That's going to happen soon. So what's important here beyond the name change is that we had it in butane a way to generate machine configs directly. So snippets of machine configs, which should help you if you want to do some very, some complex root device or some complex look setup or things like that in OKG and that should be helpful. So this is, I think it's planned for 4.8 so that probably won't work until 4.8 is released. All right, that's a good item. Then we have two upcoming changes that are much further along the road, much, that will happen much later, but we want to raise attention right now about them to make sure that people can plan for it and do the necessary changes if needed. So the first one, which is planned for about two months approximately from now is the C Group V2 change. So we decided to split that from the Federa 34 Rebase to avoid having too many things and changing at once. So in approximately two months, we'll switch Federa Chorus from C Groups V1 to C Groups by default. So existing node won't be updated. They will keep using C Groups V1, but new provision nodes will start using C Groups V2. So here's the links here. Do we have to expect problems with this switch? That I don't exactly know. It depends on Hockety. Oh, you managed that exactly. I think Hockety will probably stick to C Groups V1 until testing has been done, and that's probably up to Valium to discuss that. Yeah, I can comment on that. We'll default to C Groups V1. However, the only switch is to remove a particular line in the generated machine config. We tested this in CI, and the only tests which failed were related to builds. I didn't look into those for quite some time, but since Kubernetes 120, you should have C Groups V2 support in most of the pieces there. However, it's just not yet stable, so this is why we don't default to it. So this Federa Chorus change is important, but it doesn't affect us right now. Okay, thank you. Essentially everything in the stack is not supported. That's why we made the switch so late because we were waiting for all the pieces to be fully supported for C Groups V2. The unknown here is that I'm not really familiar with regarding how the update is going to happen, because usually I start C Groups V2 directly, but I don't know how to update from V1 to V2, so that's maybe a tricky part for KD. To migrate the cluster from V1 to V2 because maybe you will have to recreate all your containers. I'm not so sure about that. You definitely have to recreate the containers because someone ran into an issue, not in Federa Chorus, but in OpenSUSE and MicroOS where I pushed a similar change to switch us to C Groups V2 and OpenSUSE and MicroOS, and somebody had a persistent container instance that was just being invoked, and on reboot it stopped working because the C Group of S stuff that it was pulling in for its settings were just no longer there and not available, and so if you were switching from C Group V1 to C Group V2, you need all instances that are saved on disk to be destroyed and recreated, otherwise they will not work. That seems to be a hard requirement. Sounds like it rakes things, or isn't it? Well, so normally a Kubernetes system is supposed to handle this transition itself. This is where when you don't have an orchestrator doing that migration for you, if you're doing this by hand, you can technically edit the existing running container instance and change all the settings, but that's such a pain in the butt that it's easier to tell people who are running podman or Docker or whatever on a machine directly to just blow and re-instantiate your containers because that should be fine. In a Kubernetes context and an OKD context, machine config operator should tell the rest of the operators managing the OKD cluster that container instances pods need to be reconfigured, but because Kubernetes instantiates them from pod definitions rather than leveraging the base container run times own ability to save container instance settings, we may not be hit by this issue, but it is still something to be wary of if you're instantiating containers directly on the machine. So from a general F cause perspective, don't expect your instances to survive the transition, but from an OKD perspective, I don't know of a reason why we would be in trouble with the transition. Vadim, do you have any reason to believe we would have a problem with that transition? No, we shouldn't because the change from C groups C1 to V2 is effectively kernel arguments change, meaning that requires a reboot and the old containers would just be destroyed or rather their new cubelets would not be able to pick them up because they don't look as it expects them to be. So it would recreate them automatically. But if you're running some podman containers there, those would be dead. But if you update to a version of F cause with this change we have, there was a lesson, a reboot. I think updating F cause always requires a reboot anyway. Yeah, we've always required. So my guess here is that because of the smartness or dumbness, depending on your point of view of how this actually works with Kubernetes, it should actually be transparent to okay users. Thanks for the context. So if I find my notes again, yeah, so that's the group C2. So that's coming in approximately two months. So yeah, after we we based on Fedora 34. All right. And the last item here that I have is Fedora, Fedora Chorus count me change. So we're changing. We're hiding a way to count count the number of Fedora Chorus notes and how long they live. It's something that we're rolling out across all our PM3 based variants of Fedora. So it's not just Fedora Chorus, it's SilverBoo2 and Fedora IoT. And it's not enabled yet in Fedora Chorus, but it will be available enabled in Fedora 34 for SilverBoo and IoT. So yeah, we want to we wanted to give a longer heads up for Fedora Chorus users. So this one is going to be available at least not before three months. So in stable, so you have three months to decide if you look at the change in the site if you want to to be counted or not. So the issue and the notes are here. Essentially it's a very, very it's a highly privacy preserving. It doesn't store anything. It doesn't send anything. It just basically says to the Fedora servers, hey, Fedora Chorus node and I've approximately lived this amount of time and it's very approximate. You basically cannot derive anything from that. So yeah, it's a really, really small counting method. Really, and we really care about privacy here. So, but yeah, if you feel uncomfortable about that, you are all and you can easily disable that it's just just a short masking a simple unit and that's it. You can do that ahead of time too. So if you do that right now in your nodes in a machine config you will never be counted. So yeah, that's mostly it. All the details there how to disable that it's all written in links in documentation that is linked into the issue I've just linked and we'll make public posts on the chorus menu list. We have a blog post on Fedora magazine coming up too. So this is going to be very public too. And all right and that's mostly it for me for incoming Fedora first changes. So just to be clear that the telemetry doesn't tell you that they're running the Fedora CoroS in OKD and any that's not going to be any visibility of what the source of the Fedora CoroS is. Yeah, it's there's nothing that would tell us that you are running in OKD. I don't think so. Is there? Yeah, sorry. It's just a count of how many it doesn't even identify the source company. It's just very just a count of how many Fedora CoroS instances there are out there in the universe. Yes, we absolutely do not spend anything about the node as part of the request. We essentially make only one HTTP get request. And what's it's the only information that we're sending with this request is that we are Fedora CoroS system using our PMO3 and that we've dipped approximately about one month, one week four months or more than that and the Fedora version underneath so if it's Fedora 30 if we're better Fedora 34 but that's about it. We don't send host names. We don't send high IP address but when you make an HTTP request the server has your IP address but we don't send the actual node IP address we don't send any other information. Looks good. That'll be helpful definitely for everybody who's in the community to have some visibility of that. That would be great. Any other questions for Timothy? Other than I think maybe a blog post I think might on the C Groups V2 stuff just so that people don't freak out when they hear it because it doesn't sound like it's as big of a deal as it sounds scarier than it is I think and so maybe a short blog post about that because when people read that they may freak a little bit but it was great to if we can test that early it'll be great and get some feedback on that so then inform the OCP engineers as well if something goes awry that's a great use. Yeah the C Groups V2 change is live right now in Federal Chorus Next. So if you want to try that out in your notes you can do that. Okay. So I'm going to switch topics just to docs for a little bit and people don't have questions for Timothy just because I wanted to sync up with Bruce around his the taxonomy work that he was doing that was just a gist file the last time I looked from the docs working group last week Bruce developed a nice a very nice taxonomy I don't know if you want to share that again did you get any further with that in terms of because blogging I committed to creating a blog out of that or are you in the throws of exams? Well I am in the throws but I did add a few more words at the top you wanted about 200 words and I don't think I quite got there but that's okay. You want to just share that with folks and maybe walk through it quickly and then I'll explain what we're going to try and do with this in terms of creating a primer on or checklist for people who are doing installs and deploys and let me just find it I did think I saw Jamie on the call so I'm going to put Jamie on the spot next so get ready okay this should be correct yeah okay so there's the that should get you to it okay do you want me to share the screen and then make my and then you can just explain what it is that we're doing and I'll just share screen it's just a text document so it's meant to be read with word wrap on otherwise as you see the things go sort of off into the sunset on the right hand side there there we go and I don't know if you can do that turn on word wrap from the wrong yeah okay so you should be able to see the screen yeah so basically the idea was just that a lot of people produce some excellent guides I'm probably a little bit partial to Craig's because that was the one that I was first most successful in getting running but often you need to make modifications to them based on your local situation and so I was you know basically trying to pull out in a visible way all of the choices that you need to think about if you were setting up a UPI distribution because you know sometimes the choices are deeply hidden in the the setup stuff so for instance going with with Charles you pretty much have to get in and actually look at some of the scripts to figure out what he's doing and in many cases I think the choices that you make aren't critical to get the end result running but you do have to know what you're doing you know once you get off script and it is helpful to know that I think because it takes a lot of work to put one of these you know blogs or whatever together and they get outdated almost immediately and that's just you know the way technology goes I don't think there's anything you do about that part but I think that if you're sort of aware of the underlying things that you have to do that that's helpful and it also might help you pick out the particular guide that you're going to try and follow so I was just laying out the choices and I think it's like I haven't really done anything in one way because I think it's obvious to most people that have been through it so I don't take much you know credit for just writing down the obvious and I sort of first pulled out just to get it out of the way you know the various you know cloud providers where you can use the infrastructure provided you know and because in theory I guess that should be relatively automatic and then went on to the the UPI and the so that's about it the I think I've got the choices that are in anybody's blogs that I've been familiar with so if there's any missing then just feel free to add them in the one thing that I didn't see much discussion on that I like to do personally is dealing with certificates because the browsers are getting much more hostile these days to self-sign certificates and since you can get a an actual domain for like 20 bucks for a .ca I think it's maybe 25 for a .com I mean it's fantastically cheap then it's not that hard to get real certificates and because you have to have a wildcard certificate you know like start on apps whatever then the let's encrypt and so on is a bit more on who you are to give you that so that's why that sort of interacts with having a real domain but again if you're going to have a hobby set up that's not going out on the internet then you don't really care about that so much until your browser doesn't let you connect to it at all which I can see coming so anyway so that's about it in more words I probably should have said so that's okay so what we're going to do is take that and turn it into sort of first a blog post that other people can add on to so because the blog post you can make a pull request against and merge in other suggestions and just using markup so I was going to take that next step now hopefully before next week and turn that into something but if people had other things that they thought should be on there just once it's posted or prior to it please feel free to lay it on because what we're thinking is this will end up not being a blog post but as we develop it and add maybe links to little blurbs for each of those items and not full on links to the total resources but to the appropriate snippet that it will be one of the pages in the guidance that Jamie and Mike McEwen have been working on so and just another page in there but I thought we would try and develop it out sort of in a collaborative way by getting people to give their tips on where the best write up was for each of these little things to get started because I think the approach that he has here with this it's nice because it's often you start down an install path and you don't think about everything else that's coming just hit the first install a couple of steps and you forget that you have all of these things in front of you so this is kind of yeah I like that, okay the pilot's pre-flight check so if you don't know what all of these terms are don't start yeah so maybe that's it so that's so thank you for doing that we walked through it last week and I just thought if I turn that now into a blog post and post it to the working group and ask people to to annotate it basically with links that's kind of where we're going so cool and he are you a pilot Bruce is that true yeah I am but only private license it's been a while it's been a while since I've flown to Albuquerque over the Rockies nice so that's where we're at with the checklist primer thing that Bruce has so kindly created Jamie you want to take over and talk a little bit about where you're at and what the update is on the docs coming out of the deployment and testing workshop effort so the last little bit I started on the weekend and probably will finish in the next day or two because basically taking all of the dependencies that are necessary that are in Chara's article and then putting them into the stub page he's got quite a lengthy article so it's taking me some time to get all the dependencies out once that's done we are good to go all of the other stub pages have been filled in and then we can just merge it in do a pull request or however we want to do it to get it merged into the okd repo awesome that's really that will be a really big push I think and a great outcome from that working group meeting and what we're doing the other thing is next week we have a docs meeting scheduled I am not available to run it and I'm wondering if we want to take a week off from the doc stuff or if someone else could host it and that would have to be a red header I think take a week off I'd like I don't really actually get the time off I'm in another session planning session that's just that if we want to have it I'm happy to host it okay Mike maybe we will because then you guys can work through if there's anything with the migration that didn't work or whatever we'll still have it and I will give you the power to create the blue jeans or I'll just turn the blue jeans on hit the record button for the other meeting if that's great you can either hit me up with the moderator code or something Diane we can coordinate if wants to take the week off apparently I can see Joseph wants to take a week off I was just going to say Joseph isn't getting the week off because I'm going to thank him out loud in front of everybody for something he's about to do and he is going to be our okd person on the OpenShift Commons he's going to be recording at KubeCon on May 4th he's going to be giving a talk that he's going to be recording on Monday this coming week and so we'll get that up and there and he's going to be our voice and face there for okd and also okd going to arrow so that's going to be yeah we'll get I got one little thing so we're going to do that and if you haven't registered for the KubeCon gathering it's free so just join up there and the only other good news that I have is that they finally are printing some okd t-shirts for me and I will have them available at KubeCon itself so if you visit me in the KubeCon okd booth whatever that is that they've set up for me then I will get you all links to get a t-shirt and self-order and pick the size and everything it took a bit of doing and unfortunately Joseph the version of the okd panda that you so beautifully recreated and flattened and made hipster and cool didn't meet brand standards so they made another panda but it's still a panda and so that's the good news but I really like Joseph's old panda too I expect to get a shirt from you again some day before it was completely okay it's a real life panda but I'm like yeah some day so yeah so that's the good news and I'm actually really looking forward to KubeCon and Joseph's talk and there's a whole bunch of other great talks too that a lot of them I just recorded an awesome one on Podman and all the related tools by some folks at IBM and WorkPay there's just some really good content there but KubeCon itself is going to be very interesting so if you're there come visit me because I'll be the one person hanging out in the okd booth for Community Central for a while if anyone else wants to hang out there that's a red-hatter I can get you in there it might be bitter at times it's not we put a lot of sugar into the lemonade there you go so cool yeah so that's what I thought I had for the agenda the other thing that we've been working on the docs or thinking about is really an overall docs strategy and to that end I think Amy's going to try and join and help us with that because she's done a lot of amazing work with the OpenStack community and so I'm going to try and use some of her expertise in maybe you know working with Jamie and Bruce and other folks and Mike all with us on the every other week so Amy might have jumped off because she had another call at the half hour but she'll be there next week too to tap so that leaves us with 30 minutes left or almost are there other things Vadim or anyone else that you want to bring up or any issues that are burning from Kubernetes Slack channel no, not really my only worry is that we would need a lot of testing on the vSphere bug and the fix we have in Nightly's and since it's pretty elusive it's rather hard to prove that it has gone away other than that I think we're in the clear we also revived our relationship with operator framework folks and next week we'll start working on the OkD specific catalog I'm not yet aware the level of my involvement there hopefully things will just start working but they might take quite a lot of time which means we would need some community testing other than that I think very good this vSphere bug you mean the hardware version yes and other open shift is the unrelated stuff because the fix is to effectively disable the offloading I think I'm not sure what would be the implication of that so hopefully we won't have any significant regression you have operator stuff I just put a link to folks caught up that there was a conversation about Kubert and tecton pipelines operator that were added to the wish list a couple days ago and Christian made an update so take a look at that and because of the vSphere thing maybe we could invite the people that reported the bug if it works with the nightly it would be great to get a positive result from them actually this was the bug was very poorly managed and that's my fault honestly it originates it from the GCP UPI issue which has been fixed by opening ports to not reproduce with an IPI and then since it had similar symptoms folks from vSphere have reported there as well it took me quite a while to actually figure out that it's in fact a different issue so on the vSphere testing is there an ask for I forgot to spell vSphere right is there an ask for this is there a link to the bug and what to test for I have a question because the F-Course people released new F-Course versions with hardware version 13 and before as this bug was reported they released F-Course with version 15 which in fact is a problem but currently F-Course is released with hardware version 13 it won't be a problem at all or shouldn't be a problem at all with new OKD installations so we should I don't know how to reproduce it with new installations users can change the hardware version on their own and unfortunately they are very inclined to do so because vSphere CSI requires something like 15 or maybe even 16 that leads to people installing things successfully then changing version trying to apply CSI drivers and network suddenly goes down which is very complicated it complicates the debugging a lot but apparently it's very easy to reproduce by just changing the version and installing things with OpenShift as the end going to write up a little vSphere testing protocol for this real quick it's basically just a paragraph and some links yes unfortunately it's not one that I can test because I'm stuck at hardware level 13 until we actually upgrade physical hardware so this isn't something I can even test and you were the one I was going to just ask if you could I would love to very complicated bug but does it if this fixed us solve things for vSphere does that mean that also of course people will switch to out of version 15 again is it related somehow to this bug no I think it's fair for initial Fedora CoreOS ISOs to stick to the lowest possible version so that folks on 6.5 could use it and users can upgrade the version on their own if they have to but it's definitely not related to the current issue actually just a great coincidence that we're starting with lowest version so is there anyone on this call that can test this I have never seen the problem I we are just stuck on OVN Kubernetes and we were not brave enough to change the network plugin in our production cluster to OpenShift STN 4.6 but if there is a version 4.7 with OpenShift STN working on vSphere we will try to migrate a cluster 4.5 cluster to 4.6 with OpenShift STN and this fix if it's available is this something Vadim that we could maybe get you to put a short email out to the group mailing list just to get people the awareness of this to the other vSphere stuff and just if you really want feedback on this bug I'm just trying to think of a little out of the box here rather than just asking in the working group because there are about 300 people lurking on that OKD working group mailing list right that's one of the options I'm just thinking that it won't be very easy to gather feedback on that I think I'll start with poking people at OpenShift Dev select channel because the feedback loop would be shorter and if we don't get anything we'll start with the mailing list at least maybe you could post a link to the issue and ask people to document it I think it's worth trying and then just ask them for feedback on OpenShift Dev if that's where you want it just come over there so yeah you sort of sound like you have two maybe volunteers Bruce and Joseph but it probably takes more than just the two of us two folks to do some real actual testing not that you're not worthy is there anything else that we should be raising up here anything else anyone is working on that's exciting I have a question I raced it up on the dev and I didn't get a response so I'm playing with ICO 1.9.2 and one of the things that I seem to have found is that it doesn't like config deployment configs DCs has anybody ran into that I had to convert a bunch of stuff to just deployments just wondering if anybody's seen anything like that with this to you in particular I can tell you John that we were switching from deployment configs to deployments with our customers just to be more to Kubernetes and I didn't understand if you want to have an experience report of that but it's a matter of a few seconds from the deployment config to deployment and the difference is that you have to restart on every push of your images you have to restart your containers but you can mitigate that with if you use GitOps tools like AugustaD we use AugustaD and AugustaD can help you a lot in restarting deployments if new images are available we use Jenkins for a lot of our deployment stuff because I probably have about 150 deployment configs and going through and changing those is just going to be a pain I was just wondering if somebody had any experience with it you need something that restarts your containers and if you use I can heavily propose to use AugustaD or GitOps tool like that that changes your image tag and AugustaD will roll out your new deployment and restart your containers with that All righty like I said I fairly give it a whirl because the Istio group didn't have anything to say about it either so I don't know if deployment configs are just not supported in Istio which I would find odd but maybe not since it's not Kubernetes specific so anyway that's all I got I think a default since a few OpenShift or OKD versions is to use deployments not deployment configs I read something about on that in change logs that the new default is in OpenShift are deployments and not deployment configs Yeah we use a bunch of templates so but yeah it's just one of those things about moving to a new system and trying to use new technology you have to re-reduce some of our work but that's probably a good thing in general Thank you man Cool Anybody else's problems we can solve in the next 10 minutes otherwise I might give you all 10 minutes back and we'll get Mike to use that up to run next meeting I would propose to block all 4.7 releases until the next release if it works and has solved all problems that we were suffering since a few releases like this harder version thing the problem with the routers and so on I would really like to propose to block all intermediate releases if that is possible so users that want to upgrade from 4.6 get a fine 4.7 release that works out of the box without any surprises Actually if we've got a few more minutes I could comment on that as well because the like I just saw this morning somebody was looking for somebody was asking for a long-term support version which Kristen properly shot down because this is self-supported anyway but it could be what the person was actually trying to get at that a stable stream is really stable as opposed to a testing stream or a next stream or something like that and from Vadim's comment I could understand that it would be complicated to set up these other streams but I could interpret what Joseph was just asking for as a stable stream and so anybody that's wanting to actually use this for real as opposed to just a testing lab doesn't want things to have the potential of breaking whenever they're doing a new version because you can lose like two weeks and reinstall and redeploy everything from source which is a bit inconvenient From my perspective what we have currently is a fast channel and not a stable channel we always in every version of OkD we have a point that we have a stable version but since 4.6 we have lots of not so stable versions I would expect more in a fast channel I think a stable channel was really community approved stable tag would be a great idea When you say stable do you mean zero bugs? Do you mean unupgradable? Not zero bugs but something like the hardest problems that block you completely are solved like we have tests to ensure that we know that vSphere and AWS are uninstallable and upgradable it doesn't mean your particular setup will install but there is no viable way how we can do that I can understand a custom version plus OpenShift as the end but that would explode our metrics by a million OCP has nightlies and we only block on two jobs the others are informing mostly because we won't be able to keep up with the level of testing we need to Maybe an idea it's an idea that if you get the feedback from enough people that the version is absolutely fine that we market a stable that means we won't have a stable version in months in maybe years maybe I mean I'm fine with that the problem is we have to support the existing stable version which we have declared previously stable I mean somebody has to do this work I am not signing up this ever but if we have a rolling this through that's absolutely okay but maybe it makes sense if enough people say in this rolling this through we have one channel we mark some versions as more stable than the others and they go over to a different channel which gets not updates every two weeks we're getting to the point of talking about now we're gonna have checkpoints and stable release tags and like yeah no if we do that like first of all Fedor Corios people will be very angry with us if we start doing that because they don't they are trying very hard to kill that idea flat on the ground and the OCP people will be angry at us because now we are going to create a problem in which we are either holding back things but they are themselves releasing to production which creates all kinds of weird questions about what is upstream side stream downstream what's the release cadence what's going on here like it is not practical for us to move like that and it is also not practical for us to suggest that we can move slower than OCP itself because at the end of the day there are the folks making like the majority of the code that makes the OKD thing if there is a problem with your use case or whatever and you don't have a way to make sure it stays fixed then that is a defect in the testing apparatus and not a defect in the code base because fixing it once and then having a test to make sure it doesn't break again is the whole effing point I don't know if it's if my wish is something feasible but maybe all people that use OKD in production maybe you understand me that it would be great to have some kind of tag that assures you that yeah lots of people say this version is absolutely usable because I had a few situations with my team that we were maybe you can follow my chat conversations that we had situations that were great with OKD but also we had upgrades that were absolutely mess a few ones I can perfectly understand the goal to have the stable release and we have the stable release the question is how do we achieve this in a feasible way so that we wouldn't have to wait for somebody on PTOs to get us the final approval to actually release it absolutely it sets a question things that would mitigate it that wouldn't be that terrible I know that sometimes something that is totally silly can take your entire cluster down in a non-recoverable way so like I lost my test cluster because of the non base64 encoded password authentication part which had worked up until then certainly it would have been really because it had worked through many upgrades it would have been nice if the release notes had said oh hey by the way this is going to kill your system if you don't fix this first because once the system was going the MCO would not pick up the new correct pull secret it just kept perseverating and like on Slack when I put that up Vadim said try this I tried that it didn't work John said oh yeah I had to rebuild my system and okay so fine that's a tricky part world of hurt this is a tricky part we effectively cannot automatically test it because all our CI systems have to report back so that we could collect meaningful stats this is a very way of going back okay like you know I haven't found any approved way of regressing an upgrade the previous version so that you can try again restart you can restore from the backup so it's the only way to roll back the upgrade you can cd backup or cd backup yeah okay this d-backup does not fix if any host OS upgrades are a problem the host apps OS is reverted by our PMOS 3 rollback commands just to correctly if your system has been working perfectly when you do a net city backup MCD would notice that you actually should have put it in the previous deployment and do that for you but that assumes that MCD can actually do deployments but at some point you probably might have to help it and do things manually how much old OS versions are sorry you were you were well I believe the backups are manual right yes that's a feature which would fix right many other systems do an auto backup before they upgrade that's the tricky part we only have a cluster so when we can make a snapshot but there is no place to store it because all you have is a cluster it has to be stored somewhere else you can save it on the disk space and if your node goes down and I don't know we corrupt the file system then this backup is unrecoverable the backup is stored outside of the cluster if your cluster burns down because of a fire at some point you're not going to be able to recover but in a less catastrophic thing then that would help it just gives a full sense of security that yay we made the backup don't worry but actually it's stored down there so it has to be scripted probably we should stress that more in the dogs that make backups it's a critical part if you want to roll back before this busy hands and we keep going down this rabbit hole I just have to ask something do we actually have a way to show people how many people are on a particular version of okd or something I vaguely recall that we have telemetry do we have some graph or something that we could put on a web page somewhere that's different types of things are on the versions it's all private yeah it's all private background stuff also the problem is that only folks who are using red heads full secret are signed in so this number which is around 500 now doesn't tell us anything it doesn't tell us how many are outside do the count mean from fedora might give us some ballpark number but again it's very hard to say we should we ask to figure out a way to leverage count me to say this is an okd fcause node and count those and whatnot like I don't want something something if the actual ask is what it sounds like is that somebody wants to know how many people are on a particular snapshot version whatever on a different type of infrastructure that should be a question we can answer we can answer that using telemetry but not with counting time to break time out so I would like to move this into either the kubernetes chat channel or onto the mailing list I think it's good content but I do have to kill you all and stop this this meeting now so I can jump into another one because I can see the next ground if folks are joining but you know it's definitely you know the more testing we can do the better I'm just going to the messages we are earning we are earning money with okd currently and we are switching to OpenShift because we have lots of business critical applications running on okd that's perfectly fine for us but it's absolutely great for getting your feet on kubernetes cloud technologies and sets why it is worse to get okd in your company yes we all want you all to make money any way you can we all want you to test so thank you all we'll have something on the 20th and the docs Joseph I'll talk to you again we can talk about this in your talk on Monday that you're recording so we'll get there thank you all for coming today and always a pleasure thank you bye bye bye