 Okay, welcome everyone. My name is Maciej and I will be Talking for the next 30 something minutes about what the SIG apps is what we do. What is the future for? workload controllers all sorts of various workload controllers My co-presenters is Janet and Ken who unfortunately were not able to be with us today They are also co-chairs for For SIG apps. So if you've ever attended our sessions, which are happening every other Monday You've probably seen either Ken or Janet Helping to push your through your PRs, approve your enhancements, or even just lead the discussions in the in our bi-weekly meetings. Like I said, they are happening every other Monday I haven't looked at our agenda for the next Monday 13th, which is the next instance of our call when it should be Because depending on if we have any particular topics, we will hold the call if there are no topics We will cancel prior to the meeting. You can also reach us on Kubernetes Slack, there's a SIG apps channel if you have any issues PRs Because I'm fully aware that Even you might be submitting those PRs. It's very hard to With the load of PRs various PRs across the entire Kubernetes project to go through all of them I'm I've I've announced bankruptcy and GitHub notification Years ago. So it's best if you have any particular PRs Just ping us on the this Slack channel or directly I don't mind. I always tell people ping me Give me a week. If I don't respond ping me again. Sometimes it Deload various throughout both the Kubernetes release cycle because it depends or on where we are I'll talk about it a little bit more But it also depends on what's going on with everything else that is happening and I have a couple of different hats that I'm wearing in the Kubernetes community So Pinging a couple of times don't please don't be offended that doesn't mean that I don't like I don't like your PRs or anything like that That's not the case. It's just the load is Sometimes is very high and especially towards every freeze whether that will be enhancements freeze whether they'll be Code freeze whether they'll be upcoming next week test freeze The load is very high and getting through PRs is Requires time and attention Additional option if you're not into Slack, there's also an email group so you can reach us on that So the before we get into discussions about the topics that we've been working and we've been finishing and we are currently working on within the apps work group what we are what what the mission behind the The special interest group is so basically in very simple terms It's about running any kind of workloads whether they'll be batch There was a session prior to lunch where we were discussing batch workloads We have a separate working group devoted specifically for all sorts of AI ML workloads or HPC If you're interested into and into that topic you can either bring the topic over to the SIG apps or You can visit the batch working group. We have a dedicated channel just for the batch workloads Or if you're talking about all sorts of either stateful workloads or stateless This is covered as a broad ecosystem or even if you if you have an interesting application that you Created it's either open source or you are open and sourcing it and you would like to share with the broader community that is running on Kubernetes because it is Solving a interesting problem The SIG apps is a place where you want to share that That knowledge I also linked in the slides our annual report which is in Some ways covering the topics that I will be talking about and some additional community health around the the apps work group interest group so Let's go through the features because that's That's literally the most important the most interesting part of what the apps work group Interest group. Sorry is working on so over the past three releases We've been tackling and promoting several features those two that I have currently on the slides first of them is about tracking Job status, so if you've been running jobs previously, you might be aware that jobs were basically written in the initial Days in the early days of Cube somewhere back in 2015 More or less when you were writing it it was written in such a way that it kept the pods after the pods completed To be able to calculate the status of the job the problem with that it was okay until your job was 10 20 pots, but if you think about Batch type of workloads where you're running a hundred 1,000 or even more pots keeping those 1,000 pots just to be able to count how many failed how many succeeded is Very irresponsible and obviously it is consuming a lot of resources unnecessary resources because all of those pots are finished We recognize the problem very early on there is an issue which was literally open a couple Weeks or months after we completed the job controller implementation Which said well, we have to do it differently and we're just looking for volunteers So as part of the initial Task that the batch work group was picking up was just that so what they did is they created a finalizer Which is placed on every single pod that is created within the within the job And that finalizer is being removed which allows us to properly calculate how many pods Finish whether that was a success or a failure and allowed us to keep the status without the unnecessarily keeping the resources on the cluster that g8 in 126, so You're probably if you're using something newer than 26 You're already on the safe side if you're prior to 126 I would probably just encourage you to to jumping to 26 or newer cube especially that we are almost at the time when we will be releasing 129 in less than a month the other thing was Time zone support and cron jobs. So when we originally wrote cron jobs, we were very opinionated about Yeah, we don't want to support time zones. You probably don't want to do it and if you want to have some kind of a Time zone support in cron jobs We figure out that it will be best if the cron job resource will always stick with the time zone that is The one that is set on the cube controller manager and eventual translation should be done on the client side So whether you have your web application or something else it should be responsible for translating the time zones from whatever your comfortable with working locally into Into whatever is in the cluster Of course a lot of people. I think that was one of the most requested features Thankfully the all the underlying libraries Allowed us to easily enable those feature and We were able to quickly promote it through all the stages. So that one was g8 in in 127 Moving on to beta features There's a bunch of them that we currently Pushed over from the initial stages the beta basically means that all those features are available in the cluster by default so if you're interested in any of these and you have the version of the cluster that I mentioned in the Hirethesis you should be able to reuse those features First one is but healthy policy for PDB so pod disruption budgets So pod disruption budgets is a resource that allows you to ensure that this number of pods is Available for your application. It is protecting your application from going below that limit So if someone is trying to evict your pot for various reasons, whether there is a resource exhaustion or you're trying to Drain a note or something like that Pod disruption budgets allow you to ensure that a minimal amount of your resources Will always be available by minimal amount of resources. I mean pot specifically But that raised some issues and eventually during upgrades. We noticed that There is a possibility that a pod might not be healthy yet But it is already counted as part of the PDB Which prevented in some cases certain upgrades because the pot theoretically was not usable because it wasn't healthy But it was already counted as part of the PDB and that did not allow it to be evicted but because of the Backwards compatibility, which we take very seriously within the entire Kubernetes project We could not just change a default which was one of the first option that some of us did consider but We said that no we cannot just break users because some of people already started relying on this particular Behavior and we're writing Additional code around that approach. So what we did was introduce additional field within the pod disruption budget a Specification file that field allows you to tell whether you want to use the default Which is the previous behavior or the newer behavior, which basically tells oh, yes It has to be specifically healthy pod to be considered as a valid pot for a PDB And when it's not healthy, you can quickly and easily just say yes, just evicted if you need to So that is currently in beta. We are we've been considering promoting it in 129 But we want to do a lot more testing. So that one is actually Is kept as a beta for one more release To ensure that once we promote it to stable it actually be rock stable Another thing was Stateful sets by default will not touch your P Vcs So when you're creating your stateful set you have an information about what kind of Volume templates you want to have created but because we wanted to ensure The data Behind the stateful set is always secure and safe. We said no, we're not gonna touch your your P Vcs Backing your stateful set. Yeah, if you care about it, you have to be explicit and manually do it It turns out that this works for a lot of the cases But it appeared that there are some cases where people are okay with getting those PVCs delayed it So we decided that we will again extend the stateful set Specification with the ability to in in opt in sorry Into the ability to oh, yes, I want to have I'm I'm fully aware that this is a breaking thing just whenever you're I don't know whether migrating pods or Scaling up or down you can safely remove the data behind the stateful set We're currently in beta with that one. There are still back a couple of back and forth between the the apps were as interest group and The storage group which is responsible for the storage area specifically Which we want to go how far whether there are all the edge case is already considered and address before we will again Promote it as a stable feature another interesting addition that was Coming from the batch working group is the elastic index job So if you remember how the job or if you've ever worked with a job whether that will be index job or a regular job It has a certain limitation that was baked in from the day from the early days which literally said if you put a particular number of Completions and the specification of a job. You cannot modify it. It turns out that when we were considering supporting Various kinds of batch workloads it turned out that we would like to be able to specify the Completions number in some cases. That's when we introduce That's that's how we decided to introduce something that is called elastic index job So in that elastic index job case you can modify the completions But only if you're modifying completions along with parallels This allows some specific use cases for the batch work for the batch workloads to work even better If you're interested in more details, there's a cap link and a slide so you can check it out I'm pretty sure that this will be rather straightforward and we'll be able to push this forward rather Promptly over the next release or two Another one which is Coming from the batch work group and the entire area of supporting batch in Cube natively is Retriable and non-retriable failures for jobs That is a topic that we've been working for for three four releases by now. It's still in beta I'm not sure if we will be quickly pushing this forward because we still have a lot of edge cases that we want to Go through and sure that they are properly covered Especially around the API for expressing how to consider pods It's basically in a normal job case It has an information. Oh, yes, I want to have reached this many completions and the controller will make sure whatever it needs to to reach that completions numbers But there are some cases where it's okay to ignore certain certain Groups of failures for example when you're running a batch work load and Due to resource exhaustion some of the pods from the job will get evicted and moved over to a different cluster Then those failures could actually not be counted as an actual failure But it should be retried in several of those cases. Similarly if you write your Your job such that it will be returning code exit codes For which you know that this is not a problem, but rather a setup issue or differently when there will be a temporary Registry problem and the images will not be Possible to be pulled to certain notes. Those will be counted as failures So rather than counting those as failures we can express currently in part in the job Specification what kind of conditions whether that will be pot pot conditions or exit codes What conditions we will not consider as a failing and as a failure and we will allow the controller to retry again It's an interesting and an interesting area of development under has been multiple developments recent addition recent addition were literally merged Over a week ago, so shortly before the actual 129 freeze Moving on with the beta few beta features that we that we have been working on So stateful set introduce something that is called stateful set slices. So normally up until now We had a specific numbers of Replicas that you had in your in your stateful set But if you are thinking about migrating those stateful set from one cluster to a different cluster Obviously, you need some kind of a third-party Orchestrator to allow that kind of migration scenarios You cannot do it manually because the numbers will not match whatever you would expect Adding a slice over there Allows us to say that oh This particular cluster is running the instances of my stateful set from one Until five and the other one is running from five until ten and then slowly Allowing yourself to migrate from one cluster over to the other We've noticed there's there has been a couple of glitches in the stateful set controller due to introducing some of those features So we're still iterating before moving forward with that one if you're interested in that area I totally welcome you to to join us help us with either testing or or pushing this feature forward Another two additions coming from the batch work group one which is rather Cosmetic I would say is adding the labors for Pot indexes in either the index job or a stateful set very often Our users complain that they have no Um supported way to get the num the the index number of your pod They were using Downward API to parse that out of the name of the pod in either cases So we've decided that it'll be probably best to expose this as a label on a On a pot in both cases that you could easily through downward API get the number There are some cases where you want to know What is your ordering in case of a stateful set or a job? A rather recent addition I Think it was recently promoted to be yeah 27 Was a replacement of pots in job when fully terminating? so we basically We differently count the statuses that's again for the batch work group We figure out that the way we are currently exposing that information is rather Missing and some people want to have that additional information in the status of the job. So we're slowly expanding those things and Lastly was also a rather cosmetical change When working with a cron job time zone, we noticed that there's a lot of people who are interested in And Knowing the date when the job was created or when the actual schedule for creating a job was We got an issue when we were rewriting the cron job controller from the previous iteration to the current One which is a rather older topic and I had a bunch of call previous Discussions and presentation around the topic if you're interested I'm happy to to follow up after this session but basically We stop Putting the timestamp in the name of the job created and a lot of people actually relied on That fact and we only put it there as to make the job you need to make the job name unique But actually people started relying on that That information in the name which is Something that we did not expect ever anyone would do but that's how we decided Well, we want to allow users to have something that they can actually rely on so we added a new annotation and expose that kind of information Alpha features that I'm not sure but I think I have two topics and I'm The alpha feature that I'm thinking about but it's probably the slides is wrong is Rather a recent addition for 128 Where we added The ability to express When you are actually replacing your pots in a job, so When you're what when you're working in a batch work group in a batch workload and you're working on very limited resources, whether that's your node capacity or you have a specific quota and Or you're working with Specialized hardware you cannot quickly replace pods Which is a thing that is that is happening in the normal controller if you're working or you're used to working with regular controllers Whether that will be deployment Replicas it or even a job the moment it sees a terminating pod it will replace the pot immediately But in a case where you're at your limit replacing the pot is Not feasible or especially if you're reusing a specialized hardware and you use all of the Units of your hardware you cannot Make the hardware it has to be freed first and only then it can be reused So for those cases we decided that we would like to be able to extend The job API with the ability to say no replace the pods, but only after it's fully terminated rather than As soon as it goes down No, we're waiting for when the the pod will be either failed or succeeded completely Which will ensure that for example if you're using specialized hardware that the hardware is released and you can reuse it So yeah, I'll fix the slides after this presentation for the topics that we've been working on for 129 which The code freeze for 129 works last week the couple of things that we were working is exposing information about ready Pots in your job Like I mentioned the retrieval and non-retrieval fairs for job It's an ongoing topic and probably will be for a couple more releases replacement pots and A recent addition again coming from the batch work group as you see a lot of the topics originate in the batch work group I would say that More than 60% of the work that the SIG apps currently does is run Enabling all sorts of various batch work groups So we've added the ability to specify a back-off limit, but previously it was per entire job Which as we saw in the index job cases It's actually better or more preferable to be able to to say that only certain indexes can have and a little bit different Back-off limit rather than a global The batch work group I did mention the batch work group a couple of times over the this entire presentation if If you are interested in in the work that they are doing the mission Unfortunately the session for the batch work group was prior to lunch before the lunch I cannot guide you to to see that but I I will encourage you to have a look at the presentation as soon as Sorry as soon as it will be available online and on CNCF channel on YouTube But and in the meantime what you can do is you can join the bi-weekly Thursday calls for the batch work group There's also slack channel and an important thing is that's a Kubernetes slack channel Because there is a separate batch work group within the CNCF slack They are a little bit higher us The engineers and folks who are riding the batch and the controllers stuff like you We hang out in the Kubernetes slack There's also an email group if you want to reach out to us and I think with that I'm open to question. I think there is a microphone over there If you have any particular question if not I I will be here for a little while or later on at the Red Hat booth, and I'm happy to take any Discussions or topics that we can that you might want to have. Thank you. I It looks like there are no questions. I'll hang around here for a little while if if you know