 Okay. Yeah, my name is Abdullah Hachino, so I'll be moderating this panel. So we've been discussing a lot since like the whole batch working through the idea of coming with the batch. Because you always thought that Kubernetes was mostly optimized for service type workloads. There have been a lot of optimizations related to services like topology-based routing, rolling updates and all these kinds of things. We're focused on microservices. Now as we've seen recently, now that Kubernetes has become pretty much a de facto orchestrator for containers and proved to be quite nice to run service type workloads, the community is trying to convert other types of workloads on that same platform. But because we haven't focused as a community on running batch workloads on Kubernetes, there are a lot of gaps that we know that. And so we thought like, okay, maybe we should discuss this and have a look. We have these discussions a lot like during the day. We have multiple talks giving us basically a laundry list of gaps. But people are still running batch workloads on Kubernetes even though we have all these gaps. And so we thought like, what about bringing four different platforms that were built on top of Kubernetes. We have a platform from G-Research, Armada. We've got Unicorn. We have MCAT from IBM. And we have Kube Flux or Flux, I guess, from Los Angeles National Labs. It's a fluence from Lawrence Livermore in IBM. Yeah. And so the panel, I guess, what we could discuss here is the... What are the gaps that we have in Kubernetes to run batch workloads? And how we can bridge these gaps? And I guess we can start from Daniel. Sure, yeah. So I just want to start off by saying my background is in HPC and bare metal HPC. I've kind of been introduced to the Kubernetes ecosystem relatively recently and been really, really excited by its capabilities and its abilities to enhance HPC itself. So I think one thing that I like to think about in terms of HPC versus Kubernetes is declarative versus imperative models. And from my point of view, HPC comes from a much more imperative model in that the user is customized or is used to specifying exactly how things will run, exactly where, because they assume that they know best how to get the most performance out of their particular application versus the Kubernetes model is much more declarative oriented where basically you want to maintain a particular state. So how basically can we merge these two models together and is that even possible to begin with? I think that's a real big challenge. I don't know if there's a gap per se in batch for Kubernetes. It's, I think, more of a clash of models that hopefully will converge on one another. Then you want to introduce yourself and why you're interested in batch on Kubernetes. Thank you. Albin, do you want to introduce yourself and background? Sure, my name is Albin and I work with Jamie on the Armada project with a presentation that he held earlier. And so I'll just go into three things that we have sort of figured out is challenging with running our batch workloads on top of Kubernetes and indeed these are some of the things that we try to alleviate with Armada. And so one of these things is that we're really scared about overloading EtsyD and Armada was originally designed as a buffer to be placed in front of EtsyD such that when one of the users submits 100,000 jobs within, say, 10 seconds we don't cause EtsyD to completely fall over yet because EtsyD does total order broadcasts if any one user completely overloads EtsyD and there's no service for anyone on the cluster anymore. The second thing is that as discussed previously, HPC workloads are often designed in a very much when this happens to something else. And so there are events in Kubernetes, right, but these things are sort of informational and not really designed to build automation on top of. In particular, something that we are sort of struggling with now is determining from outside a cluster if a pod was preempted. And so it turns out that the queue schedule will preempt a pod and then after that create an event saying that the pod was preempted. But there's no guarantees that this event will actually be produced or indeed on any ordering between that event and other events. And so looking into the cluster it can be difficult to determine what happens to your pods and to act accordingly. And finally, because Kubernetes has sort of grew up in the microservices world there's still a lot of tooling missing. For example, gang scheduling. There's a lot of fragmentation around gang scheduling and this is something that we really want to support in Armada and it's not really clear how we will do this. There's many options. Thank you. Wilfred? Yeah. So Wilfred Spiegelberg from Apache Unicorn. We started working on Apache Unicorn from the yarn perspective. So like Weiwei, I come from the yarn perspective. I got a lot of these things like job queuing, hierarchical queues, user quotas. That's all there. So when we see people that want to move to Kubernetes, the first question is what they ask is, so are you going to give me all the functionality that I've got on yarn? So do you do queuing? Do you do quotas? Do you do all that stuff on the fly? And do you implement strict security on Hadoop? We run with Kerberos enabled. Every single thing is Kerberos all the way through the pod or the JVM that you run on the system. So that principle is there. And when you look at Kubernetes from our perspective also, we're missing that user information going through the scheduling cycle. So accounting for resource usage. And I've heard a number of people say that that's difficult. So how do you get that through there? Preemption from batch jobs often matters exactly which pod you pre-empt. If you pre-empt the driver pod from Spark, your whole Spark application is gone. So it could be far more costly to pre-empt that one driver pod than pre-empting ten executable pods overall. And that difference, you can't make that difference. The schedule is not aware of that information. So we're trying to get that kind of information into the system. But that's missing from our perspective. On top of a number of the other things that are already mentioned, it's two different worlds almost. We've got a lot to talk about here. So we've got multi-cluster, we've got Kota and queuing and bin packing. So before going to the audience questions, I have like one high-level question here, which is do we believe like since you've been working with Kubernetes for a while now, do we believe there's something fundamentally wrong with Kubernetes that makes it wrong or like limiting, that makes it hard to build batch frameworks on top of it? But we have to do it because, well, it's the cool thing out there and we have to basically unify our infrastructure around something, you know, common. I think I can give a quick hook into etcd. When you're looking at batch jobs and batch things, there's uses that will submit hundreds of jobs at one go, which could be thousands of bots. So offloading etcd is a real problem and you need to be really careful. And that's also why, what Apple did, what I hear, we had a meet-up last week, Pinterest also, they all stopped scaling at 800, 2000 nodes. They say we don't go any further, we do not want to get into problems with etcd. So that's a real thing that we've all got. Yeah, sorry, I completely agree. Okay, let's try again. Yeah, let's say like etcd is something that we're really concerned about. And indeed, it looks like at least that, for the system that we tried to build, relying only on etcd is essentially an impossibility. Because indeed we have users that submit, say, 100,000 jobs simultaneously. As a result, instead we're relying on Pulsar, which does sort of a partition total order broadcast which allows us to scale to, say, two orders of magnitude higher than etcd. The other thing that is a fundamental problem with Kubernetes that will be very difficult to change is the events. Because events, so Kubernetes will do something and then afterwards publish an event that it did it. And there are many components that concurrently publish these events. Because of this architecture, you can never guarantee event ordering or delivery and so they will never be reliable. So it's also something that we try to alleviate with this Armada whereby Armada will, Armada is itself event driven. So Armada does guarantee event delivery and ordering. But at the point at which we interact with Kubernetes we don't have these guarantees. We have to try to do the best we can. Yes, I'd like to piggyback off of what several of the other presenters or panelists have said so far. One of the things I mentioned toward the end of my presentation on Fluence was a project that we're working on called the Flux Operator. And the idea behind this is to instantiate a mini cluster inside of Kubernetes that will provide full feature batch scheduling. So this is the scheduling, the queuing, basically everything inside of it. And there are some difficulties that we've run into so far. And I said before that there aren't really limitations. That's not exactly true. So one of the things that we've run into is the difficulty in figuring out when some of the worker pods are ready state. The way that Flux will wire up is that the workers will start and then communicate with a tree-based overlay network to a broker rank zero. And they basically signal that they are ready over that particular network. So if there were an internal kind of an intermediate state or a state after running that was more specific to that than maybe that broker could query that state from within this group of pods and then transition to running to execute the application MPI or whatever it was afterwards. I have some follow-ups on each item here but I want to open it first to the audience if anybody has any questions. More from observation. So I don't have an HPC batch backwards. I have been working with Kubernetes since 2017 running in production in a big bank. And as an architect, I get the thought, should you want to do a basically lift and shift from the traditional world to the Kubernetes world? Or should you put some of the changes at the application side as well and not try to fix all the problems? Yes, you can shift some of the work towards the submitted, towards the application side of things. But if you look at, let's say, a Spark job when you process data, you do not know exactly how many executors you will use or will have to use until you know what your data is. So you don't want to take that dynamic capability away because if you take it away then you either extend the processing time or you waste resources because you're only using half of the executors. So pulling that information out and then saying, oh, if you put that in a gateway kind of thing and you do it all beforehand, then we can make it easier for Kubernetes to work with. Yeah, your mileage may vary. So you will get somewhere, I know from a Petchicon that there are a couple of companies that are working on pulling that information all into an application outside. But they also say that's a continual process and they analyze every single Spark job that they've run. So the overhead becomes also pretty big on your side. So I'll just comment. I think modern computers, one of the successful design principles has been to make computers so that it appears to the applications running on them as they have the whole computer for themselves and there's no contention and so on. And one can make the same argument for the modern Kubernetes and distributed world, right? Let's try to make the platform such that from the point of view of applications, right? It's like running on your local PC. But maybe we've reached a limit where that's no longer feasible and we have to just fundamentally write new applications built for this new world. And I think there's some truth to that. Now it's working. I think one of you mentioned this challenge that we have that frequently we hear mentioned in the context of HPC that users want to control everything and they want to drive everything. But then interestingly, I think two talks before we've seen funny slides from that Czech University presentation where it was showing how bad users are in efficiency of their workloads. My experience at least, good representation how bad most users are in controlling these things. So I'm wondering whether our drive towards all of these very sophisticated features towards controlling the processes, controlling scheduling, what's your opinion? Aren't we actually over indexing on a very small fraction of power users versus delivering something that is going to be more broadly powerful for a broader community? So at least, at GVsearch, our users are absolutely terrible in knowing what they want. On the other hand, they're using state-of-the-art machine learning frameworks and the authors of those frameworks know exactly what they're supposed to need. And so I think that's where you want to put the smarts of figuring out exactly what new metapology awareness is necessary for this particular application and so on. So if you really write normal general-purpose code for submitting into our clusters, they just express their problem in terms of some high-level framework. And that high-level framework can do all of this for them, I believe. Great. Great. All right. Okay. I hear myself now. So let me restart. I think that's a really interesting question and I think that it's true that a lot of users don't understand how much they need to be able to do out of a given set of hardware. But I think, on the other hand, there are power users as well that run at extremely large scale, like exascale. And that is something that users need to be able to do in order to squeeze as much capability out of really cutting-edge machines to do science at massive, massive scales. So I think Kubernetes needs to be able to do both in that sense for these large-scale, complex, scientific workflows to run anywhere, not just on bare metal and HPC, but also in Kubernetes in the cloud. Yeah. I wanted to go back to something earlier that, you know, it's a common pattern that people use multi-cluster because one control plane can't take it. That actually really speaks to the fact that either we accept that or we invest to try to make one control plane larger, and we're never going to work. But as a batch community, what can we do to make federated clusters? Because it's such a common pattern. Literally, all of us seem to have a solution for and start to foment something that is reusable between solutions. Okay. So yeah, I think that's a great observation and something we've been thinking about for some time. I do think that at some level, we need to start to think about how we do this federation for batch jobs across the different clusters. But in an upstream point of view, whether we enable the ability to provision clusters just for scheduling. In other words, you have a scheduling sort of mini-cluster. There's some interesting work with KPC which provides an API Kubernetes API. I don't know if folks are familiar with that, which is something we may be able to look at which would be able to put a scheduler behind that that would federate workloads across the different clusters. But I do think it should be part of the batch work group agenda at least to evaluate how we might approach it. But I think there's a lot more work that needs to be done for that. I just have one quick comment about this as well. Like all the solutions that we've seen were mostly related to specific use cases, right? Like we had the Spark implemented for Spark. We have implemented for specific type workloads. Embarrassingly parallel, for example. The challenge is actually coming up with something extremely generic and acceptable by the community. That is, in my opinion, that's like 10x more complex. So you can definitely solve it for one or two problems. But then if you want to make it generic, that's where the complexity starts to shoot up. Let me add on to what Abdullah said also. It's not just the cluster that needs to scale then. But if you use, for instance, the Spark operator or something else, all these things need to scale up. And what we've seen with our users that have been using, for instance, the Spark operator, it does not scale up that high either. So you've already got multiple instances of the operator in the cluster of a thousand nodes. So it is... You can do one little bit, but if you don't have every single little bit in your whole chain, it doesn't scale up anyway. So, quick question on the math involved. So different chips have different... Some of them don't follow the IAAA standard. So if you look at the cell architecture, for instance, for the PS Playstations, they don't follow it. And they're not the only ones that have different architecture for floating point. How do you think we should move forward to make sure that the math is correct if we're looking for actual accuracy? So Q has this really... The Q project has this really nice feature which is called Node Flavors. I believe if you can comment on this, I guess you know better me. Whereby you have some set of labels, I believe, right? And a particular set of labels that have the same value denotes a particular node flavor. And so, for example, scheduling could guarantee that all of your nodes are scheduled across nodes with identical node flavors, which I think would be an elegant solution to problem such as this. At least then you are consistently incorrect if you are incorrect. Hi guys, let's see if this works. So one thing that seems to be kind of an assumption is that something else has to get bolted in to do a lot of coordination. Either federation between different clusters, different scheduling frameworks, loading things into the scheduling framework and the operator stack, my question kind of goes to the start before you get to that layer. What is the expected user interface for just coming in and throwing bash workload at a system? And then how does that feedback into those designs and how you tackle user identities versus our back? Do you have any sort of approach that starts to tackle that and kind of where does that then spill over into the rest of this architecture? So from a unicorn perspective we have taken the approach that we want to support things as simple as possible. No CIDs, nothing. We allow it or we schedule based on annotations that you put on the pot and that's it. The rest we use the standard Kubernetes project so we've tried to keep it as simple as possible as close to Kubernetes as we can. That was the whole setup that we started with. And from a user perspective we're working on user quotas at the moment. We're working on the design doc so that's coming up and I think that's going upstream probably in the next two weeks or so. So design docs should be around on the Apache unicorn website and how we think about tackling that. It's a whole different story. Yeah, so in our model land we have the notion of a job that we use is that of essentially a bag of Kubernetes objects and so you'll give us your bag of Kubernetes objects and you say one of them is my main object and whenever that object and dies for whatever reason to pick up a pot that runs to completion we just kill all of the other resources. And user identity is based entirely around queues so when you give us your bag of objects we tell them put all of these bags into this queue and then that maps to a single namespace inside of Kubernetes. And so in this way we try to keep it fairly Kubernetes native while still making it appear as to be sort of a batch style job with a lifecycle that is easy to reason about. In the MCAT world we are about to release a quota management solution where it's hierarchical, there's borrowing and sharing and essentially you can express it as an RBAC identity. You define how you want to label all of this quotas related to it. There's no specific integration with actually reading the RBAC objects but that's something that we could look at but initially we allow the quota admin to be able to define what those definitions are and then set the limits. And then with that obviously we can manage the users in that way. I guess one major difference between of course services and batch workloads with services like you have there's a service which is a web page with batch the user is actually trying to do something with the system directly they're trying to submit something. And I think that's what I observe in for example machine learning they have SDKs. They start from an SDK API for submitting jobs and then everything after that is handled by the SDK it creates Kubernetes objects etc. etc. I don't know if that model can translate to something like with Slurm it's the same thing you have probably these command line interfaces that people used to like Slurm submit and all these things and I guess it depends on the type of workload but I feel with batch we need an SDK we need these command line tools and then those translate into something else it depends on how you set the system. To extend to his question do you think like a batch need to support like a multi-tenant support? Yes there's a huge push for multi-tenancy support and that's why we're also looking at the side of the user qualitize and all that kind of stuff so we see that push from the user that we've got and then multi-tenancy doesn't mean for them multiple companies but even within companies there's different groups that get a quota assigned or different users that have got a quota assigned so yeah we do see that and that's huge pressure. Yeah indeed also in Armada we have many users internally and it's a lot of like it's very important obviously to the users that each of them get their fair share because they will come and complain to us if we don't give them their fair share and so for this reason we do have all of these things we have sort of per user quotas we have per user quotas per scheduling round so that not a single user can have too much capacity over a short span of time and then yeah also the sort of total capacity of each particular cluster per user as well and then we schedule jobs in an order determined by which user has the lowest fraction of their quota currently and then we also have per priority class quotas so that if you have supremptible jobs we can give you more resources than if your jobs are non-preemptible Yeah just an observation on all these things we're designing is it the case that really we're all just sort of architecting and optimising around the fact that a few things we're all scared of breaking SCD and if that's scaled just imagine it's scaled infinitely then that wouldn't be a problem and then secondly the multi cluster thing I think we all want that because that's a sensible thing to have for operational and resilience reasons if there was just an automatic way that Kubernetes clusters could be somehow aware of each other if we imagine both those two problems were solved then we would all be kind of unshackled to just build whatever solutions we want on top of it and it would be more obvious what the right one was for everyone That's just my observation I don't know what you guys think about that Should we focus on scaling SCD first or looking at a different back end for Kubernetes maybe and having it use it in a more controlled way that wouldn't just allow a rogue user to destroy it with a single single button click Sorry We haven't really looked at least our group hasn't really looked at replacing the back end but definitely you know that's the big scare that's the motivation that some of our end users have multi clusters is just essentially to handle those limitations so it's a good point and I don't know if there's any work around that but it's very valid and I think you're right once that's the biggest driver for multi clusters on our end I should say that part of the motivation for creating the flux operator was to avoid a perceived limitation of SCD as well in the sense that we would create this batch system inside of Kubernetes that then can subdivide the resources and handle the same number of tasks that flux natively can handle which is tens, hundreds of thousands, perhaps millions of tasks I mean this goes back to the first question that I asked is actually SCD a fundamental part of Kubernetes kinda is not right like it's just the storage solution that they chose at the beginning but it can be replaced with something that is infinitely scalable globally scalable right we have such solutions in our cloud multiple cloud providers if we just adopt the SCD API to those storage solutions that are globally scalable it will solve most of these problems so we can have Kubernetes on the pull star Hi I'm just wondering if there's an opportunity for other projects like Slurm to actually work with the Kubernetes community to develop queues and different APIs so that Slurm could actually do a lot of the scheduling and resource control that people are asking from the batch community but treat a Kubernetes cluster as a resource and then you could filter a lot of the problems I agree with that SCD but I think that's a solvable problem but if you had something like Slurm that was extended people could still submit jobs as they're used to today in the HPC community or some internal communities I know a lot of people are also LSF users they'll dump hundreds of jobs over thousands of nodes but that could be an opportunity to merge those two communities to work with those two communities to solve some of these problems because they've already solved them for bare metal clusters or other types of compute but I guess it's this most of these schedulers they were not built for Kubernetes I guess Unicorn was like Unicorn is a component outside Kubernetes same thing with your scheduler you just use the scheduler plugins to basically offload the scheduling or call outside into an external service can be done the same thing with Slurm I mean what I'm trying to say is that that model is actually being adopted in multiple other projects other than Slurm First I wanted to comment on SCD I remember one of the topics of the last computer summit in Valencia was that SCD was in a crisis trying to find a maintainers so I guess my question is are we not doing enough as a batch community to go back to SCD in this case and improve it for our needs and same about applications like Spark or Yarn and MPI those are all very old frameworks but I kind of assume that they have like certain characteristics and like they don't know about pods they just assume they are running in bare metal even they assume everything is you only run when everything is already set up and should we be reaching back to let's say open API Intel API or Spark maintainers to actually help them or like help them be more cloud native to be more resilient to failure I recently was in a discussion also with Tim Hawking where there was a cap about discussing a startup order of containers and Hawking was saying the containers should be resilient to not having the setup already there but that's not the reality MPI operator has that problem and I don't know about Spark too much but back to my question are we doing enough reaching back to those communities are we helping them move forward by the things for us as communities, developers and so on so I just wanted to comment on the MPI at least I think that's a really really interesting point I think there are efforts in MPI to make the communicator more flexible in the sense of dynamically adding and removing ranks but I don't know where that is I don't think that's production ready I think that's mainly research at this point but I think that could be hugely informative one of our use cases we're using that Ray framework and we're very involved with them in helping deliver some of our queuing and spatching policies with them so as far as Ray is concerned we are I don't know enough about the Spark Spark from its side it's pretty flexible so you can tell it to start up with a minimum number of executors and grow to a maximum number of executors and do all that kind of stuff for you in the background but the end user needs to ask for it if they don't do it then you don't get it so it's partially also a education of the end users to say this is how you submit and this is what you set up that makes it easier for you we still see people that run Spark jobs and then all the work is done in the driver but they're asking for 10 hour executors and waste an enormous amount of resources and then you tell them and say hey, this is what you're doing and they say oh, it wasn't aware that I was doing that they don't have the view they don't look at it like Weiwei said hiding a lot of the things and they just want to submit the job and want to run quickly and they want to get the result back that's a lot of the end users that you get when you run the Spark things and I think that's the same with the guys from Armada the people just submit the job, they want to get the result back they don't look at how have I submitted this and how many resources did I waste up until you tell them here's the bill in the end and then they start complaining but they don't get that because there's no user accounting if we provide that information back to them we can educate them and they will run better thanks so I came into this very much from a services background and my work at a finance company we're doing a lot of batch thinking about Kubernetes for that the reason why I mentioned that is that you don't really care about users in services and we've talked a lot about how in batch you really do we talk a lot about queuing and scheduling and so on it feels like a lot of the solutions there's nothing native in Kubernetes to correlate the idea of a scheduling user to a workload you kind of schedule it and then it's never associated with you again and so we've all kind of building tools around it to help with that I'm kind of curious about we've talked a lot about scheduling but storage is another problem that I'm really struggling with this I've spoken to people here about it already I'm kind of curious about that idea of do we think there's a way to better correlate the scheduling user or information about the user into Kubernetes itself or do we feel like we have to keep relying on building external things around it we can kind of parallel onto that what do you guys or what do we think the solutions are for kind of interacting with non-API driven object storage so POSIX file systems shared clustered file systems in a way that might actually work because people I've spoken to here don't seem very happy with the solutions they've built even if they've managed to make it work so we are very much in this situation where our users use a lot of file systems for various operations to load data and so on and indeed this is one of the main issues with stability and performance on our platform whereby say all of your users simultaneously try to get that the same file on a file share and then everything falls over so I think what we would really like in the long term is some way for Kubernetes to be aware of the health of the underlying health and capacity of the underlying file storage systems and to be able to share those resources effectively because really this is something that we don't have at all and it's really causing us a lot of pain Do you have a question? Sometimes in the HPC where all those sorts of problems are solved via tiered storage or burst buffers and that sort of thing so I wonder if there would be some interest in the Kubernetes community or integrating that if it hasn't already been done Using the fact that nobody wants to ask a question I just wanted to throw a comment defending at CD I'm working on Google Kubernetes Engine and our internet testing pipeline test clusters up to 15,000 nodes by cluster the largest production clusters that our customers have had 12,000 nodes old spots so very heavy on like recycling and preemption so this is a very powerful tool we need to fix a couple of problems like that Aldo mentioned like on quality of maintainability or reliability with new releases that stuff but I wouldn't like go where one question that it's a really bad storage system it's a really good one the community may need a little bit help but as there are no questions I was going to do make a similar comment which is a couple of months ago people realized that at CD was left with one maintainer but that was a problem with the project but it was also a show of the community that has been built around these tools because as soon as there was a call for help there were a bunch of companies that jumped in and actually volunteered resources to fix it and maybe at CD is the current problem but these problems keep repeating themselves with a bunch of tools that after the hype and when they become kind of stable lose interest from a lot of people so we are seeing it with at CD but we'll see it any number of tools that will keep coming so jumping out of a solution is sometimes not the best option maybe there is room to fix it and then I had another comment which was regarding the opportunity of running batch on Kubernetes there are many options as we've heard today but I had a question because I come from an environment where we were doing massive scale computing before the cloud appeared and we built a bunch of custom tools for two decades that we still run and moving those to Kubernetes has been a massive simplification and it also democratizes a bit the access to a large amount of resources with the public cloud and I wonder how much of an effort should be in these common APIs given that we already have the Kubernetes API as a base to access all these resources you have options on premises you have options on all the major hyperscalers how do you see the benefit of actually putting a lot of effort into having this single extensions to the job API and some sort of single scheduler instead of just relying on the base Kubernetes API and then people choose whatever they choose on top of how much effort do you think we get how much benefit do you think we get from this potential effort so from our point of view I think it's a lot of benefit you can from the talks today and from the panelists there's a lot of common challenges that we have and to be able to solve as many of those we can with one code base and have a community that supports that same code base I think it becomes more effective become more efficient and we can evolve it I think a lot quicker so I think there's a lot of benefit I don't know if we'll be able to provide all of the different features and fancy things that we maybe individually in our projects have provided but I think if we can try to gather as much of the common functions and features I think it'll benefit the community very much one last question you have a comment a little bit more on the STD thing it's always never been STD the API server can go down because you have a thundering herd because you have so many nodes and there's a lot of these lessons we've learned internally we've learned from each other just inside of our company unfortunately we mostly were proprietary I kind of have a question this has been an incredible forum but it's been one day maybe two days out of the year I would really love to be able to communicate and be able to have these wonderful showcases what else can we do to encourage and to have more forums and more opportunities for folks to be able to present and talk like this so Alex Cameron is raising his hands in the back I think we have run out of time any final remarks I'll just mention that there are containers in HPC and orchestrated containers in HPC workshops at the supercomputing conference as well and that's starting to gain traction and you see a lot more users in that so coming from that direction would be really interesting too I think those users that are used to kind of bare metal traditional HPC would benefit a lot from viewing some of these presentations too Diana final remarks okay I wanted to apologize for laughing but Alex was back there waving his hands so yeah I I want to thank everybody for doing the presentations and coming in and sharing their their thoughts, ideas, requests and again I just want to encourage you to come to the batch work group and help us at least try to solve some of your issues or be able to identify those so we can make recommendations or hopefully upstream some changes yeah last remark also I think also from an Apache unicorn we're seeing way more traction out of the old big data kind of workload kind of moves over onto the Kubernetes side so there is there is demand now from that side so we'll see more of that and I think when I look at all the presentations today we all have the same kind of limitation on either the job API or so it's not that we want to do different things we just need to get together in the group and then move on yeah so similar for me we built Armada because we needed to address what we perceived to be limitations in Kubernetes but over time as Kubernetes can natively address these we would love to come back closer to just straight up regular Kubernetes again as close as we possibly can because we're going to use a standard interfaces thank you so much