 Yeah. But yeah, were there any talks about, you know, the batch HPS also in the normal, like in the rest of the days or was it only in the co-locate? Because we might have, yeah, we might have went to those. There was quite a few that we attended, given, yeah, our interest. But I wasn't sure if you listed those because I kind of was trying to look into my schedule and see what was, what did I pinned exactly? But I can't find my schedule. Right. So yeah, I know that during the event there was quite a few, like a few related to GPO, like the guy from Google was saying some stuff to optimize Google. I think there was another from Vulkan or above Vulkan. Don't buy, I can't remember the vendor now. There was few others, but like me, for example, I focus more on Kubernetes itself, not necessarily only on the batch. So one of my observation and main team in KubeCon for the remaining days was EBPF. So as a way of improving one performance, getting more observability and more control on your network layer in Kubernetes. So there was a, we spoke to like few vendors on the booth, like Calico and Celium, both very interesting. That's something we're going to probably try soon or soonish in GR as well to play a bit more with EBPF. Other interesting, quite a bit around automation and practices, like people, you can see clearly that like ArgoCD is getting more mature. So I went to a talk where the, how it's called, let's say vendors, but the team responsible for owning ArgoCD or talking how to extend it, what they plan, the roll up and stuff like that. That was quite cool. Yeah, I actually mentioned the batch in HPC Day, but there was also GitOpscon as a co-located before the conference, and then a ton of talks about GitOps as well during the conference, both for Argo and Flux. Yeah, but maybe we can, like if you find the links in the sketch, maybe drop them in the chat. Yeah. I just keep recording, so we should be able to then find the chat and collect them again. Again, it was as well, like it was my first KubeCon and I was surprised by a number of people attending. That was like 7,000 or so people on the site, and sometimes it was quite challenging to get into the room if you're not early enough into the room, especially if the particular talk was like super interesting and getting a lot of traction. It was like if you are not 15 minutes earlier in the room, you're not getting in. Luckily, there was a way of watching them virtually, so that was quite cool. There is one talk I haven't watched even yet, where I think it was Mercedes-Benz explaining how they migrate 700 or 7,000 cluster from Terraform and other infrastructure to using cluster API. So that's quite a few lessons there. And I think that was another observation, like Kubernetes become more and more a framework to doing stuff like that. There is quite a few talks around cross-plane. So cross-plane as a way of managing infrastructure other than just pure Kubernetes stuff using Kubernetes and Kubernetes reconciliation loop to enforce the state and stuff like that. That was quite cool as well. Few post-mortems as well, so few talks about post-mortems. So if you're running Kubernetes for a scale for people, so that was as well quite interesting seeing how I think it was Datadoc, so Datadoc running into the issue and spending like few months investigating stuff on their Amazon infrastructure. Yeah, is that the DNS one? Yes, that was a good talk. I'm pasting the links here in the chat as you speak. Cool. Yeah, we have some small screen. Yeah, three of us at the same time. Yeah, cool. Yeah, so there was as well quite a lot of talks like casual talks happening during the lunch to various people and then again to vendors. So one of the vendors we're kind of using is the one behind OPA. I don't know whether any of you is using OPA, OPA is a way of defining a policy. So before going to KubeCon, we faced a few issues where we tried to restrict some stuff. So we started discussing with them and so on. So that was again quite a nice thing, a side thing in KubeCon. Nice. All right. Anything else? Okay, so I tried to collect a few talks and put it in the chat, but feel free to add if I missed some. Yeah, I think I actually been to the one, the improving GPU utilizations. Yes, I was there, I think. Was it from Google? Yes, that was quite interesting. But it kind of was a let down that they didn't actually talk about the implementation details, but it was quite interesting about how the theoretical side of things. We also try and grab something else out. Because I need to find my SCAT from Kenya. There was one, I feel one of the attendants was, the one that I attended that was quite interesting was network hours scheduling, although it did look like something that would require quite a lot of time before it can be implemented on production cluster. Let me see if I can find it. So yeah, we just, yeah. So this one was quite interesting. But you're just not listening to anything. Yeah, we're talking to them. Yeah, another interesting one was fmr containers. So I'm not sure whether you are aware about fmr containers. That's a new thing in Kubernetes one 23 and I think it's gonna be even more mature in the future. The idea is when you need to troubleshoot a problem with your port or with your application inside the container, rather than using kubectl exec, which many people do, you'd run the command kubectl debug. The difference is exec move your kind of like almost as a search into the container, your troubleshooting. But with debug, it's been additional site container. So we can define that you have completely different set of tools in that site containers. And by being a site container, they share the same name space, Linux namespaces, not Kubernetes namespaces. So by doing that, you have the access to the same network namespace and so on. And you can use ns enter, so namespace enter, to even get access to pit namespace and few other name Linux namespaces. And that was quite cool because it allows you to have very minimalistic image. You can use probably distro less and start building more tools and by that in the into site containers. And the site container once you finish executing your kubectl debug command is gone. So you don't need to worry about having the site constantly running. So we actually do use that term. It became beta in 123 when it was alpha before. So you could enable it in the clusters and we were enabling it. And what we do is we keep one image that has all the debugging tools we need for networking, everything, file systems, whatever. And we use that image to debug and attach using informal containers when we debug stuff. I put the links there as well. Through my sketch. Oh, another one interesting kind of interesting, depending on what you do, is about kubectl. So kubectl, again, you use Kubernetes to manage your VMs simply. But they added quite a few additional features like live migration and so on. So that's in other product, which is becoming more and more mature. And in the future, probably it could be a way to like, rather than directly using, I don't know, open stack to spin your VM, you can use kubectl to control and do VM migration and other stuff. So kubectl, that is a framework again. Yeah, there was few talks as well about horizontal port, autoscaler, using CAD as well for that. The talk wasn't bad. I think there was a project called Pixie, or I think Pixie. Pixie had to capture the traffic, right? The being able to actually produce it. Yeah, that's a good point. There was an interesting talk about, I can remember the title, I will find, about different approach to doing a load test or in general a test, because let's say you run a web application on your Kubernetes. It's a web service. So you, to deploy a new version of that web service, you can do a few approaches. One you have, let's say staging cluster or something like that. You deploy there and maybe test somehow, like write some integration tests. Another one is having like canary approach. So like blue-green, where you redirect a bit of traffic to your new version. That's what people quite often do with services mesh, like Linkerd and stuff like that. But the talk was that the problem with that approach is you're not necessarily has consistent inputs going to that web service. Let's say if you're doing the change in middle of night, you don't have the same traffic or the same number of users. So you don't really know that whether the way you're promoting your new application is working at all or not. So the idea is, I think it's still using eBPF as well to capture the traffic. So in other eBPF related is to kind of tap, like start tapping the traffic and record that traffic. Once you have the traffic kind of recorded, you can reapply that later on anytime and it's going to have the same volume and so on. So you can deploy your web service without impacting a user. You run recorded traffic and if the behavior is not different than before you're changed, then probably you are good and you can try eB release on that point or canary release. That was quite a good talk. I will try to find out the name of the talk. I just pasted it there. It should be the last link in the chat. Yeah, the one, the reproducing issue in your CI pipeline. That's the one. Yeah, I still have the schedule in my head somehow. I didn't watch them, but I remember the title. Yeah, it was very good to be honest. Yeah, it was a according team of eBPF this year. It was quite interesting. I remember hearing about eBPF quite a few times in the past, but this one really was, yeah, you can see already that the tooling is getting more mature, or at least more known around the community. In fact, it was hard to talk. I think this is also quite relevant to the scheduling part, which is about bandwidth management using eBPF again. Let me just paste a link here. Yeah, it's definitely the conference of the eBPF. Yeah, I think it was something that we could see coming already for a while now, but yeah, this one was definitely the confirmation that it's going to get bigger from now on. In fact, the one from bad management was quite interesting because it was starting to, yeah, they were adding this physical possibility of adding into your basically deployment. Also, the resources around how much bandwidth you want to allocate to a specific well pod, and that's quite interesting as that it's only possible through the way eBPF works and how you can get those kind of informations out of the kernel. Yeah, I mean, again, very cool. I'll paste the link there. I think right now I should have, oh yeah, there was, sorry, I was looking at the information here. There was something that we're talking about where they were also mentioning how with this new approach, you could also have higher speed of communications Let me just look at these. The scalability limits of token bucket filter by the bandwidth plug in earliest departure time, yeah, combined with eBPF. Yeah, something about being quite, yeah, cool, both in bandwidth management and getting more speed out of what's available. So what I'm looking at with eBPF is with Cilium to do sort of like cluster mesh, not only like a service mesh, but really multiple clusters to be meshed together at the pod level even. And you can easily do load balancing across clusters without having to rely on services, which for the batches case is actually quite interesting because we don't want to have the, like we don't really care about the service abstraction, we just care about the workloads. And that this is something we started prototyping, which is to mesh multiple clusters and be able to schedule across them from a single plane basically. Well, so that's interesting because when I chatted with them about scheduling across multiple clusters, I thought that their response was, oh no, this is really only meshing the networks together so that pods can speak to other pods in other networks in other clusters. But the scheduling of them, you'll still need to do somewhere else, right? Okay. Yeah, but it allows you to, like even if you have, if you want to distribute the workloads across clusters, you can rely on having like some services running internally in one cluster without having to replicate them everywhere, for example. And you would just, you could have this workload clusters that are really disposable while you have the service clusters in the same mesh or the service, the component clusters in the same mesh. So we've been playing with this also, but actually with some tricks you can, you can actually schedule across clusters as well. This is something we've been playing with, but maybe for another time. Yeah, I'd love to hear that and also whether you're seeing a reasonable performance for, often you want jobs to be co-located so that the network speed is fast enough for them to talk to each other, but anyway, that would be interesting to talk about maybe next time. The big thing we've seen is that you still need like node-to-node connectivity, like layer three connectivity between all nodes across all clusters, which, yeah, it kind of makes sense, but it's not like a gateway or anything. You need like full mesh between nodes as well. I think something like that was mentioned in the data doc talk about DNS, where they try that they are using CELium exactly to do port-to-port routing possible across multiple clusters. I'm not sure whether across multiple different cloud providers, maybe they achieve even that. But there you need some sort of VPN connectivity, I guess, because you need to expose all nodes to all nodes. This is our dream, which is to burst using a mesh like this, but it's actually trickier than it could be, I guess. If you look at other things for service connectivity, they use gateways. Here it's really like a full mesh between all nodes, at least my understanding up to now. But it is promising. Sounds amazing. But it's actually something, maybe we should bring them to present CELium and EBPF to the group. That would be cool. We're getting Liz to come to CERN in two weeks, so maybe she can also do the same talk at the group. Let's put it for the list here. They're definitely bringing a lot of more interested parties into this research group. You mentioned the gateway. I think the other talk was about gateway API, so it is going from beta to GA. It was quite good talk as well. I can link out, actually, because I had it in my schedule, I think. I'll put it for the next one, the topic for CELium and EBPF, but maybe we can then see. Yeah, I'm just pasting the gateway API link as well. Nice. Yeah, the other stuff I had here in the summary, I saw that there were a lot of references to batch workloads, not only in the talks, but also in the keynotes. So in the TOC update, it was mentioned that there was the new group formed as part of the tag runtime. And then also in the Kubernetes updates, the batch working group in six scheduling, and then the keynotes also from CERN. We mentioned the computing use cases, and there were other mentions. You mean the one that you gave? Yeah. Yeah, you're so modest. You're like, oh, the one from CERN. I don't know who those people are. No, yeah, yeah, yeah. So, no, but definitely it wasn't a coincidence. And also in the other ones, this has been appearing a bit everywhere, but I think it was clear that from the references constantly in different keynotes, slowly building momentum and when we see the other activities as well. And then there was one session dedicated to the Kubernetes working group batch that will also be the video uploaded. So Aldo gave an overview of the work that has been going on already and the plans. And there was not a lot of different people speaking, but there were, I talked to a few, and it seemed like there were both developers and also end users interested in using these tools. So that was quite nice. And just really quickly, so they summarized the motivation, I think we all know about it here. But they also mentioned that their goal is to, it's three main tasks. One is to update the job API to allow new types of workloads that are not just the typical batch job as defined by Kubernetes up to now, then things like queuing and advanced scheduling. And then I think the interesting part that there was a nice talk in the co-located event about was the optimized scheduling on the node itself to make sure that the nomad apologies are set properly so that you get full performance and you're not losing like 20 or 30% of your capacity because of it. So I think it was a nice, I think Alex, you were there as well, right? Yep. No, I was there. Very jet lagged, but yes, I was there. It was good. I would just reiterate the number, the amount of batch scheduling related talks, that the batch day and Aldo talk and I was on the panel a day later and then you spoke in the keynote and it was, we weren't quite at EBP status, but batch was rising in the ranks of conversation. It's good. And I'll pitch one more talk which was from some other CERN colleagues and they gave a talk later, I think Thursday. I don't think the video is uploaded yet, but basically what they've done, like we have this large grid computing environment and they've been playing with getting Kubernetes being a grid site and it doesn't matter if it's on premise on a public cloud, whatever, that presentation where they showed that they could scale a single Kubernetes cluster to 100,000 cores in the Google Cloud in this case, quite easily and fast and then even scratch it when they don't need it. Yeah. And they justified that this is like an out of the box solution to integrate new resources into our grid infrastructure and also the ability to request resources that we don't have. These are GPUs and their dream is to have like a home chart that does help install grid site and you just add it to the infrastructure. So they gave some summaries here of what they've been doing, integrating heterogeneous like ARM and GPUs and then they actually built an analysis facility on top of this. So they have the Kubernetes layer as kind of the base layer to add the resources, but then they add like Jupyter Hub and they add the ability to deploy like task clusters dynamically for different users. So people can do their analysis using Jupyter Hub and then scale out using tasks to a very large number of resources. So I don't think the video is uploaded yet, but for sure it will be, Nathan. So from the link, I will find the link in the agenda for you and then there should be like a little zoom with the video. So the reason my computer is blocking a bit, I'll post the link in a bit. So I think it's an interesting talk because it's a real use case and pretty large of doing both batch and kind of more interactive analysis. Is Panda batch processing yet another batch scheduler thing that people have written? No, Panda is like a specific scheduler for Atlas. So they have their own workflow manager on top and that's where all the work goes. So Panda is their thing. Yeah, yeah, I just wondered whether they, is it an open source thing or? It is, yeah, yeah. Because, yeah, I can't find anything on it. Panda batch processing does not use the Google address. Say that again. Panda batch processing does not yield a usable Google address so far. Yeah, so I'll give you one where the actual documentation is. So it is a generic tool, but it's very much, it's used by other experiments as well, but it was developed within Atlas. I pasted the link there. Cool, thanks. I wanted to find the link to the talk. Let's see Atlas. So here's the link to this one. And yeah, the video should appear there. I think they are done with all the collocated events and they started uploading the main conference videos as well. There's some sort of delay where videos are available. Like if you have the virtual access, you can go to the virtual platform and watch the videos right now. Otherwise, they will get to YouTube at some point as well. And I think that's it. That's all I have. Yeah, I think the expectations were met for the conference. They were, and they mentioned it's 65 percent new attendees in a coupon before. So that's pretty impressive. That's pretty awesome. I was really disappointed. It's a miss out actually, but definitely going to be there and try to be there in Detroit. Should be good. I really felt like it was three years worth of budget all spent into one coupon because of the pandemic I mean. Quite a lot of things going on, I have to say. It was a good one to attend to. Yeah, and definitely you can see the difference of having in-person conferences and being able to like just bump into people and discuss quickly. It's very different. Yeah, definitely. I remember I attended the virtual one the previous year, but then, yeah, you could see, you could definitely feel that was like, yeah, it just felt so less, so to speak. This one, I think I really enjoyed the part. The part that was not in the virtual one last year, which was the sponsorship boost basically. You could just go around and find people and just talk to them, which was something, of course, it was difficult. You can't do virtually, not in the way you can do it, at least in person. It was quite massive because there was like two pavilions full of sponsor showcases. The funny thing between was killing me to be honest. Three days after, I just couldn't move anymore. I don't know how he had the strength to also cycle on the weekend, but then, yeah, Mia was just barely hanging. Yeah, so I think that's what I had. But one thing that I wanted to ask as well, because there's not a lot of time between now and October basically. So if we organize a new batch and HPC co-located, I think it would be nice because it would help keep the momentum, but we need to be really proactive to reaching out to people, to do submissions, to make sure we have enough content. There were a couple of talks that were quite good that we didn't select for this one, but maybe we need to make sure we advertise this as much as possible, both in the new world, but also in like, there are some interests, like Nathan is here, that there was some interest in involving more things like more established components, like Slurm in the HPC environment and try to kind of to the bridge between the two and see what's the way forward. I'm sure Nathan will have a lot of opinions about this, so that would be pretty awesome also to have that discussion. I think his microphone never works, though. I think Ricardo A reached out and suggested that we submit something around Armada. We'd be happy to do something, of course. I also wondered, in that batch day, do you know how pretty bass ended up on batch day? It seemed like it was a weird one to include, especially if we had other good... Which one? Sorry? There was a whole talk on pretty bass during batch day, which seemed... I like pretty bass. It was Travis Adair and the people who did Horovod and Lidwig AI, and it was more ML... Yeah, so I think it was more to get... Yeah, I will have to go back to the notes, but I think it was because they had like this idea of a noteless Kubernetes, that is quite interesting, and also because they had different use cases. That was the reasoning. Yeah. I think because if you look at the schedule, there's quite a lot of components, or it's not really vendor, but like component based talks, but not so much end user talks, and I think for the next one, it would be really interesting to have those, but we need to reach out to... I see. Yeah. I mean, if we could get end users of any of the schedules that spoke last time, that might be very interesting. Yeah. And the other question will be, depending on how many submissions there are, if we make it still half day or a full day? I'm going to submit something, definitely. I know that deadline's approaching, but let's start it at least. Yeah. That would be amazing. Your sandbox, or you submitted the sandbox as well. Yeah. Yeah, that as well. Another thing we probably need to do, or we definitely need to do, is work out the next set of agendas for this. I think we've run out now. It tends to work quite well, I think, so in the upfront. Yeah. Yeah, we could use like five minutes for that. Yeah, I don't know if we'll be able to do it now, but we could go and think about it. But yeah, we've got, as you say, it's only really four months until next Q-Con, anyway. Not the next round up, but doing them in between. Because CFP is this Friday, you know, right? I know. I know. I've started it. I'm actually, I'm traveling to some relatives tonight, and I plan to just sit up late and just type up my submission. Because for the batch HPC, if we organize again the colo, it will be late at the CFP, but for the main event, it was already this. I'm just going to get it in. Yeah, I guess, like, if everyone has, we can probably drop the topic backlog we have there, apart from the jam session, I guess. Maybe if people can add there what they would like to hear about. So we just talked about CELium and EVPF. We had other things mentioned there, right? So we could get the Atlas people also to present, because it's like a use case. Would that be okay as well? Yeah, absolutely. We mentioned the Gateway AVI. Did we ever get a presentation on that? Probably not, right? I've missed a couple of sessions, but I don't remember one. Is that so germane to this group? I mean, it's interesting. It's good. I don't know. Do people suffer that in this world of research? I think the main thing would be how to use that for bursting and multi-cloud things, if at all possible. Interesting. Okay. I can see that a little bit. It just seems like it's so much more directly useful for if I have a product and I need different HTTP end points to go to different places. Okay. Yeah, but they have this concept, because there's the Gateway API and then there's this multi-cluster service API that basically allows you to make a service external and then define on the other cluster that it should consume a remote service and then they kind of link to each other. Things like that. Okay. What else did we have here? Cluster API? Was that one as well? Yeah. Yeah, we were very interested in that, I think. We never had to talk about it as well. Crossplane we did get. Something about Numa as well. That would be pretty cool, maybe. Yeah, probably not for a whole session, but I noticed the other day that proposal finally got merged. The one's about six years old around user namespacing. Do you see that? I did not see that in Kubernetes. Yeah, and it's coming up in the next Kubernetes release, right? Oh no, nothing's been implemented. I think this is just like getting the proposal merged. I thought there was intent to get it in soon. Maybe, I don't know, but yeah, at least a big step forward having it looks like an agreement on to do it, on doing it. Yeah, no, that's a super exciting one. Maybe the excitement in my voice is not quite relaying that excitement, but yes, super exciting. But where did you see that, actually? Hold on, I've got it here. Someone sent it to me. Yeah, Alpha release target 125. Okay, I see it now. I think that's the one. That's it, yeah. I don't think there's, okay, yeah, targeting 125, that's pretty good. Also, yeah, the enhancement got merged basically after about six years, which is good. I guess there was quite a lot of to figure out on the kernel on friend. Yeah, yeah, no, it's not a small one. I just saw, but I imagined it would never happen, given that it's been there so long. That sounds good to say. All right, that sounds amazing actually. I think Jonathan just put username and Brutalis stuff, that would be pretty nice. And yeah, I think we should get a talk from Nathan with a microphone as well and dedicate a session. That would be good. Or just sign language. We can add those. Nathan, would that be okay to give, like you just mentioned also that you have some reports from sites on what they want and what they report. It would be interesting to hear about that as well. Would that be fine? So, yes, it doesn't matter. It's a research group. We'll have to be accurate. And username as we also had, right? Yeah, we did get a talk from about Brutalis quite a while ago, right? It wasn't specifically about username. Yeah, I can't remember guy's name. No, it was Akihiro. It's always Akihiro on these things. Yeah, for sure it was him. But he gave a talk about Brutalis, but it was more like all the issues of network Novel AFS and when you do user space stuff. But maybe like a more focused talk on the user net is would be cool as well. Yeah. Yeah, so I'll just drop the link here because all this Brutalis stuff is being tracked by Giuseppe and Akihiro as well. That's a good link to have. Maybe take these topics, put them in slots and then we hunt for speakers. Yeah, sounds good, sensible. Nathan, do you have a preference on when to do this? Before summer, after summer? Okay, so we can schedule the other ones already and then leave the space. Awesome. Okay, there's plenty of talks to go and watch now. Should we put all these links in the agenda? I won't be able to do it now, but maybe maybe later. And then we link it from the Slack channel so that other people can go and check them as well. Yeah, we can get back and get these, can we? Yeah, from the recording, the text is also appearing after. I hope. I think we can also export it. I think yeah, in the recording we can find it. I think that's it. Do we have anything else, Jamie? No, I was just trying to see if I could export the text easily, but yeah, that's fine. We'll grab it later. No, nothing else from me. I haven't got a huge amount of contribute this time, unfortunately, because I wasn't there. No, it's good to see people got a lot out there anyway. One thing is that we do, I forgot that because we didn't do this this time, but remember we have the possibility of doing talk in the maintenance track as well, about the group. And this last time it actually was quite nice. We got a few people interested in the group as well. So we can consider for Detroit to also have a slot for the group. Yeah, I'm always up for that. I think that's good. It's just a nice refresher. We need to submit it when the maintenance track will come. We need to submit it there. Yeah, just tap me up. I'll work on it with you. All right. Good. I guess that's it. See you in two weeks. Yeah, thank you all. Yeah. See you. Thank you. Bye. Thank you. Bye.