 Good morning everyone. Just give it a couple of minutes for more people to join. I posted the notes in the chat so feel free to add anything if you'd like to add it. We do have some agenda items today so I mean I don't know if we'll get to everything but you know if we don't get to it then we can get to it in the next meetings. So I think so we've got Eric. So thank you Eric for agreeing to be our scribe so we are open to anybody else who wants to be a scribe so feel free to reach out to any other chairs if you're interested. And also we are open to having somebody else facilitate this meeting so I've been facilitating this meeting you know the last few months but if anybody is interested feel free to reach out to you. So I think with that we can kind of do a stand-up. We're also open as to what the format of the meeting is so if you have any suggestions you know for free to bring them up but with that we can do a quick stand-up. So I'm just going to go on what I see on my screen on the Zoom meeting so we have Renault. Hello Renault sorry I'm a bit late. Yeah so so basically the format is like if you have any updates you know feel free to talk about them or anything you want to talk about in the meeting you know and then we can talk about it afterwards. Okay and do you want me to present the topic I had in mind or that I had added sorry. Yeah I mean I saw that you added it so we have some presentations going on today so if time allows I mean you can you can go and present yeah we can do it and so you want to mention you want to talk about you want to mention anything about your your topic. Sure but it looks I mean you mentioned that you had a few items on the internet so I can do that afterwards. Yeah but it's I mean your topic is about device drivers right so yeah. Okay well I mean if I have five minutes then I have a slide deck too if you if you don't mind. Okay let me share quickly my screen. I mean we can do this afterwards so let's go through it like that yeah. Alina. Hello everyone I'm Alina I don't have any updates I'm just looking forward to today's presentations. Yeah so the update is that Alina has become our second TOC so welcome. Thank you. Yeah and glad to be working with you. Likewise. Hey Derek. Yeah I don't have any updates just join in. Yeah okay Eric. No updates from my end. Okay Fabricio. Hello so here's Fabricio working on Google working on gvizor to you know bring gvizor to you know Docker and Kubernetes. This is my first time I'm joining so I'm just here to you know watch and see what will happen in here. Awesome. Fabricio is that did I get that right? Yeah yeah sorry it's a bit difficult to pronounce it's just Fabricio it's the first time I'm joining I don't have any updates but I have my colleagues that will be presenting today. Okay thank you. Furcat. Hello everyone I don't have updates we have the colleagues who will be presenting so we are joining for that. Okay great mail. Yes hello I'm from the Metal Cube project so I joined to present the Metal Cube with Russell. Great Ray. Hello no updates from me. Good and Russell. Hey yeah thanks for having me as mail said I'm here to do a quick presentation on Metal Cube project that we proposed as a sandbox project. Awesome yeah so the next item that I have on the agenda is the roadmap so we've been working on this for maybe about a month so we said that we're going to leave it open for a couple of weeks last time so based on any feedback and then we said that we're going to go back and kind of start working on some of these items. I think we've already started working on some of these I've been actually reaching out to some of the different related communities. I've reached out to Kata containers, the advisor, firecracker community, the web assembly communities, web assembly has a lot of different projects so there's multiple communities but I've actually reached out to a few of them and we've done some of that work but there was still some I mean we want to kind of expand to even a broader scope of different projects and try to identify gaps of some of the CNC projects that exist right now and that are not actually filling that gap. So I think Quentin's not around yet so maybe we can chat around or maybe me and Alina can talk about it later and see what some of these items that we want to tackle in the future but if you have anything that you want to add to that roadmap please add it and then we can prioritize basically these are some of the things that we want to do first. I think another one that was very interesting is talking some of the MLops type of workloads and tools so some of those are not necessary in the foundation yet so we're looking out for maybe projects or technologies that can fill that gap. Anybody has any comments about this roadmap or anything that they want to talk about right so I'm just reading the chat right now so I think yeah so Renault yeah the stand up is just to check in and basically just talk about whatever you want to talk about later in the meeting and yeah if you are an attendee please add your name to the list and also a reminder that we have a repo and then if you are a participant or in any way you want to contribute you know please add yourself to the repo. Okay so with that I don't think we have anything else I mean I'm the one only one speaking so I mean I just want to leave it open for everybody who wants to say anything so so feel free to kind of just you know participate so yeah so with that I think we will go with the mental queue presentation so Russell and Mayo you know just take it away. Sure thanks thanks again so we're gonna do a quick presentation and then you know leave it open for questions yeah any discussion but yeah one thing that I wanted to mention sorry so this meeting is getting recorded so typically I think you're going for sandbox so typically what happens is that you know the meeting gets recorded and then later you know there's a review by the by the SIG and there's a document that gets maybe posted in the github repo or gets checked into the github repo and later the TOC members have a chance to see the presentation and and based on that you know they decide to either sponsor or not sponsor the project right and then for entering into sandbox the requirements is three sponsors and the TOC so yeah so go ahead okay cool I yeah we there's a github pull request against the TOC repo with the document that covers a lot of a lot of details about the project and the proposal so if anyone's interested of course you can check that out afterwards and and find out some of that of course we're going to cover a lot of what's in there today but give me a second here to share the slides here all right can you see the slides yep okay great so yeah the project called metalcube and so the the high level overview what is this this is a project to provide kubernetes native bare metal host management it's a getting into it so managing bare metal hosts or provisioning and provisioning bare metal hosts this is not a new problem space obviously people have been provisioning bare metal hosts for a while so why do we want to build another one or or you know build another approach to this so starting there a big one is really the api the you know we wanted to explore uh this this problem space the problem space of managing bare metal hosts with the declared api so how would that look like we want to have a declared api for managing bare metal hosts and so we did that by creating custom resources with kubernetes to do so we also wanted to do something that was designed to run within a kubernetes cluster so self-hosted and one reason for that is again you know managing this software how you would manage other applications but really a major one is also the footprint required for a bare metal kubernetes cluster we didn't want to have to require something off to the side to to run the bare metal host provisioning stuff and part of that is for certain environments you know some bare metal kubernetes cluster use cases would be a large cluster in a data center other use cases would be very small clusters the edge computing use cases where requiring another host is really unacceptable so we needed to address this problem of space with something that could be self-hosted in the cluster and then we also wanted to be able to have a cluster manage its own infrastructure that was another aspect of this and that gets into not what we've built is not just something that manages hosts we have a and I'll get into a bit more detail in a moment that's something that just provisions hosts but also can we're looking at provisioning kubernetes clusters so building on some tooling out of one of the kubernetes six the cluster api project integrating with that to allow a cluster to manage its own hosts to become additional nodes in a cluster so looking so that's that's why we did it you know some of the problems we're trying to solve so the the first major component to this is the bare metal operator this this shows some detail but i'll talk through it there's a component called the bare metal operator this bare metal operator is something that runs in a cluster and it manages a custom resource called the bare metal host and the bare metal host is the declarative interface for for a host for a bare metal host it has details about the hardware and and what stage you want it to begin so you update this to describe how you'd like a host to be provisioned and there's some secrets that contain some key details one of them is the config drive secret so this is if you've ever used any cloud compute api there's always a like a user data section where you pass data that'll be given to the to the host when it boots up the first time so a tool like cloud init or ignition can initialize itself on first boot and we support that that same interface the way that we do that with bare metal is that we write that information to a dedicated partition so that the first time that the new operating system boots up cloud init or ignition or whatever tool you'd like reads that data from that partition and initializes itself so what does this do so you know you have this resource and you maybe you update this to say you'd like a host be provisioned under the hood we're making use of ironic this is hidden as sort of an implementation detail but we're reusing existing bare metal host provisioning technology here ironic and it knows how to contact a management controller and a host boot up a special ram disk this ram disk knows how to download the operating system image that you've decided you'd like provision to a host write it to the correct disk also write your user data to it like I said the config drive partition so it's kind of a high level of view but we build and quick overview of the api these samples are sort of cut down to fit in a slide but give you an idea of what the api looks like this is a bare metal host custom resource in the spec of this there's a few things one of them it says bmc so that's information about a management controller on a server the management controller is what we talk to it's sort of out of band management for a server we can use that to do power management to turn the server on and off or control boot settings and so this is key for for doing management or automatically triggering provisioning of a host we need to know the mac address and for doing pixie-based provisioning we need to be able to recognize the host when it shows up on the provision network the consumer ref here you know this this interface and this api can be used for for any reason that you want to provision a host you could just do generic bare metal host provisioning for you know whatever you want you may it's also designed to be integrated with like layers above it so for example if you're using the cluster api integration which we'll we'll talk about in a few minutes then we'll have references references to what you know resource just claimed the spare metal host and what it's provisioning it's here showing a machine from the from like say the the cluster api uh group we would you have a reference to what claim the spare metal host the image section in this api this is the operating system image that you've said you want provision to that host and then um if user data that's the the data that there's something a cloud network issue would consume so a reference to where that is stored and the status section of a bare metal host another thing that we do when we read one of these hosts under management that RAM disk that we boot up that then knows how to you know download an image and write it to this this this special RAM disk also knows how to inspect the hardware so it gathers as much detail about the hardware as it can and and sends it back up so we can store it in this resource and this is uh heavily condensed there's quite a bit of hardware details we collect about CPUs memory network interfaces storage and you know anything we can find but this is sort of a sample that shows here a bit of info about CPU um a little bit of info about a network interface and how much RAM is there and then in the status section also have the state of provisioning so in this case this host has been provisioned it shows you which image had been provisioned to that host so that's our declarative um api for for managing bare metal hosts and on top of that we integrate with the cluster api project um and I will turn that over to uh mail to discuss this in a bit more detail sure thank you um yeah so basically cluster api is uh is a project from sick cluster lifecycle and the idea is that you would be able to manage your clusters your Kubernetes clusters declaratively using the Kubernetes API so this slide is just like a breakup of the cluster api project um like the the main idea that is behind this if you're not yet familiar um so you would have basically a bootstrap cluster or a management cluster that is a Kubernetes cluster and then the user would be um interacting with this cluster like creating CRs um that would um actually be representing the target clusters that this user wants to deploy um behind the hood um under the hood the the cluster api actually interacts with different cloud providers so aws azure um like the google cloud like uh any any kind of cloud providers so um we we came to think like that the um well this is this is really cool to be able to to manage the feature uh the manage the cluster um like this and we wanted to extend it so that we would be able to actually deploy the clusters not on the cloud provider but on actual like bare metal nodes on physical hardware so um for this uh sorry can you change to the slide please yeah great thanks so um for this basically um cluster api defines a set of resources that for example is like cluster machine um that represent the the the cluster um and it um allows like um providers to to bring their own objects like um an equivalent of each like uh an aws cluster for a cluster an aws machine for a machine like like um this is the id behind the hood so um what um we did then to to integrate with the cluster api is that we um created those those machines um those um infrastructure specific um um crd's like metal cube machine and metal cube cluster with the cluster api provider metal cube that is actually a set of controllers that reconcile those objects and the the the core difference of um what we've been doing uh versus like what you can find in aws or in um azure providers is that uh we do not have of course a cloud provider api that we could use to start the machines so um that's where we like basically integrate with the bare metal operator um that is the core project of um of metal cube um so that then we take care of the provisioning of the physical hardware by interacting with the bare metal operator api that is actually the bare metal host so um this is the uh logic and uh like under the this cluster api provider metal cube if you can go to the next slide please yeah i just like put here a simple example of what we can achieve like is basically you can consider that each of the squares are actually like uh physical hardware um so we would like start um a first node with this bootstrap cluster um this can be achieved by using a specific iso for example um then deploy all cluster api and metal cube components and then we will we would be able from there to um to deploy a target cluster with like for example the here master node and the worker node and once this is done we can use a cluster api feature um to move all the crs to the target cluster and then remove this uh this bootstrap cluster so that then the target cluster is self-managed so um but this is currently working working progressing in in our um in our project yeah so we can go to the next slide yeah so some of the future work um as he was just saying the the cluster api pivoting is on this list this is where you know you have a bootstrap process you start with this bootstrap cluster but then moving components into the resulting cluster that's particularly relevant for those smaller footprint use cases uh machinery remediation is another one where that's detecting that there's problems on hosts and being able to automatically try to do things to to repair the cluster that would commonly be just trying to reboot the reboot hosts to get them to recover or taking the amount of service if necessary there's also like uh more detail management of bare metal uh in different ways so automatically creating raid volumes during the deployment process it's another thing or different firmware aspects so managing um bios or settings during deployment as another example so a lot of that those things are just some samples of stuff that's future work or work in progress rather um there's a website metal3.io lots of info um there's a developed environment if you have a host that um a single host we can set up a development test and demo environment using virtual machines and we set up uh different bits of software so that we manage these virtual machines just as if they were bare metal hosts um it's great for giving a shot um there's also there was a coupon talk that included some demos it's linked from the blog on the website but another resource to find out more if you're interested um I wanted to also close with a few sort of community project highlights a bit of overview um the the project we started this beginning of last year so you know a bit over a year now we do we do have um production deployments happening this year the code it's Apache 2 licensed it's all on github um contributors uh you know the two there's several repos but these are the two biggest ones are the sort of prime ones um where there's new code being developed and you can see how many individual people have contributed to each there's a list of um companies that represent the contributors so far um and of course uh the two of us here from from two of the two of the companies on that list the way we communicate is a project in the community we have a mailing list that we use we have a channel on the kubernetes slack that we use we hold bi-weekly zoom meetings uh to catch up with each other um and where we are on the internet all elsewhere um we've got our websites and that's what our account the youtube channel youtube channel has videos of our our meetings and also things like demos um go there as well so that's just some some highlights there and with that um thank you very much for your attention and listening and i wanted to open it up for any any questions or discussion that you may have about what we've done yeah any i have a few questions but anybody else any other questions i have a couple of questions the first the first question is uh so probably about the maintenance uh and the upgrade for the for the provision hosts um how is it how is it different from the initial provisioning what are what are the stages like if for example i want to vertically scale it how do you how do you to scale out a cluster or to just to scale up a host like at cpu or add memory to my to my to my bare metal node i'm sorry i so you want to to scale a cluster or are you talking about managing us something about a specific host specifically specific because yeah um i mean you can reprovision a host anytime um so just talking about the bare metal operator part you know not the cluster api part i mean the way the way the interface works is you can um you can have a host provision or deprovision at any time um you can if you deprovision it it can also do cleaning so like it'll go in and wipe the discs for you before before it puts it back into a sort of indie inventory of available hosts and reprovision and when you reprovision it it's the same process so it takes the the way the the provisioning approach is it's it's doing whole disk images so it's like any other it's like a cloud image for a cloud you take it in it's it wipes the desk so it's not it's there's no no management like inside the operating system kind of thing like it's a whole disk management thing you know you're you're the white you're wiping it completely whether is that answer the question yeah it does and my other question is about the configuration management so one of the steps was the providing the image for for the boot what if as a user i just want to install some tools like continuity for example is there is there an interface for for doing that extras yeah there is so the interface for well i guess i'll have a two-part answer for that and then see if mail it may have some additional comments the at the the bare middle host api has a user data interface so this would be something to be processed by cloud in it or for ignition so you could say you could include in there run these commands the first time it boots to install this additional software so for example like when i'm doing testing i i will commonly use the generic sentos cloud image just like the generic one that's distributed from sentos and then pass it additional data to say install my ssh key for the user or any software i would install right away and it supports that interface so the the data you pass gets written to the to a special partition and so when that generic cloud image boots up the first time it reads that and it will install whatever i asked for so that's the interface provided at the the base provisioning layer and then what you send to it and what you tell it to do would be kind of what you build on top of that and then our prime use case being provisioning kubernetes clusters the the cluster api project that's layered on top and then our integration with that generates that config using kubadm for example and and and what the configuration it generates there uses that interface so that when we boot up an os image it runs all the commands that that were specified to install the right software run the right things and turn it into a and all the user data all the user data that is being provided as the first step during the post provisioning is it stored in the crd in the cluster or can it be stored somewhere else and just referenced yeah i think um well right now we supported being stored in a secret and then the crd references that secret and that has the contents i don't i can't remember if we have any support for storing it somewhere else but i mean i you could have a secret that just says like when the host boots up tell the host to go pull it from somewhere else right i mean it could be like a stub that just says go hit this web service to pull down what you're really going to do that's actually what we do and um well like so we have this integrated so i work for red hat we have this integrated with with open shift and that's how that's how a lot of our configs will work because it'll be just it's sort of a stub where it's a you know that's written there and the big thing it does is it goes off and talk contacts a web service to pull down like the real stuff it needs to go do and that way the contents they change more dynamically um so yeah that's that's that's that's what's implemented there i mean we could also have a thing but that answers the question thank you yeah it does cool thanks for the questions appreciate it i have some questions so the management cluster that's a requirement right so if you um want to manage uh some the same cluster that is the management cluster would you be able to do that with uh you know if you want to have bare metal notes for that management cluster what is the recommended architecture is it recommended to have just that always that have the that management cluster or or can you just have bare metal providers for multiple clusters um so i can maybe answer this one so the you can have basically the both both approaches um you could have a management cluster with a bit more like that's a bit more chunky um that would then be used to deploy multiple target cluster if you wish but the main goal our main goal is to have self-managed target cluster in that case we use on that kind of what you call management cluster or like bootstrap cluster um just temporarily to bootstrap the target cluster and once the target cluster is up and running on a couple of nodes then we transfer all the like management uh towards that um that target cluster and then we can get rid of what is like this bootstrap cluster or fmro node and in that case the the target cluster becomes self-managed and if we want to add a node like to scale up or scale out or whatever like we will do it directly by interacting with the the cluster on which we are operating got it and then and then so this this bootstrap cluster can be something like mini cube or something like that or or like um so it depends um it if you have access to the proper like uh networks um you could do it with mini cube meaning that at the moment there is a requirement that you need to have a layer 2 connectivity between the bootstrap cluster and the target cluster um we're working on trying to lift this um by using a specific features feature of the dmc so that we could do it of a layer 3 network um but it's not the case yet so you need to have this network uh requirement fulfilled so if you have and your like laptop or whatever is connected on the layer 2 network with your target cluster then uh you can use um that as a as an fmro node but usually it's not the case so that's why like um in in most of the projects that are using metal cube like for example airship um or what we do in in in Ericsson um we actually create an iso image that we uh start on the on the bootstrap node with everything uh included in there so it's a self-contained image that just like starts a whole like standard on Kubernetes cluster on one of the hardware and then we use that to provision the target cluster and once we have like pivoted the resources so that the target cluster becomes self-managed we scale up the cluster to take the previous node that we used as bootstrap in use into that target cluster got it of course like the whole flow is still at work in progress like we're still working on this yeah and then um in some terms of the security for the firmware i mean if you have any protection there with with the um api handle that too the for the the operator the the metal cube operator handle like if you if there's some sort of uh password or something that talks to the ipmi modules or something on yeah yeah yeah there's different management controller protocol ipmi being sort of the the least common denominator one um but there's more moving more towards redfish that's a newer standard and then there's also a lot of vendor specific interfaces that that we can support and all of them have authentication um and we make use of it you store the authentication details and the secret and references reference that secret from the bare metal host object so i mean yeah you have to provide those credentials i mean you're basically when you enroll a host and a manager into this it's you know we have have complete control over the host so you know you have to lock down access to these to this to this api to of course but the um yeah does that make sense yeah yeah i was just thinking more about uh some of these newer things like uh nitro from aws and when they have these enclaves on on the machines where where you know only if you have a specific fingerprint you're the only one who can access that machine or something so uh preventing people from accessing your the lowest level of your infrastructure basically your firmware right so uh so if you know somebody or a hacker can get into your lowest part of your infrastructure they can basically gain access to everything right so so and i was just thinking about how metal cube would interact with with these you know technologies right yeah that's interesting i mean i mean the extent of the the authentication we can support right now as a username and password um to the to the management controller and anything more sophisticated can't do with it yet i don't know everyone who's exploring any any of the more sophisticated access control um that's what we do right now and then the use of that um you know you typically put that on an isolated network like you might put that on a network that that's not reach you would ideally put that on a network that's not reachable by any of your workloads on the cluster for example yeah lock that down as well um and then on the components you have two components you have the the cluster API component and you also have the the the operator right so that those two components are part of the project or the other like separate uh because i think the cluster API component is more related to the cluster API the Kubernetes cluster API or is it how is that bundled together yeah so okay so yeah those are the two um so if you were to look at our github you'll see that both of there's a repository for each of those components and then there's another one that sets up like our development test environment and then there's others that are like container images and that sort of thing but those two components are um yeah there's there's standalone things there's standalone kubernetes controllers that that you'd run in your cluster um the first one the bare metal operator is very focused on provisioning bare metal hosts but it's it's sort of it's a more generic bare metal host provisioning interface so you can provision you know one two hosts or whatever operating system of choice hosts for any purpose using that interface the cluster API integration is a layer on top that that then takes some of the more generic cluster API controllers but um but provide like that's a they have generic controllers and then you have to integrate it with any type of infrastructure platform and that's what we provide here and since that's so tightly related and um integrated with the our bare metal operator controller you know we have them under our same project umbrella but those are the two key things but they you don't have you don't have to use both you could use just the bare metal operator for example um but of course the the cluster API thing that is a popular thing you'd use on top of it right because I mean some people just want a provision bare metal host they don't want to yeah structure the greatest cluster right yeah exactly that's that's why they're that's why they're architecturally separate um because that's a problem to solve on its own for you know you know not specifying what you're provisioning or for what purpose and then um and if you happen to be provisioning Kubernetes clusters that's of interest to us and we have an additional component you can use for that uh yeah one more question so what is the component that you just opened stack ironic is that the yeah ironic is um so you know we have we built the API um and the behavior we want is the declarative interface to do the lower level provisioning aspects we integrated ironic it's um like within the code there's this is sort of like a plugin layer where you can plug in different provisioning systems that can fulfill a set of needed operations ironic is where we started we had a lot of experience with it it's quite um it has a ton of features it knows how to talk to a bunch of different interfaces including a number of virtual um vendor specific ones it's got a pretty good community with um with um participation from a lot of hardware vendors so it just got us it got us going pretty quickly architecturally though like it's kind of a we run it inside um maybe you know to leave it open to exploring um either all different options or or or performing certain operations with something else that sort of thing but yeah we use ironic and we typically uh we'll have it deployed in the cluster as well or with even within the same pod um yeah I could talk more about how how we use it we use it in kind of a pretty unique way but that's right that's a key component we depend on thank you anybody has any other questions so so yeah we'd like this to become interactive so if yeah so this is Eric I had a couple of questions and um apologies this might have already been answered but I just wanted to understand that the cluster she guys are provision or pull on their metal hosts right it's nothing like the answer I don't know I just want to just clarify you guys they'd integrate with other configuration management tools like chef and puppet or is it just more so strictly speaking their metal um it yeah I mean our our use cases is full bare metal hosts you know things that you take you know you put in put in a rack and um and have you know physical access to that that's that's our primary use case we have some develop environment stuff it's based on virtual machines um but like the the project the whole point the whole point is is bare metal hosts for on-premise use cases yeah okay and then you mentioned also using ISO images so I was just wondering if I guess the OS is baked in or if there's a desire to make that I guess pluggable it seems like with Kubernetes especially you have you know specific OS's that are very much trimmed down so I'm wondering if there's I guess one preferable metal I guess OS I imagine sometimes with vendors you know especially in the hardware side you've run into a lot of challenges so just curious to see if there's you know an area there with drivers or yeah I guess vendors become problematic in terms of um so operating systems at the bare metal hosts provisioning layer it's completely operating system agnostic you just need to be able to point it to a disk image and that could have any operating system in it I mean it needs to be compatible with the hardware you're trying to deploy it to but um it doesn't really care um now once we try to reboot and the host tries to boot off the image you gave it I mean you gave it something bogus then we'll get fixed that but um in any case it's it's operating system agnostic now as you get higher up the stack and you start trying to automate doing something with the host like try you need to start um say when we get to the case of trying to provision Kubernetes clusters then uh that that starts to depend on which operating system you're you're running on um I think if I may add they're like um so in the development environment that we have we are able to provide clusters um provision clusters with both CentOS and Ubuntu images and internally in Ericsson we have also done it with SLES so it basically it really just depends on um how you install the different components that are needed like kubadm, kubelet and um docker or whatever CRR that you're using um so it can be it can basically run pretty much on any OS then of course some will be better adapted to some hardware than others but then that's the choice of the of the user like um it's just about providing a disk image of that specific OS and then like um changing a bit the the installation parts in in cluster API like what executable you install and how you you you deploy them we've also used um uh rel and and core OS or the other ones that that we use heavily and our use that probably covers all the ones that we've been trying thank you sure thanks for the questions uh so in terms of the users that are trying this or you mentioned there's uh do you have some that are deploying right now or I think that's some of the questions that come usually from the TOC that uh that they sure want to see adopters or kind of early adopters initially so yeah so I mean so I can I'll speak and then I want uh mail to cover his his part of this too so at Red Hat we we're we've adopted this to to be part of our bare metal solution for the OpenShift so optionally be able to automate provisioning a bare metal cluster bare metal kubanized clusters with our with OpenShift and we're working with customers right now to to use this for correction deploy deployments um and so that that's that's our our take I can't call out specific customer names um I can just say that we are working with some with a with a number of customers yeah yeah I can also probably add that um well we are in Ericsson of course like using this internally but then on top of that there is an um project a project called Airship that is led by AT&T um that is making a full use of um of MetalCube to deploy on bare metal so the the goal of that Airship project that is under the OpenStack umbrella is to deploy an OpenStack cluster for 5G networks um and and they are using MetalCube under the hood got it any more questions any any ideas any questions any comments right so sounds like no more questions well thank you guys thank you for for your presentation that was great um so I think the next steps is uh for the sick to review and um I I need to go back and figure out what the process is I mean it's constantly changing as far as sandbox so uh typically I think there will be uh some documentation from the project that they would submit or with the uh settle set of items uh and and then the the SIG would actually write the uh official recommendation to the TOC and then from there that it it would be up to it would be up to uh the TOC if it gets enough uh sponsors to become a sandbox project all right well thanks again for hearing us out appreciate it yep all right so we have a I think we have one more one more item from Renault uh do you want to talk about your presentation uh give me a quick second to um share my screen um I have some I have shared the slides in the um issue um quick note they're pretty obscure um I'll try to change them as time goes on um so my name is Renault I work in NVIDIA I'm a software engineer there um I wanted to present about the container device interface um I'll give you some background on device support um I'll give you some use keys and then I'll talk briefly about uh what is container device interface um who does it come from just it's not just NVIDIA um and um I mean how we thought about it but we're we're definitely open to read that um very quickly I'm part of the group that originally built the device plugin interface in Kubernetes I maintain in a device plugin for Kubernetes I maintain a stack to support devices in different runtimes whether it's Docker, Podman, single IRD I've interacted with many runtimes and more recently I've been working on the OCI hooks to help device support in um the thing the first point that most people want to address is why is it that um Docker run, Podman run or or basically why is it that this option dash dash device um my device node is not sufficient this is the example for that um and um in in the past and even today enabling a container to be device aware is is is s or used to be as simple as just exposing that device node in the container and dash dash device and for you it works pretty well for for for wide range of devices that would be simple um as um you go in more complex devices um it happens that more things are needed um for the simple use case would be that um you actually need to expose more device nodes um you might need to expose IPCs um for example Xorg or vendor specific IPCs um you might need to expose files from the runtime makespace or even change some progress entries um or more generally you might want to perform compatibility checks is this container going to run with this device um you might want to perform some runtime specific operations um what you do in a Linux container um might be very different than what you do on a VM um container runtime or even you might want to perform some device specific operations um I think um if you look a bit at um some of the devices um some of the third party device um support that you have out there for for for for devices it is a very fragmented space Kubernetes supports device plugin docker has an entry plugin mechanism podman has a concept of hooks um that allow you to run oci hooks on a container known that has a concept of device plugin but it's different from the Kubernetes device plugin singularity has a concept of plugin singularity is an hbc runtime if you're not familiar with it and you could go on with many different runs like c also has a concept so why is there a need for a specification um generally the user experience is not very consistent you you won't even get the same user experience if you are using docker and Kubernetes that uses docker um so you can't get the same user experience between the runtime and your illustrator even though they're using the same runtime plugins can be moved from one runtime to another even when it intuitively should be very straightforward for example having a plugin in docker and having a plugin in podman is is not something that can be easily done um and as vendors we end up either in maintainability help um spending time on sure facing a feature to different runtimes um resorting to hacks um i'll present some of the use cases from different vendors in the next slide uh where um some people or at least for example for nvidia uh we ended up writing a run c shin so that we uh basically um hijacked the oci spec well hijack is not really the word but when docker passes the oci spec to run c we would um inject our hooks in that spec uh or for other vendors uh what that means is that um you just exclude runtimes from your platform um so going into some of the use cases um i'm going to go with i'm going to start with a simple one with intel fbga i've gathered some of these at least on intel specifically i've gathered that from the conversation that we were having at uconn us um where um generally some of the operations that they need that they need are pretty static basically they need to mount other device nodes and they need to reconfigure the spga with the correct function um one of the requirement me meaning that they don't want the container to be reconfiguring the spga because that would be a security risk for them um and from what i've gathered they currently mostly use cryo and kubernetes to inject oci hooks and i don't think they have a runtime over runtime specific mechanism to do that for docker um other than just passing the right arguments on the command uh melanox is another use case um it's a it's a um it's a specific use case because it's it's actually an edge um but there is a device component in there um where um melanox so provides basically um ethernet and infiniband makes um that are using many data centers um we're high performance clusters um we use that for example in deep learning um they have a specific interesting use case um where they definitely need to mount device nodes but they also need to mount user libraries so um their specific use case is when you install the melanox driver the melanox driver is also going to install userline components libraries um and because they don't provide backwards compatibility guarantee when they um give you um the melanox driver 1.0 you can only use um the libraries 1.0 to talk to the driver 1.0 so there's no 1.0 um they don't have server versioning so um but the example would be that in their case the next version that they would provide is 2.0 so what that means is you can't put the 1.0 libraries in a container uh because if you were to move that container to another machine that had 2.0 um while that container wouldn't run where the the calls that you would make in the libraries would fell uh on the driver um another use case which i'm definitely more familiar with is the nvidia use case uh where um we provide a a stack to help with gp integration and there's there's a couple of things that we need to do for example mount device nodes uh mount userline libraries we have the same problem um as melanox where we don't provide any guarantees and so um for a continue to be gpu where we need to mount libraries um that were installed with a driver from the host we need to mount some unix sockets for um specific um components so persistency being this demon that keeps the driver loaded at all times uh nps xor we need to update the procfs entries so for example let's say user wanted to only show one gpu out of the id and gpus that he or she has on the machine we would need to um hide the gpus that are available in the procfs the other gpus and we might want to perform compatibility checks between the container and the gpu um for example um we have many generations of gpus and um if you compile a code or a container for a specific a gp architecture you might not want uh your code to run another gp architecture uh because you end up taking a performance increase um we're sorry to interrupt uh it's just time check uh we're almost uh at the top of the hours i want to be respectful of everybody's uh time so um do you want to continue with this uh now or for the folks who want to stay around a little bit longer or do you want to kind of take the discussion uh into the next meeting um i'm happy to take the discussion in the next meeting or continue if people want to stay around um i think i've given the gist of the use keys just to um basically the next few slides are mainly about just presenting the solution that we came up with um so depending on what people want to do i'm happy to either continue or wait until the next meeting uh i think it yeah it's your call mostly i think uh um maybe i recommend you know talking about briefly uh and talking about it briefly in the next meeting and and then uh then we can jump into maybe a discussion or something right for a few minutes right so uh because that that that you know you will get more uh eyeballs during the the meeting time so definitely so okay um i'm happy i'm happy to talk about it in the next meeting in that case awesome so any last minutes questions from anybody here that that about this topic um before you know we just um end the meeting okay so yeah so thank you everyone uh yeah we'll see you we'll see you next time and and and then we're now if you want to follow up with me anything feel free to bring me on on slack so sure um i think just generally i'm trying to get some eyeballs on this and figure out if this fits in the runtime uh sig runtime scope from what you've told me it looks like um it it seems it seems like it fits this scope yeah i think it's it's relevant and uh as if it's not covered that is specific you know and it's more around workloads and how you can use it to fill facility workloads then it is more of a fit in sig runtime than Kubernetes but if it is something that is uh has to be uh you know defined and and implemented at the Kubernetes level it will be more of a fit in Kubernetes but then yeah and then and we talked about it and before and so you know it's definitely not Kubernetes specific yeah and there are Kubernetes implications um but if this is more of a runtime um got it all right thank you everyone have a great rest of your day and stay safe uh stay stay at home and don't don't go out too much thanks Ricardo and see y'all thank you thank you everyone and bye thank you bye