 Here's just a little bit about us. I'm a senior engineer at Microsoft working on service mesh things, but I like to do anything to distribute it systems or Kubernetes, which is how I got involved here. I'm an MLOps engineer at Meringue Code. I'm the co-founder, so we do a lot of machine learning deployments on cloud native, with cloud native platforms, so we do a lot of work on Kubernetes. So we're going to run through these four pretty quick points today as it is a lightning talk, and hopefully by the end of it you'll want to come and help us make Kubeflow Metal better. So going to start by answering the elephant in the room, what is Kubeflow Metal? So Kubeflow Metal is a new way, I guess new, of deploying Kubeflow onto a Kubernetes cluster on bare metal servers. So what it is, it's a Terraform module that is composed with a couple other Terraform modules that will spin up a Kubernetes cluster on Equinex Metal, and it will deploy Kubeflow on it based on the manifest. This creation, like I said, is on bare metal clusters, so it doesn't have the bells and whistles of a cloud provider, so you've got a little bit more control. We use KubeVIP as the bare metal load balancer for your cluster, so when you create your Istio and Gress Gateway, it's going to be powered by a KubeVIP in the background, and we actually, in the future, hope to have ways to bring your own cluster, where we just assume that you have load balancing provision there, and we're also looking into independent versioning of those components. Yes, so on the high level, this is all the architecture looks. What we've done is basically replaced the underlying managed infrastructure that you normally get from AWS, Azure, or Google. So with that, we can just provision with the head of Terraform, we can provision the underlying infrastructure, and you have your Kubernetes environment, similar to what you have in the cloud environment. So everything is managed by Terraform, so if you can write Terraform scripts, you can easily create a provider to target whatever bare metal box you want to access. We have load balancers implemented for you with KubeVIP, it's easy to configure and provision, and the cool thing about this is it's a cheaper alternative to cloud infrastructure where you can basically have a fixed cost rather than something that just runs as people use the underlying infrastructure. So this is a kind of a high level view of the components of KubeVIP, what you get when you run a Terraform apply. Like we've mentioned, the underlying infrastructure here is going to be equinex metal servers at this time. We're looking into maybe adding different kinds of providers, like line out or digital ocean into the future, and then of course we use Terraform to manage those boxes. KubeADM is the process for actually installing Kubernetes on top of that cluster, and the good folks at equinex have already put a Terraform module together to do a lot of that stuff, so shout out to them for helping us do this. For PVCs that you often use for Kubeflow, what we've done is we used something called the local path PVC provisioner, and what that will do is instead of having to set up a secondary like storage component of your equinex cloud installation, we'll just use the nodes of the Kubernetes cluster in order to provide you PVC storage. There's a plugin out there that does that, and we use that for just a really simple experience, especially if it's just CI CD or just testing. It's really quick and less hassle to have to deal with. Now, of course, on top of that, you've got Kubeflow. These are the different pieces of Kubeflow that we install by default, but we hope to add more plugs for folks to modify and add other things. So this is the installation step. It's pretty easy. So we have a demo repo in the main repository, Kubeflow on-prem. So you can go in, get cloned to repo, configure all the environment variables. You can run the cubes, the Terraform apply. It runs all the scripts, and you're validated that all your packages are installed. We've got a video here because I was nowhere near brave enough to try to demo this live, and we've also sped it up a bit. So this is what the process looks like. If you're familiar with Terraform, this looks familiar. From the beginning, like a complete install, it's going to be 602 different Terraform resources, including your cluster, including all the different Kubernetes manifests that you're applying to your cluster. We've sped this up because the process here running on my box takes about 18 minutes or so, so you throw more compute behind that. You're going to see that time increase, and it also takes around the same time, a little bit faster. I say maybe about half the time, actually, to tear it down. So in your head, start imagining some of the CI-CD use cases, the elasticity of being able to tear a cluster up with Kubeflow and tear it back down to validate some experiment or to validate a version of Kubeflow even. There are lots of applications for this. This finished about 52 seconds real time. It's about 18 minutes. This is what it looks like. This is the familiar Kubeflow dashboard where you can click through. You're going to have a, like I said, a Kubernetes Ingress Gateway or Istio Ingress Gateway with an IP address. We configured it to be a load balancer IP, so it's going to be a load balancer IP address for your Kubeflow installation. You can put that behind a DNS A record or whatever else and be able to access the dashboard that we all know and love. Why should you consider Kubeflow Metal? Yeah, so the number one benefit is a fixed cost. If you're still experimenting with AI and ML and you're not fully, you don't want to spend a lot of money in the cloud. You can just gang up some bugs, sabers in your office and just try to install Kubeflow Metal on it. The other advantage that we see a lot from running this is like you can quickly bootstrap an ML environmental infrastructure for your team. So if you're still doing stuff and you're new to this, you can quickly bring up the environment and use it. The other cool thing is the deployment is very elastic. So you can easily add 10 nodes, scale it down, destroy it, and make it 20 nodes and things like that while you're testing. The other part of it that is going to be really interesting is like plugging it into your CI-CD process. So you can have a CI-CD process that basically brings up all the clusters you need. You can provision from a GUI where you can specify the size of parameters and all the things you need to provision to tell from infrastructure. So that's something we're really working on right now. And there's some special use cases where you don't want your data to touch the cloud. Things like financial data, insurance data that you don't want to move to the cloud. You can easily run it on Kubeflow Metal in-house. And IoT devices, if you're doing federated learning, you have a lot of IoT devices in the field and you don't want to quickly build models to do inference and things like that. You might want to consider Kubeflow Metal. Yeah. So we are looking for more people to make this better. I think the current version on GitHub is 010 Release Candidate 2. So we're very early, but the cool thing about open source is you can just share it with people and just talk about something cool that you've done. So on our roadmap, we want to expose more knobs in the configuration, change the password, all these things take it to the next level and make it a lot more feasible for folks to put into production. Yeah. And currently, right now, we've demonstrated this proof of value with Equinix. So we're going to extend it beyond just that. And now, we have all the providers for bare metal boxes. So things like Clanode, Digital Ocean, and if you have any infrastructure on-prem, we might be able to build a provider for that as well. So we're basically looking for people to help us with these efforts. So if you're working on Digital Ocean and you want to walk on the tariff from providers for that, we can use them to help in that place. Yeah. And wrapping up here, we want to go ahead and contribute some of the change. You have to change some of the manifest for Kubeflow to make them technically more correct for Kubernetes and allow for, for example, changing the load balancer service. And so we want to upstream some of those changes to the Kubeflow manifest working group slash SIG so that it's democratized and available for everybody. And then other things like wiki updates and stuff like that. Yeah. And basically integrating with CI CD. We're working on that right now so that we can do GitOps provisioning and stuff like that. Thank you for coming. Thank you all.