 Hi, welcome to my kubecton talk. My name is Dan Garfield. I'm going to be talking about the quest for the ultimate Kubernetes home lab and ultimate slight caveat. We're gonna try to do it as cheap as possible. So we'll see how that goes Well, we're gonna talk about the goal of what we're what I'm trying to what I was trying to build We'll go through the build and of course we'll talk about what's next what I'd like to do next and hopefully at the end We'll have some time for some questions So my name is Dan Garfield. I'm the chief technology evangelist for Codefresh But this is more of a hobby talk talking about something I built at my house You can follow me on Twitter at today was awesome Just remember how you felt about the day and if it was awesome then today was awesome easy I'm a member of the Forbes Technology Council as well as a Google developer expert And so I like to talk about Kubernetes. I talk about DevOps like to talk about cloud and today I'm gonna be talking about home lab, which is actually Kubernetes bare metal at your house. I Love doing these at home projects. I love raspberry pies love home labs I built a raspberry pie chicken coop Powered chicken coop a few years ago, which of course had a linear actuator that opened and closed automatically based on the time of the day Which was a lot of fun and Cody game box, you know all the standard stuff that you see people do with pies So I love doing that kind of stuff I love having something at my house that actually provides a lot of value love flexing those tech muscles at home And that's what I try to do here So my first goal with my Kubernetes home lab is I basically want to run the services for my house And that means that I need to be able to run x86 now when I say run real workloads This isn't meant as a dig at all to arm. I love arm great architecture Of course it's on my phone Of course got a lot of these raspberry pie powered stuff But the truth is that a lot of the workloads that I work with every day and most cloud workloads happen to run on x86 that may change, you know apples coming out with that They're arm-based chips for their laptops and that might force a change in how people are developing software But in the meantime, I need to be able to run some x86 stuff Which is going to be a pretty big limiting factor on what I can choose to build my cluster out of my second goal was to My second goal is that I need to reduce toil and The truth is that I'm the only person that can work on this So I can't have something that's really complex and difficult to manage I can't have something that's breaking all the time because I frankly don't have time I've got three kids I don't have time to go be chasing around broken down servers and I've run, you know nice servers in the past at home and I end up Recompiling kernels and fixing storage pools and it's just not something I want to spend my time on I need to have something that is fairly automated I can invest a lot of up-front time to get rid of any long-term Toil so we're gonna be using those principles as well. I also wanted to have something that was multi-node Now why do I want multi-node? Well, one of the things that I noticed about working with Docker desktop or even a kind cluster is I don't really get the sense of How workloads perform over time? Nor how they react when a node failure happens or they migrate or I'm trying to scale it up You just can't really simulate those things very well And so having a cluster with a couple of nodes gives me the chance to see how different pods perform when they when they move And they might migrate with they get if they get evicted or that kind of thing And it really just changes the way you think about Kubernetes a little bit when you're using it on your home cluster now when you're using in production You can invest a lot of time to make sure that everything runs perfectly and smoothly, but You know, this is a hobby project. So I do want to have multi-node I do want to have scale and I also want to have something that's modular So if I'm running a lot of services and I can't you know, I don't have enough room for something else I don't want to replace my entire stack. I want to be able to like maybe just grab a cheap node and throw it into The pool and call it a day. I think that would be really cool number four I wanted to be able to support stateful services and You know if I want to this is a home lab, right? So I'm gonna running plaques I may running maybe home assistant or Minecraft server or something like that all of those services require Stateful they're all stateful applications. They all require me to maintain persistent volumes And so that's gonna be super important for me for us to solve in this build or for me to solve And finally, I want to be able to support hardware transcoding that means I want GPU access for all my nodes And if I if I throw a pod up that's gonna be Transcoding a video file or something like that. I want to be able to take advantage of the GPU to do that so that I can actually You know have it perform quickly and of course, this is important too since I'm doing it on the cheap That means that I have to be really careful with with my choices here And that of course is the final requirement is that this whole thing needs to be cheap I can't break the bank on this anybody could go and spend ten thousand dollars buy a bunch of high performance nodes Throw them in a rack and expect it to run pretty well Not an option for me and it's also a hobby project So I want to be able to you know throw a note away if I if I mess it up or something I don't want it to be too costly Both in terms of my time and in terms of the resources that I expand and of course the power usage You know, I like many of you mind some Bitcoin back in the day And of course Anybody that does that realizes very quickly that power consumption is costly and in this case I don't want to be paying out the nose for power consumption on this server Because at that point it's like well, I may as well just you know throw a cloud service up But of course that I'm not getting the on-prem like the bare metal experience that I'm looking for Oops so That means that there are a couple of things that aren't options the Raspberry Pi as great as it is as much as I love it It's not an option. They have this really kick-ass Turing Pi board that you can put seven compute modules on to it's got 28 cores Super cool on-board networking. They're sold out everywhere really cool board. I remember when it came out I really wanted one, but it doesn't actually meet any of the criteria all the criteria that I put out Likewise the Ryzen thread ripper you could buy that thing and you could have you know, maybe 64 cores to throw at Kubernetes How cool would that be? But of course, this is pricey pricey. We're talking. I think it's over like 1500 bucks But for the 64 core version, I think it's more like $3,000 So this is a pricey CPU and of course you also have to build a whole machine around it So and it's also not scalable, right? So this one Unfortunately isn't going to be the fit now what is going to be the fit Is the atomic pie and this thing is a little known board. It's kind of a gem They're only 35 Dollars for one of these boards and they have an Intel atom processor with four cores They have come with two gigs of RAM. They have 16 gigs of emmc memory and they only draw max About three amps of power usually less. So we're talking about something that's usually drawing quite a bit less power than Then you know anything standard that you'd be looking at the very similar to raspberry pi a little bit more power consumption than a raspberry pi I think But basically another you know five volt kind of kind of set up This thing is sweet and we can even do hardware transcoding on it, which is really cool So they are chunky there. They're bigger than a raspberry pi, which you'll see in a minute, but People have built some really cool stuff with these this this build way over to my side here is actually I think it's 32 Nodes and so they have 32 times four cores to handle All the different jobs they have and they actually put up benchmarks that showed that they were actually able to complete their workloads faster than a rise in nine Which they spent you know 750 bucks on all these nodes And they have something that performs as well as maybe a $750 and $800 processor Which of course then requires a whole nother system built on top of it So they were able to cut, you know 30 40% of the cost off of this thing plus its modular plus you can add components There's also some cool stuff on the thing a verse you can grab and this one right behind me is actually a Server rack that someone has put I think 24 nodes into and they've stuffed it into their racks So they can I don't know what they're doing with it running builds or whatever But there is actually a pretty cool community around these boards. They're not they're not nearly as popular as the raspberry pi For a couple of reasons you do have to manage the power on board unlike a raspberry pi Where you just you know plug in a USB you basically just have pins and you got to use some some do pond You know wires to actually wire it up to a power supply you have to buy a separate power supply And I didn't know these you know these server power supplies You actually have to like tunes you have to get out a screwdriver and like a voltmeter to actually get them to the Right voltage sounds a little intense But once you do it, it's not that hard once you try it once it's fairly easy so you can do it Now they do like I mentioned they do support hardware transcoding There is a cool benchmark that somebody did where they actually for they showed four hardware transcodes on one of these boards They do support x264 transcoding not x265 transcoding, but they do have x265 decoding So you could still potentially get some performance benefit from using this board and it is supported by the Kubernetes Intel GPU plug-in And it's supported as in parentheses. We'll get into that in a minute So this met all my criteria and when I started off I drew this out of what I wanted to do I had a raspberry pi 3 as my master node, of course this and running my My NAS and of course this actually didn't work well because it's USB 2 Terrible idea very bad read write speeds. So I upgraded that to a pi 4 for my master node And then I had originally planned to throw a relay into here So I could switch nodes on and off using the raspberry pi's IO but I haven't needed to do that and Even though like auto-scaling that would be really cool to get to maybe that's something I'll do in the future And then it's all sitting on a gigabit switch and there's I think it's a 200 watt power supply supply and power to my My four atomic pi nodes. So that's that's the basic setup This is what I started with and I'll show you in a second what I built As far as setup goes it's pretty straightforward. I basically installed Ubuntu 1804 server through a static IP on there I did need to install an NFS package, which I forgot to mention just so that my pods could actually access NFS services And then I installed k3s and if you don't know k3s and some people call it k3s Just like it's k8s or or k8s or whatever, but k3s is Brilliant, I mean we're talking about Kubernetes in 40 megs It's super super small super slim super efficient it supports both arm and x86 And I am gonna have a mixed cluster because I have my arm master node As well as my atomic pi Worker nodes and then there is basically node step 3 the the installation for k3s is super simple You're looking at the whole thing right there. So getting these set up Even doing it manually was actually really really quick and easy and the nodes are basically blank There's nothing really special about them. I could lose one tomorrow and I'm not gonna worry about it I can add another one in and it's gonna be super easy to provision As far as the networking goes this is a home network and I don't have a managed switch I don't have something fancy going on. I basically have I do have Google fiber, which is fancy But basically on my router all I did is I reserved a hundred addresses In my DHCP so that I could give that to the cluster to actually manage The k3s cluster actually has onboard networking that's included I don't know if it's calico or what it is under the hood But it just works and I haven't had to mess with it. So that's great And then I did install a metal LB a metal load balancer, which is a Kubernetes project I think it's in the CNCF. It's pretty rad and basically gave it around 40 addresses that it could allocate to ingress or service load balancers or whatever it needs and so this this setup from a networking perspective was pretty easy when I first did this I actually was doing it with a Wireless network bridge in between which was a whole another load of malarkey But now I've reworked all my networking in my house So I have everything just connected through a switch up into my up into my router and and so it's actually very simple networking Nothing that's super fancy that you know that requires any brilliance on my part Which is good because I don't have a whole lot So I want to show you really quick how the server looks and for that I'm going to switch my view to the atomic view This is the live view of the cluster. Let me get up and look over here And I'll point out some stuff you can see all my nodes here and node one two three and four and then for scale Of course, you do have the raspberry pi This is my master node or my primary node. That's actually running the the main part of the cluster Hopefully I didn't crash it just by touching it and then you can see I've actually Sliced out the networking cables so they all fit nicely and they all go up into a switch That's just sitting above it and back here We've got our power supply which runs this whole length that the power supplies is actually as big as Probably two nodes and then I do have a USB 3 drive sitting in here in the cluster I'm actually not using this for storage anymore For some reasons that we'll talk about in a minute But it did fit in here nicely with the Legos and eventually I'd like to put this into maybe a rack or something But I still consider this kind of a work in progress. So I'm not quite ready to to get you know fancy with the storage yet so that's the atomic pie and It's the atomic cluster in other words and so far it's working pretty great So we're gonna talk a little bit more about the setup and then I'll get into a little bit more demo action here in a second First of all k3's actually comes with a system upgrade controller for kubernetes Which is bloody brilliant because otherwise I would have to figure out how to keep these things updated with the latest version of kubernetes In this case, I can actually create a plan and you can see the plan up here Basically, I specify for my master node. How many concurrency can I change so I can actually do HA? K3 is here and then for my worker nodes. How many concurrency updates can I run and basically? I just took the boilerplate here stuck in the version I wanted right there and applied it and it upgraded all my nodes for me So this is actually super friendly with CICD I actually have it hooked up to both Argo and Codefresh for my my DevOps kind of reducing toil project So this is super slick great keeps my kubernetes nodes updated and it works both for ARM and for x86 because again I have a raspberry pi for my master node and all my worker nodes are x86 So as what basically the way this works is Argo does the sync so so Codefresh can do some testing but Argo does the sync that actually sends the plan to The system upgrade controller basically it sends it to kubernetes records these CRD plans And then the upgrade controller actually takes the work to make sure that those are applied and it will Migrate itself to another node if it needs to I believe so that it doesn't You know have any issues when it's upgrading nodes and I found that this works really well It doesn't take a node offline Until it's ready to upgrade it so The first time I tried to do it I actually had selected like the wrong tag or something for my version and It it just said oh well, I can't upgrade this with this current this current plan So I didn't have any downtime. I didn't lose the node or anything So this thing's brilliant and I basically just make a make a change to the version Do I get commit and push and then it automatically rolls out through Argo, which I'll show you in a second There's the Intel quick sync device plug-in for kubernetes Which I also installed and this is designed to support open cl Which is not supported by the atomic pi, but we only need the VA API which is Supported by this plug-in so installing this exposes the onboard GPU for hard work hardware transcoding and I the only thing I actually did make a pull request to this thing that they accepted which was to make it so that it didn't try to roll out to arm nodes So I think Intel was more than happy to get rid of that and then in kubernetes I'm gonna see if I can move out of the way a little bit This is basically all you need to add to your kubernetes pods once you've installed this You just make a resource limit and request the GPU Intel and then it will only deploy those pods on two nodes that are capable of doing hardware transcoding and This was actually pretty slick works really well I also mentioned that I have a get ops running here So I have the code fresh runner installed The code fresh runner is basically an on-prem component that you can throw onto a cluster It can run builds it can access your cluster behind the firewall I haven't opened up any of my ports from my home network to cook to code fresh or anything like that But it all it basically sits on my cluster Everything stays private to my cluster, but I can manage it from the code fresh UI So no matter where I am I can actually run builds against the cluster or anything like that And they happen automatically so I can actually execute both pipelines and access resources Behind the firewall well not not necessarily access resources behind the firewall, but that's a whole nother ball of wax But it works really well, and it's very scalable. So as I add more nodes, I can run more builds I also have Argo CD used for sync So once I'm ready to make a deployment I do it get commit and push in some cases That triggers a code fresh flow that actually does builds and tests before triggering a deployment and in other cases It just it just goes directly to Argo CD to just sync onto the cluster And you can follow my project at github.com slash it was awesome slash atomic clusters So you can actually see how I've got most of this stuff set up And I have a little bit of we're right up and I'll add a shopping list before this airs So that you can actually see all the components in my board Now storage I mentioned we're gonna talk about this earlier Storage in Kubernetes. Everybody knows is a bit rough and I have to say it is a bit rough It's tough. So I ended up going with a separate NAS for NF to do my networking to provide my storage And it's serving it over NFS. The reason that I didn't keep it on my atomic Sorry on my Raspberry Pi is that that USB drive just goes dormant sometimes And so as service I try to access a service and it would take a long time to basically warm up the USB Mount it remount it Get it provided over the network and by the time that that's happened You know, maybe you have a pod failing things aren't working Well, it's just slow and clunky and didn't work super well worked fine for the proof of concept But not a good long-term solution So I moved this over to a dedicated NFS, which is probably the background fan were that you hear And it's a little noisy. So I'd like to do something else there But in this case, it's basically running a bunch of a bunch of hard drives that have a ton of storage on them And the NFS is very very fast So this does work better but a lot of these services were not designed to work in a Kubernetes environment and So I've noticed that they do sometimes get a little finicky. I'm looking into Rook as a storage provider which supports sef, which is a distributed file system I think that's really interesting but I probably need to do some different things with my cluster before I can have Distributed storage working on it because right now my my nodes are all throwaway, right? They're not special. But once I start to throw storage on there I now have to be thinking about redundancy and how that works So I think that's very promising and might work really well for just the kind of config data portions of what I'm doing Maybe the Plex configuration that kind of thing that is More expectant of running on a local file system a lot of these services use SQ light Which works great when it's sitting on a desktop But boy howdy it does not like running off of a network file storage So I have had to repair a number of SQ light databases while I've been using this Maybe a pod goes down before the lock is removed from the database something like that I actually did use lose data several times, which is this is a good exercise, right? This is what home labs are for so that's been a big learning experience for me And looking into the future. I am planning on getting some 10 gig Networking going where my NA my NAS. I'd actually like to have A 10 gig ethernet connection to that that I then distribute to all my 1 gigabit a second nodes And that way I could have very high read write throughput and 10 gig isn't necessarily super expensive You can actually find cards for PCs for like 20 30 bucks The switches are a little bit more tricky You can find switches that aren't too pricey that have 10 gig copper ethernet Connections and if you can actually use a regular cable if you're going over a very short distance But they are very pricey once you get into the 24 put port Range, but now I'm talking more home networking stuff. So anyway, that's that's more future thinking But storage does remain a little bit of a problem. I can use my cluster. It does work But I am I notice the storage is a little bit finicky. So I have to watch it So I'm gonna that's something I'm gonna have to invest a little bit more time into fixing and addressing There once you start doing this you will find there are a ton of community maintained images on Linux server So they have everything from home lab basically all your home lab stuff and They are pretty nice images. They usually include things like user permissions mapping, which is important when you're using a Your your storage you want to be able to put the right permissions mask on it But they were really all designed for working with Docker compose on a local machine And actually when I showed this setup to a friend of mine, he said hey, these are really cool This is really cool that you're running all these services But I do all this on my desktop and I just do Docker compose up and it seems like you've done a lot more work than that to get all This Kubernetes working to which I would say yes, yes, I have yes I've done a lot more work than that, but I have something modular and scalable So and and actually cheaper than what what he's he's designed and built so You know, I'm happy with the choice in that case So check these images out. They have pretty good maintenance and upgrades I'm gonna jump into another demo here. Let's actually just browse around the cluster and I'll show you a few things of Note here, so I'm actually sitting inside my get repo here. I have some uncommitted changes So if I commit those it'll actually trigger a sync to the cluster, but before I do that I want to show you Some interesting things When I'm looking at my services, you'll notice that I actually have two Plex services here One is for UDP and one is for not UDP. It's for TCP I have the same thing going with Netboot XYZ which is a Which is a a netboot provider so you can boot machines up over over your LAN network The reason I have to have to is because Kubernetes load balancers do not support both UDP and TCP at the same time so what we've done is these actually use and Let me resize this window slightly so you can see Hang on let me bring it down just a little bit Just a little bit So you can see a little bit more what's going on here. There we go So you'll actually notice that those load balancers for the same service actually share IP So see this netboot one actually has is 1025 and the other one above is 1025 So this is actually a feature of metal load balancer where you can actually put annotations on a service to tell it that you can Actually share an IP address So this was a little bit of a gotcha to figure out how to serve UDP and TCP and this is important for a couple of services for netboot It's important just because of the way netboot works and it has to have a TFTP server that Things can access which works over UDP Plex was the one that was a surprise to me everywhere I looked people just had Plex images that were serving up TCP and A lot of local devices don't work well if you don't have UDP So if I've got like a Roku TV, it couldn't see Plex most of the time It couldn't see it reliably until I added the UDP load balancer and advertise those services And of course I have to keep it on the same IP address Otherwise, I think the whole thing would get confused and funky. So that's something to watch out for you Do have to care about UDP and TCP and again this goes towards the homelab the truth is most stuff is not designed to work with the homelab the other thing that I ran into and learned very quickly is of course and and most people know this I think if you're seasoned but All of my My volumes and volume claims are actually Immutable so when I'm working with these things if I make a change They won't automatically get configured. It'll basically throw an error and say hey, I can't I can't deploy that So today I was rolling out a new change Which went through Argo CD and code fresh and It couldn't complete the rollout because one of the PVC's was different and I actually had to go and delete the PVC and the the only issue with that and again, let me resize my window here is For some reason and this is something else that I probably need to look into I Typically run into issues when I'm trying to delete these PVC's because they basically have some protection on them So if I look at these PVC's let's say that I wanted to delete this Minecraft DER 1 It probably won't delete when I try to delete it So yeah, it says it's deleted But it's hanging and it's hanging because there are finalizers on it Which is I think it's just a feature built into kubernetes So if I pull this open again, it will say that it's terminating and that thing will sit there forever It will never die and so you actually have to do something a little gross And this is all like pretty like I'm not telling you never do this with production Obviously, I'm like I'm like just I'm the only user here, right? So it's so I'm violating some principles here But basically the only way I've figured out how to get these to actually delete is to go in and remove this protection finalizer and Once I do that this thing will delete right away And so you can see that it's been edited and if I get PVC it'll be gone So I just killed my Minecraft instance not a big deal because I'll just reapply the config Or I can actually just do it through Argo and that'll just work smoothly. There'll be no problems there So this this actually you can see, you know, just looking around my cluster here for a second You can see I've got a number of different namespaces here. I've got the Argo CD namespace I have a system upgrade namespace, which is where that system upgrade controller is working and if we look at that Let's get Yeah, just just that You can see that we actually just have a single pod there running for the controller and then if I look in the Argo What you'll be able to see is that I have all my Argo stuff running and then I also have a CF Argo CD agent and basically what that does is it sits between Codefresh and Argo and keeps all the information up to date and Codefresh so that all my build information is annotated and Associated with what deployments and rollouts are happening, which is a really useful view to have I'm not going to get super deep into that demo, but if you want to check it out go to codefresh.io We'll have some videos up there So that's pretty slick and then of course I actually run a lot of stuff out of my default namespace I know it's a little silly. I've got my ingress in there and all that kind of stuff So works pretty smoothly. I'm pretty happy with it And overall I'd have to say that this this whole setup runs fairly smoothly and is pretty rad So I'm gonna go back over here for a second Now I'm happy to take questions and you can follow progress at github.com slash today was awesome slash atomic cluster I wanted to show you just for a second over here in my Let's actually look at the Argo dashboard here So here you can see actually the the different Services that I have set up with Argo and you can you can see it. They're all synced and if I go over to Codefresh I'll actually get the same view my browser crashed because I've got OBS running and it just tanks all of my processor power Floating come on There we go So so this view actually will give me all the information now like I mentioned in the future I think I need to solve the storage issues And I also wanted to set up automated node provisioning using ansible so that when I plug in a node These actually these nodes actually have PXC boot enabled by default so they will actually They will actually automatically Boot to the net boot server and install something now one of the things that I actually looked into was There is a way to set up these nodes so that they boot an image and Just run that so you don't actually install anything on the node it just runs the image from the network and I actually ruled that out because I Lose power sometimes and I want to have cold start easy and I don't want to have some separate services or anything I actually want to run everything on my Kubernetes cluster anything that's sitting outside of the Kubernetes cluster Besides the NFS And even that is a little bit suspect anything that's running outside of the Kubernetes cluster is another point of failure that I have to Maintain I don't want to have to maintain it. So yeah, you can see here that in Codefresh I've got all these apps sitting here as well and if I look at Let's see which one would be interesting So this one actually has a couple failures on it because I had to delete my PVC because it wasn't able to sync so that was the the issue that I was talking about earlier But it is in sync right now. So everything's fine In this view actually this one doesn't have build associated with it But if it did I would actually see the build information here and of course my poor requests any issues in GR that I would have created which again, I'm not doing because Clearly this is overkill for my home cluster But it's just kind of rad to be able to use the the clean dev-up stuff So with that, I want to thank you for coming to my KubeCon talk I hope that this was interesting maybe it'll inspire you to build your own Kubernetes cluster your own home lab your own atomic cluster and Hopefully it works really well. I was really happy with mine. Like I said check out the repo I will throw a shopping list in there. I think this whole build What did I spend I spent 140 bucks on 40 bucks on atomic pies because I have four of them I spent 30 bucks on the Raspberry Pi and then I spent maybe 20 bucks on the other thing So we're talking about sub $200 for this Kubernetes cluster And let me tell you what for a PC that I could buy for 200 bucks There's no way it would come close to the amount of power and scalability to get out of this thing So like I mentioned, I run Plex on this thing. I got a Minecraft server I got a couple other things that I'm running on it I'm gonna be adding more as I go along and I'm not worried about running out of space or storage because I can always throw another note in and So I have this scalable Kubernetes cluster. It's running great and I'm pretty happy with it So feel free to ask me questions on Twitter at today was awesome and of course if you're watching this at KubeCon live I will be hanging out in the chat and having conversation there So look forward to hearing your questions and thanks again. I hope you enjoyed this I hope it was interesting. I don't know if you feel like it was the ultimate cluster Let me know if you feel like this was the ultimate Kubernetes home lab or if you feel like maybe it's a little too far on the cheap side Maybe you're looking for something a little bit more beefy, but at least it's accessible. All right. Have a good one. Thank you