 Welcome to the Home Lab Show, Episode 80, the Server Automation Mindset. I know Jay has been in the Server Automation Mindset for a very long time, and I know he's working on a few in-depth projects right now that really brought this up. Is that an accurate statement there, Jay? Yeah. As you're talking, I just realized that I think I've been doing automation for 10 years-ish at this point. I was like, how long has it been? I have to think about where I was in my career when I started something because I just lose track of the years. I'm like, I worked at such and such a place. That was 10 years ago. Oh my gosh, it's been 10 years. Wow. Yeah, I've learned a couple of things. Yeah. I know you've been talking everything about Kubernetes and all that. There's a private chat that me, Jay, and several friends have. It's almost public. I mean, some of this goes on on Twitter. This is something we've been thinking about a lot lately, but it seems like a good title here with us talking about Docker images and virtual machines in Episode 79. Let's talk a little further about how you think about these, how you think about automating all of your build processes, and that's a really important thing to get to, especially because I see in a forum post someone trying to figure out how they migrate their VMs because you're having trouble because they have no way to rebuild any of those VMs. They got them perfectly set up and they backed them up, but they're not sure how to get them somewhere else to a completely different hypervisor platform because they had some trouble with migration. But if you have a good rebuild process, suddenly that doesn't become relevant anymore. So it's kind of the automation mindset of how you think about, how you structure things, how you make them repeatable. Your data is important, but the virtual machines or the Docker images or the Kubernetes, none of that is. It's all about the build process. We're going to be diving into that today. Before we do, a company that's kind of got automation figured out to scale up their system is our friends at Linode. It's a great place to host many of the things we talk about here on The Home Lab Show. And it's a great place to test your orchestration abilities because you can have it in different regions, have different servers in different places, test your ability to rebuild it, test ability to set up some of these projects that we talk about, as I said. So we want to thank them for being a longtime sponsor of the show and continuing to sponsor the show. So thanks for Linode. If you want to get started with them, we have the offer of The Home Lab Show. It's down in the links below. All right. All right. Where do we start with automation? So I'll give a quick background. I'll try to make this short. I don't really do a good job of making things short, but I figure that new listeners may not know what I automate or how I do it. I have videos about it, so I'll keep the details a little shorter, but just a little bit of background first and then we'll get to the main topic. And the main topic is my recent project for automating Kubernetes deployment, like a complete cluster. Like let's just say you have X number of servers. You want one to be the controller. You want the others to be nodes. You have to build that, but how do you do that in a non-manual fashion? So quick background is I use Ansible for pretty much everything. I have background in Chef and Puppet. I started with Puppet, transitioned to Chef later, and now I landed on Ansible. I have never used SaltStack, so I'll just get that right out there. If anyone asks, what about SaltStack? My answer is I don't know. It might be great. For all I know, I have no knowledge. So when I discovered Ansible, it kind of just clicked with me. So Ansible for people that aren't aware, if you're brand new, brand new, then Ansible is an agent-less method of configuration management. So when you think of Puppet and Chef, they have agents. There's a server and the server reaches out to a server that it maintains. It connects to the agent, gives the agent instructions. The agent carries out those instructions. So the goal is to have a server, usually it's a Linux server, become a certain thing after the configuration management solution runs. And you want to try to get it to a point where you can build it from nothing to the thing it's supposed to be without having to dive in and manually do it. So you want a web server, right? You want to have Apache or Nginx or whatever it is installed, virtual host setup, that's going to be something you might want to automate. So Ansible allows you to do that, but the agent-less nature of it is amazing. So once I discovered that, I just fell in love with it. I set up Ansible server. You don't technically need an Ansible server. Some people will use their workstation as a server because it uses SSH. So some people will have a dedicated server that they'll log into the SSH. They'll update their Ansible config and it'll roll it out. Or they could have just a workstation, they'll pull down the Git repository, do whatever they're going to do and then issue the commands from their workstation. So there's multiple ways you can do it, but you have a central thing, whether that's your workstation or server, there's a thing that's reaching out to the different servers via SSH. That's the default, that's how Ansible works. And Ansible uses YAML, which is super easy. I'll say that I don't know YAML itself very well, but YAML and Ansible, which is the same thing technically, because YAML is YAML. I think I just have a lot of it memorized and it's just so easy to, you know, install a package or do whatever you want to do. So a Chef or Puppet, it might be like a whole block of Ruby code to install a package. Whereas with Ansible, it's like the lowest common denominators like dash name, if you want to give it a name for the task, or you could just say, you know, just go right into the task, package colon package name done, you know. It's a little bit more complicated than that. Yes, I'm making it sound a bit more easier than it actually is, but it's pretty easy. It is the easiest one. So I've been using that for a while, fell in love with it, but there was one thing that didn't work out well, and this is one of the automation mindsets, but the bigger one's coming, where I decided I wanted to automate everything, and I mean everything, workstations, so laptops, test tops, in addition to servers, and I want it to be smart enough to know what is being run on, so I don't have to maintain the workstation Ansible, the server Ansible as separate entities, so I want one Ansible, and I want it to maintain all the things. So that was a goal. And then later on, I solved that goal and it does absolutely that. If it's a workstation, it can build everything from the command line up to gnome, keyboard shortcuts, wallpaper and all, with me doing nothing but running one command, so I got that done, but there's one problem, and this is the mindset here. My computers aren't always on, my desktop will suspend, my laptop will suspend, so obviously when it suspends, it's not available. So if the Ansible server goes to connect to it, it can't, it's a suite. The laptop could be in my bag, it can't be contacted, especially if I take the notebook somewhere else, it's off the network. So I'd get errors from the Ansible server, can't contact, can't contact, yeah, I know because the laptop's in my bag and I got a little annoying. So at one point, I discovered Ansible pull and I swear I'll never use anything else. Like at first I thought Ansible pull was ridiculous because it kind of goes against the nature of Ansible. It's almost like creating an agent for a system that doesn't need one. So it kind of felt like the wrong way to do it, but when I actually looked into Ansible pull, I started using it and what Ansible pull allows you to do is you give it a URL for repository, a Git repository, and it'll run at local host. And I found a way to differentiate one host from another with the same Git repository. I can actually have different values per host. There's absolutely a way to do that. And then one thing led to another and we have this crazy automation system that I have today, but one thing isn't being automated currently and that's the rollout of Kubernetes cluster. And that's where we get into the first thing. You need to stay hydrated. Kubernetes is the big one. Well, it is, but there's one thing that I wanna touch on first and that's the question, should I automate the thing? Now, most people are probably gonna say, well, yeah, I mean, duh, automate everything. But one thing, when we get into the automation mindset that you always have to pay attention to is is this task something you're likely to do again? Let's just say, for example, there's a 95% chance you are never gonna do this thing again. At work, it could be you're doing a task that is a one-off and you're just never gonna do that again or something of your own in your home lab that you plan on never doing again. Then you could make an argument that there's a negative return on that. Why automate the thing if you're never gonna do it again anyway? Now, one rebuttal to that is regardless of whether or not you are going to do it again, you're learning, right? So if you automate something, you're learning stuff. You're learning more about Ansible or Chef or whatever the system is. So that's valuable because learning something is a great thing to do. But depending on your use case for automation that might not be good enough at work, it might make sense because someone might wanna know, how did you do that? Well, look at the playbook right here. That's what we call the configuration. We call them playbooks, the YAML files in Ansible. But someone might look at that and glean from it something valuable but the problem still is are you gonna do the thing again? So here we get into what it's like to be a content producer and a home lab owner at the same time. Now, when I set up a Kubernetes cluster I didn't plan on setting that up again. Now, to be fair, I'm constantly building these because I've done no fewer than three, probably four tutorials about setting up Kubernetes clusters at this point. So I'll often just set up a lab or something and go through commands and things. So regardless, I'm going to keep doing it but I don't wanna automate that because I wanna teach people how to build these clusters. So I need to do that manually. But meanwhile, my Kubernetes cluster, and it's funny, I was at scale and someone asked me, like, how's that Kubernetes cluster going that you built in that video? And I'm like, well, as luck would have it that one is completely broken and has been broken for months because unfortunately I just, it's an easy fix but I have a lot of work to do. I have a lot of content to create. So sometimes I'll work on building new clusters for you guys but when it comes to fixing my own if it breaks probably not something that I'm able to do. The main problem here though is I thought I was only gonna set up that cluster once. As it turns out when I upgraded to Ubuntu 2204, something broke. Now I have to rebuild it. I don't have time to rebuild it. So I plan to automate this because I felt like my original opinion that I'll only do this once wasn't really valid because here I am about to do it again. So another reason to automate is if your servers fall over. I mean, yeah, we hope to never have to rebuild them but if that day comes we'll be thankful that we have an automation. Now for me, Kubernetes cluster is dead. It doesn't work. So if I had that automated then I would have a Kubernetes cluster right now. So I figured I'm gonna have to put some hours into this and if anyone's wondering why I haven't had a video out this week well, I'm working on my Kubernetes cluster right now but I'm gonna try to have something out tomorrow. So that's the first thing anyway to determine if you should automate it if it's a negative return but also keep in mind that, yeah you might have to rebuild it someday. So there might still be value. And I'll still bring up the fact that just because you're 95% sure you won't do it again you may find yourself doing it again anyways. So you don't like the Kubernetes cluster so sometimes you don't always have that information ahead of time. It's a good learning experience doing the automation and then go roll back a little bit to something that Jay had said about Ansible how it just clicked. One of the things that I was just at Ohio Linux and there was a big discussion about this is with Ansible, in Python as well both of these were talked about a lot. You come for the product you stay for the community. One nice thing is Ansible's got a great community around it a lot of documentation around it a lot of projects around it lots of things you can find to it's not build yourself but start with someone else's framework that they put together in it. So being such a hugely popular project that really lends itself to ease of use there's just plenty of people using it. And this is the challenge of some of the automation tools such as ChefPuppet and SaltStack is the higher learning curves mean less community adoption and maybe less available tooling or existing frameworks that you know templates that you can pull from to build things. So you end up building a lot of things from scratch Ansible is I've never gone further than Ansible that's just any of the stuff I do and I don't do a lot with Ansible but any of the stuff I've done is it just is pretty easy like Jay says defining some fleets put a few things in there to find it and away you go but it's probably if you're not sure where to start don't start with the more complicated stuff unless you at least understand Ansible. And if for some reason Ansible doesn't fit the needs and by the way that you would have to have some pretty hefty needs for Ansible not to fit them because based on level automation then move up to something else. Ansible is just a great place to start so grab one of the books and actually our fellow YouTube friend Jeff Gehrling I believe has a few books on Ansible as well. I actually just bought his book I don't know if it's the most I think it's the most current cause I think it's the one that's in development. So my understanding is I'll keep getting updates as he wraps up parts of it. So it's really cool to read this book so far cause I was actually Googling how to do something as I was building the cluster through Ansible and he came up in the search results and his book came up and like I bet he probably has the answer to this issue that I'm dealing with in that book. Now another one last thing about should you automate that I think is important to bring up now let's just say this is a task that cannot happen again and you automate it. Did you lose time? Well, one thing that I've learned is in the future you could have another thing that you're automating and it's not quite the same thing as the thing you automated that you'll never need to do again but there might be enough overlap in the code to where you can pretty much just do it similarly and take that and adjust it to become something else. And I find myself doing that a lot where I'll have let's say a task that installs 10 packages. Should I just write that again? No, I wanna find a playbook where I installed a bunch of packages. I'm gonna copy that thing. I'm gonna paste it in the current task and change everything to make sure the naming matches rather than type it by hand because eventually you end up with a library of things like you get to a point where you've done pretty much everything that you can do in an automation platform. For example, you have a task or a play if it's Ansible that copies a file. You have another one that creates a file from a template. You have another play that creates a folder, one that installs packages, another that restarts services and you keep going and eventually all of the high level tasks that you might do on a Linux server, you've done something with Ansible with a play that becomes your own playbooks become a library for future playbooks. So that's also an important thing too. Now, going back to your point about installing or using Ansible and not starting with something complicated, I'm going to make a claim that's gonna make me hypocrite one and it's going to make a bunch of people happy and three, it's also gonna make some people very annoyed with me. But the mindset that we run into often in HomeLab is you might, let's just say you had 10 HomeLabbers together and you say, hey, I've done a thing with this particular piece of software. As someone else is like, you should never use that. You should only use this. Like for example, I mentioned one time I had my SQL data or a MariaDB but my SQL database for something really wasn't all that important. You should only use Postgres. What are you doing? Like, oh my gosh, here we go. But we run into that, right? And people are passionate about the things that they like and I don't like to be a person that tells people only use one thing. But if you don't have a configuration management utility unfortunately, I'm gonna leave salt out of this. You shouldn't use anything else but Ansible. And I hate that I'm saying that because it's the thing that I hate when people say to me but when you compare Chef, Puppet and Ansible and again, I don't know anything about salt stack so I'm leaving that out of this. There's just nothing else that even comes close because Ansible is lightweight by default. The syntax is way simpler. It's more flexible. And there's other things that I ran into with other solutions like for example, Puppet and this was probably, I don't know back in, wanna say back when Debbie and Jesse came out if I'm not mistaken. So that was quite a while ago. The company was using Puppet that I was working for and we upgraded to Debbie and Jesse, everything was fine. Like all the servers were good, everything passed all the tests were fine. But now it's time to start updating to the newest Puppet so I could have support natively for Debbie and Jesse but it doesn't work. And this is six months after Debbie and Jesse hit stable and it's Puppet. It's one of the biggest configuration management utilities out there that doesn't support Debbie and Jesse on release day despite the fact that Debbie and Jesse was frozen for six months. That left a bad taste in my mouth but that being said, Ansible supported that a lot sooner as I understand it, there's really not much to support. It's just apt and it's simpler but going further, you have to deal with all these certificates, the agent, watching the CPU spike every time it runs. I just really can't make any case whatsoever for Puppeter Chef and no offense to the people that work in those projects but Ansible just hits this nail so hard on the head it just knocks it through the board. It's just in my opinion, a really great solution. So does that sound biased? It absolutely does. As an educator, it's hard for me to say that because I need to teach people all the things. So that does not mean I'll never do puppet content or Chef content. I probably will someday but I probably leave out my opinion in those videos about never use it. Here's how you use it, that wouldn't be good but Ansible is just, in my opinion it just checks all the boxes and it's really hard to ignore that. Well, in the fact that it's agentless makes this low barrier of entry right at the top of the list. Oh cool, I can just pipe all this over SSH. It integrates well with Python and those are just a winning combination. I'll actually answer a question someone had brought up and see if you have a difference of opinion on this, Jay. But I went to a talk recently, granted, considered the source. It was put on those folks at Red Hat and they were talking about the future of Ansible and one of the things they really seem to be doing is bringing it to the masses even more. They talked about being able to have it easier to use by using better tools that will build the playbook so you don't even really have to write the YAML. You can do it as they kind of said, a plain syntax and it will build it on the back end. Basically it looked like they were building more tooling to make it more accessible for people to use. I think this is the right direction to go. I've seen someone comment that they think Red Hat's moving away from the core features of it but I think some of those automation features, it starts with the techie people and obviously there's someone out there coding things in machine and this is going for people who are deeply technical. We want to bring it to the less technical people having access to these things and that's how you get there is by having tooling. So us technical people figuring out where that level is and then there's someone less technical or less interested in getting into the technical details and making this more accessible for them. That's the impression I got from talking to Red Hat as to where they're taking things. Did you kind of get the same, Jay? Yeah, I also feel like people might have this opinion because when something's really great they're afraid of it becoming not great or becoming something else entirely. I mean, do you remember BitTorrent Sync way back? And you remember what happened? Like everyone loved it, it was great and then all of a sudden Evil Corporate Company made it something else. People are always afraid of that but then again, it sometimes happens and I'm not saying Red Hat's an evil company but I haven't seen anything that causes me alarm because the way I look at it building tooling on top of something is not uncommon. It happens all the time and you could even argue that AWX the open source version of Ansible Tower which is a web-based application you could run to manage your tasks. You could argue that might even take away from some of the core functionality because it automates the automation tool in some ways. So that's not a big deal to me because I feel like if you don't like that you could just go and write the playbooks yourself and just pretend like that doesn't exist. Unless Red Hat is forcing you to use it then I have a problem but I have seen no evidence to support that and that'll just make it more accessible for other people because think of it this way you could work at a company where you are the Ansible Master and you have like maybe an IT generalist that you work with that isn't like an expert of everything maybe they're just starting out and that might just be a good way to get them updating your playbooks for you without having to have like hour long training sessions right at the very beginning that might just give them an opportunity to log in and do something so that person will love it and you'll ignore it. Now if they force you that's another story but then again even if they did force Ansible to become that method only and they went completely away from it that's still not a problem because open source is self-policing and that's the coolest thing about it. So if the company takes it in a direction that the majority of the hardcore people they don't like somebody's gonna fork it and bring it right back to what they think it should be and then yeah it's divergent at that point but if you look at LibreOffice I mean how many of you have open office installed? I bet a few people in our audience do but the majority don't why? Because open office went in a direction people didn't like then here comes LibreOffice and it completely takes over and I think if Red Hat doesn't know that they're stupid and they're not stupid they know the industry they know if they push back too hard somebody's gonna fork it and then they risk losing all their investment if the thing that fork becomes more popular so they know that or at least they should know that so all things being considered I really don't feel like there's anything to worry about because regardless of the direction it goes I think people are gonna be okay. Yeah, and Red Hat seemed very aware of the community they talked about community efforts and things like that so they've been playing in a Linux world long enough even though IBM bought them IBM doesn't appear to have changed their culture so I think I'm pretty good all of with the direction things are going and I see yeah, there's the overall I think I have a positive feelings about. Yep, same here and someone asked about Packer for automating Proxmox VMs containers so Packer by Hashicorp is a tool that we've talked about before that allows you to automate the creation of images I have mixed opinions because there's two school of thoughts here automation mindset time, okay? So you could have and there's nothing wrong with this in an image that you create and you could script the creation of the image such that when you create a new VM whether it's Proxmox or AWS, Google Cloud, Linode, whatever that can then take that image and make it into a virtual machine so that way you don't have to worry about sitting down for an hour just to build a base image that'll then be used to create the base OS for all of your things you could have that automated and again, that's valid, that's totally fine and there's two ways of doing it the other way is a school of thought that you should never have a custom image now this sounds weird at first because you're like why wouldn't you want that? Well, if you think about it most cloud providers allow you to have some kind of a script or something that runs when an instance comes up like AWS calls it user data so technically what you have in the user data text box could be commands that do the few things that you need to bootstrap your automation solution that'll then automate the rest of the system so you can make an argument that the only thing you should have in a base image is just a bootstrap to the automation solution but you can make the argument to not even have that and put that in the user data or whatever your platform calls the script that runs when it comes up and all of that is valid I'm not saying one way is better than the other it depends on what you think is better the hacker is absolutely valid and the image list system is valid obviously you're gonna have an image but what the second idea is is you have just the base distribution image that the cloud provider provides you and you do nothing to customize it because your automation system does everything but I feel like it's a little harder to get to that until you have a certain level of experience one benefit of that is you don't have to worry about let's just say malware snuck into the image I mean, what are the odds? It takes you an hour to build a base image and malware just happened to get into it and it could happen but it's a lot less likely to happen the other way again, every way is valid has pros and cons now let's get into the main topic at hand like the mindset of the Kubernetes automation because as I'm going through this there's basically some things or thought processes that come up but I feel like a lot of these are natural course of how these things play out that I thought would be fun because I'm not sure how this is gonna work in a podcast but I know what won't work is me going over the syntax of a playbook and you have to think of it in your head and or just write really quick on a piece of paper or type really fast some things are good for videos and some things are good for podcasts so we're never gonna be highly visual on this podcast if anyone out there is gonna ask because the number of people that watch live is far fewer than the number of people that are listening in their car, podcast or app but what I thought would be fun is just to talk about the mindset that went into the process of building it because that tells a story and it's less about the syntax and more about the journey and I thought that might be a lot of fun I like that idea too telling about the journey for it so I kinda told everyone in the audience already how this started Kubernetes cluster broke a long time ago it's funny, I tell you guys it show you guys how to build clusters and things but I don't even run one currently because it hard to maintain and do content but I decided it's time to automate it so the first thing that I did was I set up three virtual machines in Proxmox just vanilla Ubuntu 2204 virtual machines from a template and I went ahead and bootstrapped basically meaning I had the initial Ansible run run on all of these now at this point, I have not created any tooling automation or anything around Kubernetes so I have several roles and one of them is the base role and then there's also the server role so these got the base role and the server role so pretty generic the mindset is I would bootstrap these with Ansible shut them down and create a snapshot and the reason I did this is because I haven't created the Kubernetes automation yet so when I take a snapshot it's at a time before I even created it so when Ansible runs again, when I power them on when I do have this done it's just gonna pull down the automation and get them going but if something goes wrong I can revert them back to the snapshot and then fix it and then spin it back up I wanna get it to a point where from pretty much almost bare metal just a base configuration it's automatically gonna check in and pull the recent Ansible down or config anyway so I'll just have it spin up or boot it up from the snapshot see what happens if it fails shut it off restore the snapshot fix the problem, spin it back up so it's like having infinite retries essentially as I'm going through the process of building this and meanwhile, these particular instances are checking out a branch in the Git repository so nothing is being committed to production none of my other servers know what I'm up to these particular VMs are specifically checking out a branch I call staging and only they are getting this well, there's other staging instances but that's another story so I have everything segregated and this doesn't affect production so I can just keep hacking on this and I'm absolutely gonna have errors things are gonna fail constantly and there's head scratching moments like how is this failing? The syntax looks perfect I can't find the error an hour later I find it and it's just staring me right in the face which is another thing we deal with constantly so what I did and this is funny but this is the truth I consult my own videos, books and blog posts for how to do things because I'll forget after a time and considering that I did several videos for setting up Kubernetes clusters I have all the commands right there so I just grab the commands from one of those blog posts put them in a text file I'm like, these are the things that I'm gonna need to automate at first it starts off pretty easy these instances whether they're nodes or controllers they need the Kubernetes repository they need the Kubernetes packages installed so I'm gonna start there and that's super easy right? So we start off really easy then we start to get into as we go through the process we kind of get a little bit more advanced because anyone that knows about Kubernetes knows that there's C groups that are required for you to have set up there's just CTL variables you have to have set up so we wanna get that set up but then we run into a weird chicken and egg problem and this is where it really starts to get a little annoying because some of my hosts are Raspberry Pis and as far as I know you can't set the C group options outside of the command line dot text file that it reads when it boots so I'm gonna have Ansible inject these things into the boot file but it only takes effect when it boots so wait a minute when to automate it but then the Kubernetes bootstrap is gonna fail because until it reboots it doesn't have those C groups but how do I reboot and then have it continue where it left off? And that's one of the things where I felt like it would probably be better just to have a Raspberry Pi image that has those C groups in the boot file yes, there's ways of tackling this kind of thing but then I figure it's probably more complicated and it needs to be I might be okay with that but then again, I might also automate the reboot as well now the other thing of course is the sysctl excuse me values which is just putting some values in the sysctl.com file it's gonna be your bridging for example, enabling that and that's fine because that's what we need but then that file is also read at boot but you can actually apply those values without rebooting anyone that's added sysctl values knows that you put it in the file take effect every boot but you can absolutely inject with an echo statement a value of one into something in PROC to turn it on in which case I'm adding it to the file but I'm also setting up a handler that's gonna then inject it into the current kernel that was an interesting situation to work through and then after that, things got a little bit easier at that point it's time to bootstrap the cluster but the issue is how do I tell the node which one's the controller and which ones are nodes? It's time to create a variable Kubernetes node true or false Kubernetes controller true or false and then you can easily have tasks that trigger if a value is true or if a value is false because not everything is gonna be a controller that's only gonna be one of them so we're not gonna bootstrap a cluster on everything just in case it might so happen to become a controller that doesn't make sense so going through that process was a lot of fun as a little spoiler at this point the last thing I have to do is to get the nodes to join but the bootstrapping of a cluster works beautifully and there's just so many of these different things that go into the process that you think of and another thing with Ansible is you have the ability to register a result and then right after that, you can have a task that only executes if the previous value or the value of that variable that was created from the output of the previous task has a certain thing or a certain value so you could absolutely do that but you might actually prefer something to happen after the run is completely over in which case you do a notify to a handler and a handler is the same thing but it runs after the run so that's an interesting mindset there if you want, let's just say a service to start you probably want that to happen right after you install the package but if you don't or if it's a race condition maybe you'll hold that service start until the very end in the handler but that's up to you there's no right or wrong way to handle that but you'll determine which is the right way as you go through the process and this journey is just so much fun now here's a big challenge Ansible pull runs everything local host so there isn't a central server that's going out to configure these things that creates a little bit of a challenge and the only thing I this is the only thing I think is a downside potentially advanced will pull it's, you can make it inventory based but you don't want like a bunch of Ansible servers and everything to be an Ansible server so that doesn't quite work so how do you have something reach out to the nodes and tell them to join the cluster when everything is happening independently that's a big challenge so I spent a couple of days thinking about this I'm like, do I want to, I don't know, create an SSH key and then have the controller node as a handler then reach out to a node to have it join itself which is kind of weird so that was just very complicated so then it dawned on me well, duh, I know how to solve this problem I'm gonna have the Kubernetes controller set up an Apache web server and then after it runs I'm gonna have it print the join command but redirect standard output to replace the default HTML file that Apache comes with so then the nodes can then do a curl from the server to pull in the join command and then run it on themselves well, of course that's how you solve it how else would you do that, right? But it took me a while to think of that because I'm over complicating it I'm like, what do I do? Do I just capture the command from the controller and have it scp that over to the nodes to store it in temp and then set a cron job that has an if statement that if that file is there it needs to join a cluster that would absolutely work but it's also a lot of moving pieces but then setting up a temporary web server I mean, that's actually a security risk to be honest but then again, the join commands when I last for a short period of time until they're invalidated so I figure what I'll do is have the when the controller comes up and Ansible sets up the web server is going to disable the Apache service so that it never starts I don't want a web server on my controller but then what I could do is have it start it right when the controller is bootstrapped to create that initial web server and then it can print the join command redirected into the standard HTML file and then maybe set a task five minutes later to shut down Apache so I don't have that running anymore only when I actually need it and at that point, I could have the nodes just curl the server do you have a join command for me? Oh, you do, I'm gonna just go ahead and run that on myself and join your cluster that's pretty cool one of the reasons why I can get away with this is that everything is LAN and nothing is exposed outside if this is a cloud provider I might have a little bit of a worry about spinning up a web server for any reason if it's not a web server by association that might not be a good thing but on LAN, you can get away with a lot more similar to that how I actually bootstrap a brand new node I run deploy slash bootstrap height to sudo bash done and that's a web server internally and it's just a bash file and any computer on my LAN if you were to join my LAN and run that command, I own your machine instantly right then and there my user account is there my SSH key is there I'm deep I'm installing my gnome extensions at that point and changing your desktop environment to gnome and setting up GDM don't run that command on my LAN you will absolutely need to reform that your machine but for me, it's just that one command so some people may not realize the value of a web server for a simple thing like I want on my LAN to have this script available that allows me to join nodes to my configuration management solution it's absolutely a great use case for that and the value of this is if it's not accessible from the outside at all you don't have to worry about people joining your ansible then again, if they did, I own their machine so there wouldn't be much incentive for a threat actor to say, hey, I want you to own my machine I'm gonna hack into a system and run that command and let them own all my computers probably not something a threat actor wants to do now, of course, there's another mindset where we're using configuration management but we also probably shouldn't have anything in there that is personally identifiable or I'm never gonna have API keys in a Git repository especially this Git repository is private though no one has access to it the one that's on my GitHub is just an old snapshot from like two years ago my current working copy is nowhere anyone has access to but then again, you should probably still encrypt things even if there's a very small chance it could be used to hack your system still encrypted there's no reason not to let me get into Ansible Vault which allows you to encrypt files put them in your version control Ansible itself, you give it the password for the encryption that's able to decrypt but if someone actually breaks in and steals my Git repository well, anything that could be sensitive is encrypted so it would suck because I'd have to change my repository but then again, it's not that big of a deal because everything's encrypted and it doesn't really help anyone to have that information anyway, that being said I don't even think I have anything in there that would be a problem but the mindset is if in doubt just encrypt it because why not? It's easy to do and you may as well I always encrypt everything someone pointed out too and I believe this goes back to the some of the setting up the Kubernetes that you can create pre-join tokens for them as well so that's another way to solve that there's actually gonna be for generally in Linux more than one way to solve some of these problems there may be a way that works better for your instance or the thing you're doing Jay having things on the land mean he's not worried about public exposure of things if you're running this across the open internet you may wanna think about it differently and have to use one of the other solutions because there's a risk that well, someone could even if the timeframe is narrow for these join tokens you wouldn't want them to grab them off a public facing web server that could cause problems. Yep, so you could actually into that individuals point create a certificate, a self-signed certificate and use Ansible to copy that to the proper directory on the nodes and then there's that certificate match so I thought about doing that too I ran into some other complications so this part isn't done yet but the Apache style is a method I'm leaning towards but that's absolutely valid there's always more than one way to do something and I constantly find myself simplifying my own things later because you'll learn a way to do it later that was better than the way you did the first time and then you go back and look at your own code from years ago and you're like, wow, I could simplify that I could simplify that then you just watch the number of lines drop and the amount of time that Ansible takes also drop as well. So yeah, that's absolutely valid there's gonna be multiple ways of doing that now if it was an Ansible control host where you have a central Ansible server or workstation it really doesn't matter at that point because you have an inventory file that says this IP address is the controller these ones are the nodes there's really nothing to do you could capture the join command right there in the server and then run that against the nodes, no problem now Ansible without Ansible pull in my opinion, Ansible pull just works so much better in every use case, I've switched I switched companies to Ansible pull and they thanked me for this. So, but this is the one downside it makes that a little bit more challenging everything else is fine but that was a bit of a challenge so that's the last thing I'm working out right now but considering that I have the cluster created and I also went ahead and created the framework for a Kubernetes staging environment and a Kubernetes production environment because I figure, I just created three staging VMs for evaluating this and developing this I may as well leave them around because any other time I wanna roll out a container maybe I want to try a change I could try it against the staging Kubernetes cluster, see if it breaks if it works, then it could graduate to the main branch if not, I fix it, keep fixing it and once it works, just merge it into the main branch and we're done so now I have a, I call it a poor man CICD because there's not like Jenkins in front of it so continuous integration, continuous development is the mindset where you commit something to a Git repository and there's something on the other end that catches that and runs something maybe it builds your app if you're a developer or spins up a system if you're a home labber or does both if you have like no free time outside of that at all but that's usually how it works but for me, I didn't wanna I mean, I'm probably gonna run Jenkins anyway because I think it's a great piece of software but I figure, well, I have instances in Proxmox that are specifically looking for the staging branch they're my staging instances so as long as I push to that branch first then only my staging instances will get that I have literally like caught some big problems that I almost committed to the production repository by just having them go against staging and that saved me a lot of work and it's such an easy solution to come up with because you just simply have a variable called branch and your production instances get the main branch and your staging instances get the staging branch you set that one thing and that controls how bleeding edge your nodes are like this laptop in front of me right now that's the only workstation that gets the staging branch is its main branch because this one's bleeding edge so yeah, this one breaks every now and then because of that but I think it's pretty cool just to have a bleeding edge computer have another one in front of me anyway so it doesn't really matter if this one goes down I think it's the least you could do is have something like that set up to catch these kinds of problems and for some people that could be using like Terraform and VirtualBox which is also valid you could have a Terraform script that just spins up a couple of VirtualBox VMs and then runs Ansible against them and that's the way a lot of people do it and totally valid absolutely do it that way because Terraform definitely can bootstrap right into Ansible and as an aside someone brought up Packer earlier so you could have the entire chain Packer creates the image and then you have Terraform that takes that image creates servers or containers with it and then basically has another solution like Ansible come in and maintain it for you so that's the beauty of this you could literally have one tool chain into another one and have different tools for each stage of the process depending on how far you wanna go into it but when you are big into automation I feel like automating things becomes more fun than the work it saves you like we really appreciate the hours of work it saves us but I feel like in the back of our minds we appreciate more the fun of the project because we're probably spending more time on it than we're saving but it's just so much fun it's just like do you need to max out all your characters to beat the game? No, but it's so much more exciting when you crush the final boss You level them up all the way You level them up all the way I wanna ask you a question Jay how do you feel about Cloud and Nick? Cause that was one of them that you tackled that's related to this, you know being able to inject things right into the VM was that worth learning? It was, I feel like the problem is that the documentation is good but it's missing something because when you're starting to learn it it's very confusing So the reason why is cause Cloud in it is made for the cloud providers specifically like these deep engineers that are super into the Linux system that really don't need any hand holding at all so that way they can have a image like maybe Linode I think Linode might use it too but I think most of the cloud providers do and it allows you to have customizations in a config file and when it boots up it's going to do things like reset your SSH keys and so on but the issue comes when you try to use it in the home lab it's very simple when you boil it down to the lowest common denominator but trying to find documentation for that's hard so for example, I heavily use Cloud in it on Raspberry Pi on Ubuntu heavily I have this template for Cloud in it where I basically have it not create the Ubuntu user but create a user for myself instead and there was actually a bug in Ubuntu where if you so much as try to create your own user account and not Ubuntu it breaks the entire install and you will never log into that thing at all, period like you cannot access that so found that out the hard way and it's nothing wrong with Cloud in it and maybe Ubuntu doesn't have this problem cause I think it was with 2004 that I ran into this and I just found another way to later on further down in the Cloud in it file create my user account and it works just fine but what I like about this is you could have a Raspberry Pi image and then you have a Cloud in it file and the Raspberry Pi image is very generic so you just flash that on your SD card and before you pop that into your Raspberry Pi you grab your Cloud in it file and put that in the appropriate place so for example, you could tell it what host name you want the server to be and sure you could go into the image and hack that yourself I just find it's a lot easier just to have that in a template and put it on the file around the Raspberry Pi next thing you know, what happens and this is really cool the Cloud in it file actually calls Ansible it doesn't Ansible pull to the repository so I drop that file on the Raspberry Pi at first it starts off as a generic Ubuntu image that's no different than anyone else's but that config file causes it to set its host name and then grab Ansible run it against itself next thing you know, I have an alert on my phone provision finished for this Raspberry Pi that I just set up and all I did was just drop that file in there so Cloud in it is awesome I think one of the confusing parts is that when you want to create a user account I don't think the documentation does this very well what it really wants is a web server the web server if I'm not mistaken it wants a JSON file where you can set the user name that you want your user to be it's not explained very well I go a different direction but I think the idea is you have a web server that just serves a JSON file with these values in it and then you just tell the Cloud in it file where to pull that from and that'll pull in all the values Proxmox gets around this by creating a Cloud in it drive that has those values so when you set them in the interface it's creating a it or you creates a Cloud in it drive that you attach to the instance and what Proxmox is doing is putting those values there for a Cloud in it itself to grab and then pull in and run which saves you from having to spin up a web server for those values Proxmox has that figured out but yeah, I did a video on Cloud in it I'm trying to remember how advanced it went I should probably look into that and see how up to date that is because it's a pretty awesome technology Yeah, it came up the other day when I was in the forums for XCPNG because it also has that extension and it's been, they said that some of the maintainers of that section XCPNG had made a comment about some of the challenges keeping up with some of the changes because of the lack of really good documentation but you can actually create those as templates and maybe sometime I'll do a video on that that when you build something new you say, hey, attach this Cloud in it to it and it can set certain things as an option so yeah, pretty neat I think Cloud in it is a good fit for basically anyone that wants to create a template for their VMs or whatever Raspberry Pi's in my case and they want to have a certain number of things set like from the get go and this could just be as simple as running your automation solution the one reason why I delete the Ubuntu user is because by default in Raspberry Pi as well as Ubuntu Cloud images the Ubuntu user is UID 1000 but in all my systems, I'm UID 1000 so we can't have two so one of them's gotta go and it's not gonna be me so but then again, Ubuntu's images are heavily wanting to have that Ubuntu user by default unless you know the secret incantation I should probably just throw that if I can remember throw that Cloud in it file that I have somewhere within reach for people to get it's not that complicated there's only a couple of changes that I make probably like four or five and the rest is pretty vanilla but I think those four or five changes that I make are probably not the easiest to find because again, Cloud in it is mainly marketed towards the engineers that work behind the scenes that famous Cloud providers not that they're saying that you can't use it via the home lab that hasn't stopped us before I mean, come on, people are running like ESXi and Proxmox and XCPNG and all this enterprise stuff in their basement I mean, how awesome is that? So obviously it's within our wheelhouse so I think there probably does need to be more content on Cloud in it. Yeah, what's the work on that? All right, do we cover the whole documentate or automation mindset? Documentation mindset is another thing but do that to do that to help you. Yeah. No, I mean, there's also the argument your configuration management is your documentation which I don't agree with because there's some things you can't capture an automation but leaving good comments in your scripting that's definitely, please do that. Oh, for sure. Yeah, and even in Ansible just do the dash name colon and then have something descriptive like I have a system where it's like, role, task, purpose and the individual task it's like bracketed out and right in the output so basically what it exactly is trying to do but if not that then yeah, at least have a comment. Yeah, I do that. Even when I did the video just the other day I made sure to put it's just a really simple few lines in the Vast Script but I still put what those lines do in a very descriptive way. It doesn't just help the audience when I'm teaching people it helps Tom when he revisits that a year later going, why did I put that there? Yeah, and as someone mentions, they like Cloud in it but if you have one error it does nothing and that's true. Yeah. I think at that point one thing you can consider doing is if you could put the Cloud in it file in an object storage or NFS somewhere where something can grab it after networking is set up so networking runs, Cloud in it's gonna run if you can get in between that and pull that in you could take a snapshot of the VM right before Cloud in it attempts to run and if it doesn't work roll back the snapshot replace the Cloud in it file wherever it's grabbing it from and keep repeating until it does it and then you could just practice it and then put it in production as soon as you know that it works and generally speaking that's just a good way to do it anyway you wanna try to get to a point where obviously you don't wanna rebuild your VM from scratch every time you want to make an attempt with Cloud in it because the amount of time that's gonna take is insane but if you could snapshot it right before Cloud in it runs and then replace that file right there before it runs then you have pretty much infinite retries and the reason why I bring this up and this is another automation mindset I'll leave you guys with and this is a Windows tip which is unusual coming from me, right? But I understand some of you guys up there run Windows and I used to maintain Windows servers for a very long time. It's funny Microsoft made a three time limit for Sysprep so Sysprep is how you could generalize your Windows install it takes the drivers out, hardware IDs, software IDs just makes it generic your tweaks in your software is still there but you wanna be able to have a Windows image that you can install in multiple things you Sysprep that but you only get three tries so if you have an image, Windows image you wanna change that image so you restore it and then you change something in Sysprep or something in the image and then you Sysprep it again you can only do that one more time but if you have a virtual box VM that has your base Windows image and you take a snapshot before you Sysprep then you could Sysprep until the end of days thousands of times and it'll never run out because every time you Sysprep you roll it right back before that make a change Sysprep again and to keep doing that that mindset works on Linux too obviously because we have config files but anyway I can go on like for a whole another hour so unless you stop I think we're gonna probably go around We're gonna wrap it up here we'll save some for the next episode but we love hearing from you so if you hit feedback 2022 at the homelab.show that is a way to email us directly we forgot to mention at the beginning of the show maybe maybe next week at the beginning of the show we'll mention that we'll make a note to ourselves on there you can fill out the form we have a contact form these feed ideas to us we are also available on the socials you can find both me and Jay on well currently Twitter and its instance we also both have massive instances so there's different ways you can get a hold of us depending on when you're listening to this thank you all for listening thanks for the load for sponsoring and love hearing back from you so take care everyone see you next week see ya