 I'll either run out of slides or... Hi everyone, so thank you for your patience and welcome to PosAsia 2018. The list track will be on... That's the change right at the start of the moment. Cloud Continence and DevOps. So, without further ado, let me introduce you to Matthew Trinesh. He is a common source developer at IBM and his talk will be on Build Clouds with Adam Durich. Thank you. Thanks. So, I'm sure everyone's here because they think I'm going to sing like ACDC because of the title of the presentation. But unfortunately, I'm not talking about that. I'm talking about building clouds. Specifically small clouds. So, at my house, I have a lot of home servers, like a mail server, IRC server, that kind of stuff. And I thought it would be a great idea if I had a cloud at home to virtualize all of that infrastructure. You know, everyone talks about moving to the cloud. I didn't want to run my servers on an external cloud. I wanted it on a local cloud. And when I was in college, I was also a system administrator. So, I thought as a developer on the OpenStack project, it would be a very interesting exercise to see how it would work as a small, independent, single user trying to use the software I helped develop every day. So, the scope of the project was to pretend to be a system administrator without any OpenStack knowledge going into the project. I would rely solely on installation documentation and Google searches like everyone does when they're trying to figure out something on a computer to install OpenStack on something at my house. I set a hardware budget of $1,500, which is a sentimental value for me because that's how much money I spent on my first desktop computer with my bar mitzvah money many, many years ago. And I was only going to concentrate on building a basic compute cloud. Basically, I just wanted the infrastructure that I could boot up a VM and shut it down with networking on my home network with an API. At the time of the project, I was going to install the Okada release of OpenStack, which is from April 2017. That was the latest version of OpenStack when I was doing this project. It is now the Queen's release, I think. They do a release every six months of OpenStack, so the version is a bit out of date. I would not rely on any pre-existing installation scripts or automation. There are lots of projects out there to help automate an installation of OpenStack. I was not going to rely on any of that because part of the exercise was to do it by hand. One thing that goes against my theory, my thought exercise of pretending to be a system administrator is I decided to install from the release tar balls, so just the source code that the OpenStack project pushes out and says, this is our release. I did this primarily as an OpenStack developer. I wanted to see, as someone working on the project, how hard it would be to take just the source code we released and turn it into something working, which was actually very interesting for me. The first thing I needed when building a cloud was hardware. I needed criteria with which to buy hardware because $1,500 is not a lot of money to buy servers. My first priority was core count per US dollar. For a cloud to be useful, you want a lot of capacity. The most expensive thing for capacity is going to be the number of CPU cores you have. My second priority was RAM per core. The more RAM I have, the more virtual servers I can have. I can over-commit servers with virtualization if I need to as long as they're not loaded down, but RAM becomes a limiting resource. The machines don't need to be fast though because that costs money and that's something I can't afford. I started looking at what was out there. A very popular thing for home clouds is to use the Intel Nux. I think it's the next-unit computer or something. There are little boxes with laptop processors in them that are a few hundred dollars. The problem with that is they're laptop processors. They have two or four cores, which would not have enough core count for what I was looking for. They're very pies, the same problem. They're four cores. They're also really RAM limited. I also looked at new servers from my employer or someone else, some other company that sells servers. That would have blown my budget on a single server if I could even find one that cheap. My solution was to go with eBay. For those who don't know, eBay is the web auction site. It's very famous, but what a lot of people don't realize is that there are data centers that are upgrading their hardware. There are companies out there that buy that used hardware, refurbish them, and put them up for sale on eBay or on their own website, and you can get things really, really cheap, especially from almost a decade ago, which is what I bought. After searching on eBay and graphing all of my options, I found these Dell PowerEdge R610s, which are from eight or nine years ago, that were $215 each. They have two quad-core Xeons in them, so eight physical cores and 16 virtual cores, which is a lot of capacity. They had 32 gigs of RAM. They also said they came with two 10K SAS hard drives and a lot of Nix, but it was $215, which was great. So I bought five. This was the day they were delivered to my apartment. They came in these giant boxes. The FedEx guy just left them outside my apartment door in the rain, and all of my neighbors came and stared at these giant boxes, blocking my doorway as I'm running in and out of the rain, trying to get them in before they get damaged. But then I had a problem. I've got five rack mount servers. Where am I going to put them? They take up a lot of space. I started looking at commercial racking solutions, even used ones on eBay as well, and that would have been $100 or so. And after buying $200 each, five of them, that's $1,000, shipping was pretty expensive. I think it ended up being like $11 or $1,200. I didn't want to spend $100 to $200 on a used rack, because that would not have given me enough wiggle room in my budget in case something else came up. So I ended up going with something called the Lack Rack. I'm not sure who's familiar with this, but this is an IKEA side table that has a 19-inch width between the legs, which is the exact width of a rack mount server. So it was a perfect fit. I didn't come up with this idea. Someone in the Netherlands did. It was a great idea. The table is $9 or $10, and it comes in a ton of different colors. So how many racks that you can buy off the shelf from made out of metal come in different colors? So I racked my servers, and I bought a yellow one because yellow's cool. And then I also bought some casters and put them on the bottom to push the servers around. Turns out those casters cost two or three times as much as the rack did, which was an interesting thing. And then I found a place for them in my bedroom closet, my data closet. And I wired everything up, and I was able to power things up and start inspecting the servers. And, you know, I don't know how many people have bought things on eBay, especially used hardware, but it's never what you get. It's never what they describe. So these servers, after I was able to power them on and start looking at them, they're super stripped down. There is no management interface card for IPMI or remote control of the servers. This was a standard feature from Dell on all of these servers, but they take them out and charge $30 for it, you know, to maximize their profit margin. But it didn't come with this. They also took out the redundant power supply that came out of the factory from Dell with two power supplies for redundancy. They just took one out because you don't need the redundancy and they can charge you an extra $30 for the redundant power supply. They also said I was going to get four... They said I was going to get eight four-gig sticks of RAM. I ended up getting four eight-gig sticks of RAM, which was an advantage for me. But the memory was installed in the wrong slots on half of the servers, so the dual-channel configuration wasn't right and things weren't booting properly, so I had to fix that. One of the servers came with a dead rate controller battery, which just means that the little light on the front is orange every time I boot it, and since I'm not using RAID it doesn't matter. The other advantage for me is that it came with 15k SAS drives instead of 10k SAS drives, which gives me a little bit of a performance boost for the penalty of noise, but that was just something that was different from the specs. My favorite quirk with the servers was they came pre-installed with Windows Server 2012 with the password Apple123, so someone at this company has a little sense of humor. I don't know why they didn't put, like, Linux or something to test the hardware and they used Windows, but whatever. So after I got the hardware set up and I did a base Ubuntu install on the servers, it was time to install OpenStack on these servers to figure out how to build the cloud, how to deploy it, and coming at it with no experience, it can be a bit daunting. You go to the official Git repos for OpenStack, there are 1,800 Git repos in the OpenStack, for the OpenStack project. You go to the government's documentation, there are over 50 project teams. It's not super straightforward to figure out just by looking at the project at, you know, the source level, the developer level or even the governance level how everything comes together. And all I wanted to do was build, you know, a small cloud that can boot VMs and I can log into them and then I can do things on it. But it turns out that there is something called the Compute Starter Kit, which is documented on the foundation marketing website for the project. And you only need 4 projects out of those 1,800 Git repos to do what I wanted to do. These 4 projects are Keystone, Glance, Nova and Neutron. Keystone is used for identity management, so user management, authentication and also maintaining a catalog of all the services running in your cloud. Glance is used for image management, so VM images that are used to boot servers, it keeps track of those. Nova is the service that actually boots the VMs and manages the VM lifecycle and Neutron handles network provisioning. And with these 4 projects, I would have the base set of functionality that I was looking for in my cloud. And it also turns out, if I didn't even think about this problem and just read the install guide blindly, that these are the main projects that the install guide is concerned with anyway. So it wasn't that much of a big deal, but it's confusing for a lot of people, especially if you have no prior knowledge of OpenStack. So with that, and after reading the install guide, I decided on how I was going to install the different services in OpenStack on my hardware. So I have five servers and I decided to name them AutoCumulus because I have OCD and think cloud should be named after clouds and AutoCumulus is a type of cloud. And I decided to do four dedicated compute nodes. So four nodes that were dedicated to just running VMs, and those are the AutoCumulus 2 through 5. Those servers just run the Nova compute service to manage the VMs locally on the machine. And then the Neutron Linux bridge agent, which is a daemon that manages the Linux bridge configuration for controlling the networking to those VMs. And then I was going to set up a controller node, but also have a compute node on it because I have limited hardware. This is all in one node. So I have all of the API servers running on there as long as the database and the message queue. But then I also am running a Nova compute and a Neutron Linux bridge agent on it so that I can leverage the hardware on this one node as well for real workloads and running VMs. And I thought this was a good balance based on my limited hardware and the limited load I was going to have on this cloud. If I was doing more work, more dynamic work with more than one user, I would not recommend this configuration because the database and the message queue will get loaded down with a lot of extra traffic. And then I started following the install guide going through the services. It starts with Keystone, then goes to Glantz, Nova, and Neutron to set up all the base services and then all the daemons. But I hit a snag because I was installing from Tar Balls. In the install guide, it says, app get install Keystone. Well, I wasn't using the Ubuntu packages for OpenStack, so I couldn't do that. So I looked everywhere. I googled search, how do you install OpenStack from Tar Ball? How do you install OpenStack from Source? Turns out there's no documentation in basically anywhere. So I failed in my thought exercise to try to do it as a system administrator with no prior knowledge. And I put that aside and I have done this before. So I documented the steps you need to do to install from Source, which is you download the Tar Ball, then you have to create a service user for the daemon to run as. You have to install the binary requirements for the Python... So OpenStack is a Python project. Some of the Python libraries it uses have C requirements or other language requirements that the Python links to. And you need to install those. The problem with that is a lot of projects don't actually document this. So when you do pip install to install the Python code, you'll get an error saying, oh, it can't find, you know, this library file. So then you have to look up the library file and install the package and then start it all over again. So if you can find the list of binary requirements beforehand that's helpful, but oftentimes it's trial and error. Then the other thing you have to do is you have to create all of the state directories and configuration directories on the machine. So the, you know, like your nova.conf, your neutron.conf, all of the configuration files, you have to create directories and etsy for that, and then also state directories and varlib. And then inside the tarball there are sample configurations and default configurations that you need to copy into those created configuration directories. This is because the Python packaging ecosystem doesn't actually provide a way to install anything besides the Python code where Python code is supposed to live. So if you say, okay, install the nova package for me with pip, which is the Python installer command, it will just install the Python code to where Python code lives and completely ignore all of the sample configuration or any other data files that you want. So you have to do that manually from the tarball. After you do all of that, then you can pip install the tarball and continue following the install guide. If this doesn't make it clear, use a distro package. All of these steps are the exact role that distro packages serve. You don't have to be a masochist and do what I did and go through all of the exercises when a distro package works just as well and there's nothing to actually compile or hand tweak. It's just Python code, so it's just text file. But after going through all of this, I started with Keystone project and I did pip install and I started Keystone and I hit my first issue, which is another issue with the Python installer and managing Python projects which was a requirements issue. So I did pip install Keystone and it ran and then I started the service and then I got a runtime version conflict saying that one of Keystone's dependent libraries was using a version that was incompatible with what pip had installed for me. I had not installed this package manually. I said pip install Keystone, and it installed its dependencies and so on and so forth, but it turns out that pip does not have a dependency solver, so it just goes in order of each project's requirements files and whatever ends up being installed last for a version if there's a shared requirement is the one that's installed and by doing this on Keystone, it installed itself in an incompatible state and the service wouldn't run. When I saw this, I remembered that there's something called pip constraints, which is something that we developed in OpenStack and pushed upstream to the pip project to maintain a single version that you install. So you say pip install with a constraints file and it lists a specific version for each package that's known working. So you basically bypass the dependency solver by doing it out of band manually so you know there's a known working version for each project. And after reinstalling using that, I was able to get past this and start the Keystone service. And then we got to networking. We were going along through the install guide, hitting maybe little mistakes here or there that I made, but simple enough to fix. But then it came time to set up Neutron, the networking service. And I was a bit overwhelmed. Reading the install guides, it's asking me all of these questions. Do you want to use self-service networkers? Do you want to use provider networking? I had no idea. I started reading the networking guide. There's a separate document about how you do networking. And I read through all of this. And then I found this diagram, which is for the Linux Bridge provider networks, and that's straight from the networking guide. And it explains the topology in my head that I thought I wanted for my cloud. I have everything on a flat L2 switch network with a single unmanaged switch on my home network, which is a pretty common home networking setup, I think. And all I wanted was on the compute nodes for the guests to come up, have an interface on a bridge interface that's connected to my physical network. And looking at this diagram, this is exactly what I wanted. There's a physical network, just one layer, two, and everything connects to it, and it all be up on the same network. So from my desktop, I could SSH in very easily. Everything would work. The problem with that is it's all on a single L2, which means there's a shared broadcast domain. So when I have my home router with its own DHCP server, if you go back, there's all of the stuff on the right with DHCP. Neutron has its own DHCP because it assumes it's on its own isolated network. So when I set this up, my DHCP server on my router was getting lease requests from the VMs when they would boot up, and also Neutron would see that. And luckily, the default firewall rules for the VMs on Neutron would block the incoming reply from my DHCP server and would only get it from Neutron. But if I ever changed the firewall rules, then there would be a race condition between the DHCP servers, which would be no good. And I also was getting log files in my DHCP server, and it was very messy. So I decided, okay, let's see if I can... I'll just turn off the DHCP. And also because there's no DHCP, I can't use the metadata service because for those who are familiar with AWS, they know there's a metadata server that when a guest comes up, it can pull that and it will get metadata back about how the guest is set up. It's run on a static IP. I think it's on the previous slide. No, it's 169-something. It's a hard-coded IP for a metadata service. And because of that, when the DHCP agent is running in Neutron, it will set that static route so that VMs can ping that. But if I'm not running DHCP, I can't use that. So I had to turn that off and DHCP off, and then I ended up basically with this diagram, which is the same thing, but without any of the DHCP. And all of my computers are on the physical network, along with my router, and then the internet's exploding because it's the internet. So with that, I think... I thought I had gotten the network settled and I was ready to move on. But then I hit this issue. I'm not sure if you guys can read this, but this is what happened when I started Neutron after figuring all of that out. For those who can't read it because the text is probably pretty small, it says the error message... It's a big stack trace in Python, and then the error message is unserealizable message, error, value error, IO operation on closed file. And that's all it said. I have no clue what this meant. It's no real indication of what was going on. It turns out this is how Neutron tells you that the net namespace command is not installed on your computer. And this is it saying, net namespace not found. It took me about two hours to trace through the code and figure out what was going on to find that. The way Neutron does privileged operations is it runs a separate daemon and there's a socket interface. And it sends the command it wants over the socket interface that's running as root. And that separate process will run the command as root and then return the result. Except when it sent the command to do net namespace something and the command wasn't found, it just closed the socket. And that was what this error was. But there was no indication of that. So I had to actually trace through the code and figure out what was going on to figure out I was missing the command. Which goes back to the previous slide with binary requirements and not always being so obvious at install time what is required. After tracing through that and figuring it out I was ready to boot my first guest. And I got this. I said, you know, OpenStack server create with all of my parameters to boot it from an image and nothing happened. Well it said it was creating the server but it just sat there not doing anything. And I was, you know, at a loss. So I started tracing through all of the logs and Nova said, yeah, it's booting the logs and going through, you know, getting the image, starting it up with libvert. The only indication I could find of anything being wrong was this. Which was a debug error. It was a debug message. Not a warning message. Not a log message. Which just said wrote zero bytes to that image file with that checksum which just means it's empty. Which was definitely not right because I had uploaded an image to test, you know, to boot. It was, you know, so it was like 20 megabytes. So it should, you know, it should boot. There should be something there. But this was the only indication I got. What I ended up having to do was trace through the glance code and add print statements and log statements to indicate, you know, where the data was going through the file or going through the code. And I found out that it was outside of the glance code. So I had to figure out which library was wrong to call a new version. And after correcting the requirements issue, it ended up working. And after getting that, the guest booted. But then I still wasn't able to log in. I could see the console log and see the server booting, but I couldn't log in. I couldn't SSH into the guest. And this comes back to the metadata service I was talking about before. It turns out the version of cloud init at the time I was doing this doesn't understand static network configuration. I couldn't do that with a DHCP or a metadata service. Because that didn't work, because I turned off the DHCP server so I couldn't set the route to that IP, it was never getting in any IP address. So it was trying to DHCP, but it wasn't getting a response because that was blocked. So it would just sit there. What I ended up having to do was create all of my images using something called Glean, which is an alternative to cloud init to manage the initial settings from a cloud when the guest is booted. So I would have to manually create the guest images for each server type that I wanted to boot, which is not an ideal solution, but it at least worked for me. This has since been fixed. This was actually fixed upstream in cloud init when I originally did this project, but it had not rolled out to the Fedora cloud image you would download or the Ubuntu cloud image, though it had not pulled in the newer version. The newer versions of the cloud images were equal to 7.9, so you won't have this problem. But if you do, you can always use Glean or another project to create your images if you're hitting this issue. So just to kind of come full circle, the biggest pain points with doing an install like this for me was Python packaging. Almost all of the issues I hit were because I was using pip from tar balls and it doesn't manage requirement. It doesn't have a dependency solver. It doesn't manage any data files or configuration. It doesn't understand any requirements outside of Python code. That makes it very difficult to install any of this stuff from source for any Python project, not just OpenStack. And the other thing is that OpenStack is a complex system of software that's doing a lot of things under the covers. So debugging requires a certain level of instruction under the covers. You need to know a little bit about LibVirt, a little bit about networking, or in this case, a lot about networking to get everything working. But it's honestly not that bad. 90% of the issues were caused because I did tar balls instead of distro packages. The only real issue is networking and neutron. So it's not that hard if you want to do this yourself. We could improve our logging and the project and error reporting, on a similar scale. So just to come full circle because I'm almost out of time, I had a bit of a crisis. What am I going to do with this cloud? I just spent $1,500 and all this time. Now what am I going to do with it? So I came up with two examples, which is the only thing I could think of to do it, is OpenStack development. Having an OpenStack cloud locally is very useful for developing applications on top of OpenStack. Having a low latency API you can hit to write applications with is incredibly useful. And you also don't have to pay for the usage too much. So it's not... It's very good for developing. And then the other thing I found is something I'm calling cloud native compute workloads, which are embarrassingly parallel tasks where you can just spin up... I have 80 virtual CPUs. If I can use those all in parallel where they don't need to talk to each other for some kind of task, it's pretty good. In my case, I used this application I wrote to do, some transcoding. I do a lot of transcoding at home, but I'm not going to go into too much detail on why, but... So that's what I was using it for. But then there are a lot of reasons you don't want to do this. The first is five 1U servers in your bedroom closet is not a pleasant experience. They're loud, they're really hot. I mean, like, I can't sleep at night if these are running and I'm in my bed. Also, the power bill is kind of ridiculous. At peak draw, they're drawing 1.35 kW to 1.5 kW, so it's not that pleasant. And at the end of the day, I just wasted $1,300. I could have spent that on a weekend vacation somewhere nice. So with that, I have some links for some extra information including a blog post which goes into all of the details and a link to these slides if you're interested, and I think I'm out of time. So, thank you. Thank you so much, Nancy. So if you have any questions, please support me. I think you'll be around the space of the entire day. So shall we have Nancy come up?