 Okay, good morning, everyone. 9 a.m., a lot of machine machine last night, very fun. You guys out there, good HP fun. So thank you everyone for coming to my presentation. I wanna be sharing some experiences I've had over the last couple of years, dealing with users and their instances. So let's get started. So first, let me introduce myself. My name's Abel Lopez. I started working in OpenStack in early 2012 on Diablo. I've done deployments, I've done deployments, automation, bug reports. And if you use the Chef Cookbooks from Havana and Pryor, you've probably seen my code. And you may remember me from AT&T, we had several large installations there. We had seven data centers with a few thousand cores and some pretty big tenants there. Presently, I'm at Cisco, working on our next level cloud offering. And this slender fella was present in Atlanta, but he would have been holding a Demitas espresso in real life. So you may be asking yourself, what the heck is an image life cycle? So I'm gonna talk a little bit about specific tools, but that's pretty much what it is. You really wanna have a plan in place onto how your users are gonna utilize the resources that you provide them. These things, these three steps, these should not be just some guy's task. This is gonna be something that we can automate or replicate and this can be something that can be done by anyone in your team. And really, you wanna provide some kind of uniformity, so it doesn't matter if you provide Fedora or Suse or Ubuntu or Red Hat, it should be pretty much the same set of experiences for your users. Now, why? This is the important, okay? Not only is this a good thing for your users, but it's a really good thing for your own cloud. If you do public or private cloud, hybrid cloud, this is a good thing to have. Do not assume that your users know what you're talking about or even what that you think they should know, so that'll reduce confusion. And let me tell you an anecdote about all the times we've run into issues with tenant VMs that could have already been resolved had they been using the latest set of baked images. So, your operations team spends a lot of time resolving issues that have already been resolved, image life cycle. If I had a nickel for each time, and I want a nickel, all the old problems. So, how do we get to image life cycle? There's a lot of great tools out there already in existence, so we really should, like we saw in the keynote, stand on the shoulders of giants. We shouldn't be reinventing the wheel. If there's something great out there already, we should probably be using it. This is OpenStack. We like to automate all the things. This provides for a repeatable process that can be managed by anyone. Most importantly, though, this is not a one-shot deal. This is something we need to get it right and then do it over and over and iterate over and over. So, let's describe the image life cycle. First, we need to invest the time into automating the image creation process. The goal is to get this down to one step, one executable script that's gonna take everything from soup to nuts. The result is something that's ready to go into glance right away, but first we need to test it. So, we need to automate our testing suite. So, what I do is I have a simple set of assertion tests. So, everything that I do inside my VM images, before it goes into glance, gets asserted and tested and it's a nice little pass-fail thing. So, we know we're putting quality stuff in. Like, I know a lot of shops like to do maybe a certain set of NTP servers or maybe they like to enforce a certain cloud init user over the default. So, this is all stuff that you would write a test for to make sure that it actually got into your image before you put it in. So, the third step is we need to deprecate your older images. There is absolutely no value, in my opinion, in having your latest release images out in glance and also the previous version and the previous version. This is just gonna add cruft to your list and it's gonna add to confusion amongst your users. Names are very important. We need to come up with a really good naming scheme before we can upload to glance. So, I know everybody loves UUIDs, I love them too, but the end users, they're gonna see the name that you create so we need to make sure that that is a good image name. So, once everything is done, put it in glance, ready to go. So, let me go through some examples of what not to do. So, this, I have an example of a bad image listing. Not gonna point to any fingers, but I mean, this doesn't make any sense. If you're a user and you're looking at this list, what am I supposed to pick? And they're all active, they're all public, they've got people's names on them. I don't know what to make heads or tails of this year. I mean, this is, you see, seemingly different versions of the same thing. There's a precise and a 1204 and a trustee and a 1404. The point is, I am certain some ops guy somewhere can tell you exactly what each one of these is when it was put up there and what it's for, but the point is, we're not the intended audience. This is for the consumption of our users, not of ourselves. So, let's take this as a gamble right here. Why is this a bad name? So, let's take it apart bit by bit. No, this may be great for the audience here. We can read this and kinda make heads or tails of it, but let's not assume that our users know code names and distro releases. I mean, let's just not use them at all. Sometimes they're cute, precise pangolin, that's cute, but sometimes they're kinda questionable, beefy miracle. You really wanna have your enterprise users, I mean, if you're hosting a public cloud with big corporate users, I don't think you wanna be sharing your beefy miracle with them. I mean, don't assume that, what does trustee mean? I know what trustee means. You guys know what trustee means, but your end users, is that secure? Is that trusted computing? Who knows? So, let's just stick to actual names, and furthermore, this is just way too long. I mean, look at all those characters. I mean, that's like super telephragilistic up there. I mean, who's gonna type that into the command line? I mean, actually, you're better off using a UUID at this point. It's just so long. Now, really, they put the platform up here, AMD64. Now, okay, I know everybody gets the warm fuzzies knowing that they have a 64-bit operating system, but really, it's 2014. I have not been able to buy 32-bit architecture in like six or seven years. So, let's just assume that everything we're offering, unless you have a very specific use case where you're offering, you know, Spark, or, you know, the ARM, I think I'm gonna go on to guess and say at least 75% to 80% of the people in this room are offering one single architecture in their cloud. Now, granted, there are a couple of use cases where you may wanna have 64 and 32-bit, like if you're doing micro instances with like one processor and 128 megs of RAM, then 32-bit architecture totally makes sense, but if you look at most of the flavor lists that are out there, they're usually two gigs of RAM, four gigs of RAM. I mean, why are we even listening, bothering listening to the architecture? I mean, it's kind of like saying, hey, I have a car with wheels. So, it's all 64-bit. Up here, disk one, should there be a disk two? What am I looking for? You know, am I missing something? What's going on here? It's just an extra artifact that we just don't need to list in the name. This date, is that the expiration date? Heck, I don't know. I mean, if you put a date in your name listing like this, the very first thing your users are gonna do is try to compare and search for the latest date possible. Just keep it simple. I mean, if you want a date, you can totally put that in the image metadata, but it doesn't really belong in the title. Now personally, I prefer my steak medium rare, although I did have some very good beef tartare this week in Paris, so. There really is no value in presenting the disk format in the name, because that's an attribute that already shows up in the list, so it's just redundant now. So, I don't know who uploaded this image to glance, but we really need to get some standards in place before we start pushing this thing out to customers. Like, I understand there is value in knowing the disk format, taking back-end considerations, performance issues, whatnot, but it's redundant to have it in the name because it's already in the listing. One more example. This one, what's wrong with this one? Well, anybody here familiar with Red Hat Systems and point releases, yeah? Anytime you do a yum update, you're gonna have this little number go from a three to a four to a five. It's a point release. It really doesn't mean anything to your end users. They may think it's important. Oh, a vendor software says I have to run this on a Red Hat 6.3, right? But it'll work on 6.4 and 6.5. Point releases in Red Hat, they seem to be pretty benign patch updates. I mean, I love Red Hat. They put out some really good OSes, but I don't think we need to have that. Now, let's not get confused with Ubuntu because they have the year and the quarters at the year and the month as part of the whole release version, so it's like saying Ubuntu 14.04 is different than saying Red Hat 6.5. This is Red Hat 6, that's all I should say. I mean, the reason I say this is because we're making the assumption that our users understand semantic versioning and that's a pretty bold assumption to make. They should just know, is it Red Hat 6? Is it Red Hat 5 or 7? Is it Ubuntu 1204, 14.04? We don't need to do like Ubuntu 14.04.2. They shouldn't need to know that. Again, with the dates, the natural inclination is to just look for the latest date. Platform architecture, I guess I got myself kind of backwards, so platform architecture we already talked about, but dates, I mean, should we read looking for a newer one? What the heck is this version string up here? V5 RC10. Who put that up there for my users to see? I mean, that probably means something meaningful to the guy who made this image, but my users are gonna be looking for an RC11 or it's really meaningless. I mean, I have to deal with, first I have OS major and then minor and platform and now I have this version string, so it's kind of like an alpha, delta, hike, hike. And again, we have the format included in the disk in the image name. Okay, so here's an oversimplified good image listing. Now, I know, I know, it's AWS, but look how clean and concise it is. Ignore the little windows thing here. Oh, by the way, this talk is kind of Linux centric. I don't really do, I don't wash or wipe windows, but look, this is simple. You have a little bit more information than I provide, but it's just CentOS 6, Debian 7. It's clean, it's concise to the point. Now I realize that you can go into the AWS marketplace and AMI your dreams away from any Arch Linux and Fedora, whatever you want, but this is what the front end, if you're in the free tier of AWS, the very first thing you see is pick your OS. Which one of these do you want? It's clean, concise, it tells you everything you need to know. Oh, real quick, how many of you here are on the OpenSec operators list? Okay, it's a couple of you guys. A few months ago, I did a survey on the ops list. I sent out a questionnaire and I had about a hundred responses from the list, so it was pretty good turnout. These are the results, just on naming conventions. It's pretty self-explanatory. The vast majority, 67% like this difficult name. CentOS 6.5-x86.64, that was one of the questions. Which one of these names do you prefer? Now, combine that with the red, that gives you 84% of the members of the ops list like to present users with really complex names. 16% like simpler names, and there's a tie, there's an even split amongst whether or not you should just provide the major OS version or the major and minor version. Personally, my images are in the blue section. CentOS 6, done. It's an interesting graph, and I'd like to thank the operators community for participating. So, we've talked a little bit about the life cycle. There's a lot, now we're gonna talk a little bit about the tools and the automated creation process. A lot of great tools. Just choose one that works for all. Now, there's a lot of great tools. Back in 2011, 2012, I was all about Box Grinder. The only problem was is Box Grinder only did red hat-based distros. So, I could write a plugin to do Oracle Linux with Box Grinder, but I couldn't do a boon-choo with it. So, that was kinda weird. These three are my favorite today. They do all of the distros, and they do them repeatedly in an automated fashion. Out of the three, I personally prefer Dib because it's part of the open-sack ecosystem, and if you start using Disk Image Builder today, you're kinda getting in the line of being ready for triple O when that finally goes mainstream. HP? No? Oh, you just wanted the sweater? Yeah, the guys at HP did a really good job with Disk Image Builder. It's all written in bash. So, anybody can just go and hack on it. You don't have to be one of these kick-ass Python guys that are all over the room today. Packer? Packer is from our friends over there at Hashicorp. What's his name? Mitchell? Hashy? The guy who does Vagrant. Packer Plugs is another project that kinda goes, works with Vagrant. So, there's a lot of value in using Packer because you can use Vagrant for your development environment and then create the exact same image that you use in development environment, using Packer and put it into libvert as a QCOW image into Glance. So, it's kinda cool. My beef with it is that it's basically a scripted installer. So, you have to provide a kick-star or a pre-seed and it just spins up, it creates the disk, brings up a VM and then runs through the installer. So, there is a problem with nested virtualization. So, a lot of the times, you run into timeout issues if you're running this in a VM. So, Packer has to run in bare metal to be useful. Oz falls in the same category. It's another scripted installer. It's pretty good. Though, if you do have the time to put into it, each three of these are active. My favorite thing though is with Disk Image Builder, I can have a bootable, usable, guest image in about five minutes because this just runs through a series of scripts, downloads a pre-made image, does a series of elements they're called and the resultant image is what you specify on the command line. So, it's a really cool tool. Five minutes into it, you have a working image. Too much machine, machine last night. So, testing. Here's some of the tools. I mean, there is a boatload of ways to do automated testing. ServerSpec is nice for everybody who's already familiar with the spec language. One of the drawbacks, and maybe it just hasn't gotten there yet, but ServerSpec assumes that you already have, you're either on the instance you wanna test, or you can SSH to it. So, I like to test my images before I put them into glance. I have a series of shell scripts that do assertion tests, mount the image, cheroot into it, and then just is the cloud init user what I expect it to be. Is password SSH disabled. Is the NTP servers I expect set up. So, I have a pretty cool series of shell scripts, you know, inspired by the disk image builder. And they even do like green and red output. So, any operator can just add a glance, see, all the tests passed, this test failed. It's really cool. And of course, there's any other tests I haven't thought of. But the point is, we wanna encourage testing. We wanna encourage test-driven development. And I personally chose shell scripts because the DevOps guys that I was working with, that they say like, well, I don't know Python, I don't know our spec, the annual bash. So, I wrote tests in bash to encourage them to see the tests that I have and say like, hey, I can write tests for that too. So, we wanna encourage, anytime there's a change, there's a corresponding test for that change. And I think, you know, the more tests the better. Image deprecation. We've already established what a good name is. And once you employ those good names, image management becomes pretty easy. Anybody watch the Glint talk the other day? Yeah? No? We're talking about using names instead of UUIDs to address our images. That's exactly what I'm talking about. Come up with a good name, and then you can start doing operations on those names. For example, we have like this Ubuntu 14.04, it gets renamed in Glantz to Ubuntu 14.04-old. That's very intuitive. Your users immediately know not to launch instances off the old image. It's three letters, done. So, using the same name, the new image gets uploaded in Glantz. Now the UUID is unique. So, you can have same images in the database, different UUIDs, but we've already renamed the old image as old. So, users won't have any confusion there. We upload the new image using the old name. Simple as that. And this keeps your images fresh. Now, I believe this should be on a cycle. I mean, we can get this automated process down to be run out of cron. Every 90 days, yeah? I think so. I've been in IT for about 14 years, and I don't know how many sysadmins there were here back in 1999 or 2000, but back then everybody used to brag about their uptimes. Oh, I've got 1500 day uptime, and I've, no, once YQK came around and everybody needed to reboot their systems, or the DST changed back in 04, you guys remember that? Yeah, big uptime means nothing these days. So, if I go on a system and I see something that has more than a 90-day uptime, it just tells me that somebody is not doing their job. Updates aren't being applied, kernel updates, security updates, things that require a reboot, that's not being done. So, we do this often, it gets us into a pattern. So, anybody here running unbreakable Linux? No, no case splice? Okay, reboot your, we need to reboot our systems on a periodic basis. The same goes for our virtual servers. So, again, this is a cycle. Lather, rinse, well, unless you're me, there's not much lathering, rinsing, or repeating going on. But, open stack. For better or worse, they release twice a year. Now, if you migrate, if you do the upgrades, a lot of times one of the options is to do migrations. Stand up a parallel site, migrate tenants over, migrate their VMs even. Personally, I think that's bad, because I'm a firm believer in the whole pets versus cattle argument. I realize that some users have pet instances, but if you introduce this timely image life cycle, where every month, every two or three months, your old images are going away, new images are coming up, it's gonna get your users accustomed to this periodic change. They're gonna, you're gonna have avant-garde users, they're gonna say, oh, a new image came on, let's launch new instances off of that. They're gonna get into the habit of tearing down their instances and bringing up new instances. That's a good pattern to get into. So, if we encourage our users by way of a guest image life cycle, we get them accustomed to having cattle. So, stick to a release cycle. If you sit down with your engineering teams and your ops teams and say, we're gonna release new images every six weeks or every three months or whatever it is, stick to it. And also, define an EOL for your distributions. I had a tenant that was dead set on running Lucid. Now, Lucid's a LTS release, but it's what, originally from 2000 and, jeez, what is it, 2010 was Lucid came out? Their application absolutely demanded. It only ran on Ubuntu 10. The problem was is when we started upgrading from Diablo to Falsum to have that Grizzly, we did away with the whole EC2 style metadata API and went to Config Drive. And the version of Cloud in it that shipped with Ubuntu 10 didn't know how to get metadata because it didn't have support for Config Drive. Thankfully, they went to their vendor and realized there was an updated version to their application. They just never bothered to look for it. They were just totally unaware that there was a new version that ran on Precise. So, turned out well for everyone. Ultimately, infrastructure should be defined, infrastructure should define the architecture, not the projects that run on it. Keep that in mind. You need to have an archival strategy. This isn't really a big deal if you're hosting your images on Cep, S3 or Swift, but if you're doing the file store, your varlib glanced images directory is gonna fill up pretty fast. Thankfully, you can just mount up a SAN or NFS share on that and go at it, but we need to define a way to put out the old images to Pasteur. We deprecate the images. I mean, it's a good thing. You can keep them around. Their mark is deleted. If a tenant really, really needs last year's CentOS or whatnot, they can member create it. So, another question that was asked on the Ops survey was what is your store? I was amazed to see an even split here between Cep and the file store. I mean, everybody's got to have like 30 terabyte var partitions or something, or they have a really good image nuking strategy in place, but people really like the file store. Surprise, S3 is not very popular. 3%, I mean, that's just a pretty small slice of pie. What was kind of surprising, though, is the green RBD store, that's Cep, is more popular than Swift. Swift is open stack. So, more people prefer Karan ink tank red hats object store than open stacks object stores. That's kind of interesting. So, let's talk a little bit about the different stores. As you know, the file store is the default in glance. You don't do anything, a file store is the default. It's easy, it's just a directory. Excuse me, there's a lot of smokers in Paris. Now, one of the cool things about the file store is you can just attach something to it. If you don't want to bother with setting up Swift or Cep, you can just put a big ol' NFS share on varlib glance images and go to town. I have seen some problems, though, if you scale out your API servers using the file store, like, let's say you have three or four glance controllers using file store, there can be a problem where the compute node asks the wrong glance host for the image and I don't have that image. I mean, it's not a problem if you do, like, mount NFS and everybody uses the same one. The solution I came up with was pretty cool. I installed BitTorrent sync on the glance hosts, so one image goes into arbitrary node. Now, all the other nodes have the same image. So, I mean, personally, I'm a big fan of BitTorrent and I think the technology gets a bad name from the pirates, but we should really be looking into that. Especially for, like, cross-site, are you kidding me? You upload an image one place, BT sync it somewhere else, and then it gets kind of populated to all the other nodes. That swarm protocol is, yeah. Like I said, scale out, be tough. Now, in my opinion, the RBD store shows the most promise. Have you guys ever been to a SEF talk or a SEF presentation where they rank everything in terms of awesomeness? So, yeah, the RBD backend is totally awesome in Juno, plus it works in Havana, an ice house, but there is some patches that you have to import from Ink Tank's private fork of Nova and Glance. Once you get that, though, it really is awesome. Local compute node storage is irrelevant at this point. You can run, your ephemeral instances can run directly in SEF, so that is just one less thing to worry about. And we have this 16 to one CPU oversubscription ratio. You have your RAM oversubscription. You don't have to worry now about your var directory running out on your compute nodes, so it's cool. The spawning phase, you guys ever seen, you launch an instance, it's downloading the image from Swift or an HTTP location, spawning, spawning, spawning, spawning. You put your images in Glance, and if you put your Glance on SEF, your new images, your instances are essentially clones of the image, so they instantiate really quick. The only drawback with SEF, though, is you need to have a pretty robust operational infrastructure to maintain it. We've seen issues where you need to have disk pull happening sometimes, you ran out of a disk, an OSD failed, your crush map got messed up. So there are some issues there. It does kind of become a linchpin, so you do need to have an invested SEF management team in order for it to be awesome. But if you have it, it's pretty cool. So yeah, it might actually be a good idea to check out ICE, the Ink Tank SEF for Enterprise, if you have any issues of that. Swift, why you Swift, it's part of OpenStack. If you have generic boxes lying around with big disks, it's pretty cheap. It's more stable than, in my observation, it's more stable than SEF because you're lacking some of the features that make SEF awesome, but stability is pretty nice. And it does have Container Sync. One of the cool things you can do, you throw a little Container Sync, some cross-site database replication, some geographic DNS, and you have yourself a really awesome image distribution system. So a little cross-site Percona going on, GeoIP Container Sync, it doesn't matter. You just put your images someplace, Container Sync to all your other sites, all your glance database is replicated, the location field in the database is set to the geographic DNS name, Golden, it's nice. So yeah, Anycast Container Sync centralized image repository. So, you two can be a happy puppy if you take time to invest in automating the creation process, making some solid tests, define your policies, your schedule, stick to it, and you just kick back and relax in your spare time. I really hope that you found my talk informative. If anyone has any questions, we can ask them now. You can hit me up on IRC, G-talk, email, Twitter. Most importantly, the open-stack operators list, and now the IRC channel. Any questions? To the mic. Actually I have a kind of a comment on the first part of it. I think you really address the problems about actually having long names, wherein you just come up with names that probably goes beyond the screen when you look at it. So that was exactly the reason we implemented in the last release of the Juno Glance metadata definitions. You should probably take a look at that. So what it does is, so right now you can just have a generic properties, you can name anything, but it may not mean anything unless you go click on it and look at it. But if you define something saying in a namespace of my company or whatever it is, you can say that it has an architecture property which can only have these many different architectures and all different things like maybe it has an AES on it or more of the properties. So it's already implemented and it's part of the Juno release. I would suggest I already take a look at that and use it. The second part that you addressed, I think about the listing, it's exactly the reason what we are working on this blueprint, something called the catalog indexing in Glance. So we have a design session today in Doofy at 240. So if you're interested, you are more than welcome to join. Thank you. And it's gonna look exactly what it showed, probably it'll have more properties where you can even search on, oh, get me an image by the way which has architecture of x86 and which has implementing Ubuntu or certain particular flavor or anything like that and it will give you the results rather than you listing everything and taking a look at the name itself. Yeah, so thank you. Yeah, we're currently deploying Ice House at the moment. So yeah, Juno, Juno, let's go. Yeah, thanks. Anybody else? Oh, go ahead. Yeah, a question about the release cycle to have a regular release cycle from your experience. What do you put on a regular basis in the images? I obviously security updates. Yes. Do you know all the things? Usually what it is, is primarily package updates. 99% of the time, you're just doing package updates. Now I realize anybody here who's run Cloud in it at any extent, you can tell Cloud in it to just update your instance when it comes up but as the time delta increases, if you're using Ubuntu 1204 from two years ago today, that's gonna be a lot of packages that are gonna get updated on boot. So that's a big strain on your resources and on your customer's patience. So usually I do package updates and anything significant like we found that this particular NTP server is not responsive so we changed it. So there's really minor things, nothing major. I like to pre-bake images pretty much as they are with just a couple of tweaks. Does that answer? Anybody else? As far as deprecating images, rather than changing them to be dash old or dash deleted or whatever, why not just make them private? Well that's actually in my presentation I kind of glanced over that and ignored it. But yes, so part of the deprecation package process is you rename it so that you can upload the new one and you know, the process knows what the new one is and then once the new one is uploaded, the old one gets marked as private and that way if you have a valid, like a use case where your user really, really needs the kernel version that came in the last image, you can make it an image member for that tenant and then everybody's satisfied. So yeah, marked image as a private, just totally in my presentation and I forgot to mention it. Thanks. Cool, well if nobody else has any questions I hope you guys enjoyed my talk and I'll see you guys on the floor later.