 OK, so I guess we'll start. So hello. Welcome to the M1 Bode McBoatface, the joys of flavor planning by popular vote session. Just curious, a show of hands. How many people are here just because of Bode McBoatface? Yeah, OK, most of you. That's what I felt. Well, rest assured, we'll be returning to Bode later on in the presentation. So just to get things started, my name is Craig Anderson. I'm an OpenStack Solutions Architect at Morantis. The target audience of the session would be OpenStack providers that are challenged with a wide diversity of workloads and want to specialize their infrastructure as a service offerings for those workloads. And lastly, the focus of the session will be the use of Nova Flavors to facilitate workload segmentation and some of the associated challenges in doing that. So to get us all on the same page, let's just start with a brief overview of Nova Flavors. So when you ask Nova for a list of flavors, the first thing you get is something like this. Print out of flavor names and instant sizing info, vCPU count, RAM, disquantities. So we've seen a lot of good talks in the past, I think, about how to optimize flavor sizes and to do capacity management. But I think what's more often overlooked, and indeed the focus of this session, is another dimension of Nova Flavors, which is the extra specs. So with extra specs, we can schedule VMs to hardware that supports specific features and set other parameters that drive the performance profile of our VMs. So it's the extra specs that enable us to achieve some level of workload segmentation and infrastructure specialization here. Now, one important thing to note is that in OpenStack, flavors are a flat representation of the geometry information plus the extra specs. But for the purposes of your planning, we really wanna consider these and think about these separately. So therefore, I'm proposing, or I propose an alternative way to think about flavors. Now, first of all, I didn't invent this nomenclature, I just basically stole it from Amazon and compiled the information here from their website into the logical view you see. Conceptually, I think we can think of a three-level hierarchy for flavors with the flavor type at the top or what I'm calling a flavor type, which are the main areas of specialization, like compute, storage, or whatever you're trying to optimize, designated by the first letter in the flavor name, like M for general purpose or C for compute optimize. This is second, a flavor series. So for each flavor type that you have, you can define flavor series which represent the iterations of your compute hardware. So for example, M1 and M2 would represent your first two iterations of your M flavor. And third, the flavor geometry, so then you can go and basically just generate a set of sizing parameters for each flavor series. So let's take a look at these in a little more detail. So first, we have at the top again, the flavor type. But before we look at exactly how to specialize our cloud, I want to address a more fundamental question should you specialize your cloud? So this is a private cloud and we can build everything to our customer's exact specifications. If someone wants a very specific combination of features or wants a highly specific vanilla ice cream flavor with fudge covered waffle cone pieces and caramel swirl, okay, we have them covered, right? You could probably do that but you'll also probably live to regret it. Private cloud makes it almost deceptively easy to specialize in what I mean. By that it's not as simple as just creating a new flavor in Nova. There are other costs you have to be aware of. So to give you an idea about some of those costs, this is just a breakdown of a few different scenarios. The first scenario is if we're setting extra specs that enable a feature for individual VMs and those features don't impact how we deploy our computes or how we manage our capacity. Something like the watchdog action for VMs is pretty straightforward. The second scenario are extra specs like the Libvert options and CPU allocation ratio. These are still an easy way to, or they're still easy to configure and they don't require any real special changes to the way we build our compute host but they do mean that we're probably gonna have to manage multiple resource pools now. So in other words we're gonna have to create host aggregates, divide our compute nodes between host aggregates and we have to do capacity planning on these subdivisions within our cloud. So the third item here, your costs can start to go up when you talk about things like CPU pinning and huge pages because now not only do you have to manage multiple resource pools but also each of these are going to be built differently. Your deployment automation now has to account for multiple versions of nova.conf and kernel boot parameters. Those configurations will have to be incorporated into your CI CD pipelines and tested and validated. And lastly, new hardware is another potentially very costly area. New hardware could mean new and unproven device drivers moving to a new kernel or operating system, custom builds for kernel builds for driver backports, different CPU architectures is really no end. In fact, I mean, you could also mean the different out of band management tools if you're switching to a different like from Dell to HP for example or additional testing and certification or even just the cost of acquiring new hardware in all of your labs that you need for development and testing. So you can create a lot of extra work for yourself if you're not careful here with your hardware. And after facing up to these costs, I think the understandable overreaction from some cloud providers is simply to offer no options. So the famous quote from Henry Ford about this or about the Model T, you can get it in any color as long as it's black. In principle, this is true today I think as it was then which is that it's always going to be cheaper to produce a large volume of something if they are all the same. So options are costly and this is just as true in cloud as it is anywhere else. So it's something we should keep in mind. But that said, if you're dealing with a wide range of customers and applications, you're likely going to have a large diversity of cloud workloads. And so invariably you'll have disappointed customers like these, this guy wants one that runs on electric and maybe that's a deal breaker for him or maybe it's not. This one wants one that flies okay, the Model T doesn't do that. So we lose their business. And this guy, he just wanted a pimped out ride actually. So okay, so what do we do then as a result? So the answer of course is something in the middle. We need a balance between the two extremes, having only one option versus uncontrolled proliferation of options. So you wanna aim for as few options as possible. The cost of adding support for additional infrastructure variations should be evaluated and justified prior to their implementation. And you need to be prepared to pay the cost of offering those options. To build at scale requires investment in robust assembly lines. In practice this means making proper investments in your automation testing CICDs. So all right, flavor specialization can be a good thing in moderation. So let's look at some flavor type examples at this point. So here we have a price per compute optimized flavor, a price per gigabyte of RAM, storage IOPS, storage density, graphics processing. Again, I actually basically just copied these from Amazon to show here as an example. But I think there are also a fairly reasonable generic set of flavors for private cloud as well. That said, your tenant application requirement should be the driving force behind which flavors you implement. And something you should be tracking on an ongoing basis. Don't fall into the trap of trying to create one flavor that optimizes for everything. To optimize for everything is to optimize for nothing. So yeah, that's something to keep in mind there. There'll always be trade-offs like performance versus power efficiency, quantity versus quality and those kinds of things. So this is a point at which things can go wrong because the floodgates are open and we're creating new flavors. But what invariably happens is tenants will come and request a specific combination of things. I want feature X and Y, not Z, and I want a lower over-commit ratio for my workloads, right? And it's these specific combinations that are the real killer because there's really no limit to the number of combinations you can have. That number grows very large very quickly. And the reason this is happening is that these tenant applications in many cases are not cloud native. So they can't easily take advantage of a generic flavor set that you might offer them. So what are we gonna do about that? Well, I think a great starting point is to invest in your tenant onboarding. This is kind of a key phase where you can open a dialogue with tenants to get on the same page. It's also a place where you can negotiate compromises. And this is a key point here that I think that bears repetition. Negotiation and compromise. This is politics as much as we might not like that. As a private cloud provider, this is a fact of life. And it's something you should factor into your costs. You know, on the flip side, I think there are also opportunities here to grow these skill sets in your organization and market that expertise to your customers. You know, for example, analysis of their applications, their workloads and recommendations and that type of thing. You know, even if or with the transition to cloud for their workloads, even if it's not to your cloud, I think there's still opportunities there. And probably the hardest thing here is learning, you know, when to say no, or to turn down customers who are not a good fit for cloud. You know, it's about the willingness and capability of these customers to compromise and make changes and to buy into a future that might require them to rewrite their applications. It's also about your own willingness to negotiate and compromise. For example, you know, that could mean supporting a new open stack service in your cloud like Ironic to provide a bare metal provisioning for apps that are not ready for virtualization and other stepping stones like that. But anyways, this talk of politics and the compromise reminds me of another story, a story which you may be familiar with. So, and that's the story of the ship. And last year, the British government, they were looking for a new name for their state of the art polar research vessel. And some student of democracy thought it would be a great idea to put it up for a vote on the internet. And of course, you know, I have here to show you the top 20 entries for this prestigious research vessel. And I've just highlighted a few favorites here. Number 20, you know, Botosaurus Rex, big metal floaty thingy thing. What iceberg at 12? I like big boats and I cannot lie. My personal favorite, Usain boat. And the number one winner, Bodie McBoatface, winning in a landslide. So in the end, they turned their back on democracy and went with what was only the fifth most popular option. So the ship's name, official name will be the Royal Research Ship, Sir David Attenborough. How many of you know who he is? Oh, most of you, very good. So, but the Bodie name actually, in the tribute to Bodie, they actually named one of the submersibles after that. So the name will live on and long live Bodie McBoatface. So, you know, I think there's a relevant lesson here or two in the Bodie saga, which is that, you know, as a cloud provider, you need to have a strong vision of where your platform is headed. You know, if you don't tell your tenants how to do things, then they're going to try to take the helm, you know, and, you know, to get what they want. And what they want's not always good for them or for, you know, everybody else on the boat for that matter. So, you know, you need to steer the ship as it were while accounting for not just the needs of, you know, tenants, but also operations and other platform stakeholders that are essential to the success of your platform. So again, this is more dialogue, negotiation, compromise, politics, and that kind of thing. A lot of that. So at the end of the day, you know, it's really up to you to provide your tenants what they need, above what they want. You know, otherwise you can become a rubber stamp committee like the Electoral College and we all know how that turned out. So, okay, so moving onto the flavor series in life cycle management. So in this example, I've taken one of our flavor types, the C flavor type, which is, you know, compute optimized, shown, you know, just kind of three example iterations of this flavor over time, C1, C2, C3. You know, one of the things you'll notice is at this level, we get into the specifics about supported features and configuration settings. You know, as we move from left to right, we can see, you know, the progression of changes in hardware configuration and, you know, supported features. And I think this is a good way of tracking and managing, you know, changes to your infrastructure over time, how you define, you know, from the point of view of your tenants, your users. So, you know, and how you define the flavor iteration is really, is up to you. You know, you might decide to evolve your, or iterate your flavor series when you get new hardware, when you decide to add some new set of features or even, you know, change the way that you build your servers. For example, in our second iteration, we have a node with no local disk. We use remote cinder storage for VMs and maybe in a subsequent iteration, like third one shown here, you know, we have nodes with a converged, you know, compute approach that are using local disk again for work, tenant workloads. So, you know, one common problem when putting together your flavor series is when you have a mismatch of hardware. So maybe it's inherited, you know, maybe you inherited that hardware from somebody else. Maybe it came from eBay or the dumpster, you know, wherever you happen to find it. You know, whatever the case, right, there's still very real cost to supporting all these hardware variations, even if we don't intend to have our tenants distinguish between them. So the first thing you wanna do is to take a full inventory of your hardware, you know, determine which you can support. Device drivers are usually kind of the number one factor for this. It's not uncommon to, you know, run across networks, network cards that are unsupported in Linux or, you know, more commonly there will be drivers, but, you know, poorly tested with lots of bugs or reliability issues. I'm sure none of you have ever encountered that. You know, so similarly for any new hardware purchases, you know, you should be asking yourself, you know, do I want to be the first person running new hardware in a production environment that, you know, requires upgrading to a bleeding edge kernel? You know, probably not. So, okay, now, you know, purge out the low-hanging fruit. The stuff that's not worth bothering with, right? And you might still have a few different server variations that are left, which are pretty similar in specification. You know, if they're close enough for your purposes, you know, you might decide to merge them into the same flavor series, even though the hardware might be slightly different. One of the other interesting tie-ins with your flavors is the life cycle management side of things, or life cycle management of your hardware as it applies to your flavors. And, you know, the interesting questions around here are, you know, what do you do when you need to grow capacity of your cloud and you can no longer buy the hardware you're already using? You know, in effect, the planned obsolescence problem. You know, you may end up supporting more hardware variations than you wanted to. You know, and that raises the question of whether you want to create a new flavor series for this hardware or add it to an existing flavor series. And a new flavor might make sense if you can monetize a difference in hardware performance profiles or you have a new, you know, capabilities to exploit with a new hardware. Alternatively, you know, incorporating new capacity with the pre-existing flavor, you know, makes capacity management easier. There'll be less resource silence to manage, and it's kind of the more cloudy approach to things. So also, when you have server hardware that goes end of life, you know, it's kind of a similar situation. You're kind of forced, you're forced by new hardware. It's not the same as the old stuff. So what do you do? Do you, you know, the same set of questions as before. But also, there's, you know, an operational impact here to assess as well. You know, how will your workloads get moved from the end of life hardware to your new hardware? You know, tenant self-migration becomes a possibility if we iterate our flavor series for the new hardware. You know, whereas if we don't, and the new hardware I just put into the existing flavor pool, then, you know, tenants don't have a way to distinguish between the hardware, and then you'll be the one doing the migrations for them on the back end. So lastly, as with any, you know, lifecycle management, it's important to document your process. You know, I realize you probably can't see the flow chart here. You know, this is really just to illustrate the importance of, you know, formalizing the areas that we talked about previously into a concrete process. And I view this as, you know, one of the key artifacts. It can prevent the sort of uncontrolled flavor proliferation seen in many private clouds. So now getting down to our leaf nodes here, the flavor geometries and the related advice for them. The first piece of advice is actually not to get too hung up over them. They're comparatively easier to change later on, which, you know, you can do in response to tenant feedback and utilization data. You know, adding a new flavor series is a good opportunity to redefine the sizing parameters. For example, geometries you could change between an M1 and M2 flavor series. The second thing to realize is that you'll face, you know, similar challenge here as with your legacy apps, as with your flavor features, which is that, you know, you have the people who come in and they want a very specific, you know, it needs to be not, no, it can't be 16 gigs of RAM. It has to be 15 and a half, you know, right. And then again, you know, you have to, you know, in a position where you have to put your foot down to prevent rampant flavor proliferation. So, you know, for naming your geometries, there's really no perfect system here. You know, some people have tried using naming schemes that are perfectly consistent and extensible, which means like encoding the geometry information directly into the flavor name, something like, you know, M1.C2, M4, D50. You know, others go with a more conventional but like limited logical naming scheme, which, you know, with some kind of qualitative, you know, sizing adjective like M1 small or M1 large, you know, the problem then is, okay, what in that case is what to do when you exceed large or extra large, you know, do you add more Xs like T-shirt sizes or, you know, do you add some numeric multiplier? So, let's see. And the other thing is, you know, we did mention in the previous slide that, you know, tenants will sometime have their own specific, you know, odd ball sizing requirements. And, you know, it's certainly, you can use private flavors actually to help with that. And then that's, you know, certainly preferable, I think, to polluting your public pool, you know, with all these different variations. Just something to keep in mind there, though, is that the flavor names are global scopes. So, even if, you know, both 10A and 10B have private flavors that they can't see each other, they still have to have separate names. You can still have name collisions. So, just some other considerations when, you know, planning your flavors here. So, it's a good idea to have a place to document the flavors for your tenants. An example would be something like the matrix we looked at previously or something along the lines of the web page you see from, you know, public club providers that describe their flavors to customers. You know, this would be the document that outlines the different flavor types, what they're designed for, the supported flavor series, flavor geometries, associated hardware info and the regions where each flavor are available. You know, tenants aren't gonna be able to get this level of insight from, you know, Horizon or with a Nova flavor list. So, it's kind of important to address this in its own right. Another interesting situation to consider is a multi-cloud experience. So, if you have multiple cloud deployments or multiple regions, you know, how do you manage the definition of flavors between them? It's not uncommon for different clouds to have different hardware. We might have an eight-V CPU flavor in cloud one that we called M1 Boatemic Boatface, right? And maybe we have a different eight-CPU flavor in another cloud, but we called it the same thing. But it's backed by different hardware, right? And then some people will say, well, this is fine. I have a unique set of users in each cloud and they don't know about each other. And so this is okay, but, you know, it's kind of a dangerous assumption, right? Invariably, as tenants transform their apps and to take advantage of multiple regions, it's a dangerous assumption to make. So, it's just safer to assume that, you know, you'll have some user overlap between the clouds and just to use a consistent naming scheme, even if it's not in the clouds aren't interoperable, you know, today, is it worth? You know, you could also have the situation where the same hardware from cloud two makes its way into cloud one, and then what do you call that? You know, M1 Boatemic Boatface Junior or what? So, now just to look at a couple of ways you can manage those flavors across clouds. So, I mean, one effective way is with heat templates. The, you know, OS Nova Flavor Resource Type allows you to define your flavor parameters within a heat template. And, you know, then it's just a matter of deploying the heat template to all of your clouds or regions and, you know, taking advantage of the heat automation there. And, just one other note is that it wasn't until the Newton release that you could actually specify the flavor name and the heat template. Prior to that, you would get a randomly generated name. So, just something to be aware of if you're on an older release. One other tool is Kengbird from the OPNV community. And, the aim there is to support resource synchronization between clouds and regions. So, things like flavors, images, SSH key pairs. So, this may be another option in the future when that's ready. I'm not sure exactly when they plan to have that all working by. So, host aggregates. It's another important thing to consider with your flavors is the association with host aggregates because host aggregates can be a pain to deal with. They're additional overhead to manage. You know, my opinion is best to avoid using them where it's possible. I think the more ideal way is to rely on, like, intrinsic hypervisor properties than to create your own abstractions in the form of host aggregates. So, you know, the compute capabilities filter is a good example of this. So, you can specify in your flavor metadata parameters to target, you know, a CPU architecture that the VM should be scheduled to a specific CPU model or specific CPU topology. You know, some here, like the hypervisor type are also honored when used as image metadata. You can use various comparison operators, like in the case of the hypervisor host name. It's kind of nice. You can schedule to a hypervisor host name that matches a certain regular expression. And also note that in the Pyke release, the resource provider traits, I believe are available and they are related to the placement API that was introduced in Newton for Nova and they should permit the management of other custom, customized capabilities. So, that about reps. Things up. I want to thank everyone for attending and who voted for the talk. Happy to take any questions at this point. Please use the mic if you have a question. Do you have any hints on defrecating flavors? How should you defrecate flavors? Should you just remove them and the old VMs continue working or should you just migrate all the workload away and then defrecate them more? Right, so that's a good question. You know, and I think the real key is you want to give yourself a flexibility in terms of what you do in those situations because if you look at some of the public cloud providers, for example, like Amazon has an M3 and an M4 flavor I think series and those have been around for years now. So, you might very well decide to run, really run it until let's say the hardware is end of life. That might be a reality in some cases or there might be other business decisions that drive things like maybe the power efficiency of the old hardware isn't worth running anymore. So really it's, you know, there's gonna be these other events, other drivers that may prompt you to do that but if you've kind of versioned or iterated your flavors in a way that you can easily manage them then when that happens it's easier to implement and to then migrate those users or those VMs off of the old hardware onto a new hardware whether it's whether you're doing it behind the scenes or whether you're facilitating your customers or your users doing that themselves. Hi, have you seen any success or failure in aligning the flavor geometries with the physical hardware in particular from a constant ratio standpoint? So when you say constant ratio you're thinking, what do you think you? Consistent doubling of storage CPU and memory for example. So at the same ratio that the physical host provides. Yes, so you're talking about like the fitting, like, you know, so if I wanna divide my CPU by, yeah. Yeah, I think this is, I mean this is a challenging area. You know, especially again in the sort of cloudy approach of things where if you say, well, you know, my users should know about the hardware they're using, right? That's kind of the cloud principle. So if I have all this hardware in the back end that has different types of CPUs, they have different number of CPU cores, different amounts of storage, then how can I have one set of geometries that work well for all of that hardware? And then the answer is, well, you can't. So it's kind of one of those decision points or trade-offs in there which is that, you know, do you want to expose those details, more of those details to your consumer, those to your tenants and say, all right, you have more options and have more geometries that fit better with a different hardware but now there's more options, right? The tenant has to, you know, before them. So I think it is difficult, again, it's kind of a balanced thing. And it is, I think, a difficult thing to manage and get the Goldilocks, you know, the middle four. Any other questions? All right, well, if not, thank you all.