 Guess it's close enough the time to start Hey everybody, my name is not Richard unfortunately if you guys came here to see Richard Maynard He couldn't make it so I won't be offended if you leave My name is Kevin Bringard and today we're gonna talk about Infrastructure is cattle or herding the service cloud for fun and for profit It's sort of a You know before ironic and it's it's sort of a custom implementation if you will of open stack on open stack or triple O That we're doing at Cisco where I work and I think it's a lot of fun Hopefully you will as well and we shall see how it goes So everybody tells me that I should start off up from a presentation with either a joke or a quote so the joke is neutron and the Just kidding just kidding. I work a lot on neutron. I love neutron, but oh my goodness The quote however is this what the hell is a service cloud and that's a tribute to everyone I've ever talked to about service cloud You may have also heard of it as under cloud or something like that Basically, it's just clouds all the way down We're creating small footprint sort of lowest common denominator simple basic virtualization orchestrator Sort of a hybrid cloud that's running all of your tenant services So you've got some things that are bare metal and we'll get into this a little bit Just because they don't make sense to put on to They don't make sense to put into VMs or it would just be a really bad idea And then coupled with a lot of services that are actually running on VMs and it allows you to really scale out horizontally In a pretty great way, and we've leveraged it to a lot of success So another way it's a cloud to run your cloud and that's me. I said that So I want to start off by saying I think virtualizing infrastructure makes a lot of sense So, you know as purveyors of cloud here We all love open stack and we love to sell it and tell everybody how amazing it is and you know How you like you don't have to worry about your VMs and they die whatever it you can scale out. It's awesome So, you know like we tout the wonders of this API driven chaos monkey. Don't care if it dies sort of thing and yet then at least in a lot of places where I've been we continue to Deploy all of our stuff on bare metal and continue to try to scale out on metal And I think that if we take that, you know that the concepts that we're selling our tenants and we we apply them to ourselves It creates a nice a nice symbiotic relationship there So and obviously the concept of open stack on open stack and bare metal is a service. That's not new You know people are working towards that. There's obviously an entire project dedicated towards it and I think at least in my experience deploying to metal is time-consuming even if you get it to the point where you can Do it really quickly it still takes time to like install the packages and you know Like bootstrap your your hardware and do all of these things It still takes, you know Maybe a half an hour to get a piece of metal provisioned even if you've got it like push button deploy ready Where's a VM you spin it up 30 seconds later VM is up, you know exactly what the hardware is going to look like You know what the profiles going to look like and you're totally good to go Then you start applying your puppet and stuff like that, but I mean at that point That's that's a different part of deploying to metal. So Keeping that much extra capacity laying around isn't cheap So if you've got tons and tons of physical hardware laying around, you know If you want to if you want to be able to just deploy quickly, you know You have to have hardware to deploy to you otherwise, you know, you have to go through you have to purchase it You have to get it installed to get a wired up and racked and all those different things and either that or you just keep it laying Around and it's not cheap to keep it laying around So you either are over capacity or you're under capacity you you spending a lot of money It's just not really a great way of doing things and again, I mean this is exactly what we tell our tenants We say guys you don't have to have capacity spin it up as you need it run heat You know apply a profile when you when you get close to your capacity, whatever it may be your thresholds So what we want to do is just again virtualize as much tenant infrastructure as we can But isn't that what ironic is for kind of I mean ironic is partly for doing bare metal as a service towards Like if tenants actually need to deploy Something on bare metal and and there's absolutely a place for that and so I'm not necessarily belittling that there Again sort of more looking at as people who are running clouds. How can we how can we how can we virtualize our control plane infrastructure? So To take a little bit of a side a side note here on the on the ironic thing So the way that I've always explained open stack to people so like I my parents love to ask me questions about what I do But my parents are very very old and they don't get it at all So so I try to explain to them what I do and and they're like wait so like it has to do with computers, right? So but the way that I've often explained to tenants, right as I say, you know Even people who are in technology who who aren't necessarily old or who kind of do get it Often cloud confuses them a little bit partly because there's tons of different clouds, you know Whether it's Dropbox cloud or Amazon cloud or you know Apple cloud or whatever but um? Also, it's you know, they just they just it's hard to it's hard to get a tenant to think about well I've got my server and I've got my cage So how do I take that and put that into a cloud? so I Always try to describe it as like cage robots So, you know if you think about a co-located data center you walk into the data center You've got your key which could be your credentials or your API key or whatever But you know your physical key in a data center it unlocks your cage There's cages everywhere, but you only have access to the one that you have access to and when you first get there And when when you first set up a cage in a data center, it's empty They give you basically a drop that gives you internet access and beyond that it's your cage You do whatever you want with it so you go in there and you install some routers and you configure the routers And so you know you're doing your neutron here your neutron router create You know that's akin to racking a new router and you do like a nova boot That's akin to racking a new server so if we think about cage robots and Virtualizing our infrastructure in these terminology and this terminology instead of deploying new servers and racking new servers Again, we just do this exact same thing, but with our infrastructure So open stack are like these little robots that you send into your cage to do your bidding you say I need a new server I need a new router. I need a grouter configured this way. I need whatever you need Robots go into your stuff. So I think that Google and Elon Musk may not be happy about that, but I like robots I think they're cool. So deploying straight to bare metal. It's it's logically identical It just uses a slightly different cage robot. So that's what ironic is for it's a different cage robot And there's totally you know, there's totally a market for that and a place for that But I don't really think in our infrastructure. That's what it's for. So and here's some other reasons I'm personally I'm opinionated. I like to do things the way that I like to do things I already know how to do things the way I want to do them and I don't particularly want to change How I do them. So, you know, I use cobbler. I love cobbler. It's awesome So, you know, I want to keep using I use Puffett You know, maybe you use chef or Ansible or whatever you use You know some pearl scripts that you wrote when you in university, whatever it is You've got your way of doing things and you don't necessarily want to change. So Are we sure that it's actually going to that it's actually going to work and you know, why should we change what works, right? If it works for us, then why introduce a whole new workflow into what we're doing? So if we use our existing tooling, we know how to configure an API server We know how to configure an algea server. We know how to deploy new compute nodes even like the physical thing We've got our way of doing it. So we don't have to don't have to integrate a thing. You know, do we know if these are going to work? I don't know So basically service cloud is logically the same thing that you're doing today You get to going back to the cage robots. You get a new server. It's there. It's virtual everything's good to go and Then you just configure it using your existing tooling. You don't have to change anything So What are sorts of things that we put on VM? So it's Cisco. These are some of the things we put on VMs Obviously your opinion may differ and that's totally cool. You can do it how you want We put API endpoints on VMs. So, you know, Nova Neutron glance heat, etc. All of the all of the different services We put DNS servers into into VMs. So we have like, you know, tenant-facing public-facing, etc. DNS We put our MX servers or SMTP servers all that goes into all that goes into VMs as well our message bus so like Nagios goes in there or sorry rabbit goes in there and We do monitoring in there. I guess I didn't put it up there But so things that don't like obviously compute nodes If you want to try virtualization on virtualization, that's up to you. I personally don't want to do that So we have physical compute nodes. We put our network nodes and agents Directly on bare metal because I don't want to deal with trying to go through 500 tap devices to get to another 500 tap devices So that sounds pretty nasty to me. So maybe we do put database servers into into VMs Logging servers. I don't know depends on how much you need what you need out of it You know your server profile stuff like that if you can get it out of a VM your storage I mean if you want to try to run Seth on VMs that might be a fun project. I don't know if anybody's doing that or not Abe is Saying no so it's probably not a good idea, but you know maybe I don't know that's sort of up to you But this is kind of a little bit of a profile about what we put on VM So obviously this is a hybrid cloud, you know, we've got something on physical servers some things on virtual servers Standard of what? so Could you imagine a world where all of your servers were the same like and so Many many moons ago. I worked at an ISP called Earthlink and my friend Abe over here did as well and my buddy John Dewey and so Dewey and I had this idea that At Earthlink we had we were running mostly spark gear and we had all of this Various different Sun hardware everywhere So we had like e4500s and v220s and v240s and v880s and all of this nonsense everywhere And and you would have especially with the 4500s You'd have you know all these different different trays and different ways that things would work out and you'd get like you'd start Getting random errors and you'd get you know like a core dump and so you scat the core dump And it turns out that there's one memory module in one tray in one server somewhere That is causing all kinds of problems for you So you spent all this time troubleshooting it and then you got to open a case with Sun and then they send you a new Ram chip and then you you know you do all this stuff and it was just nasty So what Dewey and I wanted to do was and this was in 2003 2004 we wanted to take and just have a bank of just generic servers We didn't really care what they were and Virtualized services on them and obviously this was you know pre open stack or anything like that and KVM was still pretty young If it was even if it was even around at all So I'm trying to remember what we were going to use but either way we wanted to virtualize the services because I made it an ISP all We have were you know MX and Sntp and you know Apache servers and earth link You know we were running like personal start page Which was like a Java thing and all sorts of like but it was all just it was all just web services So we totally wanted to virtualize all of this stuff so that everything looked the same and the idea being that you could walk In there you didn't have to troubleshoot which RAM module was bad, which you know CPU was bad You would just walk in there say this server has given me trouble You would rip that server out you'd throw a new one in you deal with replacing the server later and get an RMA Or whatever you were going to do with it and then you're done services come back up problem-solve so I Like that idea so I don't have to deal with you know I got well I've got some Dell gear over here and some HP gear over here and some obviously some UCS gear everywhere and So you know I've got all this different stuff all over the place and I don't want to deal with that So I like the idea of commoditizing as much as I can and if I can commoditize my service cloud as well And not only not only run my tenant services in a scalable architecture in a way that I don't care if I just have To rip out a server and throw it back in But it's the identical hardware that I'm running my tenant stuff on because it's all just compute nets So every rack looks the same And you've already developed a growth strategy for what you want your compute racks to look like you know You know when you know what your failure domains are You've created your it you've created your availability zones you've You know all these all of those different things you've you've worked out You know your affinity like how I'm how I place VMs in different places, you know You've already done all this stuff for your tenants. So if You just apply that exact same logic you've already done all the work just apply it to your service cloud as well When do I need to add new compute nodes? When do I need to do this? Why do I need to do that? You know exactly when you need to do it It also allows you to more easily individualized services. So one server one service as it stands today like I've I've deployed some pretty big clouds in the past and We would we would you know, you would basically you'd get like two control racks Is how we used to do it at a previous employer and you know We would have to we would have to figure out profiles of different of different applications And it'd be like well this one can take this many Nova APIs and this many glance APIs and like I'll spread the databases out Over these guys and do this different thing and it's this crazy like Tetris game to try to fit everything in there as as tight as you possibly can Now with the service cloud you don't particularly care because aside from a few services which need to run on metal You're just spinning up VMs and the VMs sort of will categorize themselves and will Tetris themselves if you will into the into the hypervisors that you have so Spend up a Nova API server and say there's a Nova API server spend up a glance API server and that's all that runs on it It's just that one thing Another thing that we started doing and this is this was Abe's idea is is testing individual component upgrades so we have For instance glance right like we're running ice house. We run around Juno We test we test glance Juno's glance in our in our in our lab in a VM And then we bring it up actually next to our existing ice house running stuff And then you're actually running two different versions, but you don't have to worry about like you know library mismatches and all these other things Because that glance API, you know instance is is its own machine. It's basically its own VM So we're sort of creating you know like Python virtual environments But out of home machines and so you can bring up certain things and if you find out that like you know For instance kilo ice house has some fix that you really are sorry You don't want that Kilo horizon has some fix that you really really want or really really need and you want to run it So you just like deployed in its own VM bring it up in theory It's API compatible and so you just start using it and now you get some new feature some new whatever and it all Just sort of works and you're not worrying about stomping on yourself So you can move quickly in your lab, and it looks just like production So when you're doing your stuff, you know, you know that you know that the machine that you're doing that you're working on in Dev stack or in in vagrant or whatever you're using is Literally identical to the machine that you're going to be using in production because it's the same image And so the service cloud itself doesn't need a lot of resources And that's sort of what I was talking about at the beginning when I said like lowest common denominator It's you know, it's it's not taking any tenant traffic So you don't need a ton of API servers in theory. It doesn't even need to be HA You just you know really as long as you can as long as you can get what you need done Then you know, that's fine. So you probably don't need like all the network as a service stuff Maybe you do that's up to you We run we run it with neutron and OBS so that we can plug it into Kind of that's that's a whole other thing with a whole other architecture talk But has to do with service VMs and stuff like that But I mean you could do that with Nova Network and by and large I would advocate for actually using Nova Network for a service cloud because you just need something simple stable fast that works It's really as close to a production dev stack cloud as you're gonna get so, you know If you all of those warnings and dev stack this they don't run this in production Well, I mean this is kind of like a like a production dev stack cloud, right? It's just very simple very basic all you got to do is be able to spin up a few VMs and there's only one tenant So it's not you don't even have to worry about multi-tenancy So this is kind of what it looks like Excuse me, it'll look very very familiar Logically, it's probably pretty identical to the clouds that you're already building So I mean you have this load balancer pair over here. Maybe that's a physical load balancer Maybe that's just HA these are your service API endpoints. So in this particular diagram I showed them as HA but they don't necessarily have to be, you know glance and obviously this is a very simple example You'd have other APIs and things like that in there. It goes talks to your message bus and then your service compute very very simple It's small scale. It's a known quantity easy-peasy. We've been doing this since there, right? Okay, so how is it useful? Well, here's why it's useful. So this is what your tenants see, right? So this load balancer pair Maybe this is the same load balancer pair if you have like physical load balancers You're terminating SSL for your tenant cloud and your service cloud. However, you want to you know shard that up That's up to you, but you've got your you know, like your tenant Nova APIs here and your tenant Glance API is in the same deal With the message bus, but then all of a sudden you're like, oh my gosh API is running really slow. You know, there's a herd coming through. I just can't handle it can't handle the load So you run a couple of API calls and boom now you have four Nova API servers Scaled out very quickly very easily Obviously you could add more than just two you could add, you know a hundred more Just that easy as long as you have the Cucu the compute the compute capacity as a tongue twister for it You can do it same thing with the Glance APIs. You need more all of a sudden. There's just more They come up they plug into the network add them to the load balancer pair and now you've got scalability on your API layer Same thing with a message bus, you know, you lose a piece of hardware one of your rabbit nodes is gone Your cluster is going crazy Spend up a new rabbit node add it to the cluster problem solved So it's very very simple and it you know, it works out pretty well at least in our at least our experience You get HA mostly for free again going back to what I was saying before you've already figured out your failure domains You've already figured out, you know, your availability zones your affinity all that different stuff So if you're just using those same strategies, then VMs are just getting placed automatically in places where if you lose an entire Availability zone your tenant-facing services are still up Because again reusing work reusable code So you can just treat it like any other web application because at its core really I mean on the back end open stack There's a lot of crazy things but on the front end from what the tenant sees and the and the data plane For the most part is really just a standard three tier web app You've got your load balancer talks to an application server talks to a database server, and then it does some stuff So by I mean, it's obviously a little more complicated than that, but you know, that's sort of the same thing It's super agile I mean you can spin up servers in minutes and instead of like we were talking about before where it can even take Even if you have the hardware there, and you've got a push button ready and take 30 40 minutes to image a new piece of metal If you know you just all of a sudden randomly you're like, you know what? We've got these two things running on DNS, but we just are these two DNS running on running on the same servers We really want to split it out run a couple API commands. You've got two new servers run your puppet manifest on them problem-solved What am I great to a new version of Nagios, but want to test your configs you test them in you know You test them in your dev environment again You know the dev environment is identical to what you're going to be doing in production You bring up the new productions the new production ones with the good with the new configs Nova delete the old ones problem-solved just need another host to do stuff on like that happens to me all the time I'm like man, I really should I had a machine I could just like log into and just do some stuff on I don't even know what just stuff right like make it look like I'm working Spin up a terminal and log in make a spin up a terminal log into something so such agility much amazed Wow Doge is very very happy with this so speaking of dogs Dog-fooding that's the other thing that's really really great here So universal troubleshooting your tenant facing cloud is The same thing as what you're as what as what you're as what your your control plane cloud is so you don't have to worry about You know you get your operators who are like they can log into they can log into the tenant ones And they're already familiar with the with the with the service cloud as well because it's the exact same thing Have a harder failure. Don't worry about it. Shoot it in the head spin up new ones You know whatever is whatever was on that machine. You don't care Get rid of it spin them up image new ones and again This is super key for me the operators are now users of the system that they're supporting and like At least in large shops in my experience when you get like, you know, kind of first-tier Operations guys, they're super awesome at troubleshooting and debugging You know like low-level Linux stuff and you know or whatever, you know It's like you got some weird network problem They can like you know, they can figure it out But then they don't necessarily know open stack because why would they you know their system administrators So if you make them users of the system that they're supporting, you know If they have to get a new server if they have to issue Nova boot they have to configure router Nova routers They have to you know, do whatever then They're gonna be more familiar with it and they're gonna be able to support your customers better because when somebody calls in and says Oh, you know, I'm trying this trying to attach this volume and they can go. Oh, that flag is wrong I know that flag is wrong because you know, I made that same mistake or whatever So that's a huge one for me as well so at this point you're probably like hey, let's virtualize all the things and I mean, that's that's how I was but not necessarily You might maybe but there are certainly places where this might not work for you So for instance, if you run a small-scale shop, this probably isn't worth it You know, if you've got two racks, I doubt you want to set up a whole cloud to run to act two racks worth of cloud If you just disagree the virtualize infrastructure makes sense Then you probably don't want to do this and that's totally cool. Like I don't have all the answers We're all just you know beggars looking for bread, right? So if you if you absolutely need a high degree of vertical scale, then this probably isn't for you either So, you know if if for some strange reason your application profile is you know what? I really just need a thousand Nova API processes to run on one machine Then certainly this isn't for you because here we're talking about horizontal scale not vertical scale You know, there's lots of other things too I mean if you're setting up private clouds like just individual drop-a-pods somewhere You know same sort of thing that's I would consider that kind of small scale This may or may not be for you depending on how big that scale is but however If you're looking for a way a way to scale your control infrastructure quickly This might be for you a way to commoditize your hardware You know that story I told about earth link like that This is what we were looking for when doing I were trying to get that set up and at earth link Like it was literally this it would have been amazing just rip out the hardware Who cares throw a new piece in like treat hardware like VMs in a sense If you think that dogfooding is good for your platform, you think it's good for your product Then you know this could probably be for you. So join the herd. I See a question in the front How do you mean? Sure Sure, so the question is is it a chicken and egg problem and I mean to a degree Yes, though and the answer to that is that again, we're already all deploying open-stack clouds So in theory, we know how to deploy open-stack We know how to deploy it on metal and we're opinionated on how we do that So if we take if we take the things that we're doing already And we just make it on a really small scale that you have to do one time Then that's how you do the first one. So absolutely there always has to be a starting point So the first one is you have to do a quote-unquote manual install on metal And you know you have to have the infrastructure and whatever in place to do that So the assumption is of course that you're already doing that because you know for deploying open-stack You're deploying it. So do you deploy the metal or you know, do you deploy the metal ad infinium? Or do you deploy the metal once and then start virtualizing things? Sure, I Do and I mean so that would be where you'd have to where you would need to figure out on your own what your failure Domains are and where you would put what you would put on VMs and what you wouldn't so in that case again I would say you know like you're gonna have your novice scheduler is gonna be HA on Multiple physical hypervisors in your service cloud And so that way if you lose one or two or even the whole availability zone and that's where that's that HA for free That's sort of what I was talking about because you've already figured out your strategy for HA But where you're gonna put your hardware and to make sure that your cloud isn't gonna fall over So you apply those tenants to your service cloud and you set that up that one time and then in theory It takes care of itself, but I mean again It's not a bulletproof solution of course There's always gonna be a problem that we can find and I mean we've run into plenty of problems as well I tried to cover them here, but Did I answer your question? Maybe yes. No, not satisfied Cool, well, I thank you. I mean I'm happy to talk about it more If you'd like later, so are there any other questions anybody want to know anything else? I see a hand up there. Yes, sir So do you mean sorry? So the question was questions about versioning So do you mean the release you're running in the service cloud versus the release you're running on top of it? No, I don't really have Much of an opinion on that personally other than to say the beauty of the service cloud being so simple Is that you really could be running probably like a Diablo cloud? Well, absolutely and there's and there's in theory There's no interdependency there other than other than the further you the further you get so like if you're still running Diablo in your Service cloud and you know you're running, you know Juno or even kilo up here then the dog-fooding aspect is gonna be a little bit less because the people who are operating this the Operation of Diablo was very different than the operation of you know Juno or kilo so there's that aspect of it and of course upgrades is a whole other problem and I mean upgrading a service cloud and Stuff like that frankly. We haven't gotten there yet, and we'll figure it out when we cross that bridge when we get to it But I don't I mean in theory. They're they're completely separate things That's we don't really care if a tenant what a tenant runs in their stuff. So same thing. Yes, sir It is physically separate hardware, but we keep it in the same failure domain So I mean it's it's it's its own thing and it's its own failure domain But we apply it logically the same failure domains to it So the way we've done it is is that it's its own it's its own separate hardware We've had some people ask about running it on the same if you wanted to cut down on cost You certainly could do that so instead of having three or four or five or whatever racks of compute for service cloud You could certainly multi-home it on your tenant compute stuff We opted not to do that because we get a pretty good deal and you see us here, but Yes, sir We're It's a good question Jeff. Do you know how many racks we're running? It's pretty big. We've got something like five or six different different different geographic locations You know 30 or 40 racks per I would have to do the math on how many cores that is and stuff like that The six or seven different geographic locations. Yeah, and we're deploying more like literally we're deploying more as we speak So yeah this week next week. They're more getting deployed as well And we found since implementing I mean since implementing this we've started our our rev has gotten faster Because we've been able to just like put a small cloud in there and then spin up a bunch of stuff And then the dev stuff translates a lot more. There's been a lot less like You know, it's the whole thing is like software Life-cycle driven because we test everything in our dev environment that we can be we can rest assured that it's identical to what's gonna Go into at least on top of in the VMs and stuff like that It's helped us rev a lot faster as we scale out or as we scale out our data centers It's a good question. I Want to say Was it 32 42 something like that? I can't remember if they're one year or two you machines that we're running but It's I mean, it's not small. I would I could I could find the information on it if you'd like Sorry, he was asking how big the how big our deployments are I Don't remember how many machines per rack we have Okay, so we have 32 see you see series UCSC series per rack and Then like I say however many racks per you know 10 15 20 30 racks per deploy it sort of depends some deploys are bigger than others So and then all of that like I say all of that runs on and then as far as service cloud goes We've got three compute racks and then to control racks is where we is what we run our service cloud on And so those three compute racks, you know, we've got however many cores of CPU that we can that we can allocate API servers and you know database servers and rabbit servers and all that stuff too And that's so that's our service cloud deploy and then this other and then the tenant cloud is is what is what's much bigger Yes, we do Well, that's why I say we talked about doing that and the main reason that we didn't was because we just figured Again because we get a good deal on UCS gear. We didn't we didn't figure that we just wanted to keep it completely separate But if you're on a budget you can absolutely do that and I wouldn't see a problem with that Yeah, more like the dog booting. Well, yeah And that's exactly it the other downside of that would be sort of back to this other question over here would be that it makes it If you're mixing compute racks, it makes it more difficult to have a service cloud That's running a like a known version and then be able to upgrade on top of it because if you if it's all mixed together Then whatever this the service cloud is running needs to is intertwined with what the tenant cloud is running So it it creates more of an interdependency Right, which again is why we decided to make them completely separate to help to help alleviate that Any other questions How much time we have Am I out of time maybe I don't know Anybody anybody? Well, I really appreciate your time. Thank you. I know it's valuable. Thank you all for coming. Hopefully it was useful and I'm K bring guard on IRC So feel free to find me. I'm in I'm in open stack channels all the time. So feel free to hit me up if you have any questions Thanks you guys