 Just to frame the presentation, I'm with the AMD C micro and we're going to talk about open stack on fabric architectures It's going to be a use case study for deploying open stack on a fabric compute architecture I'm going to talk about it from vendor perspective and my co-presenter James is going to talk about it from a user perspective Yeah, I'll click do it manually So real quick introductions. I'm Peter Yamasaki. I'm director of product management with AMD C micro C micro is a startup That was acquired by AMD a year and a half ago and what we build is we build fabric compute systems These are complete systems with servers storage and networking So we sell systems not parts. We use AMD parts. We use Intel parts. We bring the value through the fabric James So as Pete said, my name is James Pinnock systems architect. I focus on building large massively scalable infrastructure I've worked with a Number of large companies in the past focusing on building large redundant reliable Technology that's also as efficient as possible All right, so let's start out. Let me before we talk about how we're using it for open stack Let me just frame what fabric computing is So what is fabric computing? Is it servers with laser beams? No, at least not yet But eventually is it a new proprietary architecture No runs the same off the shelf OS is in software that you run today. No special drivers need it It's a different way of building systems. Do they cost millions of dollars? No same economics as the servers you buy today similar economics to regular rack mount servers But what it saves you is a lot in TCO and the way that you manage and operate these systems So let's look at let's define a little better what fabric computing is So what we did is we took a look at the traditional rack mount servers you've got CPU memory discs and networking and Those are captive resources when you put that server in a rack That's that's the capacities that you get you have fixed storage you have fixed networking and what we did is we said Let's disaggregate that so we split up CPU and memory networking and storage into separate pools of Resources and we did is we tied all those resources together with a high-performance fabric interconnect And what that does is that's all encapsulated in our system And this is the C micro fabric compute system that unifies all those resources and in addition to it We've also included a top of rack switch capability within the system So it's almost like a data center in a box or a rack of systems in a box So what we did is we really rethought the server you look at a traditional rack mount server That's what you see power supply CPU memory a lot of components We wanted to rethink that and then we broke that into the individual units So what I've got here is one of our compute cards It's got a CPU and it's got memory no disk on it This allows us to make the systems very dense whether we use AMD op-drons whether we use Intel Xeons Or whether we even use low power processors like atoms and eventually arms And that allows us to build the systems very dense. So how is it that you actually provision a server? So what you start out with it's a card CPU memory not much you can do with that you need to connect it up to the network and you need to add some storage So the first thing you do is you specify how much storage I want to tie up to that and how much networking I've got a little snippet of code here. We've got a restful API You could call it and you could call those commands provision the network provision the storage tell the server to pixie boot and you're off to the races And that becomes interesting when we start talking about open stack So what does the fabric compute system look like our system today SM 15,000? It's a 10 ru system It's got compute cards on the side up to 64 compute cards gives you 64 to 256 servers We've got network uplink cards gives you up to 160 gigabits per second of uplink bandwidth Shared storage controllers up to 8 in the system allows you to have 64 internal discs And if you want to expand that storage you can also add up to 5.4 petabytes worth of discs that are managed to the same controllers And again all that tied together by the super compute fabric interconnect, which is a 1.28 per bit per second Fabric interconnect. I'm not going to spend a lot of time on this You're welcome to talk to us after the presentation Come and visit us up in the developers lounge and we'll tell you more details about it But what I'd like to do next is jump into really the fabric computing use case, especially for cloud So I'm sure we're all familiar with the analogy of pets and cattle Right people previously were used to dedicated machines. Those were those pets We switched over to VMs cloud-based services and those are cattle. You don't worry about the machine that it runs on you What you do is you're given a set of resources if it dies. You don't care you spin that up somewhere else But sometimes you don't want always want beef How do we deal with that and what what do I mean by that? So if we look at today's workloads workloads ahead our genius Right. I want to provide the right server for the workload Yet maintain a minimal minimal set of configurations So if I look at the workloads that are out there look at Hadoop. What are my requirements for Hadoop? Well, depending on how many analytics various compute memory medium network medium Storage bandwidth doesn't have to be that high But storage capacity and storage bandwidth needs to be high and that would be very different from your web applications So different applications have different needs and if you are a large company You has to support a lot of these applications if you're a host or a cloud provider and you want to service all these applications That means that you need to like maintain a decent portfolio worth of infrastructure So you'll have a stack of equipment. That's your Hadoop servers a stack of equipment. That's your web application servers So what we strive to do with fabric computing is look at that differently. So let's let's draw an analogy I'm going to move away from pets and cattle here sort of use a Transportation analogy. So if you look at bare metal host, it's like going to a rental car company and Renting a sedan right you go there. You typically get a regular sedan. That's what you get If you are a host or you're running VMs, that's like having a bus But does one vehicle support all needs like we did like we talked about with the applications Well, no, sometimes you want something a little faster. Sometimes you want something a little bigger even on your VM hosts You have different types of hosts that you run different applications on So there's different ways to skin that cat. So now let's let's tie this analogy to fabric computing Let's say I want to provide these different services So let's say you're standing up open stack. If you're standing up open stack You have to think about controllers. You have Swiss Swift nodes You have cinder nodes and then you have your Nova compute nodes. Do all of them have the same requirements? You typically have different amounts of storage and different amounts of of networking that you want to tie to those servers So if we look at the block storage, I might want 20 terabytes worth of SSD and Pretty high performance on my data mix to be able to get out to all the VMs out there on the object storage side I don't need as much speed, but I need a lot of capacity So I'm going to want 40 terabytes worth of hard drives for each node and a different amount of network capacity And then when I get to my Nova compute if I'm high-performance Some of them are going to have lots of SSD and I'm going to want to provide a lot of network bandwidth So really with this architecture what you start out with is I say look I'm going to start out with a certain amount of CPU and memory and on demand. I'm going to provision that to the node So if I look at deploying open stack, I can now specify different types of nodes to deploy that application on to So I'm a vendor obviously I'm going to want to tell you why fabric architectures are great for open stack But we're a developer conference and I think it's more important We actually really believe this that open stack is is great for fabric compute architectures What it does is it helps us and helps our customers more easily deploy things on these type of systems where you have a lot of nodes It makes it much easier to manage So if we look at bare metal provisioning when we go back to the car analogy You know the rental car company has to maintain all these different cars What if I wanted to say you know what I don't want to have to keep an inventory of these different type of servers I want to create this on demand Well, you can do that, but it's going to be quite hard if we look at traditional hosting you do you manually manage that you keep an inventory When I'm out of this type of server, I'm going to order some more But with these architectures we start getting closer to the ability to say I Can have fewer flavors or fewer types of hardware servers I can construct those and provision those to meet my application workloads on demand, but that's pretty hard I've got to automate that and open stack is a great way to automate that And what we're talking about bare metal provisioning maybe ironic is the way let's talk about that a little bit so We delve a little deeper into bare metal provisioning We talk about these pools of compute. Let's look at open stack. How does open stack define itself? Open stack is the cloud operating system that controls large pools of compute storage and networking resources Pretty good analogy there So we looked at different ways of how we're going to provision bare metal So the first project that came into being was Nova bare metal This was the genesis of bare metal provisioning It was made available in the grizzly release Has limited capability, but it provides the ability to enroll your machines into Nova to deploy images on them through pixie and IP MI It's usable for some amount of testing and some companies have have hardened it to be able to use it It over it overloads the Libvert driver model to manage these bare metal machines So I'm sure some of you have heard that there's a new project. It's the ironic bare metal project This is being split out of Nova It starts with Nova bare metaling and it's striving to provide a much more robust and complete way to manage bare metal machines The first release is in Havana release bare metals in what it's what what it changes bare metal servers are now first-class citizens It's designed to support capabilities that are unique to hardware not just the ends the way they did it before as they've overridden They basically use the same Libvert and wrote a bare metal Libvert driver But those machines really weren't VMs. They were bare metal machines So this project now makes those bare metal machines first-class citizens and is designed to really treat them like bare metal machines So let's talk a little bit a little bit more about bare metal provisioning So how do we do that traditional servers and rack mounts your storage and networking was fixed with fabric servers? You can now specify that on demand. How does that meet that model? The way people are thinking about it with bare metal provisioning today is it's a rack mount server when I provision it I get the storage that's there. I get the networking that's there So how do we transition this flexibility into bare metal provisioning? how do we support this with open stack and This is what we're exploring and this is how these are some of the things We want to contribute back into the community and work with the community on so rather than just talk about it Let's run a quick demo here about what we're thinking about So we were gonna do a live demo, but we realized the networking was not reliable in here So we did a quick recording beforehand So let's get it started. So what did we do here? Well, the first thing we did is we wrote a script that will automatically Scan it'll attach to the chassis management and scan all the servers that are eligible as Ironic nodes, so we just kicked off that script and what I'm also showing here is we actually made some changes to the horizon panel we added a bare metal panel and This is this is really just for demonstration purposes But we created a separate tab that allows us to show the bare metal machines So the scan is in progress and we should hopefully see eligible servers pop up into this window and it just did That's one of our eligible servers. So the next thing we want to do is go provision that server So we added some additional ability. How do we get the flexibility of the fabric? servers, well, we want to be able to supply that Storage size. I think we did 32 gigs there and we also looked at the nick In this case, we did a single nick and we specified which vlan we wanted to be on So we wanted to add this additional flexibility today with the way that we actually provisioned the machines So we've just kicked it off right now so we told it to boot and We're going to refresh it here and what we're looking for is the disk is the disk actually going to get provisioned here Actually says powering on powered on but it's in the process of powering on but as I said, this is this is for demonstration purposes So it just ended I apologize. So what we were able to do there is we extended the capabilities horizon in In the Havana release to be able to show the possibilities of what we can do with fabric computing We can take a node we can specify how much storage we want to tie to it and how much and specify the Configuration of the networking that way you don't have to plan that in advance It doesn't have to be a pool that set up a certain way when you have a use case You can then go provision those servers as needed. Okay, so really what we did is we attached the network I mean the storage and we attached the networking. So what's next? Well, we are we are engaging to work on more on the ironic project and We really want to work with other hardware vendors I think there's other companies with similar architectures Where you can dynamically provision the networking where you can dynamically? Provision other attributes it can be even as simple when you provision a bare metal machine How do you specify the raid configuration? So we want to work with other vendors to figure out how to do this right in the ironic project So there's a bunch of meetings today in the in the developer sessions on the ironic bare metal project The ptl for that is Devananda van de Veen So if we look at our storage provisioning Well, we were able to demonstrate how to do that directly through Pass-through commands vendor pass-through commands that we added to The ironic power driver, but maybe that really should be part of Cinder So that we use Cinder to provision those volume nodes But some of the things that we need is we need boot support for Cinder volume On the network provisioning side neutron is probably the right right way to do that So we need to work with the with the neutron group to figure out how to specify that for bare metal provisioning to be able to to To at configuration time specify the network configuration So I'm going to switch gears here. The next thing we're going to talk about is how can fabric architecture help you with deploying open stack? So You know, this is really the model of open stack. It's a lot of pieces a lot of components How many people here have actually installed Dev stack? Show of hands. All right quite easy Out of those people how many of you have actually now installed a real production release on multiple servers a Bit harder, right? And everybody works to make it easier So if you look at if you're if you're going at it for the first time, is it easy to set up open stack? No, not a production scale. What are the things you need to consider? How many control nodes Swift nodes cinder instances do I need what type of disks nicks you have to design the network? Racking it plugging it powering them up Provisioning the systems updating firmware the drivers learning about open stack There's multiple installers out there. We've done a lot of work with Mirantis fuel There's puppet their chef crowbar Many tools out there to help you do that installation So what have we done today? As a team So if we look at provisioning a system like this So we have a system that now has lots of compute lots of storage lots of networking Maybe it makes it a bit easier to install open stack So we rack and plug the chassis gives us sixty four to two hundred fifty six compute compute nodes And then we added a capability called zero touch provisioning and what that does is it brings in a config file That configures all the servers all the nicks on the servers and all the storage on the servers Lays that out onto all the servers on the system Takes about 40 minutes to run. It's all automated behind the scenes And what we've selected today is we're working with Mirantis fuel to install other components of it We wanted to make this easier and make it repeatable Using Mirantis is a bit many manual because we're still using the GUI interface So we've provided a set of instructions that our team uses and some of our customers use to configure that on a system So that's sort of where we are today. We can kind of go soup to nuts in about four hours But we do want to make that faster And then we could scale that out to multiple systems. So where do we want to be? Well, if we look at installing the system We want to be able to rack it plug it in Today it's a couple of hours of provisioning time But we really want to get to the point that as soon as you plug it in at DHCP's brings in a configuration And everything else is provisioned automatically and that's where we'd like to get to So what are the various ways to get to that? So you can always automate those those installation scripts? We're looking at ways to do that. I think that's going to be the best way to do that over the next six months But we're pretty excited about what triple low open stack on open stack is talking about what we like about it is It gets the industry behind one way of Installing open stack rather different companies coming up with workflows and recipes with different installation tools and different scripts Maybe we can we can use open stack itself to install open stack and we get behind that But I think it's still got some ways to go So if we were to build that out, we'd rack and plug the chassis Triple low with then provision the under cloud and once we have the under cloud provisions We can now provision the open stack clusters on top of that Ideal looks easy on paper, but there's still a lot of work in the community in order to get there So one last thing I'll talk about is power and space And that's a big thing that it's that architecture like this saves you because I don't have fixed resources I only have to provision what I need to use for my hypervisors or for my applications So really how much can this save you how much power and space can we save there? Well, I said the disaggregation of resources helps you save The integration of the switching saves you space and with this architecture. We're able to deploy very densely Microservers as well, too. We can explore new architectures like arm processors But I think a lot of that power savings is also going to come from open stack itself automating the powering on and off of resources Bear metal makes this possible, but we'd like to see the tools that Automates that with an open stack. So when you don't need nodes, they power down when you need that capacity Even your under cloud dynamically being able to scale that under cloud to add more resources on demand scale them up And eventually scale them back You know, maybe we need the tools like DRS, which is commonly used in VMWare to allow us to at periods of low demand Reaggregate those VMs somewhere and then use bare metal provisioning to shut down those servers until that capacity is needed So I think we have a ways to go, but I think open stack will really help us realize those capabilities in a fabric architecture So sure, we'd like to give some definitive numbers We've worked with number of customers again It really depends on what architecture comparing it again when we go into a customer oftentimes We're replacing a legacy, you know legacy infrastructure. So it's easy for us to make big claims We've saved 2x 3x the power, but we really want to compare ourselves to to similar architectures today, so I think by you know applying some of this these these fabric capabilities of Consolidating your resources assigning just the right amount of storage getting the density the better aggregation in the system And we're seeing somewhere between 20 to 50% savings But I think a lot more savings will come in you know come into play Once you start getting more of that automation into open stack for managing those resources spinning them down when they're not needed So if we look at the density, let's imagine a few things here. What can we do with infrastructure as a service? Well, if we took four of these chassis in a rack, what does it look like? Well, that will give you 256 optor on our Xeon servers or a thousand atom servers it give you 16 terabytes worth of memory It's about 40 are you total and how much power you get a burn about 13 kilowatts on average pretty good density Now let's look at it for object storage if we did object storage. We could take two racks With one chassis build 5.4 petabytes of raw raw data It would burn about 20 kilowatts of power average on average but it really depends if It's cold storage or it's very active storage obviously So these are some of the things that we can achieve with this architecture We think it brings a lot of value to Companies who are building clouds both for private clouds for the ease of deployment and for large scale out clouds to help with management quick deployment and Saving on power and space because when you're at that scale Those those items really matter So I'd like to now change the perspective here and bring up a vendor who's been looking at some of this technology and Give his views on how he thinks this helps What he'd like to see out of the technology and what he'd like to see out of the community Hey folks So, you know, what is the why do I fly this find this part to be compelling working in an environment with hundreds of thousands of servers? What value does this give me? Obviously, I have a great economies of scale Obviously, we run our own massive data centers of our own design. Why would I find these to be a value one power savings? hundreds of thousands of servers if They're not in use if you are doing a dynamic provisioning you can shut these things off, but When you have a bare metal bare metal resource these compute cards are only burning about 50 watts That means that even a well utilized compute card. You're still not using much power compared to a traditional rack mount host The effort of installing hosts in a data center is pretty significant. What do you think about? Pallet after pallet after pallet of pizza boxes coming in and some guys has to take that Go to rack and screw in the rails and slide in each pizza box one after another after another and then wire them all up That is significantly reduced with a C micro chassis Install one chassis wire it up move on. You've just you've just racked 64 servers maintenance if you have I mean Many of us have come from an operational background I've had to drive in at 2 a.m. To race into a data center to power a machine on or to swap a hard drive out You don't have to do that with C micro you don't have to call your smart hands And spend a bunch of money to have a crisis for the on-call to come out and fix it for you If you have a compute card go bad on you you can go into the C micro chassis reassign your volume to another compute card Shut down the first one boot the second one your host is now back online on a different compute card You're up and running Stored utilization for bare metal house is actually tough the virtualization We're finally able to realize a greater efficiency of use for disc But we still don't get that with bare metal the smallest hard drive we can buy is 500 gigabytes And there have been times where we've had to ship one terabyte drives because those were all we could get With C micro chassis if you have a low disc utilization requirement You only carve out as much as you need and with the scale of a large chassis the the economy is actually work out redundant power supplies Are actually a feasible and reasonable thing when you have something like a C micro they have I believe four power supplies Is it six? Six minimum six minimum power supplies For 64 servers now if you were to ever I don't know if any of you have ever tried to manage dual power supplies in a massive environment But it doesn't work you just give up you can actually take advantage of that with C micro and Flexibility is important to us We you know working in a large industry in a large organization We have times where we have unexpected events denial of service attacks were attacked pretty frequently Normally we can soak that up, but if it's a pretty Actually, I would say the largest denial of service attack that we ever see would be a media event Michael Jackson's death is a good example Made a celebrity passing like that would drive traffic to your site 10 to 20 fold Having these resources available in Iraq Shut it's just shut off sitting and waiting and flip some switches and bring them all up So there's a lot that we can we can realize from the C micro boxes the API on the mix it easy So we can integrate them with our environment. We're leveraging open-stack heavily as any industry large organization should be and By combining it with open-stack. We can have this one environment now to manage virtual machines and bare metal resources But there's a little there's some some things we need yet So it gives us some good stuff, but now being the customer being kind of the whiny toddler. I want more Things we still want to see from Open-stack for fabric computing and indeed for VMs as well And this actually starting one is specific to bare metal So when I boot bare metal resource using ironic I want to be able to use the same disk image that I'd use for a virtual machine as I would for bare metal So let's say that I have a I have a VM and it's serving traffic And I think you know what after profiling my VMs. I've determined that these would really be better off as bare metal and the fabric computing architecture snapshot that virtual machine and turn up a hundred on bare metal nodes and Then shut down my hundred VMs Bare metal comes up snapshot works the host resumes what it was you start serving traffic immediately Leveraging heat would be a really big thing for this where we now we can have a templating engine where we are an or Template an orchestration engine where now we can say You know profile and create all these virtual machines or these these resources Into a single service and I'll say great Build me ten of them turn up all the database nodes turn up all the API servers everything. I need to run my service In an infrastructure now the infrastructure the host of these boxes too. We need to look at so in In my data center, I need to know what does my footprint look like how much power am I using? How efficiently am I using the hosts that are available? We have to be able to tease apart also the services We have so with the seat with a lot of these fabric computing with this fabric computing architecture the compute cards can only get so Big so we have to start to examine Do we scale out or scale do we go? Why do we go deep? So we find that we're actually going to need more CPUs and more memory per compute card With those we would have the ability to run a hypervisor on The compute card now you could do that anyway, but we think we'd be a little more efficient Utilization if we had even more memory RAM tends to be the long pole in virtualization Overcommit Celiometer we have Celiometer for virtual machines But now how do we track that same resource utilization for bare metal with virtual machines? We can do it all behind the scenes of Celiometer report up to a centralized thing How do we do that with bare metal? So we need to figure that out as well for fabric computing but once we have The ability to run hypervisors as an environment we can do triple O Where we can have an open-stack Architecture now there's one of the ways of doing to the way that triple O is architected right now is an under cloud and over cloud what I would prefer to see is a simple base installation of your API nodes and then a large stack of Compute cards beats of boxes whatever ideally compute cards and then you will dynamically provision hypervisors To meet whatever your VM need is and Have the rest available for compute nodes without having to do a complicated under cloud and over cloud just one big wide infrastructure Once we can do Once we have those snapshots and we have the we can meter it and we have triple O Now we can get into using autoscale and this is something that will make open stack far more powerful than it is right now Now there's work going into autoscale There's a talk later on today about I'm excited to go to 10 that one With autoscale not only can we autoscale the services our our websites our applications But we can flip autoscale back and actually use autoscale on open stack itself You have a number of hypervisors. You're monitoring the resource utilization as you realize that your your over commit ratio is hitting a certain point and your Buffer capacity is running low Autoscale kicks off dynamically turns up another hypervisor adds it to open stacking start provisioning VMs on that and Now when we also think about terms of we have compute cards and we have storage and if we have snapshots Why should my? Why should my hypervisor be a pet we use this pet versus pig or puppies versus cows metaphor? Why is my hypervisor so special? If I need a hypervisor, it should just be a disc discless boot if we're using some kind of volume backing We use that to serve virtual machines We can easily migrate them around if when we no longer need that if we're our buffer capacity exceeds a certain point Autoscale can trim that back and bring us down Shut down the compute cards. We're not using to delete the hypervisors. We don't need and increase our overall utilization efficiency PUE Oh Thank you very much everyone, but do you guys have any questions? It's time to take questions Okay, we'll be a little bit just catching the surface. Yeah Okay, awesome any other questions. Yeah, oh Sorry, I'll read the question. He was he was asking a gentleman in front of us was asking if we were familiar with the Tusker project and saying that it is effectively attempting to its goal is to address pretty much everything on this slide So that's pretty exciting It's officially merged in a triple O Right. So, yeah, we I think we get a lot of people say hey, how big can I make the fabric right? How how how widely can I expand it today? It's it's within that ten argue box when we expand out of it We have a hundred sixty gigabits per second of uplink bandwidth so people do an overlay network so they'll do a Fat tree to connect multiple switches multiple systems together. So that's the current implementation today but we have people who've scaled out to about 20 chassis in production per cluster Talks to the chassis. So we have centralized management within the system. So if you look at the ironic model, right you have a You can write a vendor specific power driver IPMI is a default. We do support IPMI, but we also have a restful API So we've actually written our own version that calls our restful API to power on power off pixie You know pixie boot a server but to also do storage provisioning and network for provisioning as well, too So we actually haven't submitted it yet, but we're gonna do that soon. Yes I'm sorry. It's a question why we are why we support so much storage Yes, oh for Nova it depends what type of what type of service some people like to on some VM nodes Some people want to have a higher performance service. So they might back that with SSDs But you could also back that with HDDs, too Yes, no Well, if you're so if you're running on Nova, there's two models You can use ephemeral discs which are local or you can attach volumes and actually have have those volumes back The storage for your VFs people both people run in different models Well, it's the backing storage. So your your your virtual machine discs will be backed by those SSDs So some people will keep multiple pools in the similar service Where you'll have some that are backed by SSDs that are going to be a higher-class higher performing and some that will Be backed by HDDs. You'll put them into different flavors Any other questions for myself or James? All right. Well, thank you very much appreciate it