 I'm here to talk about hyper-cool infrastructure, and being hyper-cool is there's a correlation with these glasses and my presentation, right? These I got about four years ago in Austin on my first OpenStack summit. We had a grand unification party for our storage systems, software-defined storage systems. We had just welcomed our SEF into our family, Red Hat family, and they gave these out. I bet you these are a collector item now. I've got to hang on to them. But why did I call my presentation cool? Why hyper-cool infra? Did not like the converged word, right? Needed to preserve starting C. I could have used Crandor or complex, but then I probably wouldn't get it accepted here and wouldn't be in front of you today. Agenda, we're going to talk about what hyper-converged is, what drivers are there, and what are the use cases, and then we're going to cover really quick the three Red Hat solutions that we have out there, and talk about architectural considerations, get into implementation, and then I'll just touch upon performance and scale considerations, and then we talk about futures, where we are thinking this might lead us. And then I'll leave some time for Q&As. So what does Wikipedia says about hyper-converged infrastructure? And there are a few words that jump out at me, at least. Off the shelf, x86 servers was direct attached storage and has some intelligent software behind the whole thing, right? How about hyperconverged.org talks about simplification and savings, and also delivers on the promise of software defined data center at the technological level, right? And finally, scale computing, one of the players in the space says hypervisor plus convergence is hyperconvergence, right? The convergence piece reminds me about, not necessarily, about eight years ago, nine years ago, when the key words were Vblock or FlexPod, right? And there were multi-million-dollar solutions that were very proprietary and very expensive, right? So the hypervisor here is our x86 has become so prevalent, and this is what makes this solution really, really attractive to both the customers as well as the vendors that are creating these solutions, right? What are the drivers? So this is pretty self-explanatory, right? So lower cost of entry and smaller hardware footprint. We want to have it as small as possible so that it's easier to standardize, to get a packaged hardware. And this reminds me of one of my first customers who wanted to implement exactly that. They wanted to put Ceph and Nova Compute on the same hardware, right? And that was, again, about four years ago. And we were saying, whoa, whoa, whoa, stop. Think what you're doing, right? You're actually getting your KVM processes, competing with Ceph OSD processes, and they can easily step on each other, right? So you might not have, maybe during normal operation, you'll be fine, but as soon as Ceph needs to do some recovery, you might not be so fine, right? But four years later, we're talking about exactly those solutions to the problem that was described way back then. And we are creating these solutions for exactly that reason, right? People want to take out maximum utilization out of their hardware that they purchased. And as far as use cases, notice the first two are kind of NFV, which is really, really hot. It was our telcos. Distributed NFV and VCPE solutions. And then we also have a remote office, branch office, mostly for Gluster and Revs solution. And then one of the most, the one that I really like, if you are interested in hyperconverged, you start with lab or sandbox, right? You put it in, minimum hardware footprint, and then go from there. If it works, awesome, test, test, test, push it out to production, OK? So what are our Red Hat solutions, right? In hyperconverged space. Traditional virtualization, overt plus Gluster. Private cloud, just talked about it, right? OSP plus Red Hat storage, Ceph storage. And we also have containerized cloud apps, right? Hyperconverged solution for that. This is OpenShift container platform, along with Gluster again. OK, what is the Revs as we have named it? Code name, graft, and what does that look like? Very, very simple, right? Three physical nodes with Gluster storage attached, or direct storage, right? So Gluster bricks. And just has a hosted engine that is highly available and can run on either one of those three. Some requirements, limitations. I would strongly encourage, if you're interested in this, go to our site. I attached the link. The slides are going to be available. Currently, the beta page shows up, but we are very, very close to actually making this into fully supported option and GA version, OK? So the key note here, exactly three physical nodes. There's no scalability at this point yet, but we are definitely working for the next version to actually scale out the graft and nodes, OK? OK, this is the other piece. This is the private cloud solution, Red Hat solution. I like to think of it as in three layers, right? You have your undercloud. In REL OSP, it's called Director. And then you have the control layer where all the OpenStack services run, and also supporting services such as Galera Cluster, or Pacemaker, or HA Proxy. They all run on that very important layer, right? And then, obviously, on the bottom, there's Nova Compute plus Ceph OSD layer, right? As you notice that middle layer, there's no disk attached, or there is RAID 1 disk for local boot, but there is no Ceph disk attached. So requirements and limitations for that. Here's one thing I will strongly, strongly recommend. If you are at all interested in this particular solution, or even if you are interested only on the private cloud HCI solution, make sure you go and download this reference architecture, OK? It was written by John Fulton. He's a software engineer that works with these reference architectures, and he has maintained this one since Liberty Cycle. So Liberty, we had a solution that was OSP8 plus Ceph 1.3, which is a hammer release. And so download that, follow those steps. He goes through exactly how you would set it up, right? And so it's very, very, very useful. I used it in my solution as well. So up to one data center, 42 nodes, I would say implement three to start out, right? Three, double to six, go to 12 maybe. But don't start with 42. We are thinking at Red Hat that as soon as you, if you really need a big cluster, right, you'd probably much better off separating and doing it the way we currently non-HCI way, right? So be careful with getting too many of those. And here's what it looks like when we kind of just map it out. So physical nodes grafting, physical nodes for control plane and compute. This is kind of a fully supported system, right? So we install undercloud. We install a couple of management systems or VMs on our grafting nodes, right? And then we kind of, I wanted to show you, this is kind of a self-encompassing what RevS constitutes now, right? And then let's do the same. Let's do a logical what our OSPHCI looks like. I noticed one thing here that when I first started the project, I was like, OK, Randy, did you really went through all this effort to set up these completely different software-defined storage and the cluster just to locate this undercloud on it, right? So probably maybe it makes sense to some, but obviously, it gives you footprint to put a lot other of our maybe satellite or free IPA or what have you, right? But as far as our goal of actually setting up a really good cluster for our CEP and OSP, maybe it's not as efficient, right? So let's take a look at what hypercool infra would look like if I kind of combine them a little better. Now check this out, right? So six physical nodes. We have our Grafton nodes, and we have our Compute OSD nodes. And every one of them has software-defined storage, right? So the cluster in case of the top row and seph in case of the bottom. And then let's see what this looks like. Now we are virtualizing overcloud control nodes slash seph controllers or seph monitors, I'm sorry. And still adding our management pieces. And now look at this and look at the overlay there, right? So undercloud and all our overcloud control nodes and seph monitors, those now living on top of our Grafton nodes or REVS, and now it's a lot more efficient. Implementation details on the REVS. Just really quickly, you install the nodes. You configure the SSH keys so that you can deploy Gluster on all three of them at the same time. And then you basically deploy Gluster via cockpit plugin and also GDeploy tool, which is Gluster deployment tool that uses Ansible for deployment. And so after that, deploy hosted engine. You continue with that deployment process. And then we create some networks. Obviously, we're going to need not only Gluster storage, which we want to put probably on 10 gigabit network, but also you will need to create a provisioning network where you're going to actually build your OSP overcloud nodes and then the rest of the OSP isolated networks. So and then basically once you've done that, you can add additional hypervisors. The rest of the hypervisors upload the rel7.3 guest image, create a template from that image, and off you go. So now you can actually create a virtual machine on REVS. As you notice, we have two nicks there, overt management one, and we are running provisioning on the nick two. And then we can actually install and configure director via Ansible undercloud playbook. I wrote that a few, almost a year ago, and I have used it at customers and just keeps getting better and better, trying to cover different areas. This is just an undercloud role, what it consists of. It's not the entire thing. Then we prepare and upload overcloud images. If any of you have actually deployed triple O, you will note that one of the things that you really want to need to troubleshoot is when you are deploying a node and something is wrong in your nick configuration, and then you can't even access those nodes. So getting to the console is pretty critical, and if you don't have root password, you wouldn't be able to get there. So for customization of triple O heat templates, that comes next. We want to, again, strongly encourage this is within reference architecture, go to GitHub, do a Git clone on that HCI work that John Fulton is maintaining here, and then you will just adjust a few things. You're definitely probably going to adjust the NIC configs on the bottom of that first column for compute and for controller, but I adjusted it to actually include SSL. I wanted to make sure my undercloud and overcloud both use SSL for public endpoints. And then there is a bunch of scripts that actually do the, it's actually coming up, that does the isolation and tuning. So if nothing else from those scripts, then absolutely make sure you read this chapter seven, because in here, he talks about exactly how you apply the NUMA changes and CPU pinning so that we can ensure that Cep OSD and Nova compute processes on the same nodes actually going to behave nicely, no matter what the situation is. And here is where I ran into my first big issue, and that had to do with overt ironic driver. For some reason, I had assumed that we already have an overt ironic driver available, but that is not the case. So I found an RFE that actually requested exactly that use case. If you are going to put controllers on a rev or overt cluster, that you should have an ironic driver that fulfills all the functions that ironic driver is supposed to do. It's not just power on, power off, and status. It's also set the boot device, and there is a couple other ones. So what I ended up here ended up cheating a little bit. Instead of using these within overt, I put them on a KBM host. And then used virtual BMC proxy to actually be able to provision, to power on off, and stuff. So here's a better view of that. So if you can see on the left side, I have an OSP controller slash SEF MON, which is living on KVM. And on the right side, I have bare metal, one of my Dell nodes, with its configuration there. As you notice, the big difference is there is a PM port. When you use virtual BMC, you actually need to define on your KVM host, you need to define some ports, and you need to run some virtual BMC processes on that KVM host. And then when you actually do introspection or deployment, you use that to actually target that KVM host on that particular port. And then I'm also using, as you see, capabilities instead of profile matching, I'm using node placement. And this is the deploy script. It's slightly modified again, due to SSL and due to the rail registration. And then we deploy. And those of you who have done quite a few triple-load deployments using our OpenStack platform using undercloud and using our approach, you'll appreciate that when you get StackCreate completed successfully or StackUpdate completed successfully, that's always a good news. Now, public URL, notice how I'm using SSL. I'm using fully qualified domain name. So that's one of the changes that I implemented. And here's a couple screenshots of my fully functional control nodes, as well as compute nodes, and seph health status that shows we have a healthy cluster with 21 OSDs. And then finally, we did some additional tasks. So installed our cloud forms, our management platform. And we would configure the infrastructure provider for Rev and cloud provider for OSP. And then we can provision some Rev or OSP instances to validate that functionality. I'm missing a slide for Ansible Tower, but that's another really, really useful tool, very popular tool for us that customers love to put on to their infrastructure. RevS would be the right place in this setup. This is my POC hardware. This is a nine-year-old servers, six of them. You see there are 746 gig drives. So that's 21 OSDs. And as you can see, some old, old, old switch on the bottom, the quantum switch. But it worked. Everything worked for a production. In its current state, you would probably, if you wanted to follow our standards and our supported stuff, as soon as it becomes fully supported and not only fully supported, but maybe GA, you would probably want to use something like that initially. So it's nine nodes instead of six. But so this would account for controller and self-monitor actually being on physical hardware. And that's what our reference architecture also uses there. Proposed hardware, if we were going to use for Hypercool Infra, if we get that over driver, and I've actually talked to a couple of my colleagues during this conference, and we are absolutely determined to get that over driver, ironic driver, in place. We just need to find the right developer who is the right skill set for both Overt as well as ironic drivers. So we'll get that done. And at that point, you would actually be able to use something like this for proposed hardware. Performance characteristics, you could find these in, again, in the reference architecture or our documentation for RevS. But the key is make sure you use 10 gig interfaces and jumbo frames for all your storage, and also probably tenant traffic. And just make sure you understand that there's only three node cluster, exactly three nodes for the RevS stuff. On a Cef OSD nodes, we can actually, like I said, we can scale up quite high. As far as futures go, short term, we want to reduce that footprint if we can. Would be very, very cool if we were able to get there. Again, the only requirement that I could see is getting that Pixie over to ironic driver. Further automate this HCI build out using Ansible. I've done some work with some of my customers already in the past. It shouldn't be too difficult. It goes beyond just Ansible under cloud playbook. It goes further by actually trying to automate pretty much the entire spectrum of the solution. And then we at Red Hat love OVN, and that's a very recent development. And we're going to ensure that it goes through all of our product line. The entire product line, including our OpenShift offerings, will actually have that OVN as one of the options. It might not be as short term as someone would like, but it definitely is something you should pay attention to see the development. One of our top guys, Russell Bryant, is actually a core developer and core contributor to that project. And we definitely are after getting the OVN in our products. And that would be really cool to actually have a SDN layer that is exactly, we can plug and play with the neutron stuff. And as far as longer term, we can containerize OSP services. I mean, one of the things I don't know how about you guys, but one of the things, the big theme of this conference is Kubernetes. After a while, you start getting tired. You see it over and over and over every session. But it's definitely here. It definitely is a great solution. And we are working on it. We are working as we speak. Our engineers at OpenStack, our OpenStack engineers are very, very committed to actually make it work within our OpenStack platform. So it's coming. I can't give you any timelines, but it's definitely going to get there. We just take baby steps and go slowly, but shortly. We'll get there. So with that, I am on my Q&A part. If anybody has any questions, please just, if there are no questions, you guys have about 10 minutes to spare or come talk to me after this.