 All right. Good afternoon, everybody. Let's get this started. My name is Arturo Suarez. I am the Cloud Product Manager at Canonical. With me, we have Victor Estivo, he's a cloud architect. And Ikuo Kumagai, he works at BDAL, a senior engineer. If you guys want to get this presentation, have a look at it. I'm going to give you 10 seconds just to memorize the link here. It's actually going to be five seconds. What are we going to do? What are we going to talk about? So one of the questions we get, whenever we get to a customer, someone that wants to get into OpenStack, one of the first questions we get is, how performant is my cloud going to be? Am I going to be able to run this specific workload on it? How much is it going to cost? Or something of the nature like, can I get a cloud out of a bunch of servers I have in my garage or whatever? We're going to look at how we should build an OpenStack cloud and what we can get from a bunch of servers from the sizing perspective. Sizing the cloud for success. Sizing the cloud to attract workloads to make it use. The more a cloud is used, the better it is. Before we get into that, and if this thing works, yes, let me tell you a little bit about my company, Canonical. It's the company behind Ubuntu. Ubuntu is the most popular OS. In Linux desktop, it is also the most popular OS in the cloud. So over 60% of the images that respond up in the three major public clouds. If they're Linux, 60% of them are going to be Ubuntu. And as per the latest OpenStack survey, we also lead in OpenStack, right? So Ubuntu underpins more OpenStack production deployments that the rest of the OS is combined. Let me get back here. All right, so how do we underpin those deployments? There are several ways in which we are involved in those deployments, right? You can use the packages. You can use the way we package it directly. You can use our tooling, which is the Canonical OpenStack. Or you can use our managed services, right? So the two later ones are the ones that when we go deploy that OpenStack for you, we do it the way we're going to explain here. So how we say what are the attributes of a cloud built for success? What does the cloud need to be for it to be successful, right? Cloud needs to be reliable, meaning that cannot fail, right? The workload on top can fail. There has to be some high availability. There has to be some failover capabilities. We need to take care of that. And that has an impact in the sizing of the cloud. It has to be resilient. Basically, what I'm talking about here is you need to be able to upgrade your cloud with no pain, all right? I come out, I told the innovation that comes from OpenStack into a process that can be something you can do. It needs to be scalable, right? The design needs to scale. It's one of the premises of OpenStack. When you look at the mission of OpenStack, it has to hype for scale. I don't know if that is thousands of tens of thousands of nodes, but it definitely has to scale. And that has an impact on the design phase. It has to be flexible. I was talking about accommodating or attracting different workloads, right? It has to be flexible to make those workloads work efficiently in that very same cloud. And then money matters, right? So it also has to be economic. I'm going to pass it to my colleague here, Victor, who's going to struggle with this thing to talk about that reference architecture where we implement that canonical. So good afternoon. Our reference architecture tends to be ATA by default because we believe if we are not providing ATA on the management services, then our cloud is not reliable. We also believe that server defined is mandatory. So we believe on server defined storage and server defined networking as a major drivers to get a successful cloud. So when we go to a customer, we always ask this question, what kind of workloads do we run? If you are a customer, when you think on your OpenStack environment, can you tell me what kind of workloads are you going to run? Mostly no. So we tend to design our architecture to be able to run any kind of workload, basically. So this is not working. Those are the design principles that we are following. Mainly, we want to be scalable. That's our major concern. We already try and evaluate this technology in different kind of companies like Deutsche Telecom, Time Warner, NEC, NTT. So it's a proven technology has been a while since I started working with OpenStack. And this architecture has changed from the very beginning. We are going to cover that. We support a number of architectures. I don't know if you ever seen an architecture like that. This is a hybrid converge architecture. Our goal here is, instead of having this traditional approach in which we have management node, compute node, storage node, we just have building blocks. I'm a very fan of Lego, and we are basing this in the concept of Lego. So if I want to spam my cloud, I just need to add some more building blocks. As you can see here, we are providing a number of units of each service, three units of each service, to provide HA. So we have my SQL, Keystone, Horizon, Neutron, and a few others, and we are always applying at least three units of each service. Every single node here will execute storage in the form of theft and swift. We'll execute NOVA to provide compute capabilities, and we'll execute a bunch of OpenStack services. In this example, we are running five servers. If you want to add 100, 200, this architecture can scale up to that. For some of the customer security, it's a very big concern. So they want to have these management services in one side, but they want to combine storage and compute. That's fine. We also support that. Some others are also trying to keep the traditional architecture, which is pretty much like this. In this case, we are having some storage nodes, some management nodes, some compute nodes. We have the payment running any of these architectures, but as I said, our recommended architecture is the first one. Have you ever seen this kind of architecture before? Yes? No? You can answer. No? I know it's weird. I know it's not very common, but I already have a bunch of customers running this architecture. At the very beginning, they were very concerned about that. How do I prevent a container to consume that many resources? What about the performance in theft? What about the overall performance on the cloud? When we go into this kind of architecture, one of the advantages we have is we don't have any bottleneck. So if I'm going with a traditional approach in which I have a number of management services, if I'm having three management services, I can have one problem in any of them. Imagine that one of them is failing. What happened? At least I lose at least one unit of each service. It can be a problem. If I'm having here a problem, I will obviously lose a few units. And in terms of reliability, it will improve a lot. What else? Well, what I see here at yours is because some of the customers are like, OK, I want to start with what we call FUSMUS, the first example I present. But after a while, I want to add some more servers. Do I need to span every time all the open stack services across all these new nodes? No, you can have a mix of all of them. What about containers? I don't know if you realize, but most of the open stack services we deploy, we deploy them in containers, not environmental, not in VMs. What kind of containers do we use? We use Linux containers. Why? Because Linux containers provide a VM-like behavior, which means that I can't deploy any stack I want. I can tweak anything that I need. But the footprint that I get in a container is much smaller than the one I get in a VM. If I think in a VM, and I want to provide a new kernel for a new operating system, typically, the minimum amount of memory that I provide to this VM is 1 gigabyte, right? The average performance or the average footprint in terms of memory I get in a Linux container is between 33 and 110 megabytes. So the density I get is much better. Plus isolation. At the very beginning, when we did an open stack deployment, we tried to deploy all the open stack services and environmental. Well, we found that Cinder is not perfect. Glams can also have bugs. Some of the bugs may affect to your CPU, which can drive into 100% CPU consumption. In order to avoid these potential problems, if you are having a container, you are not having these kind of problems. Then drive us into the million dollar question for this kind of hyperbarconverge or containers architecture. How do I prevent my containers to consume all the resources? Because what I said is I'm not having management. I'm not having compute. I'm not having storage. I'm having nodes. How do I prevent open stack services to get all the resources on my cloud? Well, by using C groups. I don't know how familiar are you with C groups or not. But let me tell you that C groups are wonderful. That's the best way I have to limit how much resources a process is consuming. I can limit the amount of CPU. I can limit the amount of memory. And I can limit everything that can happen on the container user space, which is wonderful. About sizing the cloud. If we start talking about sizing the cloud, one of the things we need to know is hyperthridden. That's fine. But also overcommit. And we had a lot of problems with the overcommit radio. So by default, in open stack, the overcommit radio we have in terms of CPU is 16.1. We translate that I will have 16 virtual CPUs for one physical core. If my PMS are not executing a high-demanding CPU war loads, that's fine. But if I hit any kind of problem in these terms, all my CPU, all my BMS will collapse. And memory. Open stack default memory overcommit is 1.5, which translates in I will have a viable 1.5 gigabyte of RAM per one physical gigabyte of RAM, which is not always very good. Thev, how many of you are using Thev for open stack deployments? A bunch, right? What I can tell you is all the deployments that I've been involved in the last two years are using Thev. At the very beginning, the performance we had in Thev was crap, really, really bad. And that's what wanted the major concern of the customer. It doesn't matter if I'm using compute storage and management, all of them separate, or I'm using these building blocks that we saw. Performance was really bad. So we start working very closely with Ink Tank in order to get a right approach to the Thev sizing. And then self-processor. Well, if you go to the self-website, they will tell you that you need one CPU, one core per OSD. What is that OSD? It's a daemon that is very related with every single D that I'm having on the server. We run a bunch of tests, and we found that if the processor is more powerful than the 2650 V3, you can reserve 0.5 cores for each OSD. Then you are saving a bunch of resources. What about the memory? 1 gigabyte per terabyte. Can I have less? Yes, sure. But if you get any problem in any of the OSDs, you will need a lot of time to recover from this kind of failure. This, as much as more this you get, the better performance. So we found that those customers running 12Ds per server are getting better performance at those one running only four or five Ds per server. What about Flask and SSD? Well, we have a number of customers that said I really need good performance on Thev. So I will deploy all my Thev nodes with SSD. The performance was not that great. So I will come back here later. We recommend a number of three. Why? Well, we found that with only two replicas, the amount of time we need in order to recover any failure on the cluster is so, so high. And if we go to four replicas, we need a very high bandwidth, more than 8 gigabits, which is crazy. And the networking, if you want to run Thev on top of 1 gigabyte NICS, then you don't get any decent performance. So please, 10 gigabytes is the minimum. We're also working with a number of vendors, basically Intel and Melanx, with the 40 gigabytes NICS. And the performance with 40 gigabytes is not as good as in Asan, but it's getting closer and closer. So journal, is it worth to have a journal? All the tests that we are executing, so that we get 12 time performance improvement when using journaling. So when you are sizing your cloud, please put at least one SSZ per physical server, because you will really find a very good improvement on your performance. I also leave here a real example with Thev. We had three replicas. We were using 2330 B3 processors. The journal will place into Intel P3700 disk. I'm putting this specific disk, because the performance that I got with this disk was amazing. We tried a bunch of them, just amazing. And the UltraStar disk we picked here was because they were very cheap. So it's a good explanation. I also put all the conclusions here. What about the management nodes? Many people is asking, right, I don't want to go with the Foolsmost architecture. What I want is to have dedicated management nodes. So what kind of resources do I need to reserve for these management nodes? Well, remember that we have three units of each services for HA. Then you will need three servers, sorry, with at least four cores, 32 gigabytes of run, 500 gigabytes and six nicks. In terms of CPU, what kind of CPU do I need? At least 2620 B2 to get a proper performance. However, if we want to build a medium cloud, I would say up to 100 nodes, that's fine. If I want to go beyond these 100 nodes, I need to increase these resources. By the way, when we are trying to build clouds with more than 200 physical nodes, we are hitting problems, mainly with RobinMQ and database. So I don't know if you have any problem with that, but messaging has been a very, very pain in the ass. Right, so go down. What are the bottomless we are in? As I said, RobinMQ and MySQL are the major problems we are hitting to the physical nodes of 4,000 VMs. The question here is, how is the people creating these very big clouds like PayPal, eBay, and things like that? Well, mainly they are not using Neutron. That's one of the things that is allowed with them to have these kind of clouds. Plus, they are having a bunch of regions within the same cloud. So they have several regions with a bunch of availability zones, but all of them are sharing a keystone domain and a horizon dashboard. So that's a trick. How to solve bottleneck? Well, for the next release, Meetaka, we are going to support finally cells in production. We've been working with cells since a long, long time. The first time I hit cells was in Grizzly. They were very painful. I tried to work with cells in Juno and Kilo. They are not working properly yet. So we are putting a very big effort to make this work in 1604 and Meetaka. And what is the new things about cells? We are just trying to work with the community to rewrite most of the code. So by default, we will release cells with any of our deployment. And then we can expand the cell. Every cell will run their own message in queue and their own database service right now. Options. Well, this is just an example of the options that we usually drive to the customers. We have different approach. If the performance is what matters for you, then we will go with 1.1 in the FTP over commit. 0.9 to 1 on memory, which is very, very conservative. But we found it's working really good. We are not using any kind of thin provisioning on theft. And we are keeping three replicas for theft. Hyper-threading. What's going on? Well, if the workload we are putting on top of your cloud is a database, you are not taking a lot of advantage of hyper-threading. If it's a web server or an application server, you get a very good improvement by using hyper-threading. So you get double number of BCPUs at the end. We also have the density one, which is the most aggressive. We are using hyper-threading. We are using 4 to 1 on CPU, 1.2 on 1 in memory. We are using thin provisioning. But you cannot expect a good performance on your cloud. And if it's not density and it's not performance, you get the balance one. So you can either choose hyper-threading years or not. You get 2.221 on CPU, 1.1 on memory, which is kind of fair. And thin provisioning, depending on the workload you are planning to run. Our suggestion is always starting with thin provisioning yes. And if you have to disable, then you do that. We have a number of examples of configuration that has been very successful for our customer. We found that in terms of cost, 26.7 TB-3 is wonderful. You get 12 cores, which is a very nice amount of memory in a combination like this. We found, too, that if we go beyond 768 gigabytes of RAM on a single server, we hit bottleneck in the communication between the memory and the processor. We are solving that in power. But in the latest one, we hit some problem with that as well. Yep. Bollneck, when running servers with 1.5 terabytes of RAM. Keep in mind that in this kind of deployment, we are also using a bunch of containers, which means that in a single server like this, we can run up to a million of containers. So you have a bunch of containers all the time hitting the processor and trying to hit the memory. Then you are trying to squeeze the machine as much as you can. Million is a very high number for a single server. That is a brief calculation about how much CPUs and how much memory can I have with balance performance and the density one. And now I leave Arturo to say, how can you get it started? Thanks. So yes, this is how we do things. Yes? All right. This is how we do things. One of the ways in which we can provide that service, the building of the cloud, the sizing, the designing of the cloud, is through Bootstack. It's our managed service. But the service, we will build the cloud. We will operate it for you to an SLA. And we will hand it over whenever you guys are ready. The design phase is very interesting. It's basically what we've been doing, what Victor has been explaining for your specific case. And again, the operation, we take care of it for a few months while you figure out what you want to do with OpenStack. This service is actually MSP certified. So if you're worried about your data protection, your security, your risk assessment, this is the third party actually certifying us on our procedures has been totally secured. One of the companies here in Japan that is actually selling this service on our behalf or with us is Betel. Betel, here, Ikuo Kumagai, is going to show us some of the examples. He's going to focus specifically on the network part, which we haven't covered much, but he's going to give us some examples on how that converge or semi-converge infrastructure is working for them, and he has some very interesting data to share with us. Ikuo. OK. So my slide. Hi, I'm Ikuo Kumagai from Betel. For those who don't know what we do, let me take a little bit about our company. We provide the center service in Japan, and we also provide cloud service and just started OpenStack Hosted Private Cloud Service based on canonical OpenStack. Today, I will be talking about our POC that is open-source hyper-converged infrastructure using 40 gig network. Just to let you know, this does not have anything to do with the private cloud service. So these are our needs for hyper-converged infrastructure. Structures as simple as possible, deploying as rapid as possible, integrated management, flexible scalability. So these days, a lot of vendor provide hyper-converged infrastructure product. But what I want is not a special server and product, which most of the time takes too long and costs too much. So we want to build this infrastructure with commodity servers. Basic structure is almost the same as canonical one. The only one control knows, but this is POC, just a POC. Computed storage server looks the same except we use PCIe SSD in safe cluster. And about the network device, we adopted Meranox product to use 40 gig network. And for the deployment, we use Jujube and mass. We installed OS and set devices with the local charm. And for deploying other OpenStack components, we picked from ChamStore. Here's the result of our POC. But this is what we tested. Firstly, we checked network performance between VM to VM. Each physical node and each physical node has 1 to 16 VM. So we don't this test in two types. One has VXLAN off-road on, and the other one doesn't. Obviously, you can see that it is much bigger with VXLAN off-road on. The biggest result we get was when 8 VM interacted. That was 31.43 in total. As you see right here, when 16 VM interacted, the bandwidth decreased to 24.63. It might have been too dense for this time of spec. As you see, the bandwidth gradually goes down in average. Secondly, we checked IOPS performance. We checked when 1, 2, or 4 VM in each four physical nodes interacted. In this test, we tried these four ways. Sequential lead, sequential light, random lead, and random light. This one on the above shows the above of the total and the bottom of the average. The result looks almost the same as the network performance test. Total IOPS increases as the number of VM increase. And the average decrease along the number of VM increases. So in total, when it leads, the highest goes up to 60,000 IOPS. And when it writes, it goes up to 30,000 IOPS. Depending on how you use this inflar, it might meet your needs. But still, we think we can use the 40 gigabit network more effectively to improve network performance. We are going to use DPDK so that it reduce network function cost of Linux curtain. And we are going to use safe RDMA for performance improvement of safe IO to enable direct memory access over ethernet for storage cluster. This is our best hyperconverged infrastructure for now. And I am looking forward to talk about the result next test to you guys. Thank you. Thank you very much. So just to wrap up before we go into questions, designing the cloud is the part of the whole process where you should be investing most of your time. There's going to be already, there are some clouds that are not well-designed and are starting to fail. And we had some news about some big vendor doing some public cloud at some point that they are pulling it off because it was not properly designed, I guess, in the beginning. So please do design for the long term. You don't know what you're going to be running in the cloud next year. It might be containers. It might be something new. So you don't know that. When you design a cloud, when you get your cloud, this is going to stay in your data center for 10 years. So take your time in doing that. We are more than willing to help you out if you want. Keep the balance. Keep the right resource ratios that is also important. And then choose your architecture wisely. Again, the full smush or the hyper-converged or the semi-converged that has its specific use cases. For going to questions, just a quick reminder. Tomorrow is the canonical track day, starting at 9 AM in Heian with Mark Shattleworth keynote. We have our booths, S3, Vitalis, and S4. We have neighbors there. So just drop by if you have any questions. You want to go through any of the details or contact us, Victor, or myself at any time. Any questions you guys have for us? I can take them now. You sure? All right. It's a wrap then. Thank you very much. Thank you. Thank you. Thank you.