 Oh, there we are. Nice, OK. So apparently, I pulled the early lunch slot. Anyway, welcome to the Futures here. And the idea to do this talk came because a few days ago, I again got a bill of materials on my desk that was, let's say, a little bit extreme. So I was thinking, why don't we talk about how the changes in hardware over the last four or five years have changed the whole landscape of cloud computing? So I've built cloud for many years. And this is basically what we were up against six, seven years ago. We had, first of all, dedicated computing storage nodes. The reason why was that the storage nodes needed a lot of slots, two and a half inch slots, for hard disks because the performance of the hard disks was so poor that we needed lots of spindles to create enough IOPS to drive the cloud. We had exclusively multi-CPU nodes, typical Xeon E5 nodes. We had a relatively low core count because the CPUs were just so extremely expensive. We had, for instance, the top of the line CPU in 2016 was about $5,000 apiece. So you would have $10,000 alone in CPU in your node. We had low memory because of the cost of memory. And so essentially, the cloud was relatively large. And the performance was just adequate to put it like that. So what has happened in the last few years in hardware? First of all, just for comparison, the biggest CPU that Intel sold in 2016 was Xeon E5 V6, 2699 V4. It had 22 cores per CPU. For comparison, if you buy the hottest CPU that you can get on the market today, it has a 96 cores for the CPU. So this is four times the core count in the single CPU. Plus, the cores are 2 and 1 half, three times as fast as the cores used to be back then. Second thing is memory prices went on a steep decline since then. 256 gigabytes memory was a very well-endowed compute node back then. Nowadays, you have compute nodes that have one and a half terabytes and more. The only exception to this are the relatively new 128 gigabyte modules, which still cost about four times per gigabyte what the other modules cost. Then SATA SSDs, they all thought it was the future. In 2018, 2019, the prices were coming down. And they have vanished pretty much. If you buy new compute nodes nowadays, most of them are equipped with NVMe, not SATA SSD, because SATA was essentially a disk protocol for a memory chip that you are trying to address. So this is really not such a good idea. And then finally, networking also became much faster. So now if you imagine your old cloud from 2016 and you think about, OK, you had a number of workloads on it. You had, let's say, 50 workloads on it. Now imagine a CPU that has four times as many cores and each cores twice as fast as it used to be. And you try to condense your cloud to these nodes. What would happen? All of a sudden, you would have 500 workloads on a single node. And while this is kind of enticing, you could say, OK, I'm decondensing my 100 node cloud into 10 nodes. The downside to that is that if one of those 10 nodes dies, you are going to be in deep trouble. So what we really need is an approach that reduces the amount, of course, per node. We don't want to just reduce the oversubscription because that would just waste the expensive CPUs that we stuff into there. And I want to show you an approach about this today. So this is, again, what we were talking about here. This is the left one is the hottest CPU that you could buy in 2016. And then for comparison, just take this and then the third column over. This is literally five times as fast. There are no benchmarks on everything, but this gives you a pretty good picture of what these CPUs really can do. And then here, this one, this one, this is pretty much an entry-level CPU that you can buy right now. And you can see it's still faster than the top of the line CPU from back then. So what can you do with all that? We already talked about that. Let's go right onto this. So one thing that you can do and that I think you should consider when you're building a cloud build of material these days is to go from a dual CPU system to single CPU system. This has a number of advantages. First of all, you'd reduce the core count. You increase the node count to a reasonable level. But the big advantage to this is that NUMA is a loose coupling between two CPUs in a system. And when you have a NUMA system, you typically have the hardware, network cards, storage devices attached to a CPU. But you cannot guarantee that the code is also running on that CPU. So the data is coming in bouncing around between your CPUs, memory, CPU, storage device, and you're going back out. So you see a significant performance degradation. So what we can do is to say, OK, we are simply going to the system. One CPU, one memory PCI Express, and you will have no NUMA lag. And one CPU chassis is also a plentiful now. It used to be that most of the server chassis were two CPUs. Nowadays, buying one CPU chassis is a pretty common place. Looking at memory, again, prices have come down. But one thing that really has met my eyes that the customers tell me, we want to standardize 128 gigabyte modules. So instead of putting enough memory modules for all memory channels in there, they say, OK, we are just putting enough 128 gigabyte modules in there to get out to meet our performance target, or our memory target. So for compare these two scenarios, you can say, OK, we put four 128 gigabyte modules into the server for 512 gigabytes, or we can put 1632 gigabyte modules in there. The one, this one, is going to be roughly four times the cost of this. And it's also going to be, from a performance standpoint, much slower than having the memory channels properly populated. Let's talk about memory channels for a second. This is also an important point. The CPUs from back then had four memory channels. The CPU I mentioned and all the smaller Intel CPUs, they all had four memory channels. So if you had four memory modules on that CPU, then you would get the full throughput. If you are nowadays looking at the CPUs, you will see that Intel has gone to eight memory channels, which is part of the reasons why the CPUs have gotten so much faster. And AMD even went to 12 memory channels. But to get this speed, you actually have to populate all memory channels. Otherwise, you will throw away a memory bandwidth. So this also can be a little bit problematic. For instance, if you buy a very small service for, let's say, control plane purposes or so, 128 gigabyte modules with 12 memory channels, well, 12 memory channels will give you 192 gigabyte at the very minimum. But so you cannot really build anything smaller than that anymore. Another thing that has also radically changed is flash storage or storage in general. We used to build self-service with 2 and 1 half inch drives in a 2U chassis. And for SSDs for the back then, the journal, later, the BlockDB and the Ryderhead log. But the downside to that is that you are still at the speed of the hard disk. Even if you have flash devices to catch the metadata operations, but to write something or to read something, you're at the speed of the hard disk. And a hard disk can deliver maybe 100 IOPS, maybe 110, 115 under normal circumstances. If you go to complete flash, and I have done the math here so you can compare this, if you go to compete, these are prices from today, by the way. This is if you were to build a traditional server these days, you would look at about $10,000 for the 8 2 terabyte hard disks. If you were to build the same thing with SSD, it would be only a few $100 more. And then the big surprise is that NVMe is even at the same price these days. And NVMe is drastically faster than both SATA and hard disks. Networking, we have 25 gigabit ethernet. I have customers who go to 100 gigabit. Typically, it's not really worth it. It's just not worth the cost. It makes more sense to build smaller servers. If you buy these, build these monstrosities with the high hot list CPUs that you can find and everything, 100 gigabyte network may potentially make sense because you might bottleneck on it. If you buy built normal sized servers, you will find that 25 gigabit ethernet is plenty for everything. So what would this look like? We could either build from entry level CPUs. We could basically use that silver 4314 that would give you pretty much the same cloud. This is about the same size, the same performance. And you would still have, let's say, a 100 node cloud. But this is what I would lately recommend to everyone. Split the nodes in half, use single CPU systems, and half the memory, half the storage devices, half everything. With this approach, you have twice the number of nodes over the big guns. You have more resilience. You have more performance, less risk to get bottlenecks. But most of all, if you are doing converged, and converged is all the rage, this stays for good reason, if you put NVMe devices in your system, you will find that this performs significantly better than a system with two CPUs and half the cores. Simply because you're eliminating the lag coming in through the network, going to the storage device, and back out through the network. Then you could also reduce oversubscription. But this is something that I normally do not recommend, and it's quite on the contrary. The new CPUs are much faster than the old CPUs. So oversubscription would normally actually go up. But don't overdo it in either direction, because if you go up too far, then you have the same problem that I mentioned at the beginning that you have too high of a workload density. If you're going down too far, you're throwing away money for the CPU. So here, this is a direct comparison between a system with a system that has two CPUs and a system that is exactly half of it, with a single CPU. So you have instead of two CPUs, you have one CPU, half the memory, half the data storage devices, and everything. And you can see that the cost for those nodes is roughly half of what the cost for the bigger nodes are. But this is, by the way, is cost per node, not for the whole system. But what you can see is that the half system, you will have twice the number of nodes. So half the impact, if anything, happens. You lose a node, you lose half the storage, you lose half the workloads. And overall, the risk of encountering network bottlenecks or storage bottlenecks is significantly lower in the system with twice the number of nodes. So this is the final stretch. I was asked by a company whether we could upgrade their 2013 vintage Xeon E5 V2 clouds. I said, theoretically, yes. I mean, the systems are supported. The downside to this is that these systems eat so much power compared to modern systems, that you literally, and I did the math for them, and it actually came out exactly like that, that the cost of the power alone pays over three years, pays your entire new greenfield solution. The other advantage of greenfielding, of course, is that you will have time to move all the workloads from one side to the other. If you try to upgrade, this is always the risk that something does not work the way that you want to, that you'll get stuck, and that you're spending a lot of time and effort doing this. And then finally, of course, you have the two big points. The hard disks are obsolete. The systems that he had obviously had hard disks in that 2013, there were no SSDs around. And they were already bottlenecking on hard disk performance, and going to a new SSD system was drastically increased performance for this. And the other thing is at some point, the bathtub curve, at some point, you will simply have a point where the systems become so unreliable, especially hard disks, that you're starting to have cascading failures. And cascading failures, especially in distributed storage systems, are pretty problematic. So as final words, when you build a build of materials, it's very important to consider all aspects, not only, OK, here I have my number of CPU cores. I have my number of memory. I have my capacity. We have to think about how does every choice that I make impact my environment. And I'm hoping that when you build this, you come up with something that is somewhere in the middle of the road, something that is not too small to have too much footprint and not too big to be too expensive, that you build something that has proper performance, reliability, and still looks, saves you some space in the data center and also some power. Thank you very much. I hope that I brought a little bit of information to the table and that it takes something home with you. And thanks for being a part of the Open Infra community. Any questions? Yes? One question I had was when you were showing the comparison between two CPU and one CPU, how does basically the power cost factor into the difference between the number of CPUs in a node and the higher density you get out of a dual CPU node? That's actually a very good question. So yes, one CPU systems will take a little bit more power. You will, of course, save the big power consumers in the system are the SSDs and the memory and the CPU. But the main board does consume power. So your power consumption is going to be somewhat higher than it would be with the two CPU system. This is, again, an example that you have to balance your different parameters against what you have. For instance, if I had a data center, this is extremely cramped and I simply cannot put more than a certain number of nodes. Then I'll bite the bullet and build something that has a very high application density simply because I cannot build anything larger. But it's not ideal. So in the end, this is what we as engineers and architects have to decide. How are we going to balance the pros and cons of each solution? Do you have any insider experience with other architectures that have better power characteristics? I'm thinking of the Ampere Ultra, which is advertising an 80 core system with a very low power consumption relative to the comparable cores on Intel or AMD. So yes, we are looking into ARM-based servers. I think everyone who is in the market at the moment does at some point. But the problem that I see at the moment is that the software that is currently built is still built for Intel CPUs. And at the moment for this generation, like if I was to build a data center right now, I would go with Intel slash AMD 64 CPU architectures. I would bet that in four or five years, if I hopefully maybe stand on the stage again, I can say, OK, the biggest shift in the last five years has been the shift to extremely power-efficient CPUs that are perfect for the new type of cloud environments. But so yes, AMD is coming. Definitely. Thank you, everyone, and thank you for having me here.