 Does hardware still matter? The attractiveness of software defined models and services that are running in the cloud really make you wonder, don't they? But the reality is that software has to run on something. And that's something is hardware. And history in the IT business shows that the hardware that you purchase today is going to be up against the price performance of new systems in short order. And these new systems will be far superior from a price performance standpoint within a couple of years. So when it's time to purchase a new system, look at whether it's a laptop, a mainframe, or a server, configuring a leading edge product is going to give you the longest useful life of that new system. Now when I say a system, what makes up a system? Well, there's a lot of underlying technology components. Of course you have the processor, you got memories, you got storage devices, there's networking like network interface cards, there's interconnects, and the bus architecture like PCIe, Gen 4, or whatever, these components are constantly in a leapfrog mode like clock speeds and more cores and faster memories and SSDs versus spinning disks and faster network cards, the whole gamut. So you see a constant advancement of the system components. It's a perpetual and sometimes chaotic improvement of the piece parts. Now I say chaotic because balancing these different components such that you're not wasting resources and that you're ensuring consistent application performance is a critical aspect of architecting systems. So it becomes a game of like whack-a-mole, meaning you gotta find the bottlenecks and you gotta stamp them out. It's a constant chase for locating the constraints, designing systems that address these constraints without breaking the bank and optimizing all these components in a harmonious way. Hello everyone, this is Dave Vellante of theCUBE. And while these issues may not capture all the headlines, except for maybe Tom's hardware blog, they're part of an important topic that we want to explore more deeply. And to do so, we're going to go inside some new benchmarking tests with our good friend, Kim Lainar, who's principal performance architect at Broadcom. Kim, always great to see you. Thanks so much for coming back on theCUBE. Hi there Dave, good to see you too. Thanks for having me on. You bet. Hey, so last time we met, we talked about the importance of designing these balance systems. I talked about that in my open and how solid state threw everything out of whack because the system was designed around spinning disk and we talked about NVMe. And we're here today with some new data and independent performance lab prowess consulting conducted some initial tests. I've seen their white papers on this stuff. It compared the current generation of Dell servers with previous models to quantify the performance impact of these new technologies. And so before we get into that, Kim, tell us a little bit about your background and your performance chops. Sure, sure. So I started my career about 22 years ago back when the Ultra 160 SCSI was out and this could only do about 20 megabytes a second. But I felt my experience really studying that relationship between the file systems and the application, the OS and storage layers as well as the hardware interaction. I was absolutely just amazed with how touching one really affects the other. And you have to understand that in order to be a good performance architect. So I've authored dozens of performance white papers and I've worked with thousands of customers over the years designing and optimizing and debugging storage and trying to build mathematical models like project that next generation product to where we really need to land. But honestly, I've just been blessed to work with really brilliant in some of the most talented minds in the industry. Yeah, well, that's why I love having you on. You can go really deep. And so like I said, we've got these, these new white papers, new test results on these Dell servers. What's the role people might be wondering what's the role Broadcom plays inside these systems? Well, we've been working alongside Dell for, for decades trying to design some of the industry's best storage. And it's been a team effort. In fact, I've been working with some of these people for, for, you know, multiple decades. I know their, their birthdays and their travel plans and where they vacation. So it's been a really great relationship between Broadcom and Dell over the years. We've been with them through the SATA to the SAS to the SSD kind of revolution. Now we're working from all the way back at that series five to their latest series 11 products that support MVME. So it's been, it's been really great, but it's not just about, you know, gluing together the latest host or the latest disk interface. You know, we work with them to try and understand and characterize their customers and our customers applications in the way that they're deployed. Security features, management, optimizing the IO path and making sure that when a failure happens we can get those RAID volumes back optimal. So it's been a really, really great, you know role between, between Broadcom and Dell. Got it. Okay. Let's get into the tested framework. Let's keep it at a high level and then we're going to get into some of the data, but what did prowess test? What was the workload? What can you tell us about, you know, what they were trying to measure? Well, the first thing is you have to kind of have an objective. So what we had done was we have them benchmark on one of the previous Dell PowerEdge, our 740 XD servers. And then we had them compare that to the R750 and not just one R750. There was two different configurations of the R750. So we get to see kind of, you know what Gen3 to Gen4 looks like and upgrading the processors. So we kind of got from like a gold system to maybe a platinum system. We've added more controllers, we added more drives. And then we said, you know, let's go ahead and let's do some SQL transactional benchmarking on it. And I'd like to go into why we chose that, but, you know, Microsoft SQL servers, one of the most popular database management platforms in the world. And, you know, there are two kinds. One is that OLTP, which processes records and business transactions. And then there's kind of an OLTP, which does analytical, analytical processing and does a lot of complex queries. And, you know, together, these two things they drive the business operations and help kind of improve productivity. It's a real critical part for the decision makers and, you know, for all of our companies. So before we get in and share the actual test results what specifically did prowess measure? What were some of the metrics that we're going to see here? Well, we focused on the transactional workloads. So we did something called a TPCC like, and let me be really clear, we did not execute a TPCC benchmark, but it was a TPCC like benchmark. And a TPCC is one of the most mature, standardized industry database benchmarks in the world. And what it does is it simulates a sales model of a wholesale supplier. So we can all kind of agree that, you know, handling payments and orders and status and deliveries and things like that, those are really critical parts to running a business. And ultimately what this results in is something called a new order. So somebody might go on, they'll log on, they'll say, hey, is this available? Let me pay you. And then once that transaction's done it's called a new order. So they come up with something called a TPMC, which is the new order transactions per minute. Now the neat thing is, it's not just a one size fits all kind of benchmarks. So you get to scale that. In the way you scale the database, you scale the size and the capacity of the database by adding more warehouses. In our case, we actually decided to choose 1,400 warehouses, which is a pretty standardized size. And then you can also test the concurrency. So you can start from one thread which kind of simulates a user all the way up to however many threads you want. We decided to set on 100 threads. Now this is very different from the generic benchmarking. We're actually doing real work. We're not just doing random reads and random writes, which those are great, they're critical, they tell us how well we're performing, but this is more like a paced workload. It really executes SQL IO transactions. And those in order operations are very different. You do a read and then a write and then another read and those have to be executed in order. It's very different from just setting up a QDepth than a worker's. And it also provides very realistic and objective measurements that exercises not just the storage, but the entire server. All right, let's get into some of the results. So the first graphic we're going to show you is that what you were just talking about new orders per minute. How should we interpret this graphic, Kim? Well, I mean, it looks like we won the Wacomal game, didn't we? So we started out with the baseline here, the R740XD and we measured the new order transactions per minute on that. We then set up the R750 and the very first R750, all the details are laid out in the paper that you just referenced there. But we started out with a single RAID controller with eight drives and we measured that and we got a 7X increase. And then in the second test, we actually added another RAID controller and another eight drives. And then we kind of upgraded the processor a little bit and we were able to even double that over the initial one. So how do we get there? That's really the more important thing. And the critical part of this understanding and characterizing the workload. So we have to know what kind of components to balance. Where are your bottlenecks at? So typically an OLTP online transaction processing is a mix of transactions that are generally two reads to every one. And they're very random. And the way this benchmark works is that randomly accesses different warehouses. It executes these queries and when it executes a read query it pulls that data into memory. Well, once the data is into memory, any kind of transactions are acted on it in memory. So the actual database engine does in memory transactions. Then you have something called the transaction log that has to record all those modifications down to non-volatile media. And that's based on something, just to make sure that you have all the data in case somebody pulls the plug or something catastrophic happens. You wanna make sure that those are recorded. And then every once in a while all those in-memory changes are written down to the disk in something called a checkpoint. And then we can go ahead and clear that transaction log. So there's a bunch of sequence of different kinds of IO that happened during the course of an OLTP kind of transaction. So your bottlenecks are found in the processor and the memory and the amount of memory, the latency of your disks. I mean, it really, the whole gamut, everything could be a bottleneck within there. So the trick is to figure out where your bottlenecks are and trying to release those so you can get the best performance that you possibly can. Yeah, the sequence of events that has to take place to do a write, we often just, we take it for granted. Okay, the next set of data we're going to look at is like you said, you're doing read, you're doing write. We're going to bring up now the data around log writes and log reads. So explain what we're looking at here. So as I mentioned earlier, even though the transactions happen in memory, those recorded transactions get committed down to the disk. But eventually they get committed down to disk. What we do first is we do something called a log write. It's a transaction log write and that way it allows the transaction to go ahead and process. So the trick here is to have the lowest latency, fast disk for that log. And it's very critical for your consistency and also for rollbacks and something called asset transactions and operations. The log reads are really important also for the recovery efforts. So we try to manage our log performance. We want low latency, we want very high IOPS for both reads and for writes, but it's not just the logs. There's also the database disks. And what we see is initially during these benchmarks, there's a bunch of reads that are going into the database data. And then ultimately after some period of time, we see something called a checkpoint and we'll see this big flurry of writes come down. So you have to be able to handle all those flurry of writes as they come down and they're committed down to the disk. So some of our important design considerations here is can our processor handle this workload? Do we have enough memory? And then finally, we have three storage considerations. We have a database disk, we have log disk and then of course there's attempt DB as well. So because we have the industry leaving RAID 5 performance, we were able to use a RAID 5 for the database. And that's something that just years ago was like, whoa, don't ever use RAID 5 on your database. That is no longer true. Our RAID 5 is fast enough and has low enough latency to handle database. And it also helps save money. And then for the RAID 10, we use that for a log. That's pretty standardized. So the faster your processor, the more cores, you know, when you double the disk and we get more performance. So, you know, we just figured out where the bottlenecks were. We cleared them out and we were able to double that. That's interesting. Going back in history a little bit, when RAID 5 was all the rage, EMC at the time, and of course Dell, when they announced symmetric, they announced it with RAID 1, which was mirroring. And they did that because they were heavily into mainframe and transaction processing. And while there was, you know, additional overhead of you need two disk drives to do that, the performance really outweighed that. And so now we're seeing with the advent of new technologies that you're solving that problem. I guess the other thing, of course, is rebuild times. And we've kind of rethought that. So the next set of data that we're going to look at is how long it takes to rebuild around the RAID time. So we'll bring that up now and you can kind of give us the insights here. Yeah, so you can see that we've been able to reduce the rebuild times. And, you know, how do we do that? Well, I can tell you, me and my fellow architects, we have been spending the last, probably the last two years, focusing on trying to improve the rebuild. So we, you know, it's not just rebuilding faster. It's also how to eloquently handle all the host operations. You can't just tell them, sorry, I'm busy doing rebuilds. You've got to be able to handle that because business continuity is a very critical component of that. So we do that through mirroring and preparatory data layouts. And so the rebuild times, if you can, if you can do a really good balance of making sure that you are supplying a sufficient host IO that we actually very quickly in the background as soon as we have a moment, we start implementing those rebuilds, you know, during those, those law periods. And so making sure that we do aggressive rebuilds while allowing those business operations to continue have always been a real critical part, but we've been working on that a lot over the last couple of generations. That said, we always tell our customers always have a backup. That's a critical part to business continuity plans. Great, I wonder if we can come back to the components inside the system. How does what Broadcom is supplying to Dell in these servers contribute to these performance results specifically, Kim? Okay, so specifically, we provide the per storage controller. And so the Dell R740 XD actually has their series 10 H740 P controller, whereas the H, the R750 has the generation 11 per 11 H755N. So we own those, you know, in terms of trying to make sure that they are integrated properly into the system provided the highest possible performance. But not just the storage controller, I want to make sure that everybody knows that we also have our Broadcom Net Extreme E-Series. These are Gen4 PCIe, a 25 gig duported ethernet controller. So in a critical true deployment, it is a really important part of the e-commerce business solution. So we do own the storage for these as well as the networking. Excellent, okay, so we kind of, we went deep inside into the system, but let's up level it. Why does this matter to an organization? What's the business impact of all this tech coming to fruition? We, you know, as everybody always references, there's a massive growth of data and data is required for success. It doesn't matter if you're a Fortune 500 company or you're just a small to medium business, you know, that critical data needs protected and needs protected without the complexity or the overhead or the cost of such hyperconverged infrastructures or sand deployments. So we're able to do this on bare metal and it really helps with the TCO. So, you know, and the other thing is MVME right now is the fastest growing storage. MVME is so fast as well from a performance perspective as well. So that Dell R750 with the two Perk 11 controllers in it, it had over 51 terabytes of storage in a single server. You know, and that's pretty impressive but there's so many different performance advantages that the R750 provides for SQL servers as well. So they've got, you know, the Gen3 Intel Xeon scalable processors. We've got DDR4 3200 memory. You know, the faster memory is very critical for those in memory transactions as well. We have Gen4 PCIe. It really does justify an upgrade. And I could tell you, Dave, that a little over a year ago I had, you know, I had one of these Dell R750 servers sitting in my own house and I was testing it and I was just amazed at the performance. I was doing different TPCC and TPCH and TPCE tests on it. And I was telling Dell, wow, this is really, this is amazing. This server is doing so, so well. So I was so excited, could not wait to see it in print. So thank you to the prowess team for actually showing the world what these servers can do combined with the Broadcom storage. Now, speaking of the prowess team, when you read the white papers, it really is focused on the small and medium-sized business market. So people might be wondering, well, wait a minute. Well, why wouldn't folks just spin up this compute in the cloud? Why am I buying servers? Well, that's a really good question. You know, the studies have shown that the majority of workloads are still on-prem. And also, you know, there's a challenge here with the skill sets. There's a lack of developers for cloud, you know, cloud architects. So keeping these in-prem where you actually own it, it really does help keep costs down. And just the management of these R750s are fantastic and the support that Dell provides as well. Great. Kim, I love having you on and would like to have you back. We're going to leave it there for now, but thanks so much. I really appreciate your time. Thanks, Dave. So look, this is really helpful in understanding that at the end of the day, you still need microprocessors and memories and storage devices, controllers and interconnects. You know, we just saw Pat Gelsinger at the State of the Union Address nudging the federal government to support semiconductor manufacturing. And, you know, Intel's going to potentially match TSM's $100 billion CapEx commitment. And that's going to be a tailwind for the surrounding components, you know, including semiconductor, you know, component core infrastructure designers like Broadcom. Now, this is a topic that we care about. And like I said, Kim, we're going to have you back. And we plan to continue our coverage under the hood in the future. So thank you for watching this Cube Conversation. This is Dave Vellante and we'll see you next time.