 Hello everyone and welcome to this CUBE Conversation where we're going to go deep into system performance. We're here with an expert, Kim Lanar is the principal performance architect at Broadcom. Kim, great to see you. Thanks so much for coming on. Thanks so much, Dave. So you have a deep background in performance, performance assessment, benchmarking, modeling. Tell us a little bit about your background, your role. Well, thanks. So I've been a storage performance engineer and architect for about 22 years. And specifically I've been with Broadcom for I think next month is going to be my 14 year mark. So what I do there is I really, initially I built and I managed their international performance team. But about six years ago, I moved back into architecture. And my roles right now are as I generate performance projections for all of our next generation products. And then I also work on marketing material and I interface with a lot of the customers and debugging customer issues and looking at how our customers are actually using our storage. Great, now we have a graphic that we want to share talks to how storage has evolved over the past decade. So my question is what changes have you seen in storage and how has that impacted the way you approach benchmarking? And in this graphic, we got sort of big four items that impact performance, memory, processor, IO, pathways and the storage media itself. Walk us through this data, if you would. Sure, so what I put together is a little bit of what we've seen over the past 15 to 20 years. So I've been doing this for about 22 years and kind of going back and focusing a little bit on the storage, we look back at hard just, they ruled for nearly, they had almost 50 years of ruling and our first hard drive that came out back in the 1950s was only capable of five megabytes in capacity and one and a half IOs per second. It had almost a full second in terms of seek time. So we've come a long way since then, but when I first came on, we were looking at ultra 320 SCSI. And one of the biggest memories that I have of that was my office is located close to our tech support. And I could hear the first question was always, what's your termination like? And so we had some challenges with SCSI and then we moved on into SAS and SATA protocols. And we continued to move on. But right now, back in the early 2000s when I came on board, the best drives really could do maybe 400 IOs per second, maybe 250 megabytes per second with millisecond response times. And so when I was benchmarking way back when it was always like, well, IOps are IOps. We were always faster than what the drives could do. And that was just how it was. The drives were always the bottleneck in the system. And so things started changing though by the early 2000s, mid 2000s, we started seeing different technologies come out. We started seeing virtualization and multi-tenant infrastructure is becoming really popular. And then we had cloud computing that was well on the horizon. And so at this point, we're like, well, wait a minute, we really can't make processors that much faster. And so everybody got excited when Cloverton and the home came out when they had two cores per processor and four cores per processor. And so we saw a little time period where actually the processing capability kind of pulled ahead of everybody else. And memory was that if falling behind we had old DDR2667, it was new at the time but we only had maybe one or two memory channels per processor. And then in 2007, we saw disk capacity hit one terabyte. And we started seeing a little bit of an imbalance because we were seeing these drives are getting massive but their performance per drive was not really kind of keeping up. So now we see a revolution around 2010. And my coworker and I at the time, we had these little USB disks, if you recall, we will put them in, they were so fast. We were joking at the time, hey, you know what? Wonder if we could make a RAID array out of these little USB disks. They were just so fast. The idea was actually kind of crazy until we started seeing it actually happen. So in 2010, SSD started revolutionizing storage. And the first SSDs that we really worked with are these Pliant LS300s. And they were amazing because they were so over provision that they had almost the same read and write performance. But to go from a drive that could do maybe 400 IOs per second to a drive that could do 40,000 plus IOs per second really changed our thought process about how our storage controller could actually try and keep up with the rest of the system. So we started falling behind, right? That was a big challenge for us. And then in 2014, NVMe came around as well. So now we've got these drives, they're 30 terabytes. They can do one and a half million IOs per second and over 6,000 megabytes per second, but they were expensive. So people started relegating SSDs more towards tiered storage or cash. And as the prices of these drives kind of came down, they became a lot more mainstream. And then the memory channel started picking up and they started doubling every few years. And we're looking now at DVR54800. And now we're looking at cores that used to go from two to four cores per processor up to 48 with some of the latest different processes that are out there. So our ability to consume the computing and the storage resources, it's astounding. It's like that whole saying build it and they will come because I'm always amazed. I'm like, how are we gonna possibly utilize all this memory bandwidth? How are we gonna utilize all these cores? But we do. And the trick to this is having just a balanced infrastructure is really critical because if you have a performance mismatch between your server and your storage, you really lose a lot of productivity and it does impact your revenue. So that's such a key point, Patrick. But you're not slide up again with the four points. And that last point that you made, Kim, about balance. And so here you have these electronic speeds with memory and IO. And then you've got the spinning disk, this mechanical disk. You mentioned that SSD kind of changed the game. It used to be, when I looked at benchmarks, it was always the, you know, the D stage bandwidth of the cache out to the spinning disk was always the bottleneck and you know, you go back to the days of, you know, it's symmetrics, right? The huge back end disk bandwidth was how they dealt with that but you're taught. And then you had things, the oxymoron of the day was high spin speed disks of a high performance disk, right? Compared to memories. And so the next chart that we have is show some really amazing performance increases over the years. And so you see these bars on the left hand side, it looks at historical performance for 4K random IOPS and on the right hand side, it's the storage controller performance for sequential bandwidth from 2008 to 2022. That 22 is that yellow line. And just it's astounding the increases. I wonder if you could tell us what we're looking at here. When did SSD come in and how did that affect your thinking? So I remember back in 2007, we were kind of on the precipice of SSDs. We saw it, the writing was on the wall. We had our first three gig sass and SATA capable HBAs that had come out. And it was a shock because we're like, wow, we're going to really quickly become the bottleneck once this becomes more mainstream. And you're so right though about people working, building these massive hard drive base back ends in order to handle kind of that tiered architecture that we were seeing that back in the early 2010s kind of when the pricing was just so sky high. And I remember looking at our sass controllers our very first one, and that was when I first came in at 2007, we had just launched our first sass controller. We're so proud of ourselves. And I started going, how many IOPS can this thing even handle? We didn't have, we couldn't even attach enough drives to figure it out. So what we would do was we'd do these little tricks where we would do a 512 byte read and we would do it on a 4K boundary so that it was actually reading sequentially from the disk but we were handling these discrete IOPS. So we were like, oh, we could do around 35,000. Well, that's just not going to hit it anymore. Bandwidth wise, we were doing great. Really our limitation in our bottleneck on bandwidth was always either the host or the backend. So on our controllers, basically there were three bottlenecks for our storage controllers. The first one is the bottleneck from the host to the controller. So that is typically a PCIe connection. And then there's another bottleneck on the controller to the disk. And that's really the number of ports that we have. And then the third one is the disks themselves. So in typical storage, that's what we look at. And we say, well, how do we improve this? So some of these are just kind of evolutionary, such as PCIe generations. And we're going to talk a little bit about that. But some of them are really revolutionary. And those are some of the things that we've been doing over the last five or six years to try and make sure that we are no longer the bottleneck and we can enable these really, really fast drives. So can I ask you a question? I'm sorry to interrupt you, but on these blue bars here. So are these all spinning disks? I presume out years they're not. Like when did flash come into these blue bars? Was that, you said 27, you started looking at it, but on these benchmarks, is it all spinning disk? Is it all flash? How should we interpret that? No, no, initially there were actually all hard drives. And the way that we would identify the Max IS would be by doing very small sequential reads to these hard drives. We just didn't have SSDs at that point. And then somewhere around 2010 is where we, so very early in that chart, we were able to start incorporating SSD technology into our benchmarking. And so what you're looking at here is really the Max that our controller is capable of. So we would throw as many drives as we could and do what we needed to do in order to just make sure our controller was the bottleneck and what can we expose? So the drive, then when SSD came in was no longer the bottleneck. So you guys had to sort of invent and rethink your innovation and your technology because I mean, these are astounding increases in performance. I mean, I think in the left-hand side, we've built this out pad. You had 170X increase for the 4K random IOPS and you got a 20X increase for the sequential bandwidth. How were you able to achieve that level of performance over time? Well, in terms of the sequential bandwidth, really those come naturally by increases in the PCIe or the SAS generation. So we just make sure we stay out of the way and we enable that bandwidth. But the IOPS, that's where it got really, really tricky. So we had to start thinking about different things. So first of all, we started optimizing all of our pathways, all of our IO management. We increased the processing capabilities on our IO controllers. We added more on-chip memory. We started putting in IO accelerators, these harbor accelerators. We put in SAS core kind of enhancements. We even went and improved our driver to make sure that our driver was as thin as possible so we can make sure that we can enable all the IOPS on systems. But a big thing happened a couple of generations ago was we started introducing something called tri-capable controllers, which means that you could attach MVME, you could attach SAS, or you could attach SATA. So you could have this really amazing deployment of a storage infrastructure based around your customized needs and your cost requirements by using one controller. Yeah, so anybody who's ever been to a trade show where they were displaying a glass case with a Winchester disk drive, for example, you see it spinning and it's actuators moving. You go, wow, that's so fast. Well, no, that's like a tortoise, well, it's slower, it's like a snail compared to the systems. So in way, life was easy back in those days because when you did a right to a disk, you had plenty of time to do stuff, right? And now it's changed. And so I want to talk about gen three versus gen four and how all this relates to what's new in gen four and the impacts of PCIe. You have a chart here that you shared with us that talks to that. I wonder if you could elaborate on that, Kim. Sure, but first, you know, you said something that kind of hit my funny bone there. And I remember I made a visit once about 15 or 20 years ago to IBM and this gentleman actually had one of those old ones in his office and he referred to them as disk files. And until the day he retired, he'd never stopped calling them disk files. And it's kind of funny to be a part of that history. Yeah, yeah, DASD, they used to call it. SD DASD, I used to see it all kind of thing, you know. Oh, you don't know what it was like back then. But yeah, but nowadays we've got it quite easily enabled because back then, you know, we had SD DASD, all that. And then, you know, ATA and then SCSI. Well, now we've got PCIe. And what's fabulous about PCIe is that it just has a, the generations are already planned out. It's incredible. You know, we're looking at right now, GEN3 moving to GEN4. And that's a lot about what we're going to be talking about. And that's what we're trying to test out. What is GEN4 PCIe going to buy us? And it really is, it's fantastic. PCIe came around about 18 years ago and Broadcom is, we do participate and contribute to the PCIe SIG, which is who develops the standards for PCIe. But the host and both our host interface and our MVME disk utilizes standards. So this is really, really a big deal, really critical for us. But if you take a look here, you can see that in terms of the capabilities of it, it's really is buying us a lot. So most of our drives right now, MVME drives, tend to be buy four. And a lot of people will connect them. And what that means is it's four lanes of MVME. And a lot of people that will connect them either at buy one or buy two, kind of depending what their storage infrastructure will allow. But the majority of them you could buy are there. So as you can see right now, we've gone from eight gigatransfers per second to 16 gigatransfers per second. And what that means is for a buy four, we're going from one drive being able to do 4,000 to do an almost 8,000 megabytes per second. And in terms of those 4K IOPS that really evade us, they're really, really tough sometimes to squeeze out of these drives. But now we've got one million all the way to two million. It's just, it's insane, just the increase in performance. And there's a lot of other standards that are gonna be sitting on top of PCIe. So it's not going away anytime soon. We've got open standards like CXL and things like that. But we also got graphics cards. You've got all of your host connections that are also sitting on PCIe. So it's fantastic, it's backwards, it's forwards compatible and it really is going to be our future. So this is all well and good. And I think I really believe that a lot of times in our industry, the challenges in the plumbing are underappreciated. But let's make it real for the audience because we have all these new workloads coming out, AI, heavily data oriented. So I want to get your thoughts on what types of workloads are going to benefit from Gen 4 performance increases. In other words, what does it mean for application performance? You shared a chart that lists some of the key workloads and I wonder if we could go through those. Yeah, I put a large list of different workloads that are able to consume large amounts of data, whether or not it's in small or large kind of bytes of data. But as you know right now, and I said earlier, our ability to consume these compute and storage resources is amazing. So you build it and we'll use it. And the world's data, we're expected to grow 61% to 175 Zettabytes by the year 2025, according to IDC. So that's just a lot of data to manage. It's a lot of data, it's something that's sitting around. But to be useful, you have to actually be able to access it and that's kind of where we come in. So who is accessing it, what kind of applications? I spent a lot of time trying to understand that. And recently I attended a virtual conference, SDC. And what I like to do when I attend these conferences is to try to figure out what the buzzwords are. What's everybody talking about? Because every year it's a little bit different. But this year was edge, edge everything. And so I kind of put edge on there first. And even you can ask anybody what's edge computing and it's gonna mean a lot of different things but basically it's all the computing outside of the cloud that's happening typically at the edge of the network. So it tends to encompass a lot of real-time processing on those instant data. So in the data is usually coming from either users or different sensors. It's that last mile. It's where we kind of put a lot of our content caching. And I uncovered some interesting stuff when I was attending this virtual conference and they say only about 25% of all the usable data will actually even reach the data center. The rest is ephemeral and it's localized locally and in real-time. So what it does is in the goal of edge computing is to try and reduce bandwidth costs for these kind of IoT devices that go over long distance. So but the reality is the growth of real-time applications that require these kind of local processing are gonna drive this technology forward over the coming years. So Dave, you're toastering your dishwasher, their IoT edge devices probably in the next year if they're not already. So edge is a really big one. It consumes a lot of the data. Well, the buzzword du jour now is the metaverse. It's almost like the movie The Matrix is gonna come real-time. But the fact is it's all this data, a lot of video. Some of the ones that I would call out here, you mentioned facial recognition, real-time analytics. The edge is, a lot of the edge is gonna be real-time inferencing, applying AI. And these are just massive data sets that you again, you and of course your customers are enabling. When we first came out with our very first Gen3 product, our marketing team actually asked me, hey, how can we show users how they can consume this? So I actually set up a Hadoop environment. I decided I'm gonna learn how to do this. I set up this massive environment with Hadoop and at the time they called big data the three Vs. I don't know if you remember these big three Vs, the volume velocity and variety. Well, Dave, did you know there are now 10 Vs? So besides those three, we got velocity, we got value, we got variability, validity, vulnerability, volatility, visualization. So I'm thinking we need just to add another V to that. Yeah, well, and you know, that's interesting. You mentioned that and that sort of came out of the big data world, the Hadoop world, which was very centralized. You're seeing the cloud is expanding, the world's getting, you know, data is by its very nature decentralized. And so you've got to have the ability to do analysis in place. A lot of the edge of analytics are going to be done in real time. Yeah, sure. Some of it's going to go back in the cloud for detail modeling, but we are the next decade, Kim, ain't going to be like the last, I often say. I'll give you the last word. I mean, how do you see this sort of evolving? Who's going to be adopting this stuff? What's to give us a sort of a timeframe for this kind of rollout in your world? Oh, in terms of the timeframe, I mean, really nobody knows, but we've got Gen 5 that is, it's coming out next year. We, you know, it may not be a full rollout, but we're going to start seeing Gen 5 devices and Gen 5 infrastructures being built out over the next year. And then follow very, very, very quickly by Gen 6. And so what we're seeing though, is we're starting to see these graphics processors, these GPUs and coming out as well that are going to be connecting using PCIe interfaces as well. So being able to access lots and lots and lots of data locally, it's going to be a really, really big deal in order, because worldwide, all of our companies, they're using business analytics, data is money. And the person that actually can improve their operational efficiency, bolster those sales and increase your customer satisfaction, those are the companies that are going to go on to win. And those are the companies that are going to be able to effectively store, retrieve, and analyze all the data that they are collecting over the years. And that requires an abundance of data. Data is money. And again, it's interesting, it kind of all goes back to when Steve Jobs decided to put flash inside of an iPhone and the industry exploded, consumer economics kicked in, 5G now, Edge, AI, a lot of the things you talked about, GPUs, the neural processing unit, it's all going to be coming together in this decade, very exciting. Kim, thanks so much for sharing this data and your perspectives. I'd love to have you back when you got some, you know, new perspectives, new benchmark data. Let's do that, okay? I look forward to it, thanks so much. You're very welcome. And thank you for watching this CUBE Conversation. This is Dave Vellante and we'll see you next time.