 Good morning, hardware nerds, and welcome back to Denver, Colorado. We're here at Supercomputing 2023. My name is Savannah Peterson, joined by my co-host, David Nicholson. Good morning. You're looking sharp today. Good morning, oh thank you. Grow out the hounds tooth. Thank you, yeah, yeah. I don't know how it affects the cameras, but don't care. You're bringing a rebellious attitude today. I'm pumped for our conversation. We have some CUBE veterans with us and some fabulous smiling faces. Kimberly and Steen, thank you both so much for being here. How's the show going? So far, so good. Really great. It's huge. My watch is so proud of me for getting so many steps. We're all so proud of you, Kimberly. Great job. Way to prioritize your health. What about you, Steen? What's it like being here? Well, I'm just happy. Like as you said, these hardware nerds finally have a software workload that takes full, you know, the base of their hardware, right? So true. So, you know, the step up hardware team, you know. Exactly. Very hard right now. It's a great moment. No, I feel like hardware is having its moment in the sun in a really great way. Kimberly, when we were chatting before the cameras were on, I loved your insights. And so I want to drive in here. How do you define high performance computing? We're talking about old time on the show. The theme is IMHPC here, obviously a theme, but we don't actually define it. So how do you define it? High performance computing is being able to come up with the results that you need using the software and the hardware that you have available, really. So to me, it's all driving an answer. It's all going to, are you able to develop the technology and results that you need today? So that's, to me, that's kind of what it is. You know, I don't really define it by bits of hardware and bits of software, but the end results. You mentioned that you'd had a couple of conversations on the floor with a slightly varying scope of what that actually means right now, which I thought was fascinating. Last time we chatted was at supercomputing in Dallas, at least you and I. In the last four months, your job has shifted scope, not surprisingly, to AI. Tell us a little bit about what you're working on. So we actually have been working on AI ML. I know that that's not surprising, but I've spent the last 20 plus years doing storage performance. So this has been a really big shift in kind of responsibility. And trying to come up to speed now. I've always really enjoyed the AI ML world and I've always tried to dabble into it a little bit, but now I actually have the resources available to me. Just like Steen was saying there. Yeah, yeah, that's great. Keep going, I don't want to end up doing it. So anyhow, so what we're trying to do, so as you know, I work for Broadcom and we kind of own a lot of the connectivity within the server. And that's a big part of what we provide. We provide the network, all the NIC cars and we have storage cards and we also have PCIe switches, which is a big thing right now to allow you to kind of connect all this stuff. So my job is I'm actually trying to look at what the workloads look like. So we actually have Lama models that we're not training them yet, but we are running inference on very large Lama models, the 70B ones. We're working a lot with ML Perf trying to drive these workloads and then we have the tools to kind of analyze what that workload looks like from the switch and from the network and from the storage. So you use the term connectivity? Yeah. Just to be clear for folks who don't maybe understand how super computing, high performance computing works. Typically, when you're deploying at any scale at all, you don't just have one box, right? I know. You have many of them and they have to be connected to one another and sometimes we get caught up in the hype around, well, what GPU, CPU, TPU, do we have to have the latest and greatest hardware in this box? Well, hold on a second. If those boxes can't talk to one another, fundamentally, you're going nowhere. Is that a fair? That's absolutely fair because a lot of people are deploying these in large clusters, and they have all these resources across the cluster. These things, they need to talk to each other. When you say large clusters, like three, four? No, I mean, we're actually talking to a company right now that is building a 500,000 node cluster. So, and they go much bigger than that. So, you know, but to me, just getting two nodes, you know, working within a cluster was a little bit of a challenge, but now we actually have much larger clusters going within our own labs here at Broadcom. So, on that point, I want to transition to Steam for a second here with Scalers AI. So, 500,000 nodes in a cluster, would your council be the best way to deploy that is to find the absolutely most expensive hardware solution possible? Or is there, or are there a variety of ways in your experience that these things can be deployed? But first, tell us a little about Scalers AI. Yeah, well, not you let me. You're layering them up there on the question, David. That's a nested layer question. Question 3A. Yeah, yeah, yeah. First, you made me salivate, you know, over that massive cluster. So, Scalers AI is an enterprise AI software company, and we're really focused on, you know, making custom large language models for enterprises so they can transform their business. And we're really focused on industries like insurance, where they have like outcomes, where they have regulatory compliance. Reporting compliance, and we can really take the AI up to that industry specific expertise, and help them not only, you know, solve a bottom line productivity problem, but we're focused on top line, you know, driving top line results for companies in those industries. And of course, like, you know, we don't do just language stuff, so we like to have a little fun and go like multimodal with our work. And so we also work with voice and vision models as well. And it's just, I mean, it's an incredible time to be an enterprise AI because we can help, you know, enterprise decision makers transform their business. And, you know, the fear I would have if I was running a company, you know, in a lot of these legacy spaces right now, or just great businesses, is, you know, is my competition going to set up and take advantage of this incredible new infrastructure and capability in the market? Or am I going to step up and take advantage? And so we want to help those thought leaders in the industry step up and deliver outcomes for their business. And that's what we're focused on. And part of the way we do that, you know, is we work with incredible companies like Broadcom and PowerEdge, and we beg for massive clusters. But, you know, you don't want to go just, you know, wait for expensive hardware or wait. You know, we have so much great capability today and we all want the XE9680 from Dow, you know, with a bunch of Broadcom, Ethernet in it, you know, ready to go. But we can't always get that all the time. We want to make use of all of our infrastructure. And so, you know, we just talked about like running inference, you know, on Broadcom, Ethernet. What we've done is we've actually done fine tuning of a model. And so like, particularly what we've done is we've taken like a medical-based dataset, this PubMed dataset, open public dataset with tens of millions of clinical data points in medical, and we trained that on top of a Llama II model. And we did that in a distributed cluster, you know, leveraging the networking capability there and heterogeneous PowerEdge servers as well. So, we had a little H100, we had a little A100, we had a 9680, we had its legacy, you know, 8545, we had a 76XDA, and then we had a bunch of Broadcom, Ethernet to bring it all together. And then we distribute the model weights throughout that cluster, and you know, they communicate over the Ethernet. You'll see the Ethernet spiking as you communicate those weights and those gradient updates for those weights as you fine-tune the model. And then you can take a model like Llama II, which is fantastic, and turn it into a medical expert as well, and help transform that specific business. And so, yeah, just fantastic opportunity in the market right now. Super exciting. Yeah. I got a follow-up question for you, using that example of the medical data, which I think there's going to be so many cool possibilities with that. How long does it take you to do that? I mean, you know. To fine-tune that model with that customer? Yeah, I mean, fine-tune that model, you can do it in a matter of days. Now, you want to, you know, really think about the accuracy and the data curation process. I mean, you know, in setting up the cluster itself. To be clear, we want a model that doesn't kill patients. Does that help you? Yeah, yeah. And I think what's most important is like, when you talk about the medical field, we always talk about human in the loop and medical professional in the loop. And what we're trying to do is increase patient outcomes by keeping the medical professional in the loop. And so, you know, these are just tools to advance their services and capability, but I will tell you that the honest truth is setting up that cluster takes more time than training that model. Because, you know, there's incredible software and tools and hardware, but it does take some expertise to go run your own infrastructure. And, you know, what's cool about the work we've done with Broadcom and the Dell teams is we're actually making that sample code and that step-by-step guide available to Dell and Broadcom partners. So they don't have to go through the pain and suffering we did. They can just focus on taking that industry-specific data set and building a custom model and transforming their business. Stan, we appreciate you taking one for the team. Yeah. For all of us. Kimberly, you got to have one of the coolest jobs because you get to see across the sector what the biggest and most exciting players in this game are doing. What has you most excited, not just as a leader at Broadcom, but as a human? What application in this space is making you smile the most? Yeah, because one of the... So, he trained his large language model, you know, how to do something very useful. You know what we trained ours to do? Talk like Wiley Coyote. Who has to say that to not use it? That is out of value somewhere. So ours answers every question as though it was Wiley Coyote answering it. That is so not the answer that I was expecting and I am glad I have... Oh my God! Exactly, right? So what has me most excited about this? Well, you know, I've come from the storage world and you know, I've been doing it for so long storage performance that moving into something new has just been so exciting for me and just being able to see the whole thing. Instead of my own little storage part of the world, but in terms of what has me most excited, I did not realize until I stepped into this just how amazing these GPUs are. Just the kind of work that they can do, the kind of problems that they can solve and I find myself actually dreaming about all the things that I can ask this GPU to do, all the different kinds of models and the way that you can train them. And I'm really excited about the future. I think that we're going to be able to solve so many problems, but you have to have the hardware there. And so from my perspective, we are working on accelerating even Gen 6 PCIe and accelerating up to Gen 7 PCIe, we want to make sure that we have those pipes in that bandwidth available. Not only that, but our storage products are amazing. The performance on these, so these data lakes that are taking all this data and ingest are doing it at a phenomenal rate. And so for me, I love just making sure that everything is working, all the pipes are there, the performance is there to enable this kind of stuff for the future. Let's double-click on that. What is the state of the art now? When you talk about connectivity, what are, I mean, feel free to talk in speeds and feeds, it gets me going. You're a fan of the off-over. Yeah, yeah. A little toasting. The vapors. But no, but what is the state of the art in terms of connectivity and in terms of PCIe? What are we looking at now? Well, as you probably know last year, Gen 5 PCIe came out in full force, but we're also looking here in the very near future at Gen 6 PCIe speeds. So going from 32 to 64 gigabytes per second, or megatransfers per second, is just incredible. And then we're going to be, and as I said, we are working on accelerating that to get us to Gen 7 speeds even. So the bandwidth that is going to be available to people for this connectivity, to connect all these GPUs, we're going to provide that. In terms of network, right now, we're at 200 gig NICs, so we provide, and he's been testing them as well, and he's able to push, it's a 200 gig, but I believe you're using, yeah, they're separated at a 100 gig. If we had a few more, we could max out 200 gig. Yeah, yeah. Can you get more? Yeah, yeah. But just to compliment what you're saying there though, it's like, I mean, what you want to do is you don't want to be network, or storage constraints, right? Because the whole industry is talking about the compute capacity, the GPUs that we all want and desire, but the biggest mistake you can make is not set up your cluster with the right networking capacity and this right storage capacity, because then your high value compute capacity is now bottlenecked. And so that's why you want to use the best, most available networking and storage when you build out these clusters, they're very important. And the things that networking is really important in doing distributed training or fine tuning of large language models, is you got to use some of the capabilities at Broadcom, if a net has GPU direct technology. In that case, we're bypassing the CPU, we love CPUs, we run a lot of AI and CPUs, but we're just running right by them and using that GPU so we don't have a bottleneck there as well. So just eliminating all those bottlenecks is really important throughout the process, especially as you put together those high value assets. So that process of eliminating bottlenecks is a bit of a whack-a-mole process through our, over our lives, right? I'm obviously much, much younger than you guys, but you know, over the five years that I've spent in this- You're spicy today. You're spicy. In this business, you got the lighting right? He's got a full head of hair. I mean, you can go with it. Actually, I do not. No? Okay. But where are the bottlenecks now and are the bottlenecks in any of the hardware area at all? Or are the bottlenecks in terms of expertise and the ability to execute? I'm hearing people say that it's cleansing data. It's data hygiene that's the bottleneck and then the fear that in place of data hygiene, synthetic data is generated and that that leads to other problems, but not trying to lead the witnesses here. Where would you say the bottlenecks are? What are the issues? Okay, so I'd love to take a first run at this, but I know that he has a lot of experience with it too. But one of the things that I have learned is that there are multiple processes to this, right? So you have to bring the data in. So that has its own bottleneck. You got to make sure the storage is fast enough. The networking's fast enough that you have enough capacity to handle all this. And then you got all the data munging that you have to do. And that was, for me, that was the most painful part of the process. It took so much memory to do that. It took so much bandwidth from the storage to do that. So that is kind of where that bottleneck became. And then, in the CPU, and then we moved over to doing the training and then, at that point, the GPU became the bottleneck. And then when we moved to inference, it depended on how we ran it. If we ran it using a GPU, we were able to run that GPU, multiple GPUs, at 100%. And with the CPU, we were able to max that out as well. So really, the bottleneck changes depending on which part of the process that you're really focused on at any given time. I think some of the winners are going to be the ones who can anticipate what those bottlenecks are, like both of you are doing and in your collaboration being able to tackle those challenges as they come through. Last question for you as we wrap up since you're Cube regulars. What do you hope that you can say next time you're sitting next to me and Captain Spicy that you can't currently say? Steve, let's start with you. I think just riffing off the bottleneck conversation, I mean, there's not a problem that the industry can't solve. I think the industry's really stepped up from compute network and storage capacity, and the software stack around AI is fantastic. And there's amazing contributions to open source. And there are real challenge around data curation and the data rights associated with that. But the area that I'm seeing what I want to see change is I think line of business decision makers are still hesitant to transform their business right now. And I think the amount of change and confusion in the market and complexity of what's happening is slowing them down for making really good decisions around how they can transform their business and implement these technologies today and not tomorrow. And so I'd like to see a little bit more leadership from those line of business decision makers on implementing this technology. And of course as an ecosystem in the industry, we've got to step up and provide them clarity with stable tools and support that gives them the confidence to run their critical applications with language models, generative AI and other complimentary AI tools. Well, David, we look forward to hearing about it. Kimberly, what about you? So right now as I'm still kind of in the learning process, the complexity of putting this together, and as he mentioned earlier, you spend, he spent more time putting a cluster together than he did training a model. A very complex model. I think that's a remarkable point. I mean, that's how you know we're out of real inflection juncture. We really, really are. I'm going to go visit the Slurm booth today and see if I can't get a block of instruction from them. But he's been using Kubernetes and Slurm. So the orchestration management and the container management process has really been a big steep learning curve for me. So I'd like to see, I'd like to see a little bit, maybe better documentation or more ease of use around trying to implement this so it could be a lot easier. And then when it is easier, that's when you're going to see the technology adoption I think that he's talking about too. A lot of people are still kind of scared of it. You know, it's just big unknown. We know it's powerful, but you know, some people are just kind of afraid to open up that curtain and look at the yaws. Yeah, I have to say, what's powerful to me, knowing you, we've known each other for a while, Kimberly, knowing how much of a technical rockstar you are, coming into something that is new for you, but with the depth of knowledge that you have, hearing you be so honest about how, hey, this is, you give an amazing assessment of the current landscape that I think people should pay attention to. I totally agree with that great point. And you know, Kimberly, like you said, we're all still learning, just like the LLMs. Steen, thank you so much. Kimberly, always a pleasure to have you both on the show. Can't wait to have you back for more salivating over clusters for teaching AI how to sound like Wiley Coyote and David. Thank you for being your spicy self sitting next to me all morning. Can't wait to see what comes next here, live on theCUBE in Denver, Colorado, where it's supercomputing 2020. My name is Savannah Peterson. Thank you for tuning in to the leading source for emerging tech news.