 Welcome back to theCUBE's coverage of ISC High Performance 2023 where we're covering all things HPC. We've been reporting for years that AI and HPC are coming together in a big way. And we've seen that accelerate in 2023. Organizations are trying to figure out how to apply the potential of foundation models like GPT to make them more productive. The question is, how do they do it? How do they make it work for their business? And in this segment, we're going to dig into the how-to's detailing recent advancements from the industry and specifically from Dell. And how to implement generative AI for your business. We're going to double click on AI and maybe even touch on security. And with me is Ben Fawber, who's PhD and senior AI research scientist at Dell Technologies. Ben, good to see you. Thanks for coming on. Great to see you too. Thank you for having me on. You're welcome. All right, first question. I had this conversation the other day with an AI expert. Is chat GPT AI in your view? And is it representative of what generative AI offers? Or is there more to it? Yeah, I think that's a great question. I think it's representative of a very impressive tool in artificial intelligence right now. It falls into a subfield of generative AI. There's a broader field that does other things, but in a space of generative AI where the goal is to generate new content that's similar to the content that the model is trained on. I think it's a very impressive tool and it's specific right now to the language space, right? Text, human prose, things that we typically think of as language, falls in that space, although there are expanded capabilities with new iterations where it can ingest images and tell you a little bit about what's in an image. But there's another side of it out there, the generative AI story that has to do with image generation. Maybe you've seen these tools where you just put in a few lines of text and you can create hundreds, if not thousands of images that are all different from each other based on just a few lines of text. There's also the voice synthesis side of things where you can take a recording of someone's voice and just for a few seconds of another person's voice, you can impart that style on the longer recording of the first person. And in some cases, you can just type in text and it will generate that speech for you. So the so-called text to speech space. So there's a lot of other sub facets that maybe the public hasn't interacted with as much, but just as much impressive, just different domains. Right, and it didn't happen overnight. There's been folks like yourself working on this for years and years. This individual that I talked to very deep into AI, deep into a lot of government projects and so a super smart guy, but he kind of was poo-pooing chat GPT and he really tried to simplify it. It's far from me to be able to debate a brain like that, but I'd love to get your opinion. He said, Dave, look at this, you're talking about a database, it's got a search engine on top and some NLP. And I was like, come on, it has to be more than that. No, can you explain how it actually works? Yeah, sure. So there is no database, there is no search engine. Everything that chat GPT generates is generated from the information it was trained on. So it was fed a bunch of information and a way to think of it is like a giant matrix of information, lots of weights and numbers in there and those numbers mean something to the deep learning network that sits beneath chat GPT. And as it's trained, those numbers are adjusted to better reflect the information it was provided. And the goal of all these large language models, whether it be chat GPT or other things is to generate text. So it's a pretty simple thing where I'm speaking right now and as I speak, you can probably guess what the next word I'm going to say is because you've heard people say things similar to me. It's no different than when you're using your phone to text and your phone is trying to guess your next word that you're going to say. That's actually a large language model sitting beneath your text application. And that's how these things like chat GPT work is they're just producing what they think is the highest probability next word but they're starting to do it because they've got so much information they've been trained on to create an entire paragraph of information or some cases the entire pages of information they've become so powerful. So people have this perception because what they're generating looks plausible or looks human-like that they are attached to the internet that they are attached to some database but it's actually just a standalone model. The only thing about them that's connected to the internet is the user interface that you're interacting with. So I wonder, what's impressive about it is the speed with which it returns answers and text and the high quality of it's like the Ivy League level pros. So my question to you, Ben, is like, what's the infrastructure behind that that enables this? You know, if you talk to anybody who's trying to buy GPUs right now, graphics processing units, they will tell you about the challenges that they're facing in buying GPUs. And it's because things like chat GPT require hundreds if not thousands of GPUs behind the scenes to facilitate that type of instantaneous response across multiple users, right? It's one point I think it was said that there had been over 100,000 people who used it within the first couple of days and then within like a couple of weeks, it was a million users and I don't know how many people are using it simultaneously. But you can imagine if running these models requires, let's say one model requires 16 GPUs to run it. If you're serving multiple users, it just starts to scale on a tremendous amount of compute required. So it's really high performance computing infrastructure just to run these models. We're not talking about training. This is something that's totally different about generative AI versus what folks typically think of as AI or machine learning where you use lots of compute power to train whatever model you're building. But then at inference time, you can do it on something sometimes as lightweight as your phone. In this case, you need lots of compute infrastructure just to run the model. For example, the language model that sits beneath the chat GPT or that sat beneath it, I should say, they have a new model now, but the one that sat beneath it back in November, just the model itself was about half a terabyte in size. So to run that in memory across multiple GPUs requires lots of high performance compute infrastructure and that's just for one instance of the model. And again, if you're serving multiple users, it just scales to hundreds if not thousands of GPUs running simultaneously in order to serve up the users with low latency. So I want to get into what Dell's doing here, but I'll have you. I have to pick your brains. Are you familiar with Eliza? The Eliza, they called it a chatterbox. It was developed in the 60s. You've heard of this? I'm not familiar with Eliza. I come from a scientific background, so I'm familiar of Western Blot and Eliza on the technology side. But yeah, I don't know about Eliza. So for our audience. Language model. And anybody can look it up on Wikipedia. They called it a chatterbox and it was developed in the 1960s and it was an NLP computer program in the mid-60s and I think it ran on like a might be a mainframe or something. It was developed by a guy named Joseph Weisenbaum to explore communication between humans and machines and it was basically a pattern matching system. And people, when they used it, they thought the machine had a motion. We know that these machines don't have a motion. I'm familiar with this story, yes. Yes. So, and I've had people tell me that chatGPT is just pattern matching, Dave. Don't think of it as more than that, but I'm inferring from you it's more than that. No, I don't feel like it's, I mean, it's not pattern matching. It is, what is the most likely word or the language community we call the token, which is usually a fragment of a word. What's the most likely thing to come next if we've said these things previously in this statement? It's just guessing with high probability the next most likely thing to come. And if it's been trained on lots of high quality data, it will produce some very high quality output. And it's quite convincing. Sometimes you think it's real. You think it has a motion, but all it's trying to do is with high probability generate what is the next most likely word or token. I say please and thank you to chatGPT. That's because, hey, you never know when the machines take over, you know, they'll remember me maybe. Okay, so. There's some people who are very worried about that. Yes. What are some practical examples of how businesses can use Gen AI to maybe improve business and even make or save money specifically? I mean, VCs are all thrilled now that evaluations are down and people in tech are getting laid off and they can start a company and get the MVP in two months with 50K. How real is that? And what can you actually do today with Generative AI? Yeah, so, you know, I think I'll start on the image side and then we can move to the language side. I think one of the neat things on the image generation side is that there's a lot of sort of quick start capabilities to these tools, right? So with a few lines of text, you can generate some speech, you can generate a song, you can generate images and things like that if you're in a creative space where there's low risk associated with maybe things being a little off or things, you know, not being entirely factually correct, but it's just to start the creative juices flowing. You know, these tools are very much in use right now by creatives across industries in the marketing space as well. Real estate, architects. I mean, I saw something from one of the leading architecture firms in the world that they estimate between 80 to 90% of their staff is using image generative tools on a daily basis. And it's not to generate the final product that they show the customer and kind of provide that walkthrough experience before a building is built or before a home is built, but it's more to get the process kicked off, right? To help their teams kind of bounce some ideas around. And I think that is the reduced barrier to entry in that space with image generation as well as audio generation. I think on the language side, there's a lot of different capabilities. And, you know, one thing to be clear about is that, you know, most of us don't have businesses that trade in images or audio or video every day. You know, a lot of our business tends to reside in text. And, you know, that can be verbal communication, written communication, things that companies put out as well as numbers, right? So if you just kind of step back and say what's the market landscape look like for most businesses, most businesses have numerical data and text data, but there's fewer businesses that really, you know, trade every day and image data. So, you know, the impact, I think, on the language side is much bigger. And there's lots of, you know, capabilities that these language models have. What I will caution people about is using language models to generate factual information. I think it's been well described in the press as well as the academic literature that these models tend to hallucinate, right? In the spirit of trying to generate the next most probable word in a sequence, it's just trying to do that. It's not necessarily trying to tell a lie or tell the truth. It's just trying to go toward that objective. And sometimes in doing so, it generates false information. It can be very convincingly written. You know, it can be very elegant in the way it's described to you. And if you're not an expert, you may not know that it's lying to you. It certainly doesn't know itself that it's lying to you. But that is a risk that is important to consider. So instead of relying on them for information, I think using them for their capabilities, and we can kind of go one by one through the capabilities is where I think the opportunities reside. But I think that's really where the power sits. And that's where a lot of companies that have started to deploy them today are using them for. Yeah, so true about the hallucinating. I was ego GP team one day. And I was at one time a writer for the Wall Street Journal, which of course was never, I think I was the founder of Computer World, which was not true, but it was fun to see that. But you gotta be careful. Yes. So how impactful do you think this is? I mean, the world's gone crazy over the past, you know, many months. You as a researcher in this space have known the capabilities for quite some time, but it just feels like, do you feel like we're at an inflection point in terms of productivity, new ways to work, new opportunities, new companies that can be started? Is it, are we living that now? I mean, I wish I were 25 again. I think it's an exciting time. You know, what's neat about what is happening today versus maybe a year ago, let's say, is that my parents, my grandparents, they're aware of what it is that I do every day because they've had an opportunity to interact with these tools via free, easy to use web interfaces. And these capabilities are new, right? Like the ability to interact with these language models in the past, you had to, you know, code it up from scratch and you had to have your own infrastructure to run it. And there were a lot of barriers to entry. And I think as of November of 2022, you know, that barrier was reduced by making chat GBT free and placed on an easy to use, easy to interact with website. And it really exposed where we are in the AI space, specifically generative AI space to the broader public. And I think it's opened a lot of people's imaginations as to how they can use these tools and where the opportunities reside. So, you know, I think it comes back to this notion that amazing advances are made not through people sitting back and thinking about what could be, but instead tinkering. And it's allowed people to tinker and play with it and really push the boundaries of what is possible, where it's limits sit and maybe how to improve upon that. You know, there's companies being started around these ideas and folks publishing things every day about what could be and what is. So it's really exciting time. I agree, I'm curious, what's Dell doing in this space? How are you helping people implement your customers, implement foundation models, large language models? Where's the juice? Yeah, you know, so these these topics have been hot conversation with every customer, regardless of industry, vertical, you know, we've had folks, the gamut from finance to, you know, K through 12 education, asking about how they can make use of tools like this, how they can implement their own custom language models within their organizations. And the motivations are varied. You know, a lot of them are really interested in the capabilities that these tools offer. You know, I think the text summarization tools and capabilities are really powerful, right? So if you think about healthcare, doctors have to look at lots of text and notes and try and summarize, why was this patient here six months ago? Why are they here today? What is the status of this patient? And the text capabilities of things like these language models can really help them. Now, one of the challenges, if you are, say, in healthcare is that you can't just feed your information to an open API like open AI's chat GPT portal because you're sending that information outside of your organization. There's certain regulatory restrictions around that. So those types of customers are very interested in building their own tools. So we're helping customers create those capabilities via our hardware as well as our services side of the house because we also recognize that not every customer has a dedicated team who has experience in this space and can just grab the hardware and run with it. They're gonna need some assistance along the way, building and standing up their own instance within their own infrastructure. But what that does provide them with is security, privacy, control and availability. So they own the entire pipeline and they don't have any of those security and privacy concerns that they might have if they're going through a third party. Well, we've seen in the news some companies have banned the use of chat GPT because the employee mistakenly put a bunch of code in there and it was proprietary IP and people generally, many people hadn't understood that but you've got to be super careful. What is, I mean, we talked a little bit earlier about GPUs, what's the relationship between generative AI in the context of HPC? Does it require a supercomputing system to run? And what's the relationship there, Ben? Yeah, so in my opinion, it really does. If you want to have low latency responsive system with multiple users, you really do need an HPC-like infrastructure. I gave the example of the model that sits beneath the first instance of chat GPT. That model was public in the sense that it was published on. It wasn't open source, but the information about it was published. The newest version that sits on any chat GPT, that is less disclosed because it is a hot commercial space, so there's less known about it. But what I can say about that model that was released back in November of 2022, that model, just the model itself requires around half a terabyte of memory to run the model. So if you start thinking about how do I do that across multiple GPUs, across multiple nodes in a system, because even if you have eight GPUs, that's not enough memory, so then you have to have multiple nodes. And most GPU nodes are two to four cards per node, some cases six to eight. But you can't host one of these things just on a single server under standard conditions. There are some tricks you can play, but in order to get that sort of responsiveness, it really does require HPC infrastructure in order to have the low latency and the rapid communication between the various nodes of your system. So it is a very interesting time where all of a sudden everybody's clamoring to get a hold of GPUs and low latency networking. And we just had all sorts of customer conversations around this, so it's exciting. Yeah, I mean, wow, infrastructure is a big tailwind you would think for infrastructure. I mean, think about when the internet started to take off, it was like when they needed routers and servers and then web 2.0 and social media, Facebook was built, big data centers and they were sucking up a bunch of infrastructure and the big data era, right? It was still going on, I guess. And now the cloud even, right? Sucked up a lot of infrastructure. Infrastructure just never goes away, does it? I want to ask you- It does not, it's changing a little bit though. How so? You'll add a little color there. Yeah, I think so one of the classic things that was sold in the HPC space were CPUs, right? And I think that's because the jobs that were running required CPUs. With generative AI, a lot of these methods and algorithms are built on the premise of GPUs. And some people may have a handful of GPUs in their infrastructure, some people may have none at all. And what went from, we've got a few researchers here and there making use of GPUs for training and we just use them for training some models. Now you have to use GPUs and many of them for inference of these large language models, just running these large language models. It's really changed the game. So the infrastructure is still required, it's just a different type of infrastructure and the GPU memory-intensive workloads are new to a lot of folks in the IT space. Yeah, and we always think about the processor as the sort of brains of the system. Do you buy this premise that increasingly the surrounding components are critical to not only system balance, but system performance? Whether it's IO or network interface, ethernet types of connections that those increasingly become, it becomes more of a connect-centric environment versus a processor-centric environment. Does that premise hold true to you? I think all of it starts to matter, right? I mean, the system's only as good as its weakest link. So I do think that what was maybe classically not that dependent upon connectivity starts to become more dependent upon connectivity because these various components do need to talk to each other. Now, as with anything, there's sort of diminishing returns in certain space. And the main key with a lot of these is that they're memory-intensive workloads. So if you don't have available memory and you don't have the overall orchestration in order to make the optimal use of the memory space you have, then it's very difficult to run some of these workloads, some of these very large language models. But you're right in that networking speeds and these are just networking speeds between the various nodes of the HPC system start to come into play. And that's why we see things like NVLink being very important for inter-node communication with multiple GPU cards per node. We've just released at Dell, we've just released our XC9680 PowerEdge server platform that has eight GPU cards in it. They can be either the NVIDIA A100 or NVIDIA H100 cards and those are NVLink and the performance is outstanding. Just yesterday, one of my benchmarking papers came out on that server and the results are just amazing. We've got some additional results come out where we're seeing 10-fold improvement over some benchmarks that were just run a few months ago on a cloud system that used A100 cards. And it's very interesting to see the incredible improvement performance just by changing not only the card but also the networking fabric within the node how the cards communicate with one another. How about security? We were at RSA last month of generative AI was of course the big topic of conversation. It's two-edge sword, right? On the one hand, hackers are going to use it to write better phishing, hackers. Oftentimes, these phishing emails, spelling errors, replete with spelling errors, et cetera. They can use that to clean it up. But the flip side is humans can use this to prioritize. What are your thoughts on that? I mean, quantum is another sort of security concern but potential defense against security hackers. What's your thoughts on security and AI? Yeah, I think security is important. It's an important consideration. I mean, you raise a very good point in that folks for every set of good people out there in some regards, there's somebody out there too that's thinking about how do I use this to defraud someone or take advantage of a situation. And it's very unfortunate. I find it very unfortunate that that happens. But as with anything, I think we need to keep our eyes and ears open. I joke with folks that if somebody ran up to you on the street and said, I'm an African queen or king and I've just been kicked out of my kingdom and I need your bank account number and $5 million. If you just give it to me now, I promise when I get back to my kingdom, I will wire you $20 million. Most people on the street will be like, this is preposterous, you're crazy, no way. But for some reason, 10, 15 years ago when that came via email, folks said, oh, okay, that makes sense. People were giving away their bank information over email. That same sort of sniff test needs to be for most in people's minds when they start to interact with information, especially information that comes from people that they don't know that well or news organizations that they maybe haven't heard of or have less trust in. So, I think that level of skepticism is always important, regardless of what medium we're using to interact with each other. And I was at a conference about a year ago and there was a business professor, she was new to the faculty there and her area of focus was fake news. And she said that one of the things that she's learned as she's been researching this is that the most sure source of information is the printed newspaper. And it's kind of interesting to think that we've gone so digital with everything that printed press, it still has the most human touch to it, right? More people see that and interact with it and fix it before it goes out, then anything digital, even if it is from a trusted news source, right? And there's all sorts of things that go on on the outside news sources or excuse me, web-based news sources, right? Because they're paying per click versus what shows up on your doorstep if you're paying for the printed papers. So it is interesting to think about the printed paper still being a more singular source of truth than even the website for the same newspaper. Fascinating conversation, Ben. We'll leave it there. I really appreciate you coming on today. Yeah, it's great. Thank you for having me. Really appreciate it. You bet. All right. And thank you for watching our continuous coverage of ISC 23 and the innovations in high performance computing. You're watching theCUBE, your leader in enterprise and emerging tech coverage.