 We are back, approaching the finish line here at Supercomputing22, our last interview of the day, our last interview of the show, and I have to say, Dave Nicholson, my co-host, my name is Paul Gillan. I've been attending trade shows for 40 years, Dave. I've never been to one like this. The type of people who are here, the type of problems they're solving, what they talk about. The trade shows are typically so, they're so speeds and feeds, they're so financial, they're so ROI. They all sound the same after a while. This is truly a different event. Did you get that sense? 100%. Now, I've been attending trade shows for 10 years, since I was 19, in other words. So I don't have necessarily your depth. No, but seriously, Paul, totally, completely, completely different than any other conference. First of all, there's the absolute allure of looking at the latest and greatest coolest stuff. I mean, when you have NASA lecturing on things, when you have Lawrence Livermore Labs, that we're going to be talking to here in a second, it's a completely different story. You have all of the academics, you have students who are in competition and also interviewing with organizations. It's just, it's phenomenal. I've been, I've had chills a lot this week. And I guess our last two guests sort of represent that cross-section. Armando Acosta, Director of HPC Solutions, High Performance Solutions at Dell. And Matt Leninger, who is the HPC Strategist at Lawrence Livermore National Laboratory. Now there is perhaps, I don't know, you can correct me on this, there's perhaps no institution in the world that uses more computing cycles than Lawrence Livermore National Laboratory. And it's always on the leading edge of what's going on in supercomputing. And so we want to talk to both of you about that. Thank you, thank you for joining us today. Thank you for having us. Let's start with you, Armando, HPC solution. Well, let's talk about the juxtaposition of the two of you. I would not have thought of LLNL as being a Dell reference account in the past. Tell us about the background of your relationship and what you're providing to the laboratory. Yeah, so we're really excited to be working with Lawrence Livermore, working with Matt. But actually this process started about two years ago. So we started looking at essentially what was coming down the pipeline. You know, what were the customer requirements? What did we need in order to make Matt successful? And so the beauty of this project is that we've been talking about this for two years and now it's finally coming to fruition and now we're actually delivering systems and delivering racks of systems. But what I really appreciate is Matt coming to us, us working together for two years and really trying to understand what are the requirements, what's the schedule, what do we need to hit in order to make them successful. At Lawrence Livermore, what drives your computing requirements, I guess? You're working on some very, very big problems but a lot of very complex problems. How do you decide what you need to procure to address them? Well, that's a difficult challenge. I mean, our mission is a national security mission dealing with making sure that we do our part to provide the high-performance computing capabilities to the US Department of Energy's National Nuclear Security Administration. We do that through the Advanced Simulation Computing Program. Its goal is to provide that computing power to make sure that the US nuclear rep of the stockpile is safe, secure, and effective. So how we go about doing that, there's a lot of work involved. We have multiple platform lines that we accomplish that goal with. One of them is the advanced technology systems. Those are the ones you've heard about a lot. They're pushing towards exit scale. The GPU technologies incorporated into those. We also have a second line, a platform line called the commodity technology systems. That's where right now we're partnering with Dell on the latest generation of those. Those systems are a little more conservative. They're right now CPU-only driven, but they're also intended to be the everyday workhorses. So those are the first systems our users get on. It's very easy for them to get their applications up and running. They're the first things they use usually on a day-to-day basis. They run a lot of small to medium-sized jobs that you need to do to figure out how to most effectively use what workloads you need to move to the even larger systems to accomplish our mission goals. They're the workhorses. What have you seen here these last few days of the show? What excites you? What are the most interesting things you've seen? There's all kinds of things that are interesting. Probably the most interesting ones I can't talk about in public, unfortunately, because of NDA agreements, of course, but it's always exciting to be here at Supercomputing. It's always exciting to see the products that we've been working with industry and co-designing with them on for several years before the public actually sees them. That's always an exciting part of the conference as well. Specifically with CTS-2, it's exciting. As was mentioned before, we've been working with Dell for nearly two years on this, but the systems first started being delivered this past August. And so we're just taking the initial deliveries of those. We've deployed roughly about 1,600 nodes now, but that'll ramp up to over 6,000 nodes over the next three or four months. So how does this work intersect with Sandhya and Los Alamos? Explain to us the relationship there. Right, so those three laboratories are the laboratories under the National Nuclear Security Administration. We partnered together on CTS. So the architecture is you were asking how do we define these things? It's the labs coming together, those three laboratories. We define what we need for that architecture. We have a joint procurement that is run out of Livermore, but then the systems are deployed at all three laboratories. And then they serve the programs that I mentioned for each laboratory as well. I've worked in this space for a very long time and I know I've worked with agencies where the closest I got to anything they were actually doing was the sort of guest suite outside the secure area. And sometimes there are challenges when you're communicating. It's like you have a partner like Dell who has all of these things to offer, all of these ideas. You have requirements, but maybe you can't share 100% of what you need to do. How do you navigate that? Who makes the decision about what can be revealed in these conversations? You talk about NDA in terms of what's been shared with you. You may be limited in terms of what you can share with vendors. Does that cause any efficiency? To some degree, I mean we do a good job within the NSSA of understanding what our applications need. And then mapping that to technical requirements that we can talk about with vendors. We also have kind of in between that, we've done this for many years, and a recent example is of course the Exascale Computing Program and some of the things it's doing, creating proxy apps or mini apps that are smaller versions of some of the things that we are important to us. Some application areas are important to us, hydrodynamics, material science, things like that. And so we can collaborate with vendors on those proxy apps to co-design systems and tweak the architectures. In fact, we've done a little bit that with CTS too. Not as much in CTS as maybe in the ATS platforms, but that kind of general idea of how we collaborate through these proxy applications is something we've used across platforms. Now is Dell one of your co-design partners? In CTS too, absolutely, yep. And what aspects of CTS too are you working on with Dell? Well the architecture itself was the first thing we worked with them on. We had a procurement come out, they bid an architecture on that. We had worked with them but previously on our requirements, understanding what our requirements are. But that architecture today is based on the fourth generation Intel Xeon that you've heard a lot about at the conference. We were one of the first customers to get those systems in. All the systems are interconnected together with the Cornellis Networks Omnipath Network that we've used before and are very excited about as well. And we build up from there. The systems get integrated in by the operations teams at the laboratory. They get integrated into our production, a computing environment. Dell is really responsible for designing these systems and delivering to the laboratories. The laboratories then work with Dell. We have a software stack that we provide on top of that called TOS for Trilab operating system. It's based on Red Hat Enterprise Lennox. But the goal there is that it allows us to write a common user environment, a common simulation environment across not only CTS too but maybe older systems we have and even the larger systems that we'll be deploying as well. So from a user perspective, they see a common user interface, a common environment across all the different platforms that they use at Livermore and the other laboratories. And Armando, what does Dell get out of the co-design arrangement with the lab? Well, we get to make sure that they're successful but the other big thing that we want to do is typically when you think about Dell and HPC a lot of people don't make that connection together. And so what we're trying to do is make sure that they know that, hey, whether you're a work group customer at the smallest or a super computer customer at the highest end, Dell wants to make sure that we have the right set of portfolio to match any needs across this. But when we were really excited about this, this is kind of our big, you know, big CTS too, first thing we've done together. And so, you know, hopefully this has been successful. We've made Matt happy and we look forward to the future of what we can do with bigger and bigger things. So would the labs be okay with Dell coming up with a marketing campaign that said something like, we can't confirm that alien technology is being reverse engineered? Yeah, I don't know why. I mean, that would be right, right? And I have to ask you the question directly and the way you can answer it is by smiling like you're thinking, what a stupid question. Are you reverse engineering alien technology at the labs? Yeah, you'd have to talk to the PR office. Oh, okay. That's a good answer. But it is fascinating because to a degree it's like you could say, yeah, we're working together but if you really want to dig into it, it's like, well, I kind of can't tell you exactly how some of this stuff is. Do you consider anything that you do from a technology perspective, not what you're doing with it, but the actual stack? Is there, are there proprietary, do you try to design proprietary things into the stack or do you say, no, no, no. We're going to go with standards and then what we do with it is proprietary and secret. Yeah, it's more the latter. Is that the latter? Yeah, yeah, yeah. So you're not going to try to reverse engineer the industry? No, no, we want the solutions that we develop to enhance the industry, to be able to apply to a broader market so that we can gain from the volume, the volume of that market, the lower costs that they would enable, right? If we go off and develop more and more customized solutions that can be extraordinarily expensive. Sure. And so we're really looking to leverage the wider market but do what we can to influence that to develop key technologies that we and others need that can enable us in a high-performance computing space. We were talking with Satish Iyer from Dell earlier about reference to validated designs. Dell's reference designs for pharma and for manufacturing in HPC. Are you seeing the HPC Armando and is coming together traditionally and more of an academic research discipline begin to come together with commercial applications and are these two markets beginning to blend? Yeah, I mean, so here's what's happening is you have this convergence of HPC, AI and data and leaks. And so when you have that combination of those three workloads, they're applicable across many vertical markets, right? Whether it's financial services, whether it's life science, government and research. But what's interesting and Matt won't brag about but a lot of stuff that happens in the DOE labs trickles down to the enterprise space, trickles down to the commercial space because these guys know how to do it at scale. They know how to do it efficiently and they know how to hit the mark. And so a lot of customers say, hey, we want what CTS2 does, right? And so it's very interesting. The way I love it is their process, the way they do the RFP process. Matt talked about the benchmarks and helping us understand, hey, here's kind of the mark you have to hit. And then at the same time, if we make them successful then obviously it's better for all of us, right? I want a secure nuclear stockpile so I hope everybody else does as well. The software stack you mentioned, I think Tia talked about, how did that come about? Why did you feel the need to develop your own software stack? It originated back even 20 years ago when we first started building Linux clusters when that was a crazy idea. Livermore and other laboratories were the first to start doing that and then push them to larger and larger scales. And it was key to have Linux running on that at the time. And so we had the- 20 years ago, I knew you wanted to run Linux. Yeah, yeah. And we started doing that, but we needed a way to have a version of Linux that we could partner with someone on that would do the support, just like you get from an EOS vendor, right? Security support and other things. But then later on top of that, all the HPC stuff you need either to run the system, to set up the system, to support our user base. And that evolved into TOS, which is the Trilab operating system now. It's based on the latest version of Red Hat Enterprise Linux, as I mentioned before, with all that other HPC magic, so to speak. And all that HPC magic is open source things. It's not stuff, it may be things that we develop, but it's nothing closed source. So all that's there. We run it across all these different environments, as I mentioned before. And it really originated back in the early days of Baywif clusters, Linux clusters, as just needing something that we can use to run on multiple systems and start creating that common environment at Livermore and then eventually the other laboratories. How is Dell, a company like Dell, able to benefit from the open source work that's coming out of the labs? Well, when you look at the open source, I mean, open source is good for everybody, right? Because if you make a open source tool available, then people start essentially using that tool. And so if we can make that open source tool more robust and get more people using it, it gets more enterprise ready. And so with that, we're all about open source, we're all about standards, and really about raising all boats, because that's what open source is all about. And with that, we are out of time. This is our 28th interview of SC-22, and you're taking us out on a high note. Armando Acosta, Director of HPC Solutions at Dell, Matt Lyninger, HPC Strategist, Lawrence Livermore National Laboratories. Great discussion. Hope it was a good show for you, fascinating show for us, and thanks for being with us today. Thank you very much. Thank you for having us. Dave, it's been a pleasure. Absolutely. Hope we'll be back next year. Went by fast. Absolutely, SC-23. We hope you'll be back next year. This is Paul Gillan, that's a wrap with Dave Nicholson for theCUBE. See you around next time.