 Hello, and welcome to theCUBE's coverage of ISC High Performance 2023. We're covering all the things HPC, machine learning, AI, high performance analytics, quantum computing and more. And one of the most important topics in the HPC community is next generation cooling and energy efficiency. And we're joined here by David Hardy PowerEdge, Cooling Product Manager at Dell, Tim Shedd Engineering, Technology's Office of the CTIO, and Mohan Kumar with an Intel Fellow. Gentlemen, thanks for joining me today. Thank you, good to be here. Thank you for having us. So the big topic is power and cooling, but how do you get more power with to power all these CPUs, GPUs, processors to get the power that's needed and at the same time sustainable? We'll start with Dell. Well, I'll start as a Product Manager for PowerEdge. You know, one of the biggest challenges is bringing sufficient power into these systems really to support these high performance processors, both CPUs and GPUs. Luckily, it's more than worth it. The performance gains relative to the increased power make it a no-brainer to go with the next generation systems. And the other piece of the equation is from a efficiency perspective, how do we cool it? Luckily, generationally, we keep improving how much we can air cool. We've got liquid cooling options that make everything run very efficiently. So again, it's more power consumption to deliver this high level of performance, but we can do it more efficiently this generation compared to past. What's the innovation behind this next generation if you had to put the finger on it? What was the key aspects? Well, actually, it's not just one thing. It's a bunch of incremental improvements and a variety of areas, be it power delivery, be it the designs of the system so that we can more efficiently move air through there. It's the way that we bring cooling to the chips as smartly controlled fans so that we're only moving as much air as needed at any given time, reacting dynamically to a workload inside the system. It's a lot of refinement. It's continuous improvement generation over generation that adds up to big differences at the system level. Well, and what are we talking about in terms of power that we're going to be seeing that can maintain the cooling and also the sustainability requirements? There's a lot of green action going on, sustainability goals. This is a big part of this new metric. Absolutely. So the processor, as David mentioned right now, our processor consume on the high end about 350 watts and the GPUs can consume close to 1000 watts as we look into the future and you need to have a efficient solution at all levels to cool these solutions. And when we talk about cooling solutions, especially, it's not about can you cool with air or can you cool with liquid? The question is, can you cool economically and sustainably with any given solution? That's what you're looking for. And so this is where the right solution for the right problem, right? So at some point when you can, you can always come up with a cooling solution, but the problem is the power of the cooling solution is going to put a dent on your pocketbook. And that's the point where you cut over into technologies like liquid cooling, which would be cold plate or immersion cooling, various things. So we're always driven by what we call PCO, total cost of ownership, right? What is the optimal solution for your total cost of ownership? If it is air, it's air. If it is cold plate, it's cold plate. If it's immersion cooling, it was immersion cooling, right? And you have the spectrum covered so we can hit all points, you know, as we go, as we increase the power of the platform, increase the efficiency of the platform, we're able to do that. And one additional point here is that since you brought up sustainability, even if there was no power issue to dealt with, a lot of folks are looking at liquid cooling solution simply because it's more sustainable because in general, liquids like, let's say take water for an example, it's an order of magnitude more efficient. It's conducting heat away compared to air. So that gives you the efficiency that then translates to reduced power that contributes to your sustainability value. How does the liquid cooling solutions today compare to previous generations? So, you know, liquid cooling has an interesting history that goes back to, I believe the first patent was somewhere in the fifties for cooling capacitors on the street transformers, right? And then it moved from there to the supercomputers which were in liquid nitrogen baths and in the good old days in the seventies. And so we keep in technology, it's a very interesting phenomenon. We keep it reinventing things. So it's not that, so the domain shifts over is basically what happens, right? So what used to be in the domain of capacitors moved over to supercomputers, now it's moving to mainstream servers. So what we are doing now is taking those principles that have worked effectively elsewhere and we are applying the same thing to cooling chips and server platforms. Yeah, we hear a lot of people talking about direct liquid cooling compared to just other cooling solutions, especially in the racks. What's the advantages of the direct liquid cooling? Can you just quantify that or give commentary? I'll jump in. So for the direct liquid cooling, what we're doing is we're trying to match the heat load to the cooling system, right? So that's, if you go back to engineering and thermodynamics, that's the best way to be efficient. You don't wanna overpower your cooling. If you don't need it, you wanna match that well. So when you are placing a cold plate, which is just a little box, typically with a copper base and you're running water through that, you're putting that really effective cooling right on the heat source. Everything else in the chassis can typically be cooled pretty efficiently with relatively low-powered fans and so you're able to significantly decrease the total energy required to cool while enabling what we see is enabling chip powers, well past 1,000 watts. We don't see a real limitation right now from the silicon vendors as far as the roadmap and being able to use DLC to cool it. What's the role of the industry playing on standards? Is there a lock-in? Is it open? Can you guys share? It's been discussion around, worried about lock-in, from a particular cooling solution or provider. What role does the industry standards play in the cooling area? It seems super valuable, especially when you have racks exceeding more than some of the numbers you guys are quoting there when you have more GPUs and CPUs. Yeah, again, I'll offer this and just at this time, there's not a lot of standard that exists. Every system tends to be kind of a one-off design from the chip to the facility water, but there are efforts going on from the Open Compute Project through in the United States, ASHRAE, the American Society of Heating Refrigeration, Air Conditioning Engineers to AHRI, which is the American Heating Refrigeration Institute. They're all involved now in developing liquid-cooled equipment standards that will open up the ecosystem and make it a lot easier for the components to be interchanged. The idea is to open up the ecosystem for innovation and for more competition in the ecosystem, which we anticipate will also make the technology more affordable. We're migrating towards standards, but we're not there yet. It's definitely really important to be able to enable the type of scaling that we see is necessary to support the computing innovations that are coming. So hitting the levels now, you can support the heat now. What are some of the reuse benefits? There's been discussions around position assault from these challenges, lower the T-cases. What are some of the most effective solutions out there? How do we do this efficiently? So I think you covered a couple of points there. So let me talk about reuse first, right? So one of the reasons, especially like immersion cooling is very interesting to folks is because it allows you to have an outlet temperature that's much higher. And you can utilize that outlet water temperature that's much higher for today. What we do is essentially we pay for removing the heat and then we pay for rejecting the heat, right? You pay twice. So what they want to get to is that once that heat has been removed from the platform, you want to take that heat and make it do a useful thing. Like, if you're in a building, maybe heat up the building in mid-latitudes in America, they're using it for greenhouse. Essentially they're pumping the heat into a greenhouse where it's maybe 30 or 40 Fahrenheit. So you can grow vegetables there. And in other countries where they have heat water loops that supply to the homes, they're using it essentially to supply heat, hot water into the homes using the data center. So data center is essentially, instead of you paying to reject heat and causing an environmental impact in that you're actually benefiting the society through the data center business, which is an amazing thing, transformation to happen. And that's real efficiency. Talk about leverage there. I mean, that's benefit to society, green and turning it into societal benefits. What about the other challenges around effective solutions around higher TDPs and lower T cases? Yeah, so one of the benefits of going down this path is that it allows us to go for a higher TDP, a thermal design power. So, which means we can deliver higher performance. And if you can deliver higher performance in a smaller footprint, then you need your overall volumetric space in which you're delivering the performance basically goes down. So it's a lot more sustainable solution for you to have. And then having more efficient cooling solution means you can go for a lower lower TKs because you're able to, you have the ability to reject that heat essentially. And that plays into essentially higher, once again, into higher performance that you can deliver to the customer. Dave, let's bring you in here. You're the PowerEdge cooling product manager. You got to make it all work with the products. Yeah, I was going to say, I'm sitting here with these technologists that they're experts in this field. They, I'm sure, have excellent vision into the future on how the different technologies work and how they scale. I work with the customers today that are trying to take these great ideas, but how do they map back into some of the constraints that our customers face today? When they may have had a data center built 20 years ago. So, we do work with our customers to make sure that transitions to take advantage of these latest liquid-cooled solutions. And there are a variety of liquid-cooled solutions. As a standard, we offer direct liquid cooling, but we also support immersion cooling in other solutions. You can do it at the system level, you can do it at the rack level. There's a lot of ways to apply liquid cooling. And so we work with our customers to try to figure out what works best for their constraints, what works best for their budgets, what works best for their timelines. And we're really at the beginning right now of deploying processors that are stressing that air-cooling threshold. So for a lot of customers, they're still air-cooling. They're going to continue to leverage that equipment in their data center. And they see that their next step is when they're going to have to start considering liquid cooling. Others are already there. They're comfortable with it. It's running well. It's accomplishing the goals and they're developing a skill set and how to manage it. So every customer runs at their own pace. And it's important that Dell and Intel, as we work with customers, help our customers at the pace that's comfortable for them. Tim mentioned, and Mohan also had a comment on this other side. Tim mentioned the power per racket should go exceeding the numbers. Mohan talked about the future of powering new use cases where there's benefits that come out of the heat reuse and water cooling. The customers have a lot of racks. They could have old racks. This is what performance per rack, power per rack comes in. What's the innovation around the racks? Whether they have to have old racks or new racks. How are people stacking up their data centers? Because we're seeing more and more data centers being deployed, not only for the hyperscalers, but for everybody. I mean, we've got edge coming around the corner. You're going to have a lot of footprint challenges with the intelligent edge coming. So this is a real going to be a power and cooling challenge as you get more density. Yeah. Yeah. I mean, that's a great question. And I would say very roughly, if you were to break this into helping customers who have existing data centers, and they're used to working with certain rack footprints, power distribution schemes, it looks like incremental improvements and we try to make the latest technologies digestible in bytes that work. Customers that are starting with a green field, oh, clean sheet of paper. There are a lot of opportunities to be creative to start with. Starting with getting power distribution oriented around density from the very beginning. High power, high voltage power distribution, tall racks, you know, plumbing water into the data center for every rack position from the start so that your future prove. But again, it gets back to customers move at their own pace. We have to give them options. Yeah. One of the things I want to ask you on the product side because Dell is well known for modularity, interchangeability, increased innovation or every year, lower costs. I mean, come on, that's the Dell formula. What's the benefits for the end customer on this area? Because this is a very important area. They got to get the more power and there's a sustainability targets they want to meet too. What are the key benefits to the customer? Well, jump in. One of the innovations that we are driving is in partnership with members of the Open Compute Project. We are actively supporting the DCMHS along with Intel. It's a data center modular hardware systems. This is, you know, at its core, allowing that flexibility, allowing OEMs like Dell to be able to incorporate the latest and greatest silicon into a standard format that then slides into a rack with desegregated power. So that's where instead of having PDUs, vertical PDUs in the rear of the rack, we now have power supplies in power shelves that are spread throughout the rack and then we distribute DC power. This offers a lot of advantages, both in terms of more space in the computer platforms for doing compute, but also in terms of efficiency and sustainability because now we can have these optimized power supplies that are available to provide power to everything in the rack. And so that also goes to the cooling is now we can have these manifolds in the back of the rack that the service can just slide right into. All comes back to standards, of course, and the ability to be able to slide compute nodes in and out easily, but this provides a promise of modularity, affordability and interoperability for the future. Now that's not a today's statement, but it's certainly publicly known that we're working together with Intel and others to develop these sorts of modular and efficient systems. And by the way, the open compute organization is a phenomenal group. We covered their inaugural event many years ago at theCUBE and just they've had a great track record. So congratulations is super important for the industry, this area of sustainability and network efficiency. Mohan, I have a question for Intel. You got the Intel, Dell relationship, well documented, successful over many, many years and generations. Question for you, what is Intel doing to increase the performance being mindful of the cooling challenges around sustainability? And how are you working with OEMs such as Dell to create efficient cooling solutions for these new hyperpowered processors? Thank you, John. So first of all, we have offerings that kind of target these markets. So we have a QSQ that's targeted towards immersion. We have optimized heat sinks that target liquid cooling based solutions. And above all, we have this know what left behind approach to solving the performance problem. So we want to maximize the performance at the optimal power footprint for you. So in every generation, we try to make our processors more power efficient. We have built-in accelerators that's give you, you know, 10 to 15 acts. Yeah, energy performance improvement for you compared to the alternative. And we have the right solution for the right problem, right? We have not just processors. We have GPUs and AI solutions. So put it all together. We try to cover all the bases in there. As far as our partnership with Intel, a few of them were not talked about earlier, right? So we worked with them closely, directly, as our podium partner. And also with them in these public forums, like OCPE and GreenGrid and ISRAE and so on, to make sure the right standards are in place for us to take advantage of the liquid cooling, immersion cooling sustainability efforts like DCMHS that John referred to in here, right? So we have an approach to essentially provide them the solutions. We partner with them closely when we go in because these type of solutions are not 1% problem. These are definitely, we need the OEM partner. They need us and we need them. And so we work tightly together to give them customer a solution that they can utilize as opposed to giving them ingredient pieces that they have to put together. Yeah, and the needle's been moved on the sustainability side. You guys doing a great job. PowerEdge, great name. I'll always love that name. More power, saves power. Next generation, you got a great product there, David. And thanks for coming on. Tim, I'll give you the final words since the engineering technology at the office of the CTIO, which stands for Chief Technology Innovation Office. What is going on there? What are you most excited about right now as you look out? You got the standards bodies coming together. You got a real momentum accelerating into the efficiency and energy savings, sustainability is looking good. People are all on point, not just mailing it in. There's some real action. What are you most excited about? Yeah, that's actually really true. And that has been one of the most encouraging and exciting things that I've seen recently in Dallas as we are developing these new high performance platforms employing the OCP, you know, RV3 racks and DCMHS. Also upfront, front and center is sustainability. How are we accounting for that? How are we best taking advantage of the power saving features that Intel is providing us? How are we taking advantage of the power saving features in these new power supplies, in even network cards and so on? And it's been really exciting to be a part of this and to see how we can enable really the compute solutions of the future in a way that our consumers can really benefit from our customers can really benefit from while also being completely cutting edge, high powered and sustainable. Well, gentlemen, thank you for your time. And we did a full interview without talking about AI. So we can't leave it there. We have to bring it up as the final question. We did a survey to our CUBE alumni network, technical network of infrastructure, cloud and on-premise friends and we had about 50 people. We asked, are you using AI? Most of them said they're going in for low hanging fruit around helping around automation, cost optimization, network optimization and these low hanging fruit use cases. Just final question for each of you. Are you seeing AI coming in to help with some of the hard, heavy, undifferentiated, heavy lifting in the area around getting more efficiency out? I just thought I'd throw that out there as a lightning round. Anyone want to take a shot at that? I'll start with that simply because the focus for my team is to enable people to use AI more than how we're applying it to our product planning today. There's a lot of good new accelerator based solutions to support customers in every industry to leverage AI and the efficiency factors and cooling these high power systems is my maniacal focus. Well, they eat the power, they want more power. GPUs, I mean, you can't get enough GPUs for training large multimodal models, foundation models. Yeah, so on the AI, it's a two-fold answer. I would agree that with David that our job is to enable these solutions that AI can play in, but we also see that AI has got a tremendous role to play where as a platform and a server system, we have capabilities and then there are data center that has cooling and systems that are typically tend to operate independently. And there is a way now for an AI to come in and essentially say, oh, I see how these systems are being cooled. And on the basis of this cooling, I can change the temperatures, I can move the needles on the various things and I can improve your performance. I can give you better sustainability. So various knobs that it can turn and to that extent, machine learning can be applied in those spaces. It's an exciting place to be. Yeah, it's exciting. Okay, Tim, man, take a shot at that. Any AI comments on the air end? No, I'll support that. I would say, if you've heard recent comments from our chairman, Michael Dell, basically if you're not using AI, you're leaving ideas and performance on the table. And so we are aggressively supporting our customers in the space while exploring how we can best use this to improve our products at potentially a faster pace and provide the sorts of efficiency gains and sustainability gains that Mohan just outlined. Absolutely, it's part of the game now. You guys are right here in the center of the action. You're on both sides, you're enabling more AI to be smarter, faster, cheaper, and at the same time you can use it to make efficiency on the sustainability energy side. So super, super cool, pun intended. Thanks for coming on theCUBE, appreciate it. Gentlemen, thanks for your time. Thank you very much. Okay, this is theCUBE's coverage of ISC High Performance 2023. We're covering all things HPC, machine learning, AI, high performance, analytics and computing. I'm John Furrier, your host. Thanks for watching.