 Live from New York City, it's theCUBE. Covering Lenovo Transform 2.0, brought to you by Lenovo. Welcome back to theCUBE's live coverage of Lenovo Transform. I'm your host, Rebecca Knight, along with my co-host Stu Miniman. We're joined by Madhu Matta. He is the VP and GM, high-performance computing and artificial intelligence at Lenovo. And Dr. Daniel Brunner, the CTO of Sinet at University of Toronto. Thanks so much for coming on the show. Thank you for having us. Our pleasure. So before the cameras were rolling, you were talking about the Lenovo mission in this area to use the power of super computing to help solve some of society's most pressing challenges and that is climate change and curing cancer. Can you talk a little bit, tell our viewers a little bit about what you do and how you see your mission? Yeah, so our tagline is basically solving humanity's greatest challenges. And as we're also now the number one super computer provider in the world, as measured by the rankings of the top 500. And that comes with a lot of responsibility. And we believe that, one, we take that responsibility very seriously. But more importantly, we work with some of the largest research institutions, universities all over the world as they do research on this amazing research whether it's particle physics, like you saw this morning, whether it's cancer research, whether it's climate modeling, I mean, we're sitting here in New York City and our headquarters is in Raleigh, right in the path of Hurricane Florence. So the ability to predict the next tsunami, the ability to predict the next hurricane is absolutely critical to get early warning signs and a lot of survival depends on that. So we work with these institutions, jointly develop custom solutions to ensure that all this research, one is powered and secondly, works seamlessly and all their researchers have access to this infrastructure 24-7. So Danny, tell us a little bit about SINET too. Tell us what you do and then I want to hear how you work together. And no relation with SINET, I've been assured, right? No, not at all. It is also no relationship with another network that's called the same, but doesn't matter. So SINET is an organization that's basically the University of Toronto and the Associated Research Hospitals. And we happen to run Canada's largest supercomputer, where one of a number of computer sites around Canada that are tasked with providing resources and support, like support is the most important, to academia in Canada. So all academics from all the different universities in the country, they come and use our systems. And from University of Toronto, they can also go and use the other systems. It doesn't matter. Our mission is, as I said, okay, we provide a system or a number of systems. We run them, but we really are about helping the researchers do their research. And we're all scientists. So all the guys that work with me, we're all scientists, initially. We turn to computers because that was the way we do the research. You cannot do astrophysics other than computationally, observationally and computationally, but nothing else. Climate science is the same story. You have so much data and so much modeling to do that you need a very large computer that, of course, very good algorithms and very careful physics modeling for an extremely complex system, but ultimately needs a lot of horsepower to be able to even do a single simulation. So what I was showing with Madhu at the booth earlier was results of a simulation that was done just prior us going into production with our Lenovo system, where people were doing ocean circulation calculations. The ocean is obviously part of the big earth system, which is part of the climate system as well. But they took a small patch of the ocean, a few kilometers in size in each direction, but did it at very, very high resolution, even vertically going down to the bottom of the ocean so that the topography of the ocean floor can be taken into account. And that allows you to see at a much smaller scale the onset of tides, the onset of microtides that allow water to mix, the cold water from the bottom and the hot water from the top. The mixing of nutrients, how life goes on, the whole cycle is super important. Now that, of course, then gets coupled with the atmosphere and with the ice and with the radiation from the sun and all that stuff. We run our, that calculation was run by a group from, the main guy was from JPL in California. And he was running on 48,000 cores, single runs at 48,000 cores for about two to three weeks and produced a petabyte of data, which is still being analyzed. So that's the kind of calculation that's been enabled. It's enabled by a system the size of the one we have. It was not possible to do that in Canada before this system. I tell you, both when I lived on the vendor side and as an analyst, talking to labs and universities is you'll love geeking out because first of all, you always have a need for newer, faster things because the example you just gave is like, oh wait, if I can get the next generation chipset, if the network can get me improved, you can take that petabyte of data and process it so much faster. If I could only get more money to buy a bigger one. We've talked to the people at CERN and JPL and things like that and it's like, this is where most companies, it's like, yeah, it's a little bit better and it might make things a little better and make things nice. But no, this is critical to move along in the research. So talk a little bit more about kind of the infrastructure and what you look for and how that connects to the research and how you help close that gap over time. Before we go, I just want to also highlight a point that Danny made on solving Humanities Gator's Challenges, which is our motto. He talked about the data analysis that he just did where they are looking at the surface of the ocean as well as going down, what is it, 264 vertical layers underneath the ocean to analyze that much of data, to start looking at marine life and protecting marine life. As they start to understand that level of vertical depth, they can start to figure out the nutrients value and other contents that are in that water to be able to start protecting marine life. There again, another humanities greatest challenge right there that has given you life. Nothing happens in isolation, it's only interconnected. So, you know, when you finally got a grant, you're able to buy a computer. How do you buy the computer that's going to give you the most bang for your buck? The best computer to do the science that we're all tasked with doing. So it's tough, right? So you need to, we don't fancy ourselves as computer architects. We engage the computer companies who really know about architecture to help us do. So the way we did our procurement was, okay vendors, we have a set pot of money. We're willing to spend every last penny of this money. You give us the biggest and the baddest for our money. Now it has to have a certain set of characteristics. You have to be able to solve a number of benchmarks. So some sample calculations that we provided. So the ones that give you the best performance, that's a bonus. It also has to be able to do it with the least amount of power. So we don't have to heat up the world and pay through the nose with power. And those are objective criteria that anybody can understand. But then there's also the other criteria. So how well will it run? How is it architected? How balanced is it? Did we get the IO subsystem for all the storage that was the one that actually meets the criteria? What other extras do we have that will help us make the system run in a much smoother way and for a wide variety of disciplines? Because we run the biologists together with the physicists and the engineers and the humanitarians. Humanities people. Everybody uses a system. So to make a long story short, the proposal that we got from Lenovo won the bid, both in terms of what we got in terms of hardware and also the way it was put together, which was quite innovative, yeah. So I want to hear about, so you said give us the biggest about us. We're willing to empty our coffers for this. So then does, where do you go from there? How closely do you work with Sinet? How does the relationship evolve and do you work together to innovate and kind of keep going? So I see it as a, it's not a segment or a division. I see high performance computing as a practice. And with any practice, it's many pieces that come together. You have a conductor, you have the orchestra, but at the end of the day, the delivery of that many systems is the concept. That's the way to look at it. So to deliver this, our practice starts with multiple teams, one's a benchmarking team that understands the application that Dr. Gruner and Sinet will be running because they need to tune to the application, the performance of the cluster. The second team is a set of solution architects that are deep engineers that understand our portfolio. Those two work together to say, against this application, let's build, like he said, the biggest, baddest, best performing solution for that particular application. So those two teams work together. Then we have the third team that kicks in once we win the business, which is coming onsite to deploy, manage, and install. And when Dr. Gruner talks about the infrastructure, it's a combination of hardware and software that all comes together. And the software is open source-based that we built ourselves because we just felt there weren't the right tools in the industry to manage this level of infrastructure at that scale. So all this comes together to essentially rack and roll onto their site. Now, let me just add to that. It's not like we went to RFP in a vacuum. We had already talked to the vendors, right? We always do. You always go and they come to you and when's your next, you know, money coming and they always, it's a dog and pony show and they tell you what they have and this and that. With Lenovo, at least the team, as we know it now, used to be the IBM team, the X-Systems team, who built our previous system. So a lot of these guys were already known to us and we've always interacted very well with them. So they were already aware of our thinking, where we were going, and that we're also open to suggestions for things that are non-conventional. Now, this can backfire, okay? So some data centers are very square. They will only prescribe what they want. We were not prescriptive at all. We said, give us ideas about what can make this work better. So these are the intangibles in a procurement process. Oh, you also have to believe in the team. If you don't know the team or if you don't know their track record, then that's a no-no, right? Or it takes points away. So we brought innovations like Dragonfly, which is, Dr. Derek can talk about that, as well as we brought in, for the first time, Accelerol, which is a software-defined storage vendor. And it was a small part of the bid, but still. We were able to flex muscles and be more creative versus just a standard. All right, so my understanding, you've been using water cooling for about a decade now. Maybe you can give us a little bit about your experiences, how it's matured over time, and then Maddo will talk to us and bring us up to speed on Project Neptune. Okay, so our first procurement, about 10 years ago. Again, that was the model we came up with. We, after years of racking our brains, we could not decide how to build a data center and what computers to buy. It was like a chicken and egg process. So we ended up saying, okay, this is what we're going to do. Here's the money. Here is our total cost of operation that we can support. So that included the power bill, the water, the maintenance, the whole works. So much can be used for infrastructure and the rest is for the operational part. And we said to the vendors, you guys do the work. We want, again, the biggest and the baddest that we can operate within this budget. So obviously it has to be energy efficient among other things. We couldn't design a data center and then put in the systems that we didn't know existed or vice versa. So that's how it started. So the initial design was built by IBM and they designed the data center for us to use water cooling for everything. And they put rear doors, rear door heat exchangers on the racks as a means of avoiding the use of just blowing air and trying to contain the air, which A is less efficient, the air, and is also much more difficult. You can flow water very efficiently. So you open the door of one of these racks and it's hot air coming out. But you take the heat right there in situ. You remove it through a radiator. It's just like your car radiator. It works very well. Now it would be nice if we could do even better by doing the hot water cooling and all that. But we're not in a university environment. We're in a strip mall out in the boonies. So we couldn't reuse the heat. Now places like LRZ, they're reusing the heat produced by the computers to heat their buildings. Or if we're by a hospital that always needs hot water then we could have done it. But it's really interesting how the option of that design is that we ended up with the most efficient data center. Certainly in Canada and one of the most efficient in North America 10 years ago. Our PUE was 1.16, that was the design point. And this is not with direct water cooling to the chip. Right, right. All right, so bring us up to feed Project Neptune in general. So Neptune, as the name suggests, is the name of the God of the Sea. And we chose that to brand our entire suite of liquid cooling products. And liquid cooling products is end to end in the sense that it's not just hardware, but it's also software. And the other key part of Neptune is a lot of these, in fact most of these products were built not in a vacuum, but designed and built in conjunction with key partners like Barcelona Supercomputer, LRZ in Germany and Munich. So these were real life customers working with us jointly to design these products. So Neptune essentially allows you, very simplistically put, it's an entire suite of hardware and software that allows you to run very high performance processes at a level of power and cooling utilization that's much, much, like it's like using a much lower processor. So it dissipates that much heat. The other key part is we're using, you know, the normal way of cooling anything is one chilled water. We don't use chilled water. So you save the money of chillers. We use ambient temperature up to 50 degrees, 90% efficiency, 50 degree goes in, 60 degree comes out. It's really amazing, the entire suite of stuff. It's 50 Celsius, not Fahrenheit. It's Celsius, correct. And he talked about, Dr. Gruner talked about Finet with the rear door heat exchanger. You actually got to stand in front of it to feel the magic of this, right? As geeky as that is. You open the door and it's this hot 60, 65 degree C air. You close the door, it's this cool 20 degree air that's coming up. So the costs of running a data center drop dramatically with either the rear door heat exchanger or direct to node product which is we just got released the SC650 or we have something called the thermal transfer module which replaces a normal heat sink. Where for an air cool, we bring water cool goodness to an air cool product. Danny, I wonder if you can give us the final word. Just the climate science in general. How's the community doing? Any technological things that are holding us back right now or anything that excites you about the research right now? Technologies, technology holds you back by virtue of the size of the calculations that you need to do. But there's also physics that hold you back. Because doing the actual modeling is very different. And you have to be able to believe that the physics models actually work. Now this is one of the interesting things that Dick Peltier who happens to be our scientific director and he's also one of the top climate scientists in the world. He's proven through some of his calculations that the models are actually pretty good. So the models were designed for current conditions with current data so that they would reproduce the evolution of the climate that we can measure today. Now what about climate that started happening 10,000 years ago? The climate was going on. It's been going on forever and ever. There's been glaciers, there's been all these events. Well it turns out that it has been recorded in history that there are some oscillations in temperature and other quantities that happen about every 1,000 years. And nobody had been able to prove why they would happen. So it turns out that the same models that we use for climate calculations today, if you take them back and do what's called paleoclimate so you start with approximating the conditions that happened 10,000 years ago and then you move it forward. These things reproduce those oscillations exactly. So it's very encouraging that the climate models actually make sense. So we're not talking in a vacuum. We're not predicting the end of the world just because these calculations are right. They're correct. They're predicting that the temperature of the earth is climbing. And it's true, we're seeing it. But it will continue unless we do something. So it's extremely interesting. Now he's applying, he's beginning to apply those results of the paleoclimate to studies with our anthropologists and archeologists. We're trying to understand the events that happened in the Levant in the Middle East thousands of years ago and correlate them with climate events. Now, is that cool or what? It's very cool. Solving humanity's greatest challenge again. He just added global warming to it. You have a fun job, you have a fun job. And it's all the interdisciplinarity that now has been made possible before we couldn't do this. 10 years ago we couldn't run those calculations. Now we can. So it's really cool. Great. Well, Madhu, Danny, thank you so much for coming on the show. It was really fun talking to you. Thanks. I'm Rebecca Knight for Stu Miniman. We will have more from the Lenovo Transform just after this.