 We're here at the Denver Supercomputing right here and who are you my name is Wu Fang I'm a professor of computer science and electrical and computer engineering at Virginia Tech I also am the founder of the green 500 which maintains a List of the most energy efficient supercomputers in the world So a super computer you don't want to just use a lot of power. That's that's not very efficient Right is expensive or expensive. So I think We as a community are starting to realize that We don't want to build the formula one race car of supercomputers If you think about a formula one race car It requires a lot of maintenance, right? You have pit crews that change attires that maintain the running of the system and so the idea is Can we back off of that a little bit still be high performance? But then also be able to do Highly efficient Computing so think of instead of a formula one race car of supercomputing you do a Nissan 370 z of supercomputing so something that is very fast but reliable doesn't require a Pit crew Necessarily to maintain right you don't have to take it to the shop or fuel it up often It gets good gas mileage. It's very fast. It's a Prius Toyota Prius. Well, maybe it's a souped up Toyota Prius So so so like a Toyota Prius is kind of like at the other extreme one wouldn't consider that it is a high-performance machine, right? That would be a very highly energy efficient Maybe a better example might be a you know a Tesla Tesla s or Tesla 3 or whatever that car that's energy efficient. It's also very fast high performance Think of I Think you might be old enough to know this Remember this movie. You know there's a movie called cannonball run Yeah, Bert Reynolds. I don't know if you remember that movie. Maybe so basically it's a race across the United States I forget if it was Los Angeles to New York or New York to Los Angeles and so the people The fastest car that they're gonna be they take is they don't buy a formula one race car Yeah, they're not looking at a formula one race car what they're trying to do is They're maybe getting a a Lamborghini or they're getting a Camaro. They're getting some fast car That's reliable that gets reasonably good gas mileage so that they're not always fueling up or they're not always in the shop having to repair something in the car So And so it's a race to get from One side of the continent to the other and it's all about miles driven in the short amount of time so relative to supercomputing it's like I'm more concerned about the answers per month than I am about the floating point operations per second answers per month So is the only way to do supercomputing in the future is to be very power efficient Is that going to be the only way to physically make it happen because you need to you need to be able to fit a certain space You need to be able to not use too much power if you and if you just don't care about that Then you're not gonna be able to do it actually or something. Well, I think it's more more one of it's it's a That's true to a certain degree. It's more fiscal a fiscal challenge And what I mean by that price scale. Yeah, the price could be too high So like if you if you have a if you have a 50 megawatt system the amount of money that it's gonna cost in terms of powering and cooling that system is It's gonna be very expensive relative to the cost of the supercomputing. So General rule of thumb, it's just a rule of thumb is like for every megawatt of power consumption You need roughly a million dollar US dollars To power cool it well if you have a 50 million if you have a 50 megawatt system well, that's 50 million dollars that you're annual So so so there are there are some constraints and so right now what we're doing is part of the green 500 Which which we've which has been integrated with the top 500 now What the green number hundred is trying to look at is can we get to an exascale in a 20 megawatt? Thermal power in and in order to do that that means machines have to be able to sustain 50 gigaflops per watt and right now You'll hear this officially on Wednesday, and you'll get to hear from the People that are in the number one in the green 500 slot They've managed to reach over 16 gigaflops per watt who's number one. It's a supercomputer called show boo It's at Reichen R. I KEN so it's a it's a Japanese It's a Japanese is it using arm chips because we're staying right here with the arm stuff Arm is trying to be aren't is trying to claim that they're very Power efficient, right? Yeah, so in general arm is a very efficient Architecture that the ism bar one though my understanding is is that with Simon Macintosh Smith that one is is a traditional arm Processor, it's not it's not going to have particularly efficient Processing cores it's going to be a typical kind of an Intel AMD type of Processor it's going to be targeted at the server space, but arm of course has a whole line of processors from you know in the embedded and low-power space that certainly can be used in fact I think I Was just saying Barcelona super competing center has It looks like they're going to pair arm chip with NVIDIA Volta GPU for for their upgrade for the supercomputer, and I think in fact in right there right over there So do you specialize in parallel computing and everybody's been talking about that forever and That's very important thing to be able to master when you do a supercomputer, right? Right, so parallel can be so in fact we were just talking about Is some bard with the University of Bristol and Simon Macintosh Smith we just Back in August we had the 46th international or was it 47? Gotta remember what number we're at. I think we had the 47th international conference on parallel processing Going on for so for that long. Yeah started in 1971. There was a parallel processing computers starting in 1971 Yeah, so in all these 47 years has it been solved yet, or I mean I'm joking, but well like it's still a It's still a huge kind of like challenge, right? Yeah sure, I mean it's a The Parallelism that can come at all different levels and so I don't know about mastering it. It's more a matter of Being able to extract extract the the most parallelism from the lowest level architecture at the instruction level parallelism level to thread level parallelism and data level parallelism To Internode parallelism, so you've got this hierarchical a span of Parallelism that you're trying to get extract out of the codes and the challenge right now is some of the codes that are not as as Inherently paralyzable whether or not those codes need to be refactored in a way that are more amenable to parallelization Or are they just inherently sequential in a way that there's no other way to to solve it except to Solve that part sequentially and then solve the whatever you can solve in parallel solve parallel I mean you're ultimately limited by something called Amdahl's law So if half your code is serial and half your code is parallel the parallel part that you can get to go Infantesimally faster so that it like executes in zero time So you get a you know a bazillion fold speed up on the parallel part That takes that half of the program down to zero, but you're still left with the other half of the program That's serial so even though you parallelized half of the program to be bazillion times faster Your overall speed up is only a factor of two because you're still stuck with that serial part For half of the program so there's a lot of optimization work to be done Being done or have been done in the last 47 years and yeah Yeah, and with all the advent of all the different architectures that are out there now There are certain ways that you have to look at the Writing your algorithms for the appropriate parallel computing platform So some of the work that is being done Is taking a look at irregular algorithms in the sense that they have Irregular execution flow or they have irregular Computational granularities like it they'll execute One thread will be very short Another thread will take a very long time to run well those types of irregular codes generally do not do as well on GPUs So what you have to do if you're if you're really good at algorithms and parallelization You might revisit those irregular algorithms find some way to refactor them to be regular algorithms So that they map well to the GPU and that would just be one example so you have to be cognizant of of In parallel computing There's this design this notion of co-design Across the hardware to the software to the algorithms each leveraging each other in a way to to deliver Significant speed up so your 500 list is the most important of them all Because we want to get to exit scale so right now it's 16 and we just need to get a 50 that's not very far right How far is it's a factor of three improvement and gigaflops for what that's just two two years Oh, how long does it take to get to the dream? well, you know, so if you look at the DARPA exit scale computing study they were targeting 20 megawatt exit scale super computer and they were talking about it in the year 2015 and We clearly found that it's getting to exit flop or exit scale in 2015 wasn't possible China's looking at making it in 2020 Although they've now loosened it. My understanding is from 20 megawatts to 30 megawatts So they're gonna do exit flop in 30 megawatts Which is I can't do the math right now at the top of my head But that's not you know, it's much lower than 50 gigaflops per watt The US is I think it's targeting 20 21 20 22. So we've bought ourselves Five to seven years depending on how you count five to seven years of runway in terms of reaching that 20 megawatt thermal envelope, but when you have the exit scale, what can you do with it? Is there is there a reason people want to have exactly that? Well, I don't know it's just like a number it's it's it's it's just it's it's another goal to to to be reaching mean before before exit flop it was petaflop before you know that we had it so it's a it's a It's a 1000 fold next improvements If or I should probably say when we reach exit scale The next one is Zeta scale I believe I Think that's right. Yeah, so it's Peta X Zeta and then yada yada scale I want the yada scale you have the yada scale right? Yeah Well, they used to be they used to well, you know with with the rate at which we've been slowing down in terms of reaching these higher Compute speeds that it may be a lot more than 10 years. Okay, but we'll still be around, right? We I hope so Doing those 500 lists that means all these super computing guys give you access to their hardware and you can remotely Execute your benchmarks and then you that's how you do it. Is that how you are? No, so no, so it's it's it's it's a self-reporting System There's the top 500 and arguably By association the green 500 reserve the right to be able to try and validate the results of different systems but this is a it's a self-reporting system that that is going on here and Knowing the community the way it is it's going to be pretty obvious as somebody is is is Stretching the truth of it in terms of what their performance or what their power consumption is how many secrets? Supercomputers does the Pentagon have and China and Russia do they have a bunch of secret super computers or they probably don't Or you don't have to say I don't know. I mean, I would imagine is it possible Yes, certainly it's possible Certainly it's possible. I don't know the answer to that what I think is interesting is if you look at the green 500 And the top 500 lists is is that we've been seeing occasional forays into the list by Companies like Amazon and Facebook We see Google here with one of their boots and so Most people don't understand that They're using a supercomputer on a daily basis whenever they're doing a search or when they're when they're when they're Doing social media. There is a there's effectively a super computer super computer backing these applications When they are getting into our brains and understanding what we want to buy next or something they're using super computer Yeah, well, so yeah, so like Amazon has these recommender systems And so they've got to run the algorithm someplace to to come up with the recommendation now I don't know how sophisticated and I don't work at Amazon so I cannot speak for them, but You know Depending on sophisticated their recommender algorithm is and what what different things they take into account It may or may not be the case that they need a really big super computer or they might just need, you know, four or five Four or five nodes of a modest cluster to be able to do the recommendation system But but the point is is that in order to do it in a timely matter They have to do it relatively fast Which means they do need some some compute computational power to do so and you're gonna have some fun here The super computing event. This is a like a cool event This is the prime event for super computing or yeah. Yeah, this is this is probably one of the main ones It brings together Technical researchers it brings together vendors educators Government lab folks and so it's it brings together a diverse set of people from all around the world to tackle the the Challenge of parallel and distributed computing. I mean this this phone is a parallel computer That's 15 cores in it to CPUs for GPUs and nine Accelerators that that folks don't really know what what's in it so And it's still a challenge to get all these apps to use all this power Yeah, so certainly has had multi-core for so long, but I don't know how good they are multi-tasking multi-using sometimes it's the apps applications themselves, so you could download applications and they only make use of one core I Teach a class and parallel computation and the students when they program their codes there They've been using one core and so they start out in my class. I said all right well You're gonna use your laptops to use the four cores that are in your on your laptop You've got a you're got a you've got a powerful quad core CPU. You might as well use it and they succeed They all pass the class They all for the most part they if I pass the class and I think they're they're surprised pleasantly surprised it How much faster some of the codes that they were running serially are now running in parallel? so And and on some of the ones like the the MacBook Pros They have a spare GPU one GPU to drive the graphics display and another GPU for general-purpose computation and They're able to accelerate codes Eightfold tenfold twelve fold faster than than what they had been doing serially before Cool, so I'm gonna check out the internet The future of supercomputing in your list is published since this morning, right? Yes, the latest as they've published the formal announcements in award sessions will be tomorrow at I believe 515 as part of the top 500 birds of a feather session or the boff And then we will have another boff the green 500 boff Which has Natalie Bates Who's part of the energy efficiency? high-performance computing working group and Eric Strowmire who's part of the top 500 We're we're running a boff on the green 500 and we will talk about those trends that we see on the list We will also have the number one green supercomputer talk about their Energy efficient supercomputer and and what they did to make it so energy efficient at over 16 gigaflops per watt We will also have one or two additional talks on What we call level two or level three measurement methodologies these are higher quality higher fidelity measurement Methodologies to really get the power consumption readings of the supercomputers in question more accurately