 started a long time ago, actually before that, and it was nice to see Flippo up here earlier, so I was part of his project Mon Blanc 2, we did lots of work that kind of proved that this stuff should work, this should be a viable approach, and as my part of Mon Blanc 2 was finishing, I just felt that one of the next logical steps was to do this for real, build a real machine and run it as a service, so don't run it as an experiment or a prototype, run it as a real production science machine, and that's the last step to find out what the last kind of missing gaps were going to be. So we've been planning to do this for a long time, we've been dying to tell you all of the details, some of these details have been out for a while, but not very much, and so I was really thrilled when I heard that Cray were going to fully announce all of the product line today, so at last we can tell you some things, and we have some real performance numbers we can share as well, so hopefully this is going to be pretty exciting, so we all know why we're interested in this stuff, there's lots of advantages of having arms as one of the members of the Hypogemic Computing Community, and it really is thanks to Mon Blanc that a lot of this is ready to try, we can't emphasize that enough, a lot of the software stack that exists is because behind the scenes lots of people have worked on it for years to get it into a pretty good state, and you'll see the benefits of that in a minute, so this Ismbard project, it's a new, in the UK we call it a tier two Hypogemic Computing Service, it's named after this quite elegant gentleman who's a very famous British Victorian engineer, Ismbard Kingdom Brunel, which if you come to the UK and you travel around in the south of the country, you'll be going on his railways, you'll be going through his tunnels, you'll be going on his bridges, he was a real genius of his time, so we've named our machine after him, and the project is a collaboration of four universities all in the south west of England and Wales, Bristol Bath, Extra and Cardiff, we teamed up with the Met Office, who were very interested in trying this technology and could provide a home for the machine as well, we're funded by EPSRC, our main science research council in the UK, obviously we've involved ARM, the reasons for that should become clear over the next few slides, and Cray and ARM together have been quite key to making all of this happen, and just to clarify what I mean by tier two, so in our kind of categorisation, tier zero, the big international machines, tier one is our national services, so Archer in the UK is a tier one, and then Ismbard is one of the tier two centres in the UK, and we currently have six of those funded, they're all exploring different architectural approaches, there are dense GPU machines, there are regular GPU machines, all sorts of different things, and Ismbard is the only ARM system that we're exploring, and then tier three tends to be our local machine, so in Bristol we have our own 15,000 core cluster at the university, so that's what I mean by saying Ismbard is tier two, so when it goes live we're going to run it like any other supercomputer scientist in the UK will just apply for time to run their science codes and get science results, the fact that it's an ARM machine should be irrelevant, that's what we're aiming at, it's not just a play with ARM, it's just to do real science on ARM, that's our goal. So what is it? So at last I can tell you what it really is, it is one of the XC50 machines, we think one of the very first XC50 machines, which means it does have an Aries interconnect, we've not been able to tell you about that before, it's just over 10,000 cores, just north of 160 dual socket nodes, it is based on Kavium Thunder X2, we've not been able to admit that before either, the exact skew we're going to use is still being decided, but it's currently going to be 32 core processors and over 2 GHz, we'll know exactly what they'll be soon. It's got the full Cray software stack, that's another thing we haven't really clarified before, so that's including the Cray compiler, Cray math libraries, Cray profilers, Cray debuggers, MPI libraries, all of that kind of stuff, that's one of the reasons we really wanted to do this with Cray, so I have a different hat where I'm involved in the Archer UK national services, Archer is also a big Cray machine, and we really wanted to generate some good comparison data, we wanted to be able to do as apples to apples comparisons as possible, so that's why we really wanted to be able to have a shared software stack between ISM Barden Archer, a shared interconnect between ISM Barden Archer, it's really mostly the processes that are different, which is going to be very useful in generating lots of results, within ISM Bard we've also got a small number of other nodes, we have a few Xeon Fies, we have some X86, we have some Pascal GPUs, again just to aid doing some comparison exercises, and the first part of the machine is already there, so that's everything apart from all the really interesting ARM stuff, and the interesting 10,000 core part will arrive early next year, but we do have eight early access nodes, which we've had for about two weeks, so all the numbers I'm going to show you, we originally generated just in the last 10 days also, so this is all very much hot off the press news, and in fact we were still tweaking the numbers yesterday, and I expect them still to keep improving, so this is all kind of new stuff. So what are we trying to do, we're trying to take some of the most heavily used codes from Archer, our national machine in the UK, and get them running on ISM Barden, and experience how hard is it, is it easy, does it just work, experience how well does it work, do the codes run at good performance out of the box, do we have to spend a lot of time tweaking and tuning, so hopefully this experience will be useful to anyone who's interested in trying this, these are the top 10 most heavily used codes on Archer, ordered from most heavily used, so Vasp is the most heavily used code on Archer, over the last 12 months, many of the codes are in Fortran, so it's important to point that out, and there are other codes outside the top 10, which we also have interest in, things like OpenFoam, IFS, which is one of the other weather and climate codes, others that you would recognize, and we're not planning on trying to do all of this ourselves, in fact we'd really rather not try and do all of this ourselves, if anyone else is working on any of these codes, please let us know, if anyone would like to use any of the codes that we've been getting running, please come and ask us, we're going to share as openly as possible everything that we are doing to try and get all of this going, that's the plan. So, another part of the plan, about two weeks ago, we had a hackathon, it was the first Isambard hackathon, we got a bunch of smart people into the Cray offices in Bristol, we locked them in, we didn't let them out until they had all their codes working and optimized, and it turned out that was actually quite good fun, and it went better than we expected, and we were using our early access nodes, so they were Alpha hardware, it was Beta software, we only crashed a few nodes a few times, we thought that was pretty good, we had access to the Cray toolchain for the very first time, and it went kind of better than we expected, and we have some real performance numbers we're going to share. There's lots of detail, I won't go through all of the detail, but the detail is in the slides which are already on the going on website, so you can go and look at all of this, it's there, so that you can dig in and see what we've done. First thing we did was look at some mini apps, because you can learn a lot just from looking at that, so these are absolute performance results that we've been measuring for real on real systems, and then we've normalized it to the performance of Broadwell, so these are 18 core Broadwell parts, we had a short window where we had some remote access to some Skylake parts, and they are 22 core Skylake parts, and we had our own early access nodes of 28 core, but after we'd done some runs, we had access to a 32 core as well, we wanted to see what difference that made, so these numbers are actually from a 32 core, which is what we'll have in Isambard, so we've got, from left to right, stream, everyone knows stream, you'd expect that to scale roughly with external memory bandwidth, we've basically got four memory channels, six memory channels, eight memory channels, that is roughly what we would expect to see performance wise, but we've got it, great, so then we move on, then we look at the kinds of mini apps that are interesting for the people that my research group works with, so these are hydrodynamics mini apps, heat diffusion mini apps, neutral particle transport mini apps, again, they're mostly memory bandwidth bound, we would therefore expect to see performance increasing with more memory bandwidth, and that's pretty much what we see in almost every case, so that was really great, see what happened, we also used all three different compilers, so we had GNU, we had Clang and Flang, and we had Cray, and we tried all of them, and we just picked whichever one gave the best performance, and that's what we're showing, but later on, I've got the list of which compiler was best on which code, so you can have a look at that if you want, those are mini apps, that's great, but those are just mini apps, how about some real codes, so then we started to dig into our list of top 10s, top 10 codes from Archer, and we have some other numbers, now we don't have Skylake for these yet, because we didn't have enough time, but we should get that soon, over the next week or two, we should be able to add that, Gromax is an interesting case, we had one of the Gromax developers come and join our hackathon, he confirmed that that code is actually fairly compute bound, so you would expect that processes that have lots of more flops and wider vectors may do better on Gromax, and in this case, we actually didn't beat Broadwell, so that seemed to be playing out, but we weren't too far behind, that was with GCC, generally we're using the Intel compiler always on the Intel platforms, we wanted to make it as hard as possible for the comparison exercise, the next two codes I should emphasize were actually worked on by the UK's Met Office, so this is the unified model, which is the weather forecast code, if you come to the UK, you watch the weather forecast at the moment on the BBC, that's generated using the UM, this is the real production, millions of lines of nasty Fortran, which are hard to get compiled on anything, that's what the Met Office will tell you, this is a real challenge to get it working on anything, and NEMO is one of the ocean models which often gets coupled with the unified model, those codes again are fairly memory bandwidth bound, so we expected to do well, but we were very pleased with the results, in fact, I think one of the most impressive outcomes from the hackathon was the Met Office brought their production build system that they used day to day for the real weather forecast, they used all of their standard compile flags, all their standard make files, they didn't change anything, and they wanted to get that working, and I thought if in two days we get that working, that'll be a fantastic result, the first time they tried it, it just worked, the whole thing just worked, it correctly built, it built a binary that ran properly, that produced correct results, and the performance was good out of the box on a really nasty production code, that is a very interesting, very important data point I think that everyone should take on board, so that's very promising, and then last of all, open foam, which is one of the CFD codes, again, traditionally performance on open foam closely correlates with stream, so we hoped we would see good performance, and we do, again, open foam is a fairly hairy code to build, that one we use GCC for, because it's a fairly complicated C++ code, and that's worked really well as well, so we were very pleased with how all of this went in a very short amount of time, just two days of effort, and a little bit of effort from a few people since has got us to where we are, I think everything that we have tried, all of these codes has just worked out of the box, that is a very important thing for people to take away. Just to give you a little bit more detail, I won't go over this on the various process we had access to, what they are, so you can check these out, I've also listed for each benchmark which of the compilers ended up being best-reached one, so you can go and have a look at that, see how that worked out, I've also got in here some details of what the benchmark cases were, so you can go and try and reproduce these yourselves as well. Where are we gonna go next? So this is very exciting, we can finally tell you these things, so we have a whole bunch of codes, again from Archer that we want to start working on, we're starting to pick off some of these, we've got many of these already compiled and running, results are very early, the way I would summarize them so far is like the Gromax case, if you have a code that's more compute-bound, then processes that have more flops, you'd expect to do better, codes that are more memory bandwidth-bound, processes that have more memory bandwidth, you'd expect to do better, and that is what we are seeing, so that's quite good, it means we can kind of understand what we're seeing, fortunately we've got many codes that are suiting these kind of processes that have lots of memory bandwidth very well indeed, so that's very exciting. That's pretty much it, the last thing I'll leave you with is if you want to know more information, the top link is the Isambard website, you can go and have a look at that, if you Google search Isambard and HPC, I think you find everything about the project, there isn't really anything else called Isambard, so it's quite useful, craze press releases there, the one below is a GitHub page where we're keeping information about how to actually apply for an account on the system, if people would like to try that, in the UK you can let me know, and I generally just tweet about high performance computing related things if you are interested, so that's everything I had to say, I'm very pleased, we can finally tell you all about this, and there will be more to come, so watch this space, thank you very much.