 Hi everyone, this is a full house, so I think this is a really good sign for ARM. I'm Kevin Padretti from Sandia, and I'm going to be talking to you today about the Vanguard project and our efforts to mature the ARM ecosystem, the software stack for our computing workloads. This is a Trilab project within the NNSA, the National Nuclear Security Administration, and the three labs are Sandia, Los Alamos, and Livermore, and that's their logos there. So for an outline of this talk, I'm going to start by giving a little overview of Vanguard. Then I'm going to talk about the effort to mature the software stack for ARM, for our computing workloads. It's kind of the early days of this project, so we're kind of trying to get organized and get planned, but I'll tell you what some of our current thinking is. Then give some preliminary results. We're not quite ready to share absolute results at this point, but I'm going to give kind of a qualitative feel for different workloads that we've evaluated and the different compiler stacks like a red, yellow, green type status for each of them, and then conclude and hopefully have some time for questions. The Vanguard project is an effort we're starting up to try to expand our choices in the HP ecosystem so that when we procure the large systems that we buy, we have multiple technology choices. To do that, we have to start getting some confidence that they're actually able to run the large scale production multiphysics applications that we've spent many years developing. The Vanguard project is kind of to take promising technologies, do a focused effort on them that builds confidence so that we can procure large systems and have some confidence in that. So we're addressing both the hardware platforms and software stacks together with any new hardware technology. There's certain to be gaps in the software stack that have to be addressed, so that's what we need to do in this project and buy down the risk so we can be successful in the procurements. Okay, so this is kind of showing where Vanguard fits in our spectrum of computing systems that we procure. So on the left here, we kind of have small scale test beds that we procure and evaluate and run our workloads on, and the goal here is to kind of have a breath of architectures and figure out which are kind of the more promising results to evaluate them for our workloads. And in this regime, it's relatively small systems, 10 to 100 nodes, and there's really brave users that get in on these systems and counter all the bugs and try and crank through and get the things working. Then on the other side of the platform, we have kind of our production systems that we procure that are the larger scale systems that the code teams focus on getting their applications working on and running at scale and doing real work. So these are, you know, we have the CTS systems where commodity kind of turnkey class systems we buy, but they're large and they have to be production and it's hard to do kind of development on them, and then we have the advanced technology platforms that are also kind of, you know, lockdown software stack and, you know, have to be used for the real workload. So where Vanguard fits is kind of in the middle of that, where rather than have the breath of architectures, we want to pick certain ones where we have a focused effort on. We buy a big enough machine that we can get the code teams interested to get a focused effort around, you know, trying to develop it. So we have a broader user base on this type of platform, and it's not targeted for function, for production use. So we can leverage that to be more flexible and agile and doing the software development that we need to do to mature it. Okay, so that's Vanguard. So for ARM specifically, we've filled a number of test beds over the last several years to get some experience. So the first was the Hammer test bed that was about 50 nodes of the X gene from applied micro, and I think this was one of the first 64-bit processors. So we started working with that. More recently, we've evaluated the Cavium Thunder X1 in a small-scale test bed called Sullivan. We've been working with HPE on, you know, evaluating the Comanche platform, and we're going to be getting kind of a single cabinet Comanche system. It might be getting delivered this week. So we've had a few pre-production boards that we've been trying to, you know, work through and figure out how to boot so that we're ready to go once that arrives. So that system's called Myer. It's got the Cavium Thunder X2 in it, and it'll be about 32 nodes. And then where we're going, what we're planning on is a larger, pediscale class platform that we're planning to procure in 2018, which is what we're kind of calling Vanguard. So this is what our efforts are moving towards. Schedule, we've been putting out a request for information on a periodic basis. This slide's just kind of giving a timeline. We're going to continue doing that, issue an RFP, go through negotiations, and hope to start standing up the platform in the July-August kind of timeframe. Okay. We're building a new facility at Sandia, a new computing facility that this platform will be going into. It's building on to what was originally the Red Storm Building. That was one of the first Cray XT systems, one of the first large-scale 64-bit X86 systems. So we're building on to that with an energy-efficient data center that has some unique capabilities to improve energy efficiency, reduce cooling needs, and it's being designed to be primarily water-cooled. So we're speccing the system out to be water-cooled. It has expandable power infrastructure from 7 to 15 megawatts. This is kind of a rendering of what it looks like. This is the new building being built on to the existing building, and this is kind of a fun picture of some people with gold-plated shovels at the ground breaking. Okay. So that's Vanguard quickly. So now I'm going to talk about the Trilab Software Sack effort. So a lot of people are working on the ARM Software Sack. It's being mature. There's been a lot of great progress, but we need to mature it for our competing workload. We have large applications that we spent decades building and billions of dollars, and we have to get some confidence that we can actually crank them through the compiler tool chains and so on and actually get them running on the system. So we need to harden the compilers. We need sound math libraries. We need all the things that users expect in tools and really high-performance communication libraries. Many of our applications now are using highly templatized C++ code, so these can cause some issues for immature compilers. We mix Fortran and C++ code, so we need to be able to call between the two and have them interoperate well. Sometimes we end up with multi-gigabyte binaries, and this causes issues when linking and relocation errors and things that we're working through now, and often all the templatized C++ leads to long compile times, so it's not unusual for a single object file to take a couple hours to compile with certain compiler tool chains. So we need to work through all these issues as part of the Software Sack effort, and then we need to optimize performance as well. There's a lot of things in the x86 ecosystem, for example, that we take for granted. Things like optimized memory copy, having good versions of memory copy and other intrinsics built into the GLubC library, just simple things like that that need to be worked through and fixed to improve performance in the math libraries as well. And then we also have to verify that we're getting the expected results, comparing to our existing runs, making sure that on ARM we're getting the results we expect comparable to other platforms. This is very important for us. Let's see. So as part of this effort, we need to build an integrated Software Sack all the way from the user-facing programming environment that has all the compilers and things I just talked about, to a low-level optimized OS that's got a very streamlined computing environment for high performance computing, job scheduling and management. We need all that in place and we need to be able to manage and boot the system in an effective way. So for tools like system monitoring and other things, they need to be working on ARM. And overall, what we want to do is improve the zero to 60 time, the time to get going from once an ARM system arrives to once we can use it to do productive work. So the way we're kind of specking out the Vanguard procurement is we're expecting the vendor to provide a baseline software stack that works and is capable of launching parallel applications. But from then, thereon, we want it to be a collaboration with the vendors to work on this software stack. So the Trilab team, we need to do the work to integrate it into our computing environment. That's a big job. We need to identify gaps and collaborate with the vendors to resolve them. And then we want to be able to get lab-developed software. We want to leverage the non-production status of the system to also get lab-developed software into the software stack and work with the vendor to do that. Regardless of which vendor we choose, we want to have the flexibility to develop alternative software stacks and deploy them on the platform and test them out so we can look at various different options. Okay, so this is going to be really hard to read, I think, for folks out there. But this is kind of how we're thinking about staging the effort, I guess, and how we're going to do acceptance for the platform. So we have multiple different columns here that are progressing this 2018 system through different networks. And so we'll start out on the open network, and we don't expect the software stack to be fully baked at this point. But we want to do some simple tests, like run HPCG and HPL and make sure it works, and run some other micro-benchmarks, and stand the system up and get some confidence with it. Then we'll move it into the next column into a more restrictive network where we can actually run more realistic workloads that are export-controlled and get more users on the system building things. And then finally, we want to move it into a closed network where we can actually run the production workloads and demonstrate we can run real multi-physics applications at scale. So here in the first row, we have kind of the applications that we're targeting from each of the different labs. In the middle row, for the software stack, we kind of have some of the milestones we want to meet. So in the first milestone, one thing that's very important is that the vendor provides us with all of the proprietary vendor, proprietary and closed source things that we need to stand up an alternative software stack on the platform. So the milestone one, I guess, is we take all that and we rebuild it and demonstrate we can reboot a slightly modified stack onto the system. At milestone two, we want to demonstrate we can build an alternative software stack and boot it on the system and demonstrate it at running jobs. And then in milestone three, probably in the open network, we want to demonstrate that we can containerize these environments and kind of run them on demand as user defined software stacks on the platform. That will also let us potentially explore some additional workloads and machine learning and AI that are also within scope for this effort. And then on tools, again, milestone one, we don't expect it to be fully baked. So we're asking for one solid compiler and one MPI runtime that's capable of running our target applications. And milestone two, we want to expand that to two of each where one could be vendor proprietary, but at least one of them has to be open. So one open compiler and one open MPI runtime. So that's kind of a high-level overview of our efforts. And we're of course going to be doing more than this. We're going to be looking at additional applications, but these are the ones that we're kind of focused on moving through these different networks. Okay. So that's kind of the high-level overview of the software stack effort. Now I was going to talk about what some of the different options are that we're thinking about and incorporating. And we may do some combination of all three of these. So the first are the Trilab operating system stack that's led by Lawrence Livermore. And they develop a software stack that's based on Red Hat Enterprise Linux 7 that supports three architectures, X86, PowerPC, and ARM from a single source base. And so they build this in house and their specific target are the commodity technology systems we buy that are kind of the turnkey clusters. So we go out and get a vendor to, you know, a vendor provides us with the hardware and then we stand up the software stack on it. So there's about 4,000 packages that build for all three architectures plus a couple hundred packages that are custom built for TOS by Livermore that add, you know, optimized compilers, vendor compilers, optimized MPI, and kind of the other things you need to stand up a integrated software stack. The baseline is built, tested, and integrated, but it's not optimized for any particular system. The end labs usually do that sort of optimization, typically building a tuned programming environment for their site. And they're partnered with Red Hat to work on pre-GA hardware, such as the ARM, the Cavium X2 systems that they've been working with Red Hat on that. So some concerns from them as they've been doing this. There's some pieces missing on ARM. The Nvidia CUDA toolkit and driver are missing. AMD Rockum support isn't there. It has some x86 embedded dependencies, things that we need to run systems like security scanners and backup tools are missing, and similarly third party software. So these are some gaps that, you know, just aren't there yet due to maturity issues. And over time those will be filled in, but currently these are some of the gaps that have been identified. The system's been built, but we've only had a handful of nodes to test on so far. So, you know, we're going to be doing larger scale testing on the Comanche systems that we're getting very soon. And then lastly, right now due to license restrictions, it's hard to give TOS to people outside the labs. We can give it to vendors and we can collaborate with vendors that are bidding on proposals to work with TOS, but right now it's difficult to give externally, so that has, you know, a negative for developing a wider, broader ARM ecosystem with the software. Okay, then, you know, we heard today that Cray has announced their XC50 ARM plans. So this is based on Thunder X2, building a blade that fits into the Cray XC50 architecture that is based, that has the Aries network as well. So you can have a, you know, high-end production type system that mixes and matches different processing technologies that all plugs into the same high-performance network. And so along with that, Cray announced that they're porting their software stack to ARM and that includes their optimized Linux stack for HPC that's based on SLS12. They have a programming environment, a user-facing programming environment with all the tools that you'd expect. And the Cray compiling environment as well for ARM. So this is big because a lot of our codes depend on robust FORTRAN support and the Cray compilers have that. So this is a promising direction for supporting FORTRAN on ARM. And similarly, it's production proven, our users have a lot of experience with it and it's the same kind of software stack that runs on our current ATS-1 Trinity system at Los Alamos. So some concerns about it, it's vendor proprietary. Ooh, I'm out of time. Ooh, okay. Ah, sorry. It's vendor proprietary and we've been working with Cray to figure out how we can plug our software components into it. So I'm going to skip over some slides, I guess so I didn't really time this out. So compiler dashboard, I wanted to say that things are working surprisingly well. We've been on crank a bunch of codes through it. There's some issues, but we're working with the vendors to rapidly resolve them. Performance is looking good, there's really excellent memory bandwidth on Thunder X2 and it's on par for Compute. So for realistic workloads, which we actually care about, this looks like a really balanced and really good platform for us to run on. And we expect it to get a lot better with the pre-GA hardware, sorry, with the general availability hardware and with the software tuning and performance optimization we're going to do. So yeah, over here on this column we've got a bunch of different workloads. We've looked at different mini-apps that stress different capabilities that the labs have developed that we care about. We've looked at the GCC software stat, GCC compiler tool chain, and then two vendor stacks that we've kind of anonymized here. And high-level picture is there's a lot of green here. Things are working really well. There's some issues in compiling OpenMPI. There's some issues with compiling heavily templatized C++ code. We file bugs and issues and we're getting great response on resolving those. Okay, so that's it. We're trying to mature the Vanguard ecosystem and work on the software stack. So thank you.