 At the beginning of the 21st century, the Earth needed to find a new way to keep up with the data from over 30 billion connected devices. Just 30 billion. A bold group of researchers and computer scientists in Silicon Valley had a breakthrough. They called the machine. It changed computing forever and it's been part of every new technology for the last 250 years. Everything? Everything. This year, you live Packard Enterprise will preview the machine and accelerate the future. Our next speaker is someone I've known for a very long time and I think most of you know him as well. Grant Likely has been in the Kernel community for over a decade and has worked on a variety of things to make this community stronger and better. Recently, Grant joined HPE where he's working with that organization on Linux technology and helping them be a better partner in our community. Welcome, Grant Likely. I'm well aware that I am the last thing standing between you and the activities and beer tonight, so I figured it was appropriate to wear the red shirt. So, yes, as Jim said, I have joined Hewlett Packard Enterprise. I've been working in the Kernel for many years and I'm working with their Linux team to help be involved with the community and be part of the upstream development for the products that we ship. But what I'm here to talk to you about today is the research project that we have within Hewlett Packard Enterprise with absolutely no hubris whatsoever. I'm saying we're talking about the future of computing and this project that is called the machine. Now, you may very well have heard, you've probably heard us other people from Hewlett Packard talking about the machine over the last couple of years. Keith Packard gave keynotes at LinuxCon and LinuxCon Europe last year. You can go and see videos of Martin Fink. So I'm, when he announced it originally. So I'm going to give, I don't want to rehash things that we've already talked about so that you can consider this as SQL. I'll give a brief summary of what the machine is and then I want to talk about how we design software to actually make use of this architecture. So to begin with, I just want to remind you that this is a research project. We are exploring new architectures and computing. So this is subject to change, but this is where we see the technology right now to get started. I'd like to take a look at how we build machines right now. You look at computer architecture, one of the things that you will see is that we have a hierarchy of storage. Depending on where you look at stuff, we've got the CPU with its local registers and caches. You've got main memory, which is direct load store access off of the SATA bus. You will have your SSD or you'll have rotating disks. And then you'll have a network connection. You can have data stored over on a storage area network or using our DMA to access it. And each one of these different locations where we store data has a different software stack that's used to access it. And then we spend a bunch of time in our applications moving data around because we need to get data out of storage into memory to do the work that we need to do and then send it back to storage. So the machine project is really asking one very simple question. And that is what would computer architecture look like? What would software look like if we were able to collapse the storage hierarchy? How would our software change if we were able to have all of our data in a large pool of direct accessible with load store instructions memory? How would that change the performance of the machines if our entire data set could be instantly available from your software without having to worry about different storage stacks depending on where the data was? And that is what we are attempting to build with the machine. Let's take a look at it from another in another way. When we build data centers right now, this is what we do. We take a bunch of machines that have their own local storage. Some of them are compute nodes, some of them are storage nodes. We put them in racks, we put them in the data center, and we connect them all together with the network. And then we have the software to move the data around back and forth. And we've been calling this processor-centric computing because we are designing our software solutions based on the limitations of each of those individual machines and the storage limitations that are there. That's why we have the different technologies. Now, if instead we could start our architecture by taking all of the storage and all of the memory and creating one large pool that we put at the center and then attach all of our processing elements around the outside and giving them all equal access to that. We haven't been able to do this before because the technology hasn't been there but we think that the technology is moving that direction and we're getting prepared for it now and this is exactly what we're building with the machine project. So I would like to show it to you right now. Currently we're working on the first generation of research hardware so that we can explore, develop the software that works with memory-centric computing. And this is a picture of one of the nodes that's on that hardware. And when you look at this, there's three important parts to each node in the machine. First thing you'll notice is we've got four terabytes of fabric attached to memory. We've then got a high-performance SOC, multi-core. And third, we have a fabric switch interconnecting them that gives us access from the SOC to any location in the fabric attached to memory securely and can be allocated for our applications. One node actually not very interesting. Yes, there's a lot of compute power, yes there's a lot of bandwidth, yes there's a lot of memory but it's still just one node. Where it becomes interesting is when we take ten of these and populate an enclosure and then we connect the fabric switches between all the nodes into one large pool. Each one of those nodes has four terabytes so in an enclosure we have 40 terabytes but to the processor it only sees one large pool. So each SOC can be mapped direct access to any location in that 40 terabytes of fabric attached to memory. And then we have the capability of putting up to eight of these enclosures in a rack and then finally we top it off by putting a bog standard PC at the top of it simply to manage the allocations and manage the whole machine. So this is the hardware, this is the capability that we're working with and allows us to test out to do the research on what software would look like. However, how do we actually use it? Sure we've got this huge pool of fabric attached memory but now we need the software to make use of it and we live in the real world. We know that the machine isn't going to be useful if we can't run existing workloads on it if we can't run all of the existing software to be able to do map reduce to be able to run Apache Spark to be able to run the big data workloads. So very early on we knew that we were going to be running Linux on the machine and we knew we were going to be running a lot of existing software. The trick is then teaching that software how to make use of the large pool of memory. And I want to step through that of each of the layers that go through this. So let's start with the operating system, start with Linux. Fabric attached memory is a big pool. But it is big, it's fast, it's persistent, but it's not coherent. So we're not able to run an SMP operating system between all of the nodes. We have to either come up with a new operating system or adapt Linux to deal with this non-coherent memory. But take another look at the diagram that I have here and you'll notice that each of the SoCs happens to also have 256 gigabytes of DRAM each as local memory. And when we've got a processor and we've gotten memory, it becomes very easy just to put a Linux instance directly on that SoC. And that's what we do. We put every single node has a separate instance of Linux. This does a couple of very useful things. To begin with, now the computer just looks as a familiar software environment. We can run all the existing software that we have now. But the other thing that is useful is now the machine, a fully populated machine with all its nodes, looks an awful lot like any other Linux cluster. We can use all of the same tools that we currently use for managing Linux machines, for deploying workloads, for managing virtual machines, for managing containers that would be running on the architecture. The significant difference, though, between the machine and a traditional cluster is every single node on the machine has attached to it this slightly odd but very large peripheral that represents the fabric attached to memory. And that is a concept that we were able to build on to start using useful abstractions. So let's get up to the next layer of the software. The big pool of memory isn't useful until we start slicing it up and assigning parts of it into applications. The hardware that we have allows us to do allocations eight gigabyte blocks at a time. And we're able to securely choose which blocks are assigned to which, how blocks are assigned, and then also which nodes have access to those blocks. So we have eight gigabyte allocation units, and we've been calling those books. We can do logical allocations of one or more books so that we can get very large applications. What do you put books on? Well, you put them on shelves, of course. So the logical allocations we've been calling shelves. And then finally, we have software running on that top-of-rack management server that we call the librarian that manages the shelves, manages the allocations. That gives us the capability of allocating from the memory. And now the final step, the final piece, is exposing that to the Linux operating system running on each of the nodes. And the way that we've done that is we have written a new file system that runs on Linux called the librarian file system, whose sole purpose is to expose each one of those shelves as a file in the virtual file system layer. And those files, it's the simplest possible file system. We just want to expose the files. User space applications could open them. You can read and write from them. It could become interesting when we mem-map them. And then that gives user space applications direct read-write store to fabric-attached memory. And the best part about this is we don't assign one shelf to one node. We're able to have multiple nodes all sharing the same shelf. And this is where things get interesting. And this is also where we can, again, we can build on top of that. And that's what we can do. A trivial example is we can take a shelf, assign it to a node, and then loop back-mounted. And that gives us a file system. We can put a persistent EXT4 file system for node, which great works, but it's not really very interesting. It's not where we want to go. Where things get interesting is when we have software libraries that allow for shared access to that region. One of the things that we have a library that we've been working on that provides a structured object store within a shelf that can be shared between that. We have another library that provides mutual exclusion between nodes, because when we're dealing with this, especially with a non-coherent block of memory, we need to coordinate between the nodes. Your application can directly access its data, but it still needs to be able to coordinate with other nodes when that data is going to change. And then we can have application-specific when we have to do in-memory databases. And these are areas of research that we're able to work on now that we have hardware that implements memory-centric computing. And everything just builds up on top of that. So we've added things to begin with with the operating system. We've got the Linux kernel. We are endeavoring to use upstream as much as possible. And we're also building all of this on top of Debian because we see Debian as a very stable base that gives us the capability that we want and the ability to get in there at the pieces that we need and the technology that we need back into Debian. We have the drivers for the kernel. We have the librarian software both that runs on the management server and on the nodes. We have been looking at RDMA because, again, this is all enabling applications that we already have. And so we can use applications that are already using RDMA can be ported and make use of the speed benefits that happens when you've got access to all of the memory directly. And then finally, we've also got tools that we've been ridden. As we've been waiting for this hardware, we've already started in on the software development. So we've started with QMU to implement an emulator that allows us to demonstrate. You can take a regular laptop and bring up virtual nodes that have the same behavior as nodes of the machine. And we also have performance tests on that. So now why am I talking about this and all the projects that we have internally on the machine? One of the things that we decided a fundamental thing that we decided very, very early is that we were going to make use of open source software as much as possible. That's not our strategy. Everyone is depending on that these days. However, before even any of this hardware is available, we want to open source what we're working on because memory-centered computing is, we think that this is how data centers are going to be built in the near future. This is where we see computer architecture going. And if this is where computer architecture is going, then being ready for memory-centric computing is a fundamental thing that needs to get out there in this bigger than Hewlett Packard Enterprise. So we are releasing as much as this as we can early, before hardware is available. But not only are we releasing this into open source and making it available, but all of the development that we're doing, we are moving outside of the company firewall. We're putting stuff up on GitHub and that is where we're doing the development internally because we think that there's others who are also involved with this that will want to see what we're doing and will want to collaborate on how do we make the application stacks that we have right now work in a memory-centric computing environment. So this is my call to you. We are releasing this as much as we can. We're releasing as much as we can under GPL with DCO process because we're trying to eliminate barriers to collaboration. It is all there ready to go. I encourage you to go and take a look at our homepage which has links to the projects that we've already open sourced and is also the place to go and look as new stuff is released. And finally, in my very short amount of time here, I've not been able to go into detail, but I don't need to because there's already a bunch of information out there. Go take a look at Keith Packard's talk from Linux Conf Australia earlier this year or Rocky Craig when he presented at Vault earlier. There's lots of information. There's lots of things to get started. Finally, come to the booth at the upstairs. We have the hardware that I showed you today. We have some space models there and I'm looking forward to working with you on the machine. Thank you very much. Thank you very much. We have a reception upstairs at the hub directly following this and we also have a scavenger hunt going on through our new mobile app which is called Pokemon Go. No, it's actually our mobile app. So please use it. See you up at the reception. Thank you, everyone. Thanks, Grant.