 All right, so hi, everyone. I'm Jonathan Balkand here from Princeton University, and I'm representing my research group, which is led by Professor David Wentzlap. I'm going to be talking about our OpenPTOM plus Ariane platform, and how we run that on Amazon F1 and make it available for you to use for software or firmware development in the Cloud, without needing risk-fied development boards. So this work is part of the decades project which is funded under DARPA's Software Defined Hardware Program. The goal of the program in general is building runtime reconfigurable hardware to accelerate data-intensive applications. So we have a big focus in decades where we're trying to build a full-stack system which Boots Linux has a large number of accelerators and has what we call intelligence storage tiles to make more efficient data movement in the system. This is a collaboration between our group Professor Wentzlap's group at Princeton, Professor Martinossi's group at Princeton, and Professor Luka Carloni, who you heard from earlier, at Columbia. A big thing here is that we make all of our tools open-sourced and available under permissive licenses like BSD and Apache. So OpenPTOM plus Ariane is a collaboration between ourselves at Princeton and the pulp team at ETH Zurich. Essentially, we want to build a permissively licensed Linux-capable many-core research platform, which uses a risk-fied visa. ETH, they developed the Ariane core and we saw them at risk-five workshop, give a presentation, and at the end they said, we want to do multi-core, but we're not sure how to do it. We had an open-source platform called OpenPTOM, and so we got in touch with them and we said, hey, we could make a really cool many-core risk-five platform, and so we started that collaboration. Because these were both mature and extensible designs, we were able to bring the core and the many-core platform together. Within about six months, we were booting SMP Linux, and it was the first open-source SMP Linux booting risk-five many-core. So Ariane itself, you may have heard of already, it's being used in a large number of projects at the moment, is an application class processor, it's written in System Verilog. In my opinion, the RTL is very elegant and easy to read, and it was designed to be Linux-capable. It was designed specifically to boot Linux, but it can boot other unices as well, and it's got all that you need to do that. So it's got the all-in caches you need, it's got the privilege modes, the CLBs, and so on. It really is designed for extensibility. We had a Google Summer of Code project this past summer, and our student within three months was able to build a new branch predictor and extend the core to be dual-issue core instead of a single-issue core. That was a newly graduated undergraduate student who was able to do that. From our side, the OpenPetan platform is an open-source many-core research platform. We've been developing this for about six years at this point. It's been contributed to the open-source hardware community for about five, and it's written in Verilog RTL. Everything we write is in Verilog, and everything we write comes under the VSD license. The ID here is building an extreme-scale many-core, which is still OS-capable, and so we have a cache coherent system that we call Pmesh, which can scale up to half a billion cores, and we provide a lot of configurability options so that you can choose to enable, disable certain features or change the size of structures in order to test a research idea in multiple design points and figure out where the optimal point is for your design. Our RTL has been around for quite a long time, and as a result, we support a variety of different tools, both in simulation and for back-end and census and so on, and I have an asterisk here for Riviera support. This week, we got a PR from the developers of the Riviera simulator, and they had just implemented all the support and submitted as a PR to us, and so that's now upstream in OpenPetan. Another important part here is it's all been verified for ASIC, and we've done a very thorough characterization and released all that characterization data open-source at openpetan.org. So you can go and see real data from real chip that we taped out, 25-core, many-core, and use that for furthering your research, but also just in terms of philosophy, we want to make as much open as we possibly can. And the chips that we have are back in the lab in debut at Debian Linux, which is pretty great as well. So OpenPetan plus Arian is a tiled many-core, and so you can see a single tile here, and on the left, we have the Arian core, and it's L1 caches, and on the right, we have the Pmesh cache system from OpenPetan, and we have our private second level of cache, which is the L1.5. We've got our L2 cache, which is our LLC, and that's distributed and shared across all of the tiles in the system, and so there's a shard in each tile, and then we have our three knockritters which implement the Pmesh cache coherence protocol. And so we started out with the Spark ISA, and when we wanted to move to risk five by integrating Arian, we had to make changes on both sides, and so on the Arian side, they implemented a new cache subsystem that's specifically tailored to our cache interface, and then we, on our side, developed not only that cache interface to make it better improved to work with Arian, but we had to add the risk five atomics, and so those are all supported in a scalable way in our cache coherence system. And then once you've got your whole many core building up your chip, we also have the chip set, which is on the right hand side here, and that's where all the peripherals live, it's where accelerators live at the moment, and it has things like the boot ROM, which is generated when you compile your design, and that includes the device tree so that you can go ahead and run Linux and run the same Linux image across a variety of different OpenPetan designs. We provide the risk five debug spec compliant debug unit, and so you can actually use GDB, plug into the JTAG of your FPGA, and then use GDB to step through your Linux boot, which is pretty cool. I was really impressed that this was actually possible, I didn't think that was possible before we did it, and then we have all the kind of intro controls and so on that you need. In terms of configurability, we have a variety of parameters that you can configure plus more than you just see here, and again, this is really about giving a variety of design points. People can design the chips that they want to design and come up with what the correct optimal point is for what they're trying to do, and you can also see the bottom here, we have a variety of boot loading methods include the JTAG that I just mentioned. In terms of FPGA prototyping, we support a variety of Xilinx FPGAs at the moment, and these vary in price, and the number of cores you can get, the speed you can run at, and so on, but the one that we're going to be talking about is Amazon F1, and the great thing about F1 is that you can rent it by the hour, and so you don't need any infrastructure on your local machine, and it costs, I don't know what it is in euros, it's about $1.60 an hour, and you can get up to about 15 cores here, so this is a lot more than you could get on any available board, the Boots Linux today, and so if you want to check out how scalable your application is for RISC-5 on a real multi-core system, you can do this and get good scaling. So the reason that we did this is that we really care about making OpenPton available freely in open source, and so we want people who don't have FPGAs to be able to use the platform, and that means, you know, hey, you don't have the thing in front of you in the desk, you don't have to go and spend the money to buy one, you don't need to buy Xilinx, Vavato licenses, that sort of thing, and we want it to make it available so you don't need to recompel the hardware too, so you can do this sort of software firmware development, and so F1 gives us this kind of cloud rental, this is a great step forward in that direction, and you can do things like rebuild in the cloud, so when you're renting your FPGA system from them, you don't need to buy a separate Xilinx license, it's all available for you on the system, and so if you're kind of bursting in terms of you want to do a bunch of FPGAs at once, maybe you don't have the hardware for that, you can use this, or if you just want something that's much lower cost than spending $8,000 on an Ultra-Skill-Plus FPGA that you're getting here, then you can rent this for about a buck, 16 hour. Another great thing here is that there's a repository of bit files for the FPGAs that are available, and so we make available what they call the AGFI's, we make available an AGFI for a nine-core OpenPetain-Plus Ariane system, and this is really great as an alternative real hardware system, so if you want to do your software development, you want to test things out, you can do that using this without having to touch Vivado, without having to go and look at RTL, you can just go on, grab that image, and get going. Now if you do want to build your own, then we try to make everything as push-button as possible, so you can run a single command and get back a bit file on the other end and then use that to program the FPGA in F1 or for other FPGAs, and in this case, it takes a few hours to generate this. The FPG is really big, it's awesome, but it means it takes a while to do since it's placed in root for it, and you'll note here, it produces Xalanx Design Checkpoint, that then gets fed into some other system on Amazon in a queue, and you kind of wait in the POS process to produce the final AGFI image. You can run one core, about 125 megahertz here. This is reasonably in line with what the real hardware would be. It's maybe like 10X off or something like that, so for computer architects, that's really close. We're used to talking about simulation where you're running a 1000X slower or whatever, and you can get up to 12 or 15 cores on this board. So if you want to go ahead with census, you can go and check our GitHub and give that a shot. The design itself here, you can see kind of what ends up on the Amazon FPGA in the end, and on the left-hand side, we've got the OpenPetan side of things, so this is the stuff that we contribute. So our system is the chip in the chip set, all the peripherals, all the cores, caches, and so on. And then that connects to, we have this UR up here. And so the UR gives you a serial terminal, and we have a driver in Linux, which we'll pull on that and give you a serial terminal that you can connect to. In the middle, we have LEDs, switches, clock trees, et cetera. These are all virtual, which is an interesting thing. So normally we're just having the FPGA board in the desk and you can flick the switches, you look at the LEDs, you're trying to figure out what's going on. But it's all virtual in F1. So in your terminal, you can open up, you can run a command and it'll show you like a virtual LED flickering, just kind of going from zero to one. And then we've also got our AWS crossbar here, which connects to the ACSI interface for memory and for DMA. And so that's where the main memory is. That's also where we put our kind of virtual SD card, comparable to the real SD card we have in our other FPGA designs. And you can really quickly DMA onto that to start booting your OS. So in terms of booting the OS, the first step of this is that we have what we call a zero-stage bootloader. This is really straightforward. It's executing from the boot ROM, so it was compiled in when you built your design or when we built the design that you use in F1. And the thing it does here is kind of parse the GPT partition table in the SD card, grab that first partition and dump that into memory. And so then that can contain whichever bit loader you want to run or whichever other kind of operating system or low level firmware you want to run. And so in our case, we're using VBL at the moment and that does some standard things like set the trap table, start SNP and get things ready for Linux. So parsing the device tree and so on. Then in our case, we're producing VBL blobs that have Linux and build environment built in. And so that starts between Linux. It does all the driver loading from the device tree that's there and specified. So if you add a new peripheral, you add that to the device tree, then Linux will be able to pull that automatically. If you go between boards, Linux will know whether or not to load the driver for a particular device depending on whether it's there. And then we finally get into our busy box environment. And so to get running, you can do the whole thing with basically four commands. So you can load the image. This is the AGFI that we have available. Start up the UR and get your serial connection and then DMA, whatever your operating system blob is, whatever you want to be on the virtual SD card, you can DMA that on and then just reset the FPGA and it will be going. So I'll show you a quick video here to give you an idea of what that looks like. And thanks to my groupmate, Gregory, for doing this. So you can see here, first we load the image on the FPGA. It only takes an instant. We DMA the operating system here. I'll note that that, oh, is that just, okay. I'll note that DMA only took like two seconds to copy two gigabytes. It's really great. I was just restarted for some reason. And then you will open up your UR and then reset the FPGA. We just have a simple reset command that clicks the virtual button. And then we can open up our serial terminal and by the time that we actually get the serial terminal opened, you'll see in a second, we've already run through the bootloaders and Linux is starting to boot. And then we'll skip through in Linux boot here toward the end. You can see standard Linux boot, standard driver bring up. We have a specific driver for our SD or virtual SD and all of this is really broadly unmodified. And once you log in, you can run, you know, cat proc CPU info and you can see that with the AGFI that we uploaded, you get nine Linux capable Arian cores. And so from there, you can put other things in your builder environment. So the SDK that we use, you can check out that repo and use it really straightforwardly. You just either provide your own cross compiler or it will build a cross compiler for you and then just run make all and it will generate everything you need. And so the components of that are there's the cross compiler tool chain. We have the modified version of VBL that we use. We have a build route environment. Build route is really great. It's been used a lot in the cloud, especially to provide small Linux images. And our friends at low risk have actually got to the point of having a full graphical environment running on Arian using build route. So you can have kind of mouse keyboard, VGA and it's all through build route even before you have to go to the point of having a distro. We also provide a root FS overlay. So if you want to just bring along a binary that you built to run your application so you want to test scalability, something like that, you're not running a full distro. You can just copy this into root FS and it will show up inside that file system that you were running on the F1 FPGA. And so that's extremely useful too. In terms of customization, if you have new drivers you want to add, if you want to change the config, maybe you want to change the schedule or you want to add more debug information, you can do all that through build route in a really straightforward way. And then you can kind of save that in and have your own configuration. You can even replace through the configs down here, you can replace which Linux kernel you're grabbing whether you want to grab an upstream one instead of the one that we have or you want to make your own patches and so on. And then on the right, in terms of build route and busy box, there's a really large number of packages that are available. As I mentioned, you can have like basically a full GUI environment, but there's all sorts of things like databases and stuff like that out there that you can put in build route. And so you can choose whatever applications you want, test that those work on risk five using OpenPetain plus Ariane. And this is like a really convenient environment. Now I'll talk a little bit about the things that we're still looking to do. So we are looking for assistance in making this a more usable and better platform. We're in the process of moving from VBL to OpenSBI for firmware. I've got that to the point of like entering an infinite loop somewhere that I need to debug. We're in the process. And so OpenSBI is kind of becoming the standard there. We're interested in other bootloaders for real bootloading. As I mentioned, we have this environment where you've got the first partition on the card and it kind of copies stuff over into DRAM to get running. We would like to actually use a real bootloader and have real drivers in the bootloader to pull things off the SD card in a more elegant way. We are in the process of doing this drill bring up. So we have a Debian CHroot environment in F1, so you can just kind of go in and CHroot into Debian and then you can run kind of whichever applications you want to run that will work without system reboot. And so we've tested all sorts of applications there. We can natively compile. We can do all sorts of other stuff. But the bit that we're missing is getting the whole way through regular system reboot. And then you can actually have the full Debian environment. We're also interested in other distros. We know that people have got Fedora and other things running. So if you work in another distro, we would love to see that running on the platform and you can go on and rent these FPGAs and give that try yourselves. And then the other thing is testing parallel software scalability. You've got a bunch of cores available here. You can go on and see and start to make optimizations and understand if you have this kind of cash coherent many cores between Linux, what optimizations can you make and so on. And so we're interested from our perspective but we also know there's a lot of applications out there that you might have that you want to see running on risk five and understand kind of what the benefits of risk five are. And so with that I'm gonna finish off. We have a variety of sponsors ourselves and ETH to thank but most recently we received some funding from Amazon. It turns out if you're doing research in these sorts of open source platforms, Amazon is very willing to help fund your exploration on F1 so you can kind of put in requests and get funds from them that way. And so I'd like to open up for any questions. You can check our websites, pubplatform.org and openpittown.org if you wanna find more. We have a Google group, we are on GitHub, we will accept issues and so on. So you only have about 18 seconds.