 So, welcome all. Thank you for coming to our little talk today about reconfigurable computing for Linux. My name is Vince Bridgers and this is Eve VanderVenay. I work in the OpenCL HLS development team at Altair now Intel PSG and Eve works in the Linux development group running the Linux development team. So, I'll start and I'll present a few slides and then I'll turn it over to Eve and then I'll continue. So, today we're going to give a brief introduction to heterogeneous computing. We'll cover a range of system structures that we think need to be supported. Some interesting use cases and then we'll present what we consider to be a proposal at this point in time for heterogeneous computing architecture for Linux. This is not anywhere final, it's just something that we're thinking through. And we'd like to get your feedback if we could at some point. So, how many people have tried to program an FPGA and use it in an embedded system? Was it easy? I guess it depends on the problem you're trying to solve, isn't it? Did you write HDL or did you use some of the higher level languages or different things? Well, so this can be quite difficult depending on the problem. Intel ISG has open CL tools that help with the data parallel tasks, but we hear from customers that these are not easy to use. So, this architecture tries to address this problem and there's a really nice quote from Steve Jobs, I'll not read it to you, but US programmers know that parallel programming can be quite difficult depending on the kinds of problems you're trying to solve. So, with this I'll turn it over to Eve and you shall continue. Thank you. So, we have a few objectives with this architecture proposal. The first one is of course to use open source software and have a reference implementation as a starting point for developers. As far as FPGAs are concerned, there's already some infrastructure in the Linux kernel for that, so we want to leverage it. We want to accelerate the adoption of acceleration technologies across the market segment from embedded to larger systems like data centers. We also would like to make the interfaces open, that's clear, but also have possibly vendor-specific plugins for innovations and differentiation as everybody needs them. So, if we look at a heat engineer system, we have different components. We have the CPUs, obviously, which can be in a S&P configuration. We can also have GPUs and DSPs, that everybody knows, but we can also have these days FPGAs. All these components can be connected over AXI, PCIe, you name it, and the reconfigurable part comes from the fact that some of these components can be reprovisioned with different features, and today's focus will be on FPGAs. So, since there was just a few hands up for FPGAs, I'm going to quickly remind everybody what it is. So it's a field-programmable gate array. That means it's an array of programmable logic blocks, some are generic. They provide features such as gates and flip-flops, and others are specific, like multipliers and transceivers. The FPGAs designed to be configured after manufacturing. The user design is written either in a hardware description language or also in C these days, where you can take your C code and translate it directly to gate. And this lovely design is then compiled into Bitstream. Bitstream is used to program the FPG itself. In embedded devices, usually it's at boot time. The FPG will program itself from a non-volatile storage, but we can also configure it at runtime with the OS. There are two types of configuration, either the full configuration that most people are probably aware of, where you configure the functions and the IO at the same time. You can also do partial reconfiguration, where you target some section of the FPGA to change a function, for example. So the typical workflow, you start with your FPGA design, you turn that into a Bitstream, and then you load it into the FPGA. There are some use cases that I wanted to list here. In the industrial market, a lot of people do model control with FPGAs. In the multimedia world, they do video and image processing. Telecommunication could be a packet of loading. And then in high performance computing, it could be search engine acceleration. So these are two typical systems that we need to take into account. On the left-hand side, you have, I guess, a picture of what could be a typical embedded system with the CPUs, the FPGA, possibly a cache currency unit, and then your interconnect. On the left-hand side, you have more a server type of system with CPUs in their sockets, connected to one another like Intel does over UPI, and then each has its PCIe root port and connects to the IO that way. So these are different interconnects that the architecture is going to have to support. When it comes to use cases, there are ranges of use cases that have different demands on performance. And if we go to the far right, we have the high performance computing for example, climate modeling. These are going to need a lot of CPUs or a lot of loading to have acceptable runtimes. Another aspect of FPGAs is that there are many studies that show that FPGAs tend to consume less power than GPUs. So for people in data centers, that's critical because a lot of the building goes into air conditioning. Wrong direction. So there are a few existing technologies that support reconfigurable computing. This is not a comprehensive list. The first one is a Linux kernel FPGA manager. This is currently upstreamed and is being actively developed. There's OpenCL. OpenCL is a tool to develop complete software applications. It was intended to be a very, very low level development tool, but it's being used to solve problems. So I think one of the things that we'd like to do with this reconfigurable computing architecture is bring that up to a little bit higher level to not only support OpenCL but other ways to solve these types of problems. And then there's high level synthesis, which is a component of reconfigurable computing but it's not really the focus of our talk today. I want it to include it to be a little bit inclusive because it is part of our OpenCL product offering that we have at Intel PSG. This is our current Linux kernel FPGA manager framework. We support a higher level interface to different APIs to control the FPGA. And these are basically for just programming the FPGA, reconfiguring the FPGA, enumerating the types of FPGAs that are available. And this is still currently under development. And then there's low level FPGA device driver interfaces to interface directly to different FPGAs. This did not come out well. Hidden animation. So this is our OpenCL programming development flow, and you'll see this across different vendors that offer OpenCL. On the left-hand side is the host application. This program can be written in C or C++, and these contain the low level APIs that will download and run your kernel. This is typically on Linux. You compile this using GCC, and you use a vendor runtime libraries to get the support necessary to download and use your kernel. On the right-hand side is your typical kernel. This is written in C99 today. And this is where your work is being done to offload certain types of tasks. And this is, in the FPGA case, this is compiled and synthesized and programmed directly into the FPGA. We also support x86 and ARM today. So we'll go through some definitions as we work through the slide that talk about the definitions first. So we have CPUs. There's a CPU cores cluster. There can be any number of CPUs in your system. There's shared memory that it's used by the CPUs and your offload components. And the interconnect can be anything, as Eve mentioned. It can be PCI Express, AXI. It can also be other technologies that are being developed and will come out sometime in the near future. So this is another reason why it's important to talk about what sort of reconfigurable computing architecture we'd like to support now so that we can support these different interconnects. There's a layer that provides bus management for offloading the components, a device manager layer that allows you to enumerate the different FPGAs in the system and your different accelerator functions that you could program to your FPGA or your offload elements. For the purpose of this presentation, we'll refer to these functions as accelerator functions, AFs. These AFs can be dynamically inserted and removed. And one of the problems you encounter when you start dynamically inserting and removing is resource management. So in one case, you may want to insert a device. In the event that you don't have enough resource to insert that device, you may want to evict a device. So this is one of the jobs that the reconfigurable computer architecture would provide. So what are some of the management actions that would be supported by this framework? One of the things I mentioned was insert an AF and you would get some sort of success or fail. Another is to remove an AF, you would get a success or fail. You'd want to somehow query the capabilities of your underlying offload components and you might want to initiate an evict action as well. This is not intended to be inclusive, but just a few examples of some of the things that you could do. So one of the issues you run into when you consider PCI Express and AXI are PCIe supports the notion of being able to discover devices on the bus. AXI does not. So the reconfigurable computing architecture needs to be able to support both of these. And not only that, but if you look at the other interconnects that will be coming along in the near future, they may or may not be supporting these types of discovery features either. So there needs to be a way to support this. We refer to these two different types of interconnects as discoverable and nondiscoverable for the purposes of discussing this architecture. So here is one example with PCI Express and AXI. We're proposing the use of device tree overlays. So you can have two different types of applications in this example. And the applications are described using an AF descriptor. The AF descriptor contain information about if the AF needs to be a partial or a complete configuration, the IO resources and the interrupts required, if any, the class of device, the transceivers, IO pins required, and policies for configurations. So an example of a policy would be, do I want to allow the framework to evict my AF or not? Another one would be, maybe I want to assign an affinity of an AF to a particular socket in a system so that I have proximity to a particular core for latency optimization purposes. So these AF descriptors are compiled to device tree overlays. This step is done by the vendor libraries. The vendor libraries can be partially open source and can include vendor-specific plugins. The device tree overlays may be used for discoverable and nondiscoverable interconnects, as was mentioned in the previous slide. The idea is that the device resource manager finds a matching FPGA with the required attributes and assigns that AF to the resources that can host that particular AF. Yes. For compiling the AF, sorry, the AF descriptors? Well, I would refer to the device tree overlays at this point. Yes. Yeah, those are two very good questions. So for the first question, that was a really nice segue into the next slide. So we view OpenCL as embedded within this particular framework. So OpenCL would be described as a package that would be described by the AF descriptor. So one thing we didn't mention is CUDA. CUDA could be treated the same way. But again, as I mentioned early in the presentation, many of these things are concepts at this point in time. So the details have not been quite worked out. It's the same work again. And we, for instance, are facing like if it was suspend resumed and you want to unload the FPGA to save like 20 watts. It shouldn't automatically, after resumed, it should be reloaded as the latest firmware because we have different frameworks for different modems. So whether this framework can be used for not only this OpenCL approach, but also that generic suspend resumed managing your FPGAs in generic way related only to OpenCL. Yeah, so those are good points. We will take those in consideration. The idea is that we will consider these things and they will be part of the framework. So that's, I did that address your question. Yeah. Yeah, look. The presentation of our use case. Yes, yes. So a compliant version. Yes, yes, let's do that. So this particular slide is very similar to the previous slide. The difference is it uses OpenCL as an embedded, basically as an embedded AF within the AF descriptor. The AF descriptor describes an OpenCL kernel and an OpenCL host application. This could be a single stage offload or a multi-stage offload to the offload devices. And to answer your second question about PCI Express. So the idea is that the differences between AXI and PCI Express would be abstracted in the device resource manager layer. Yes, yes. That's right. So you wouldn't need a device tree overlay portion for a device that was attached through PCI Express. Yes. But so one of the benefits you get of using device tree overlays for an FPGA that's attached through PCI Express or AXI is you get to describe resources such as IOPINs. That may be on a particular fixed portion of a board. So that could be one thing that could be done. So here we talk about AF descriptors just a little bit. AF descriptors contain information about the AF required for the framework to instance the device. For each offload device needed, it contains a reference to the FPGA bitstream. Expresses constraints such as the FPGA family, special resources, PINs. Any sorts of policies such as priority request, affinity, proximity to a socket or a CPU. And a list of the devices. And this could contain nested blocks of descriptors. So at the resource management framework layer, this would be a vendor agnostic API because the idea is we'd want to support a broad range of applications from a broad range of users. If you look at the different OpenCL offerings today, they're very vertically integrated. So you have your offering from Altera for example. And it may not necessarily be compatible with other vendor offerings. So this is very limiting for people that want to develop OpenCL applications that work across a broad range of devices. So this is one of the things we'd like to address with this framework. And last slide pretty much. We'd like to support this notion of exposing an AF to a virtual machine. And that would be done through VFIO. That's the way that we see this should be done. So some of the earlier pieces are done. As you get to later in this slide deck, it's a little bit far from reality at this point. As I said earlier, this is meant to be more of a proposal to gather feedback. So yes, that's the idea, yes. So I'm not sure if you're aware there's a reconfigurable computing group that's being hosted and run by John Masters. They met earlier this year and they'll be meeting later on this year again. So in summary, we'd like to support as many embedded systems as well as client service systems as possible. We'd like to support as many different types of interconnects as possible. Support as many different offload elements as possible. And support exposure to AFs through a hypervisor and virtual machines. So with that, that is our presentation. I thank you for your patience and attention. If we have any questions, we'll try to answer them. Yes. That's a very sensitive topic. We should take that offline. Sharon, do you have any comments on that one? We hear you, we would like that too, but it's a little bit beyond the scope of this discussion, I'm afraid. It's way above our pay grade. Let's take over closer. We really a lot of effort to assist them, which is a consistent FPGA driver, which has interrupts in really generic way. You can re-reaches the DMA transfer and everything. And also, we've got user notifications where you can configure all FPGA registers, with transactions and EMAs and everything. So I really would love for you to get this to the community. So it describes all the features and things, how the FPGA can be linked with SPI or PCI express analysis. Every capabilities and all the stuff is actually described in an X amount. And for every BAB camera manufacturer, a very capability of the FPGA, you have like so-called device drivers. And you can just implement them and you have a backup. And that's all it's ready to use. And I really would love to see open source. So if there is by any chance, which is actually like goes for introspection. So actually, our plan is to like have the bit stream and then also the X amount accordingly. But we are now, but the plan is X amount is right with FPGA and difficulties to the bit stream or with the entire archive or whatever. It's all professional and I mean the details. It's about, we have something ready. It's like three men here now, by now. And the system is working in group concepts. So it's really about what you should. Okay. And it's also like continuous integration tests. And we also have that in mind. We complete continuous integration protocol. We change it. We compile the source code. We synthesis the FPGA. And then we test again. We have test cases and then we sort of do it. So if there is a platform to contribute, I would love to. Okay. I'd like to get your contact information. Yeah. Yes. Yes, that is intended to be supported by this framework. Yes. Yeah. So if you think about the open CL use case that that is a requirement. So yes, that will be supported. Any other questions? We have a booth. We have a booth. No. So have you been by our booth? Have you been by our booth? Yeah. Okay. We'll be back again. Yeah. More questions. Okay. Great. We'll be there. Yes. The only showcase would be ideas in our brains. Yeah. That's right. Yes. We've not started that yet, but that is something we intend to start. Yes. The open CL standard was modified or influenced by Altera to add support for FPGA. So we hope that everybody uses it. At the open CL level. Yeah. At the open CL level, the standard, the group. Okay. With that, unless there's any other questions, I thank you for coming and your attention. And you can come by the booth and ask us all the questions you want. And I'd definitely like to get your contact information. That's awesome. Okay. Thank you.