 La charla, bueno, ahora está ahí de al lado del laptop. Sí, sí, sí, sí. Ojalá puta la cámara. Sí, sí, sí, sí. Ya estoy aquí. Vale. Thank you. Thank you. Hi, good morning. We can start now. My name is Javi, Javi Garcia. I came from Spain. And today I want to talk about the high level synthesis. What's the status of this kind of technology in respect to free open source software tools? The point is that, as we will see, we have not very, a lot of alternatives to do this by using only free open source software. And the main topic or the main idea of this talk is to rise the flag of all the issues we have right now and to try to mobilize to push the community in order to take all this issue from our company. We have started to do some work in order to reach this kind of things. But as we will see, there is a lot of work to be done yet. OK, let's start. I want to make some comments about the state of the art, about high level synthesis. Why is this becoming very, very important right now? Maybe you are already aware of this, but in the last years a lot of different algorithms are starting to gain a lot of traction. And these algorithms require a lot of computational power. For example, we have embedded speeds, we have embedded vision, deeper machine learning, big data, data mining. All of these kind of algorithms are not very complex as per design, but they require a lot of computational power, a lot of much operation and extreme power links with the data. In this situation, the von Neumann CPUs, the well-known CPU, that used to be accelerated by the path of Moore-Laut is not scaling anymore. We need to design or to develop alternative architectures in order to take all these kind of tasks. Well, this is where FPGAs come to the rescue. Most of these kind of tasks, these very computational intensive tasks, right now are take off by using GPU tools. For example, it's very well known, the NVIDIA CUDA, so that you can program in a very high level and they provide a lot of floating point parallelings, a very powerful, very, very powerful one. Sorry. The point is that, for example, this looks like very promising, but there are some algorithms in which this approach doesn't fit very well. For example, we have the inference in artificial vision, that you might suppose that using floating points arithmetic is a very good idea, but it's not a good idea. It's empirical test, so that by using fixed point arithmetic you might improve the most all of the metrics. You save silicon real estate because the logic, you need fewer logical gates in order to implement the thing. You also have fewer power consumption and even, and surprisingly enough, even the algorithm is more accurate by using fixed point arithmetic that using floating point arithmetic. This is maybe counterintuitive, but it's all the things that really works. Some of the, some vendors and some hardware vendors, for example, Google, are starting to develop their own very new hardware architectures, but if you develop a chip from the ground up, the costs are very, very high. So, in this kind of algorithms where you are, you can change the algorithm, the software is constantly evolving, your flexibility is a must, and this is where FPGA comes to the rescue because, per definition, you cannot find any more flexible than FPGA. And the thing is that the whole industry is looking at FPGA for answers. Let's see some examples. For example, we have a Wintel, it's a striking back, Wintel is Microsoft plus Intel is where we used to call that, and it's a striking back. I don't know if you were aware of this, but Intel has just bought Altira, this was the second brand of FPGAs in market share, and it was funny enough that a couple of years ago, in the first EDA death room, in the digital design panel, we started to talk about RUMOR, about Microsoft getting involved in FPGAs, and so on, and about Project Catapult, that is an SDK that Microsoft has released in order to accelerate software by using FPGAs. They are partnering in some way with Intel, and for example, they are already using this in the bean searching game, and they are planning to use this also in Office 365. In this feature, I really like it, do you have a Xeon processor in the same package, we have the multicore CPU, and Altira, FPGA, all packages in the same socket. So this kind of technology is going to be pushing the data center, a lot of, but not only the consortium formed by Intel and Windows, or sorry, Microsoft, is using this approach. There are some other examples too. For example, we have Baidu, or Baidu, I don't know what is pronounced, that is accelerating SQL at the data center by using silence, context, ultra scale parts. I don't know if you were aware of this, but it's very interesting. As Phillip has said before, Amazon Web Services are already offering some units in which you can accelerate the software by using Apto 8 ceilings FPGAs, and IBM and with this intelligence augment concept started to work with silence by using the, in order to integrate their PowerPC CPUs with the FPGAs coming from silence, sorry. And even more, they have designed their own chip for deep learning that is called TrueNorth and what is, no more people know that this chip is actually designed by the Cornell University FPGA expert. So what is really in this chip so after these examples let's focus on how level synthesis works. Well, the target is to accelerate software algorithm by using an FPGA. What does this mean? Ok, we want to automatically generate a code for a high level definition. For example, OpenCL the language that is using in QDA sorry, the language that is using for NVIDIA another hardware platform for acceleration is becoming the de facto standard for most of the FPGA vendors too. Namely, silence or Intel, formerly Altira. The idea is to take all these growing community of GPU based programmers that are very used to use OpenCL and this kind of technology and take them and provide the FPGA tools in order to allow them to program an FPGA without noting that they are using an FPGA. These are very complex and expensive commercial tools in which you provide an application, the application gets profiled by a bad green machine or something similar and the most computing intensive task are identified and there is an automatically generated of the FPGA code that cover these parts of the software that is most intensive. Then the software part is compiled and the hardware part the gateway part, the part that is going to be deployed in the FPGA is also compiled and generated. And then another piece of these tools is a special kernel module that is in charge of loading and unloading the FPGAs cores as they are demanded by the algorithms. How are we approaching this kind of staff from our company because there is not in the past there have been some tries to develop high level synthesis tools in which you can program almost anything and it gets basically profiled like in the tools that the vendors, the FPGA vendors are providing but this is very complex. Actually there is no any free open source tool that is working right now in this way. So our idea is to provide a set of hardware accelerated libraries for a specific purpose. Let's see, for example if we are going to program digital signal processing we want to have a set of libraries in which the most common operation are already pre synthesized so that you can use the call function and what you are on the thing what this is doing is taking this pre synthesized thing. This is also has application for other libraries such as embedded vision inverse kinematics and so the provided libraries must include a software side providing a wrapper like API in order to link with the standard software so that you can program a standard program and call these kind of libraries and a gateway side, the FPGA side in which the optimized HDL code is already pre synthesized. This must be transparent for the programmers so that they can just call use the API make some includes and use this function and and the system in the previous case needs to be in charge of loading and unloading the FPGA part as the as the code of the binary that you are executing the program in the CPU is needing the the FPGA part in order to accelerate the thing. How do we can integrate the FPGA into the operating system in order to get this kind of thing? For sure, first of all we are only focused on Linux, I am not going to talk about Windows or nothing at all and what we need as I have previously said is to have some hardware accelerated libraries with their own associated bitstream for the FPGA we need to know that different library versions talking about libraries in the standard software in the software point of view might require different bitstream versions too so we need to match the software part and the bitstream part so what we are doing is to package the associated bitstream for each of these libraries as a Linux package so that you can we can establish interdependencies between the FPGA part and the software part so that you need to install a software by instance of these kind of libraries the bitstream is also deployed and matches the version of the software library too and it's very important also there's a comment that the bitstream needs to be pre-synthesized and why this is also important well, because synthesizing a bitstream is very time consuming task you can lose a lot of time you cannot be synthesizing on the flight and even more important because only x86 processors are able to run the most of the synthesized tools that are available today maybe Yossi is an exception but we can talk later about this because if you are using an ARM processor you cannot synthesize the thing inside the chip we are using Yoctoproject as the building system so that everything including the bitstream or the different bitstream associated to the libraries are managed and we use HDLmake in order to integrate the bitstream synthesis inside the operating system generation so that by using HDLmake we can generate some resizing we specify Yoctoproject, the building system how it needs how it needs to handle the HDL code in order to generate the package the package including the bitstream as a couple of examples we have used this with the ARM open hardware kit in order to provide just for internal use by the moment to provide the automatic to provide automatic support for all different different module different hardware that can be inserted in a in a carrier board. Sorry I have not explained this because I am doing a little bit out of time this board is a board in which you can plug different hardware modules each of them have different functionality these are controlled by an FPGA and if you dependen of the module you are plugging into you will need a different bitstream on the FPGA2 so by using HDLmake we have generated different packages different linus packages for each of these different modules that not only include the kernel module but we have also package for the user spied program and also for the FPGA so that we want to make a linus runtime in which one of these modules needs to be support by just installing one of the packages all of the different ones are generated and installed the user spied the user space program the kernel module plus the FPGA system another very interesting example that we are using is for robotic controls I don't know if you have heard about ROS, the robotic operating system that there is one of the parts of this project that is called ROSIndustrial that is the vote to applying robotic operating system to industrial manipulators and it's also a free open source SDK in this kind of robotic controls there is a need for massive computational power we are part of this consortium and we are in charge of trying to accelerate the most intensive algorithms that these robots use by using FPGAs and for example the most appropriated examples in order to do this is the inverse kinematics that is as you take a robotic arm tell it where they need to put the tooltip and it automatically calculates all of the movements of the different servers it has it has a very computing intensive task and in order to planify a path or a movement a robot may lost about tens of seconds tens of seconds maybe even a minute is complex enough another one is point cloud that is more related to artificial vision and so on and we are working in applying the previous the previous approach to these kind of to these two examples we have a missing feature that needs to to get into the FPGAs in order to be able to use these approach of having library accelerated by FPGAs that is the partial reconfiguration maybe of you have already heard about the Pareto rule that applies to libraries in general so that not all the libraries are used at the same time most of the libraries are in the processors are loaded in the cache cache memory as they are required in order to perform or to implement these approach in an efficient way in FPGA in order to accelerate libraries in an FPGA in efficient way we need a way in order to reprogram only slices of the FPGA not all the FPGA we need to handle the FPGA in memory used as the CPU handle the cache memory there are already some tools that are able to do this but they are but they are very expensive but they are not commercial alternatives and they are only supported in high-end devices and they are very expensive some conclusions there are not a lot of flaws free library open source tools in order to do this HDL make is a very interesting tool that allows us to manage all the pain of handling handling FPGA and libraries together and we need also an HDL library standard maybe IP exact can be a good candidate we need maybe another alternatives and we is very important also IP integrator and maybe few so can be a very good idea in order to use in order to do this kind of things finally to have a workaround for partial reconfiguration that is sorry you had a video right? yes but I I was marking your minutes without the video ok ok ok I have told you before I have told you before about the HDL make I have told you about high level synthesis but this is my core my core tool and this is an assistant to a command line 2 in order to make the life easier for FPGA designers I have some stuff but I want to show you a video about how you can about the brand new graphical user interface I am developing for HDL make let's see sorry ok, can you see that I am in this video I am just going to create to clone an HDL make project taken from the open hardware repository as I maintained by sir once why I have already download this repository because HDL make is very useful in order to to use control version for your HDL code I have already now loaded the silence ISC tools and then I am going to launch the HDL, what right now I call the HDL maker is like no javascript wrapper for HDL make so that you can open it in a web page so that you can run the synthesis even in your own computer and open the graphical user interface or even run it in a server in a server with all the tools installed and just accessing with any other computer we have some different controls now I am fetching the code because it automatically fetch the libraries it needs the IP core libraries it needs in order to make the the synthesis I can make a pause because synthesis process is very very time consuming a very time consuming task and now here we have running the we can see that the synthesis is running well the fetch is running and now I am going to make the auto so that a make file a synthesis make file has been just generated synthesis make file generated and now I am going to run the make command because HDL make produce make files we are running in local mode and then all the ISC start from silence start appearing in the in the screen now we need to stop because this is very time consuming and after some minutes should be going to be faster right now finally ended ok ok the synthesis has finally oh I thought the video was shorter sorry ok now we have a programming file we have already generated bit stream and then we could go to the to a file browser and get the to grab the bit stream and to download it this is very useful if you are running the synthesis in a different machine so that you can access through your web browser and the synthesis is performed in another big machine and you can just grab the bit stream and one of the cool things we can do with this maybe this is the most interesting one is that when you have already parsed the design and you have the relation between the inner parts of the design itself we have added a project navigator so that you can see the whole structure of the of the design itself you can browse the design you can close you can close and collapse and expand the different branches from the design I think there is not much more to explain in this we see that we can zoom and to navigate and we can collapse the different branches also in order to focus in the parts of the design we really want to focus on and I think the video is a minute left but it's the same, I did not move from the browser