 Okay, our next talk will be by Gornel Goevec and he's talking to us about platform independent CPU FPGA co-design. Thank you. This talk is about this work, this ecosystem. We are developing an ecosystem to be able to assemble and generate design in FPGA and using the CPU, for example on the zinc board, the CPU, to configure the design and to configure each block at the run time. It's possible to modify some parameters in live. It's fully pipelined. It's not no 5.4, nothing. It's just a sample arrive is used. So we try as possible to avoid using RAM to simplify design. Since we use a CPU with Linux, this ecosystem comply with the structure of an architecture of operating system. We have IP, IP are connected to the CPU to communicate with respect of system and in the better way we provide driver for these IPs. And finally some IPs are some complex configurations. So we provide again library to avoid to duplicate piece of code. It's more the user has to focus on the design, on the application and nothing else. The last wish about this framework is to be as possible independent of the hardware. The same design must be generate for zinc platform, for cyclone 5 platform and maybe other new one existing. We are validated. It's not exhaustive on zinc, on cyclone 5 with some different board, other board with other hardware. But it's some ID. It's not alone the other ecosystem. The first one of course is a etrus RF knock used in the USRP. It's a great thing because your USRP has a firmware like a toolbox with some IPs available to be used. It's possible to configure which processing is done in the FPGA, which processing is done in the CPU by you just adding refnug block or new radio block. But if your design has no an IP you need for your specific application, you need to generate a new firmware with this IP. And the refnug structure has a limited slot available to add your IPs, 5 or 7, I don't remember exactly. And I have tried to use a refnug on a Z board with an A to D converter. Finally the FPGA part, it's okay. It's not really hard. But UHD, I never, it's a fail. UHD do some assumptions about the hardware. You have an EUDEPROM to detect the mother board, the daughter board. You need to have a take chain, a rate chain and finally it's just too complex for some small design. So it's really good, but with USRP. Another ecosystem from work is Pavel Domain, Red Pitaya Note. This ecosystem has been present here. The good thing about this ecosystem is the repository provides some already available design. You have just to do some make and you have SD card ready to be used. Each project has a documentation with everything and it's directly compatible with the new radio. Finally, Osmo is there as a client for Red Pitaya CPU server part and you have a Red Pitaya exactly as a DVB Tiki, USRP or another. But it's dedicated to the Red Pitaya platform and it's more or less limited to the already available project. It's to move from another board or to create your self-project. It's less documented. Ok, now the ecosystem looks like this. We have the IPs. The IPs has a specific repository. The IPs is connected to the CPU using driver to communicate, library or not. And device tree because with the SOC system, the way to describe the platform is the device tree and this overlay is used to open the default runtime device tree. And finally, you provide some tools to build the application, to provide some wrapper to generate some part of design. First thing, maybe the most important part is the FPGA and the IP. It's for simplified simulation where split between the implementation part, the algorithm implementation and the rate of communication for the configuration of the design. Exactly as new radio, we use a custom interface. This allows the user to plug one block to connect one block to one other. And thanks to this interface, it's ok. It's the current data stream. For the user, it's just a wire like this one, but to provide more information, we have in this interface this series of signals, data control and clock and reset to be sure to have a correct coherence with a cross-clock domain. More precisely about content, you have on a complex, you have i and q, data bus on real, just one data, unable to specify when a new data is available to be processed. And you have some control signals, start of frame, stop of frame. Finally, an A2D converter or D2A converter has just an infinite data stream. No start, no stop, nothing. But when you have some design, when we want to propagate data only on the event, it's mandatory to know the start sample and sample of the data set. It's exactly the same for, for example, the FFT. The FFT has the start frequency, stop frequency. And this two signals allows to simplify the logic per IP to handle this data set. Ok, we have the IP. Now we must connect the IP and generate the design. It's possible, of course, to use the graphical approach with Vaivado or with Quartus and QCIS, but all vendor tools provide some procedure to build a design using TCL. The interest of this is small, able to version and it's great thing. But Vaivado provides some function, CreedDB, CreedBD cell, etc. It's a specific procedure for Vaivado. It's the same thing for Quartus. And finally, Vaivado provides a need one TCL file to generate everything. Quartus needs two files to generate everything. So we have added our procedure and some makefile to generate. This, for example, this procedure is implemented in the Vaivado context, in the Quartus context. If you want to add a new vendor tool, you need just to fill this procedure to be able to build the design for a new platform with a new tool. I've done some tests with two different FPGA and two different vendor tools. The same design, the same application is able to work in all cases. Some stuff are just regenerate for each platform, but it's not a problem. It's just generate. The rest is perfectly compatible. This design is just a support for a project. A project looks like this. You have the project directory, sub-directory, where we store everything about the design. On the design grid, it's possible to use TCL script to analyze Xperia or Quartus equivalent to extract which IPs are connected to the CPU, the base address for each of them to generate an XML file. This XML file is used by a tool to generate a template of application. And the last thing for the user is just to create a C++ or Python file to configure everything in the FPGA. This set is generated by a modulator. You have an XML file. You have a project name. You have an IP, a second IP. IP is the name of the IP in the repository. And for each of them, you have instance IP with a specific name, with a specific address based on this. It's possible to generate the script. The script is to simplify the run of the application. You copy the bitstream, you flash the FPGA, you update the device tree, you append your device tree with your specific case and you load the driver modatory for this specific project. First one. Second device tree. You need to append and you find exactly you append the FPGA full. It's a node dedicated to handle the FPGA. You have the firmware name. You have one driver with this same base address. It's exactly the same. And finally, a Mac file. Mac file is just used to cross compile the application, to compile everything and to install in the board everything. It's finally all about the ecosystem. Just we provide this block. It's not all block available, but it's subset. You have a local oscillator frequency transposition. You have an IP used to handle A2D converter, a D2A converter for the red pitaya. You have a sound card. You have some block used to start the propagation when a specific event upper on the block on the pin of the FPGA for a trigger event. You have a radio frequency signal, CA code used for GNSS GPS. Fear filter. You have fear filter real for interface real complex for interface complex. Some PRN generator, a cross creator again used to, for example, with the GPS. And some utilities to add a specific constant to all sample received. It's exactly on offset. You have other subsector. You add one input to the other input, mean on a simple accumulator. And some utilities convert from one interface to another interface in custom interface, due because interface as is one to one. And sometimes we want to have one to many. So we have this to allows these possibilities. Expander shifter is for bit manipulating, for bit shifting or expanding. Switch is just a select. You have two input, you have a mix, finally. And some interface between our custom interface and AXI interface used, for example, when you need to use a 5-4 provided by Xilinx to do a cross clock domain. Now, we have seen the ecosystem, but why doing this? First one is the GPS decoding. The idea is we have an Pluto S0. The Pluto S0 is able to directly sample the signal. We have a satellite constellation with the same carrier frequency, but with a Doppler shift and with each satellite as a new unique PRN code. But when you use the Pluto S0 like this one with the stock firmware, finally the Pluto S0 do quite nothing. Receive data, transmit data, and it's all. To detect which satellite light as present, you need to loop on the frequency, you need to loop on the PRN, so you need to cross correlate the signal with all possible PRN code with a new radio and CPU at stake one or two seconds per satellite. The idea is to update the firmware by adding a specific IP to do exactly the same thing, but in processing. This look like this. You have your RF front end. The local oscillator of the RF front end take some time to be reconfigured. So we have added NCO, a local oscillator mixer to do a second step frequency transposition. You have our CA code generator. This block provides all possible CA code in parallel. So we use the user space to configure our CA code to move from, to loop frequency and to select which PRN we want to use. And you have the cross curator and the result is just sent to the RAM and received from the CPU. Thanks to, finally, the most important thing, we have added this part, but we don't break anything. The Pluto work exactly like with the default firmware, but we are with a bonus block. Thanks to this modification, it's possible to divide by five, the time to process all satellite for all frequency. But now, but we are just limited by the zinc. The FPGA part is a bit small, so it's not possible to add more than one cross curator in the FPGA. So we have moved on the analog device board with the biggest FPGA to be able to cross create all satellite at the same time. It's a work in progress. Currently the design fit in the FPGA in parallel on official data stream, but we need just to verify it's working. Next demonstration is, it's more funny demonstration. Again, the Pluto S0 do quite nothing, but this board uses built-root to generate the firmware, the root file system, the Linux system, etc. Built-root supports new radio, so it's possible to add new radio in the firmware on the board. It's we have do this and the second idea is to have really totally embedded. So we have added a sound card in parallel in the FPGA. And finally, we have an application, C++, Python, what you want to just to do this low pass filter, WBFM receiver and output on the sound card. This look like this. You have the default Eric's chain, the IIO used as a source by new radio. The process is done by new radio and an audio sync is used connected to the ALZA framework. We provide a driver compatible ALZA to communicate with the Sigma Delta IP in the FPGA and the RC filter. This is using this approach. It's just due to the small amount of available pins for FPGA. So and for the fun, sorry, I just have just launched a specific application by providing a frequency and it's all and the PC do nothing. Just connect with SSH and it's all. Everything is done in the Pluter SDR board. The reception, the processing and the output on the audio sync. Now to conclude, this is a flexible framework. Demonstration, we have demonstrated with some specific design and to end, but we have to use our approach with the analog device environment to merge, to mix our approach and analog device approach. And it's definitely really working. It's a platform independent. It's possible to use with Altera, Xilinx, maybe other. It's a respect, the structure of a Linux system with where the IP is located is exactly as all controller around the core in the SOC component. Perspective, the first one is to finalize the GNSS Parallel Goal Code. Verify and push on the GitHub for everyone to improve the documentation. Some part has quite, I suppose, well documented. There are some tutorials for 8-bit AI for Pluter SDR, but there are some missing parts around many TCL approach and it's work in progress. And the final thing, currently for RISC-5 seems to be an important concurrent to ARM processor. So it may be interesting to try to demonstrate with the SOF Core RISC-5 instead of an hardcore ARM. Of course, we have the built-root extension for the Red Pit AI. Again, same thing for the Pluter SDR with configuration, default configuration and configuration with new radio already enabled. And of course, the GitHub, the oscim digital GitHub where everything is available. Just mandatory to get cloned in the registry and to read doc tutorials to see how to start with this ecosystem. Thank you very much. UIO, I use UIO. No, no, I currently use... Actually, for some block, I use a specific dev car device, but it's true, I need to use IIO, UIO or some specific structure of the kernel. The configuration is done by the user space. The repetition for configuration between userland and kernel land to configure the IP. Definitely, the driver knows how to communicate with the IP. But depending on the IP, a part is done by the user space. For example, the NCO, you have an input frequency, you have an output frequency, you have the phase accumulator and this is done in the user space through lib library. For the fear filter, you have just a set of data. You write, you use the write on the pseudo, on the file descriptor and the kernel do everything about that. It's a upon, sorry. The comparison, the speed comparison between the hardware implementation and the software implementation. Finally, in the FPGA, we are just limited by the input, the sample rate. If the sample rate is one mega-sample, we have one microsecond between two new samples. If you need to cross-correlate about one kilopoint, you have just one millisecond. It's not possible to reduce this because it's the FF front-end, the limitation. And finally, it's just not mandatory to speed up this because it's the fastest way, the fastest due to the input frequency. And the real gain is after that, you have results in the CPU, not you have sample to compute. It's just a latency between the error front-end and the CPU. With a classical approach with Octavoro, you need to store a big data set. You process this. It's just a big difference. It's okay after it's possible to use the GPU to improve this, but you have always the acquisition time before able to process data.