 La prochaine speaker est un membre de l'ADAC Corée qui contribue au GNU-ADAC Compiler. C'est Tristan Wingold et il parle d'une programmation de 64 bits de bar métallique. Let's applaude. Light up. Yeah, let's take care of the light. Here you are. OK. Hello. Thank you for coming. So it's a talk about bare metal platform, which is usually things that comes without boxes, like that, and particularly without any operating system. So when you program on bare metal platform, you don't use any operating system. Why you want to do that? The main reason is because there is no enough resources to use an operating system. For example, this is how do we know there is two not enough memory to have an operating system. But there are other reasons. It's fun. It's different from usual. It's fun. You can learn a lot of things, low level things. There's a lot of things to learn about when you do bare metal programming. And I have chosen Raspberry Pi 3. Why? Mainly because it's very, very popular. Which means there are a lot of forums. There are tutorials on the web about how to program directly on Raspberry Pi. And also because it's a very safe platform. You cannot break it. It will always work. However, there are some drawbacks with Raspberry Pi 3. Because it's based on the Broadcom system of chip. There are very few documentation about it. Here is a page about Raspberry Pi 3 platform documentation. Which basically say, OK, it's a ARM v8 CPU. Thank you. Also written in marketing documentation. And for more documentation, see Raspberry Pi 2 or Raspberry Pi 1. And of the documentation. Not enough, but we can deal with that. So you know, maybe you know about Raspberry Pi family. The first one was the Pi 1, which is based on a very rather old ARM core. The Pi 2 was much more interesting because it's based on a new core. And there are four cores. So I wanted it. And the last one is even better because it's four 64-bit cores. So I want it and I want to use it. The architecture of the Raspberry Pi is a little bit weird. There are four ARM 64-bit CPUs that share level 2 cache. And there are also the video core GPU, which contains the firmware, which uses the firmware. And they share the memory. The boot process of the platform is interesting because it's unusual because it's the GPU that starts the first running its firmware and then loading from the SD card the application into the memory. And then once the application is loaded, it starts all the CPUs. So the nice things about this platform, the Raspberry Pi platform, is that the CPUs boot start from your code, not from the firmware code. Only the GPU use firmware. There are a couple of files that need to be present on the SD card. Some files that are used by the GPU to boot. Configuration file, which is interesting. And your image that will be loaded in the ARM and will be executed by the ARM CPUs. If you want to execute 64-bit codes, you have to specify some command in the config.txt file. But it's explained. So let's start our first bare metal program. Usually, we do things like either blinking lights or writing a message on the console. So I will do something quite common, which is a hello word on the console. And for that, you need a terminal emulator connected to a serial to USB converter. This is the URL of the code I will show in front of you. So this is USB to serial converter. And you connect it directly on some pin of the header of the Raspberry Pi 3. Very bare metal. Quickly, this is a Mac file. So there are two main files, the CRTDO, which is the assembly code that is executed. And the main C code. We don't use any C library. We use a linker script to tell a section of our group. And we create not an ELF file, but a bannery file. At the end, you have to copy this file on the SD card. The CRT0 is the usual name for CRT0, which means the first file to be executed by the in fact non present CRT. It is generally written in assembly because you do so low level things that it cannot be expressed by C code. It has to initialize the board or the card. But on the Raspberry Pi, it's very easy because the GPU does most of the initialization. For example, it does set up the RAM. It does set up the video. So everything is much easier on this platform. However, you still have to create an environment so it's necessary to execute the C code. So this is the whole assembly code for the Hello World. This is the first instruction executed. There are four CPUs, and all the four CPUs are started together. So you need to put into a busy loop three CPUs and keep only one, and then you have to initialize the first CPU, the main CPU. Here you load the stack pointer. Here you clear the memory that has to be cleared for the C environment because all the variables are initialized to zero. This is done here. And finally, you call main. So our C code, start with main, like normal application. And this is the code we have seen previously, that's called main. You can do whatever you want to do in C, but there is no C code and no C library, so you have to write everything you want to execute. This is the main code. So there is next slide for how the UART, so the serial console is initialized. And we do just a puts hello world. Puts is here. It prints every character. And to execute every parameter, to print every character, we deal with extended backstash n to backstash n, backstash r. And this is how to print one character. So we wait until the UART is ready and when it is ready, we write one byte at one specific location that will have a side effect and will be sent over the serial line. This is how to initialize the UART. So this is most of the code. So this is very bare metal things. We change some bits at some specific address that are specified here. And these are the side effects of initializing, enabling the UART, specifying the number of bits that will be transmitted, specifying the speed of the UART. And here we have to specify that the pins are in fact used for the UART. OK, this is documented in the Raspberry Pi documentation. And this is very, very bare metal stuff. To correctly gather all the things and specify address, we use a linker script. OK, nothing very interesting. And then you have your first Hello World program. So what can you do next? Accept things like Hello World in different language. You can write your own drivers. Well, if you want to start, you can start with GPIO because that's very easy, just a way to send a signal to the headers. ESQURC, SPI, as a way to communicate using serial protocol, and they are very easy. Using SD card isn't very difficult to program, but there is a, well, a small stack of things to do to communicate with the SD card. Video is very easy because most of the work, it's done by the GPU firmware. So you just have to say, OK, I want a frame buffer and you get a reply say, this frame buffer is at this address with that width and that height. You can do drivers if you want for USB, Bluetooth, Wi-Fi, Ethernet, except that's much difficult. And documentation is not very, very extensive on these topics. If you want more performance you have to enable cache because without cache the CPU starts with cache disabled which create abysmal performance so you want really to enable cache for performance except that if you enable cache you have to specify that IOR regions are not cacheable because IOR regions have side effects must be stored, must be go to the device and ring must be come to from the device and if you want to specify that some regions are not cacheable you have to set up the MMU which is a little bit complex and when in the setup you specify that some regions setup the MMU is to use one-one mapping so no translation just writes on regions you can also try to use the 4 cores so as we have seen all the processors start and we have put 3 in the busy loop there is a specific register the core number so you get number from 1 to 3 you have to specify stack for each processor and to execute a specific routine start routine for each processor but don't forget to initialize hardware only once if you want to go even further you have to know that core start has the highest protection level EL3 you can switch to lower level execute code to lower level from exception level to hypervisor level and then from hypervisor level to kernel level and if you want you can also go to user level there are code in the SMP directory that does exactly that so it setups it enables cache it setups MMU and start all the 4 cores what we have done with that we have done one colleague, it's not me sorry what colleague has done broadcasting demo which shows the 4 cores which shows the DMS 2D from the GPU to speed up except that we can't use the GPU and we have reached 60 frames per second this is a screenshot well not a screenshot a photo and this is if I can't play it this is a video from the SEDINGO that's it thanks for this