 So, the next talk will be about embedded system security and Pascal, the speaker will explain how you can hijack deep back components for embedded security in ARM processors. Pascal is not only a banded software security engineer but also a researcher in his spare time. So, please give a very, very warm, welcoming, good morning, applause to Pascal. Okay, thanks for the introduction. So, as it was said, I am an engineer by day in a French company, so I work as an embedded system security engineer. But this talk is mainly about mass spare time activity, which is a researcher or whatever you call it. This is because this talk I worked with a PhD student called Mohamed Abdul-Wahab, I was his third year PhD student in the French lab. So this talk will be mainly a representation of this work about the embedded system security and especially the debug components available in ARM processor. Don't worry about the link, there will be also the link with all the slides, documentations and everything. Before the Congress, I didn't know about what kind of background did you need for my talk. So I put there some links, I mean some references of some talks where you will have all the vocabulary needed to understand at least some parts of my talk. So yeah, about the computer architecture and the embedded system security, I hope you had attended the talk by Arista about the formal verification of software and also the talk by Kegan about trusted execution environments, so VTE, so Charles Trouson. And in this talk, I will also talk about FPGA stuff and about FPGAs. So there was a talk on day two about FPGA rivers engineering and if you don't know about FPGAs, I hope that you have some time to go to the open FPGA assembly because these guys are doing a great job about FPGA open source tools. So yeah, when you see this slide, I mean the first question is that why I put Trouson is not enough. Just a quick reminder about what is Trouson. Trouson is about separating a system between a non-secure world in red and the secure world in green. So when we want to use the Trouson framework, we have lots of hardware components, lots of software components, allowing us to let's say to run separately a secure OS and a non-secure OS. In our case, what we wanted to do is to use the debug components. You can see it on the left side of the picture to see if we can make some security with it. And furthermore, we wanted to use something else than Trouson because if you attended the talk about the security in the Nintendo Switch, you can see that the Trouson framework can be let's say bypassed under specific cases. Furthermore, this talk is something quite complementary because we will do something at a low level. I mean at a low level, at the process architecture level. So I will talk in a right about what we can do between Trouson and the approach developed in this work. So basically the presentation will be a quick introduction. I will talk about some works aiming to use the debug components to make some security. Then I will talk about ARMEX, which is the name of the system we developed to use the debug components in Harper's saw and finally some results and the conclusion. So in the context of our project, we were working with system of chips. So system of chips are this kind of device where we have in the green part a processor so it can be a single core, dual core or even quad core processor. And another interesting part which is in yellow in the image here is the programmable logic which is also called an FPGA in this case. And in this kind of system of chip, you have the hardcore processor, the FPGA, and some links between those two units. And so you can see here in the little red rectangle, sorry, one of the troopers saw. So yeah, this picture is an image of a system of chip called Zinc provided by Xinix which is also an FPGA provider. In this kind of chip, we usually have two Cortex-A9 processors and some FPGA logic to work with. So yeah, so what we want to do with the debug components is to work about dynamic information flow tracking. So basically what is information flow? Information flow is the transfer of information from an information container C1 to C2 in the given process P. So in other words, if we take this simple code over there, if you have four variables, for instance A, B, W and X, the idea is that okay, if you have some metadata in A, the metadata will be transmitted to W. In other words, what kind of information will we transmit into the code? Well basically, the information I'm talking in the first block is okay, this data is private, this data is public, and we should not mix data which are public and private together. So basically we can say that the information can be binary information which is public or private, but of course we will be able to have several levels of information. In the following parts, this information will be called a taint or even tags, and to be, let's say to be a bit more simple, we will use some corals to say okay, my tag is red or green just to say if it's private or public data. So as I said, if the tag contained in A is red, the data contained in W will be red as well and the same thing for B and X. If we take a quick example over there, so if we look at a buffer overflow, so in the upper side of the slide you have the assembly code and on the lower part, the green column will be the color of the tags and on the right side of these columns you have the status of the different registers. So this code is basically okay when my input is red at the beginning, so basically we just use the tainted input into the index variable. So basically the register to which contains the IDX variable will be red as well. And then when we want to access a buffer IDX, which is the second line in the C code at the beginning, basically the information here there will be red as well. And of course the result of the operation, which is X will be red as well. So basically that means that if there is a tainted input at the beginning, we must be able to transmit the information until the return address of this code. Just to say, okay, if this tainted input is private, the return address at the end of the code should be private as well. What can we do with that? There is a simple code over there, so this is a simple code saying, okay, if you are a normal user, if you're in your code, you would just have to open the welcome file. Otherwise if you're a root user, you must open the password file. So this is basically to say, okay, if we want to open the welcome file, this is a public information you can do whatever you want with it. Otherwise if it's a root user, maybe the password will contain, for instance, a cryptographic key and we should not go to the printf function at the end of this code. So basically, the idea behind that is to check that the FS variable containing the data of the file is private or public. So there are mainly three steps for that. First of all, the compilation will give us the assembly code. Then we should modify, well, we must modify the system calls to send the tags. So the tags will be, as I said before, the private or public information about my FS variable. And I will talk a bit about that later, but maybe in future works, the idea is to make or at least compile an operating system integrated with integrated support, sorry, for the AFT. So yes, there were already some works about dynamic information for tracking. So basically, we should do this kind of information for tracking in two manners. So the first one at the application level. So basically working at the Java on Android level. Some works also propose some solutions at the OS level, for instance, K-Blair. But what we wanted to do here is to work at a lower level, so this is not at the application of the OS level, but just at the hardware level or at least at the processor architecture level. So if you want to have some information about the OS level implementation of information for tracking, you can go to blaredat-ids.org where you have some implementations of an Android port and a Java port of intrusion detection systems. So yeah, in the rest of my talk, I will just go through the existing works and just see what we can do about that. When we talk about dynamic information for tracking at the low level, there are mainly three approaches. The first one is the one in the left side of this slide, sorry. So basically, the idea is to say, okay, in the upper side of this figure, we have the normal processor pipeline. So basically, decode stage and register file and array 20 and logic unit. And the basic idea is to say, okay, when we want to process with tags or tints, we just duplicate the processor pipeline. So basically with the gray pipeline under the normal one, just to process data. And it implies two things. First of all, we must have the source code of the processor itself just to duplicate the processor pipeline and to make, let's say, the DIFT pipeline. So this is quite an inconvenient because we must have the source code of the processor which is not really easy sometimes. And otherwise, let's say the main advantage of this suppose is that, okay, we can do nearly anything we want because we have access to all codes. So we can pull all wires we need from the processor just to get the information we need. Yeah, on the second approach, so this is the right side of the picture, there is something a bit more different. So instead of having a single processor aiming to do the normal application flow plus the information for tracking, we should separate the normal execution and the information for tracking. So this is the second approach over here. And this approach is not satisfying as well because you will have one core running the normal application. That's okay. But the core number two in the figure over there will be just able to make the DIFT control. So basically it's a bit of shame just to use the processor just to make some DIFT controls. So let's say the best compromise we can do is to make a dedicated core processor just to make the information for tracking processing. So basically the most interesting thing about this topic is to have a main core processor aiming just to make the normal application and a dedicated core processor just to make the DIFT controls and you will have some communications between those two cores. So yeah, basically if we want to make a quick comparison between different cores. So if you want to run the dynamic information for control, sorry, in pure software, I will talk about that just in the slide after. But this is really, I mean, really painful in terms of time read because you will see that the time to do information for tracking in pure software is really unacceptable. And regarding the hardware assisted approach, the best advantage in whole cases is that we have a lower read in terms of silicon era. That means that on this slide the over read between the main core and the main core plus the core processor is not so important. And we will see that in the case of Matoc, we will see that the dedicated DIFT core processor is also, let's say, is also easier to get some different security policies. So yeah, as I said, in the pure software solution, so the first line of this table, the basic idea based on that is to use instrumentation. So if you're there on day two, instrumentation is basically the transformation of a program into its own measurement tool. So basically that means that we will put some sensors in all parts of my code just to monitor this activity and gather some information from it. So basically if we want to measure the impact of instrumentation on the execution time of an application, you can see in this diagram over there the normal application level which is normalized to one. And when we want to use instrumentation with it, the minimal over read we will have is about 75%. So basically it will, let's say, the time with instrumentation will be most of the time it will be twice higher than the normal execution time. So this is completely unacceptable because it will just run slower your application. So basically, yeah, as I said, the main concern about my talk is about reducing the overhead of software intervention. I will talk also a bit about the security of the DFT coprocessor because we can't include a DFT coprocessor without taking care of its security. And this is, according to my knowledge, this is the first work about DFT in harm-based system and chips. On the talk about the security of the Nintendo Switch, the speaker said that black box texting is but except that it isn't. In our case, we have only a black box because we can't modify the structure of the processor. We must make our job without, let's say, decapping the processor and so on. So basically, this is another schematic of our architecture. So on the left side, in light green, you have the arm processor. So basically in this case, this is a simplified version with only one core and on the right side, you have the structure of the coprocessor we implemented in the FPGA. So basically, you can notice, for instance, for the moment, sorry, two things. The first is that you have some links between the FPGA and the CPU. These links are already existing in the standard chip. And you can see another thing is that regarding the memory, you have separate memory for the processor and for the FPGA. And we will see later that we can use trust zone in the concept just to add a layer of security just to be sure that we want, let's say, mix the memory between the CPU and the FPGA. So basically, we want to work with arm processor. We must use arm data sheets. We must read arm data sheets. And first of all, don't be afraid by the length of arm data sheets because in my case, I used to work with the arm V7 technical manual, which is already 2,000 pages. The arm V8 manual is about 6,000 pages anyway. And of course, what is also difficult is that the information is split between different documents. Anyway, when we want to use the debug components in the case of arm, we just have this register over there, which is called a DBG, blah, blah, blah. So we can see that in this register, we can say that, OK, writing the key value zero C5A, blah, blah, blah, to this field lock the debug register. And if you write any other value, it will just unlock those debug register. So that was basically the first step to enable the debug components was just to write a random value to this register just to unlock my debug components. So here is again, let's say, a schematic of the overall system of chip. So as you can see, you have, let's say, the trooper saw. And on the top part, you have what are called the cross set components. So these are the famous debug components I will talk about in the second part of my talk. So here is a simplified view of the debug components we have in Zinc as sources. So basically on the left side, we have the two processors, so the CPU zero and CPU one. And all the cross set components are PTM, so the one which is in the red rectangle. And also the ECT, which is the embedded cross trigger, and the ITM, which is the instrumentation, trace microcell. And basically when we want to, let's say, extract some data from these cross set components, the basic path will be OK. We will use the PTM, and we will follow the red line, go through the funnel, and at this step we will have two choice to store the information taken from debug components. The first one is the embedded trace buffer, which is a small memory embedded in the processor. Unfortunately, this memory is really low because it's really small, sorry, because it's only about four kilobytes, as far as I remember. But the other possibility is just to export some data to the trace packet output. And this is what we will use just to export some data to the processor, to the processor sorry, implementing in the NVF PGA. So basically what PTM is able to do, the first thing that PTM can do is to trace whatever you want in your memory. For instance, you can trace all your codes, so basically all the blue sections. But you can also, let's say, trace a specific region of the code. So basically that means that you can say, OK, I just want to trace the code in my section 1 or section 2 or section n. Then the PTM is also able to make some branch broadcasting. That is something that was not presented during your scanner, so we already submitted a patch that was accepted to manage the branch broadcasting into the PTM. And we can do some time stamping and other things just to be able to store the information in the traces. So basically what the trace looks like. So here is the most simple code we could have is just a for loop doing nothing. So basically the assembly code over there. And the trace will look like this. So in the first, let's say in the first five bytes, we'll have some kind of star packet, which is called the ASIC packet. Just to say, OK, this is the beginning of the trace. In the green part, we will have the address which corresponds to the beginning of the loop. And in the orange part, we will have the branch address packet. So you can see that you have 10 iterations of this branch address packet because we have 10 iterations of the for loop. So this is just to show what is the general structure of a trace. Yeah, so in other words, this is just a control for God just to say what we could have about this. So of course, if we have another loop at the end of this control for God, it will just make the trace a bit longer just to have the information about the second loop and so on. Once we have all of these traces, the next step is to say, OK, I have my tags, but now how do I, let's say, define the rules between an institution in over just to transmit my tags? And this is where we will use static analysis for this. So basically in this example, if we have the instruction OK, just do register1 plus register2 and put the result in register0. So for this, we will use static analysis which allows us to say, OK, the tag associated with register0 will be the tag of register1 or the tag of register2. And static analysis will be done before running my code just to say, OK, I have all the rules needed for all the lines of my code. So now that you have the traces, we know how to transmit the tags all over my code. The final step will be just to make the static analysis in VLIVM backend. And the final step will be about instrumentation. So as I said before, we can recover all the memory addresses we need through instrumentation. Otherwise, we can also, in the second possibility, we can also only, let's say, get the register-related memory addresses for instrumentation. In the first case, on this simple code, we can basically just say, OK, we instrument all the code, but the main drawback of this solution is that, OK, it will completely access the time of the exception. Otherwise, what we can do is that, OK, with the VSTONE section over there, we can get data from the trace. So basically, we will use the program control address from the trace. Then for the stack pointer, we will use the static analysis to get information from the stack pointer. And finally, we can use only one instrumented instruction at the end. So if I go back to this system, the communication overhead will be the main drawback, as I said before. Because, OK, if we have over there the processor and VFPGA running in different parts, the main problem will be how we can transmit data in real time, or at least in VIO speed, we can between the processor and VFPGA. So yeah, the PTM, let's say the time of red when we enable across the components or not. So basically, in blue, we have the time of red, let's say the basic time of red when the traces are disabled. And we can see that when we enable traces, the time of red is nearly negligible. Regarding time instrumentation, we can see that regarding the strategy 2, which is you are using the constant components using the static analysis and the instrumentation, we can lower the instrumentation overhead from 53% up down to 5%. So this is basically, OK, we still have some overhead due to instrumentation, but it's really low compared to the related works where all the code was instrumented. And this is an overview that shows that in the gray lands, you have some overheads of related works with full instrumentation. And we can see that in our approach with the green lines over there, the time of red with our code is much, much smaller, sorry. So yeah, so basically, how we can use Trason is this. So this is just an overview of our system. And we can say, OK, we can use Trason just to separate the CPU from the FPGA coprocessor. So if we make a comparison between with related works, we can see that compared to the first works, we are able to make some information for control with an hard coprocessor, which was not the case with the two first works in this table. So it means that, OK, you can use a basic coprocessor just to make the information for tracking instead of having a specific processor. And of course, yeah, the area overhead, which is another important topic, is much, much slower compared to the extended works. So it's time for the conclusion. So yeah, as I presented in this talk, we are able to use the PTM component just to obtain runtime information about my application. This is a non-intrusive tracing because we still have a negligible performance of red. And we also improve the software security just because we are able to make some security on the coprocessor. So the future perspective of that is to mainly to work with a multi-coprocessor and to see if we can use the same approach for Intel and maybe ST microcontrollers to see if we can also do information for tracking in this case. So that was basically for my talks. Thanks for listening. Thank you very much for this talk. Unfortunately, we don't have time for Q&A. So please, if you leave the room, take your trash with you and make the angels happy. I was a bit long, yeah. I was a bit long in my talk, yeah. And another round of applause for Pascal.