 So, I'm presenting joint work with my colleagues from the University of Leuven, and also Benzy from NXPCM conductors, and Nigel Smart, which affiliated also with the University of Bristol. So, let's start by having a taste of physical attacks, like that's the main topic of our paper. So, physical attacks refer to a really broad category of attacks that can perform against cryptographic implementations, and I'm not sure if it's okay. Okay, thank you, sorry. Physical attacks refer to a really broad family of attacks, and we could classify, we could split this huge family into probably two broad categories. The first one is such-and-and attacks, so here we have a picture of the typical setting that we would use to perform such-and-and attack on a regular processor, like the ones that we have on the phones, et cetera, and so there is a pickup coil that is a recording that is acquiring some electromagnetic emanations coming from the chip, and by applying some well-known statistical processing, such as differential power analysis, it's fairly easy to extract secrets from an unprotected implementation, so this is assuming that the implementation is running as is, we are not interfering with any of the flow of the computation, we are just observing some intermediates that are stemming from that same computation. On the other hand, also equally large family of attacks are fault attacks, so here, actually the adversary is actively trying to inject, to modify intermediates during the computation, this is essentially the cryptonalist paradise. Usually we are used to not being able to modify intermediates, but just observe either inputs and outputs, and here we completely change the game, and we can inject differences in any round, or probably we can skip rounds, et cetera. This is a really powerful attack model, and here we have a picture, again, of a microcontroller that we have in encapsulation, so we have access to the die, probably to the rear side of the die, and we have placed this chip into a microscope so that we can focus very precisely a laser beam into a precise spot in the chip, and we can, for example, target individual memory elements, so that we could flip at any point during the computation some intermediate that is happening. We could insert differences, or we can, as I said, skip rounds. This is a very powerful adversary. We will come later, and we try to model those two categories of attack. So the problem that we are addressing here is very well known. It's fairly common. It appears in many fields of crypto. That is, we try to implement some crypto graphic algorithm in a hostile environment. And what we are doing in this paper is we are porting some ideas that are fairly common in modern MPC protocols to this embedded security setting. So we are concerned about hardware implementations, such as, for example, for FPGAs or ASICs, and also embedded software implementations. I use the terms in this thing, Libby. So I don't try to be exhaustive in this slide, but there is hundreds of countermeasures that we could implement to end up with a fairly secure implementation. We can go very low level and try to suppress signals that others are in my observe using in-circuit noise generators or signal filters, or we can use, for example, a different logic style whose purpose is to equalize the power consumption such that we try that the power consumption is data independent. On the more formal side, perhaps, we can apply concepts from secret sharing. One principle, for example, is masking or ISW, private circuits. We can also add some countermeasures for attacks, for example, at the circuit level by having some detectors of light or glitches, or we can try to randomize the layout. For example, if we are concerned about micro-proving, we can try to add a layer of obscurity to the circuit. So there is a huge variety of countermeasures for physical attacks. Also, I mean, relatively, we are trying to solve relatively more or less the same problem with several, we can establish some analogies that they are not perfect, but we can establish some analogies with multi-party competition protocols. I have tried, again, not trying to be exhaustive here, so if your favorite protocol is not here, apologies, and I can put it later. There is a lot of modern NPC competition that we can use to compute jointly a function, even in the presence of some adversaries or some colluding parties. And you could argue that in the limit, maybe, probably, fully amorphic ingestion would also try to help in this problem that we have at hand. So what we try to do here is we try to port ideas that are common in this example in SPDZ, and we try to port it to bring them down here to the world of physical attacks and graphic implementations in hardware. Apologize for the noise, not sure what is happening. So one thing that we are introducing in this paper, shall I use this? Okay, perfect. Good. So one thing that we are introducing in this paper is the tile probe and fault, fault and probe model. So this means that we are assuming certain architecture for the chip, and we are partitioning the chip into a series of tiles, into a series of areas, and we are making the high-level analogy of each tile is a party. So we are assuming that each tile has its own combinational or sequential logic, also its own memory elements, such as registers or RAM, and also its own control. And that's not typical in modern hardware implementations. Of course, we are inspired by multiprocessor designs, current multiprocessor designs, but you have different cores that are more or less the same and can perform competition concurrently. And in this case, we further assume that there's some communication between these tiles, so every party can send messages to any other party. We are making here the analogy of each tile belongs to each party, so it's natural that what we are trying to achieve is to provide security even if some tiles are compromised by an adversary. More precisely, the adversary can either drop or even can tamper with any intermediate as long as it stays within the set of tiles that the adversary controls, namely all but one. So this is, of course, naturally also inspired by the wire probe model, but here, normally in the wire probe model, we limit the number of probes that the adversary can probe, and here we are rather not doing that, but rather limiting the number of tiles. Within a tile, he can probe any number of intermediates. So we are assuming certain structure in the underlying architecture of the chip where we want to implement our design. So again, I'm going to talk about the different attacker models. So first, for site-channel security, as I have said, the adversary is allowed to probe any number of intermediates as long as they stay within the tiles. Now for fault analysis, we actually consider two different adversaries. The first one is a quite powerful adversary that is allowed to fault any intermediate as long as also it stays within these T-1 tiles. This is naturally inspired by SPDC, and with this we try to capture the capabilities of a multiple-shot DFA or even multiple lasers. Currently, there are already labs that do multiple-shot DFA and also labs that can position several lasers in the same chip, but this is fairly complicated. So I think we are capturing a little bit more than what we can perform nowadays in the lab, but it's a matter of time and practice that we can get more lasers shooting the same chip. In addition, we also consider, for fault analysis, we also consider another adversary that can fault everywhere in the chip, so that can affect all the tiles. This is quite relevant because there is also a broad family of attacks that consist of, for example, and generating an electrical arc in the vicinity of the chip, and this arc induces some strong electromagnetic field pools, and when this field gets into the chip, it induces some currents, and then it flips values everywhere in the chip. Actually, normally here you don't have a lot of resolution or a lot of precision. You are just affecting everywhere. This attack is like a big hammer that you are hitting the chip somehow, but you don't exactly know what you are affecting. But this is still, we try to model this adversary with this random fault everywhere. So this is a picture of our laser setup in Leuven. So there's a microscope there. There's a microscope objective here. We use that microscope objective to focus some light, some laser light, that laser light, when it hits the silicon and chip, but the photoelectric effect, it generates some currents, and then it can flip some bits, or send bits to one, to zero, et cetera. It's really powerful and it's a great deal of an art to have fine control over it. The adversary model is, of course, bounded by the resolution, whether in time or in location, whether the fault is transient or permanent, we try to account for them all. And you can see that, for example, the laser normally goes either here on top or this is an optic fiber that is connected to laser that we don't see, but this works. This is an example of an instantiation of the attacker that we try to model with the second adversary model. This is essentially a short loop antenna that we, and this is some electronics, that essentially we are discharging one capacitor rapidly over this antenna, and this generates a pulse that goes into the chip and that flips random values everywhere. This is just to give you an example of the two different adversarial models that we try to account for. So I will skip this and maybe get back later here if I have time, but I want to start with a description of how we compute data in this model. And, of course, I will split it in two parts. First, how do we represent data and then how do we do useful computation with this data? So the main idea is as modern successful NPC protocols is not to start with a verifiable secret sharing scheme to confer active security. It's not to do that, but to start with a possibly secure secret sharing scheme and add some information on the MAC tags to the values that are being handled. So that's the approach that we are following here. So let's say we want to represent a value A, and what we do is we handle shares of that value and then shares of the MAC tag. Shares of the value is just additive secret sharing, so you split value into a tuple of D elements. The splitting is probabilistic in the sense of any D minus one tuple would give you nothing about the underlying secret A, and all D will sum up back to the original secret A. The MAC tag, the answer MAC tag that you actually don't handle is a multiplication of this value times a secret constant that is the MAC key. We handle this MAC tag also shared, otherwise it would leak and it would be more susceptible to fault attacks by following the same procedure. So we also apply additive masking to this tag. So we split this tau of A, this MAC on A into D shares such that this sum to the original tag. The MAC tag is also shared, but does not have any tag itself. So as you know, if you are already familiar with SPDC, this sounds very, very familiar, it's exactly the same. So now we can explain how to perform computation. So linear operations are very easy because this splitting is amenable to any linear operation, such as addition or multiplication by a known constant, so I will rather explain how we do a nonlinear operation for example, multiplication. So let's say we want to multiply x times y, so we are given shares of x and also shares of the tag of x, similar for y, and we are also given an auxiliary triplet, an auxiliary data that is the sharing of three values, A, B and C. So we'll talk more about this later, but for the moment, it's sufficient to say that A, B are random values that are shared following what we just described, and also it holds that C is the multiplication of A times B. We assume that this falls from the sky and then we can proceed with the computation. The computation is performed in four steps. So the first one is blinding. I'm trying to go to the next slide. The first one is blinding. So we essentially blind our input x and y with the corresponding random value A or B. This step is local. This applies only to the values and also to the mag tags, but we keep the mag tags local also because in the next step, what we do is we unmask the values that we just compute in the previous step. You can do this because A, B are random. So this unmasking is performed by a broadcast of its shares. So parties are connected as we have set and they can send to each other and these values are they can unmask epsilon and eta here. This broadcasting, if it's implemented in hardware, requires a synchronization element, of course, otherwise you could incur in, you could incur in some unmasking via glitches and so you have to be careful. We acknowledge that in the paper. You would have to capture intermediates before performing this excerpt into a register. In the next step, we try to detect if somebody cheated while broadcasting this value. So we perform a mag tag verification on this value. Here we are deferring slightly from SVDC that differs this checking until the end of the computation but this operation also requires some communication. After that, we perform the actual multiplication. So until now, probably nothing has happened, but nothing useful in terms of computation has happened. And here we use the Viva relation to mix local shares of X, local shares of A, that is some part of the auxiliary data and the publicly open values or the unmask values to compute shares for C, which is the multiplication of X times Y and as well as the tag. It's tag. So this auxiliary data is required. Actually, probably all the magic comes from having this triplet that satisfies that relation. And of course, it does not automatically fall from the sky. We cannot use a simple PNG like in conventional masking, but we have to compute this beforehand. We use, we start by generating them with a passively secure multiplier and then we add security for active adversaries with a relation verification step that you have the details in the paper. So what we can provide here is that the union of all minus one tiles does not disclose any secret. So we could argue security against D minus one order DPA attacks. In the same way that we go from prop security to statistical DPA security, we also provide some security against fault attacks so fault attacks are actually always possible. I mean, the probability is non-zero, but you have to just to get lucky and the probability that you get lucky depends of course on the length of the MAC key. It's parameterized. This security is parameterized. So if you can afford it, you can have longer MAC keys. You will pay for that in area, but you could get higher, better bounds on detection probability. It's maybe worth remarking that this detection probability does not depend on the structure of the fault. So contrary to other approaches, for example, where we use linear codes, for example, there are many schemes in which that accept undetectable faults. So if you are able to inject the fault such that, for example, it becomes a code word, it will pass undetected or are targeted towards faults with a high weight, with a many bits set, here we try to stay away from that. Also, we acknowledge the existence of a combined adversary and we inherited this kind of properties from the NPC protocols. Not everything is covered because for example, we are not using commitments, but still for our cases that are relevant to us, yeah. I have some minutes left, so I will talk about some recommendations that we wrote. So the first one is an AES in hardware for FPGAs or ASICs. So there are, as you know, the most difficult part to compute probably in AES is the S-Box, it's the only nonlinear element. There is an expression for the S-Box using operations in GF2 to date. This looks pretty ugly to implement, but actually you could implement in, I think, six squares, seven multiplications, and 13 cycles or something like that. You could also split the S-Box into two stages, like first an inversion and then an affine operation over bits, and we go for this route, actually. The inversion we compute as x to the power of 254 and we split it using that expansion chain, which requires only two kinds of operation. First is expansion to the power of five and then another operation that we need is x to the power of four times y to the power of two. So I have only explained how to do multiplications, but in the paper we also explained how to do more complex operation, such that, for example, multiplication after linear operations that you could apply to carry out x to the power of four times y to the power of two. And naturally, here you would not need triplets, but you would need a different kind of auxiliary data that is even bigger and carries more operations. We synthesize the design using these primitives and you get four cycles for the inversion, one extra cycle for the affine transform, and in total making five cycles. This is a picture of the data path. Okay, this is good enough. Synthesis results, we have, we synthesize this for using standard cells for ASIC. The size is very large, it's quite large. It's just not worth it to see that, for example, the actual computation takes about 30 gates for two shares, but the pre-processing takes probably three times more. So actually, most of the work is pushed to the pre-processing, as we imagine. I have to go a little bit faster. I mentioned that we have also, so we tried to get a simpler block cipher to give a proof of concept and we went for catan, which the only nonlinear operation is an ant. And this one we could fit into an FPGA and we could take measurements and then could empirically confirm that we are reducing leakage. What we have here, the results of the test evaluation of a non-specific leakage detection test. On the left, this is fairly standard practice. On the left, we have the results of the evaluation when the countermeasure is switched off by switching off the randomness. And on the right, we have the change when the only thing that we do is we switch on the randomness. And well, you can see here that the statistic surpasses some threshold, indicating that there is evidence for leakage. And when we turn on the countermeasure, the statistic remains within bounds so that it indicates that the leakage is reduced. And so this is for first order and these are for second order. As expected, there is leakage and because we are only just handling two shares. So synthesize for another version with three shares, similar results hold. And we also have prototype some version of a bit slice in software for an ARV. And I think this is enough for future work. We definitely would like to improve the cost, probably simplifying even further, and especially the preprocessing stage. Okay, let's thank all the speaker of the session. Thank you.