 Well, my name is Paolo Savini. I am a compiler engineer intern, as you can see at Ember Cosmo. And I'm here to talk a little about security and how the compiler can help the programmer to strengthen their code against some kinds of threats. Actually, how many of you were in this dead room last year? Well, quite a few. Yeah, yeah. And some of you may recall, actually, that there was a similar talk last year from Jeremy Bennett. Please raise your hand, Jeremy. Yeah, yeah. He made me raise my hand last year, so now my turn, right? And he talked about the secure project, that it is an open source project that Haynes had to add into LVM some tools to improve security of the code. Anyway, I'll talk about it later. This talk is a kind of follow up of that talk, because it's about my contribution to that project. So, I'll talk to you about what kind of threats, I mean, the threats that are based on information leakage of devices. Then I'll introduce you the two projects in which I was involved to do this work. And then I'll talk to you about the bislicing, because bislicing is basically the technique on which is based my tool that I'm developing. And then I'll talk about the tool, the bislicer, that you may guess what it does, right? And then we'll make a few final considerations about what we are going to see. So, firstly, when we talk about information leakage, a lot of, a huge variety of these may come to our mind. But what I mean here is information leakage related to small trunk devices. Specifically, a trunk device is designed to perform encryptions. So, we were talking about small chips, for instance, on smart cards or smartphones. So, not like chips of general purpose computers, right? So, small chips that perform few operations, so that host a few processes. And why is that important? Because on such chips, some features, interesting features like the power consumption of the chip, or the execution time of the program, also any other kind of a mission that's quite unavoidable, like electromagnetic leaks, could be related more easily than our general purpose CPU to what's going on on the chip. So, if somehow these interesting features, these behaviors can be related to the sensitive data that has been processed, you have what is called a side channel. A side channel that can give an attacker the opportunity to get some clues about what the sensitive data are, without the need to find a flow in the algorithm itself or to use a brute force attack. So, that's why it's so dangerous because this way, sorry, this way some algorithm may be quite powerless against this kind of threat. I'd like to give you an example before moving to a real example. Imagine, for instance, that we are using our smart card on a reader, and the receptionist of the hotel probably puts a device near that that can, for instance, intercept the electromagnetic emission, right? A sophisticated tool. If, for any reason, there is a different electromagnetic emission, according to the fact that in a certain moment a zero or a one of the key is being processed, that could give the device a clue about whether at a certain position on the key there is a zero or a one. That's the kind of fact I'm talking about. This is quite a silly example, but should give you an idea. I also put some real example from real encryption libraries. This is quite old, actually, and it has been solved. Don't worry. I also put a reference at the bottom of the caption so you can find these security issues in the database, in the CVE database. Just to give you an example, this is an example of a timing side channel because someone put the padding in the so-called buff on the top row and then used P to point to buff, but P was also used in a condition of an if closed. An if closed, that if satisfied, if the condition satisfied will make the subroutine return. That means that the flow control here depends on how, in this case, how long is the padding? With the proper tool and proper, of course, sophisticated tool based on statistics and so on, it's not so easy to understand that, but with the proper tool, someone could understand how long is the padding and we don't want that to happen, of course. Another example of timing side channel happens when we use sensitive data to access memory, for instance. In this example, the highlighted variables are the sensitive ones. Well, you can see that Y has been used, while it contains some sensitive data, has been used to access the array R and in this case, an attacker that has access to the system or may also influence the cache, may understand whether what is the content of Y by monitoring the events of cache hit or cache miss, of course. Of course, you need to have proper tools again, but this is anyway a possibility and we have to manage that too. Several projects are risen up to some of these kind of issues. These are a couple of them. The latter project that is run by the Cryptology Research Group of the University of Bristol particularly aims at developing some tools that help the programmer test their devices or their implementations against these kind of effects, side channels basically based on informational leakages. Their aim actually is to provide these tools to any programmer rather than well, because actually to test devices and systems against these kind of threats requires you to have a deep knowledge of side channels and sometimes also big resources that not all the labs have. So that's the aim of the latter project to bring the expertise in liquidity attacks to the table of any developer. And they partner with Ember Cosby in order to achieve their goal and that's how the secure project was born. That is as an open source project, it is more focused on the compiler because it aims at adding some tools to open source compilers like GTC and LVM that seemingly help the programmer to write more secure code than these kind of threats. Here are some of the things we are developing, these projects are still active and we are working on these tools, a tool that can automatically be sliced to some selected region of your code, a tool that raises some sensitive data left on the stack by a subroutine for instance and that can be so collected by a proper attacker and then some warnings that let you know whether in the code you are writing there are some bad practices, some bad choices of implementation that could lead to a side channel later. So I have to ask you how many of you heard about bit slicing? More than what I expected, right? Well, just for sake of completeness I will explain you briefly what it is. Starting from this previous use before we had the microprocessor, bit slicing was basically used to obtain, let's say, a processor with a longer worth by ideally, let's say, putting together a tool together and one bit processor to build a virtual and bit processor, let's say, like a single multiple data system. Of course in order to do that you need also to transform the software that had to run on this system. That means that you had to be sliced, sorry for that, to be sliced data and the arguing as well. As I was going to show you now, you could do it also in software by, let's say, simulating this virtual processor on a general-purpose CPU by, of course, bit slicing the data and bit slicing the algorithm and then I explain why we should do that nowadays. Before that I would like to show you an example of simple bit slicing. Let's take for instance that array as an input on the left and let's imagine we want to bit slice it. This array on the right is not complete, of course, because it should be eight times longer than the one because it will have one element per each bit of the original array so much longer and simply we put each bit of the original array in a new element of our array of slices. We call them like that from now on, the slices, right. What about the algorithm? Here is a simple example. Let's imagine that the algorithm that we meant to run on that input was that loop at the top, just a simple sole operation between the device of the arrays. If we want to bit slice the algorithm, we have to substitute that each sole operation of that loop with a set of eight sole operations per form each one of one of the bits we've seen before. Right, we're wondering why should we do that. Dividing data and adding instructions. Besides, you may have thought that just some algorithms can be bit sliced and moreover only some of this would benefit from this kind of representation. For instance, the single input multiple data systems would benefit from this because they will gain a better throughput, as you can see from this example. This is an evolution of the previous example in which we are using, we are taking more instances at the same time of input and we feel in the remaining bits of the slices with bits of the other inputs, as you can see here, in an orthogonal way. This way, you just make a simple calculation. Instead of making, well, eight times more the sole operation and just processing one array, we make eight times more the sole operation but we process eight input instances. So basically, in this case, we balance the throughput, the loss of throughput, but we could also use slices that are longer, for instance, 32 bits, and in that case, we will gain throughput because we will process 32 input instances at the same time. But this is just about efficiency. In cryptography, instead, B-slicing, as we suggested, has quite an interesting technique to address the problem of timing side channels. Why is that? Because as I told you before, the transformation of an algorithm actually implies that the original algorithm is transformed into an equivalent version just made of atomic Boolean operations. So operations can be performed on just one single bit. And as you may know, the atomic Boolean operations have an execution time that does not depend on the input. Think for instance, the XOR operation or the logic operation like AND or OR. And so if we manage to translate a whole algorithm into this equivalent atomic Boolean version, we obtain an equivalent version of that algorithm that has an overall execution time that does not depend on the input. And this is very crucial for a block cipher, for instance, that, moreover, is also usually a single input and multiple data system. So that would gain also throughput from this technique. As I was saying here. So now, the bislicer. And you may, as I said, you may guess what the bislicer does. Practically it is an LLVN pass as many of you may have heard in the first talk of what a LLVN pass is. A pass that automatically bislices, as I said, select the area of your source code. And I also meant to add the possibility to have, to manage yourself some bisliced data. Because the aim of the bislicer would be to spare you the need to bislicer your data and your algorithm that can be quite painful because you have to isolate all the bits and put them in the proper place and then you have to transform the algorithm equivalent, equivalent or formal version. And if you do not know how to do it properly you can also end up in mistakes. So we'd like to provide an automatic mechanism to do that for you since it's quite mechanical. But from time to time you may need to do it to manage the slices on your own and then I'll show you how it should work. So that's how it should work. We'd like to see it work with the automated bislicing. I'm saying I mean because it's still a work in progress. So I mean to introduce a pragma that takes, for instance, as arguments the data structures that need to be bisliced and that encloses the part of the code that has to be bisliced. And so that's what would happen from the previous example. That's the compiler would create the second version of the code and by hiding it to you, I mean you don't need to care about it and then it does it automatically. Well, while about the other behavior I was talking about so the case in which we want, we need some bisliced data for instance because our implementation of the block cipher needs us to handle these slices on our own. It works this way. We need to allocate ourselves, for instance, an array of slices of the proper length but it's not so difficult, of course. And then I mean to add a bit in function like that that could take the data you want to be bisliced. They are contained in an array at the top and then the array of slices you want this data to be bisliced into. Of course they also mean to create some built-in function that can take more input instances and put them in a complex way in a single array of slices because of the reason you've seen before for the SIM systems. Right. Now let's make a step back because bislicing might sound wonderful but the problem is that it is not because we don't have to forget the side effects of bislicing. As you've seen before, bislicing implies an increase of the allocated space and those an increase of the operations need to be performed because you need to decompose the data, you need to manage the slices also. And as I said before, only some arguments can be efficiently bisliced. We are not talking about only security but also efficiency because efficiency is almost as much important as security in box ciphers and so we have to consider also that aspect. So as I said, the SIM systems, as we've seen before, are quite good candidates because they may gain a lot of throughput or will be slicing. And box ciphers, they may also gain the precious feature of the independence from the input of the execution time. And so while it can then gain that resistance against the time inside channel attacks we've seen before. But of course, never forget that any dependency that may occur between the several bits of the same input instance may cause a loss of efficiency because they may prevent you from processing several slices in parallel but also could prevent sometimes also the bisliced transformation and also remember that also in box cipher they look like the best candidates ever. There might be some implementation of the same box ciphers that would not benefit from the slicing just because of the implementation choices. And so that's why I conclude here by saying that this kind of tool may be useful for this reason but should be used very carefully because if you are concerned about efficiency and security, of course, you first need to understand whether your box cipher implementation really fits it. Thank you for your patience. And if there are any questions or suggestions I really open to suggestions, please ask. Yes, please. In box ciphers you have a view of the head boxes where the operation is that you take a bite and you go through the head box and you get a bite. These head boxes are not linear by design because we don't want to have a linear operation so how do you be sliced in this kind of operation? Sorry, could you probably repeat the question about you were saying that usually in box ciphers? Yeah. So yeah, and so how do you be sliced in this operation, the head box operation? Can you be sliced in this one? So is asking me about the substitution operation that in some box ciphers happen like in the AES algorithm for instance, so the substitution boxes that are used to remove the linearity of the box cipher itself. You raise a very good point because that is one of the first features I was trying to be sliced in the pre-send box cipher. I don't know if some of you heard about it. It is a kind of light version of the AES. In that case, well, just to explain how it works, the plain test or that's around the ciphertext anyway is being encrypted by substituting its bytes with the values found in a table, the substitution table. So in this case you have that you're just collecting data from memory. Actually, I kind of solved that issue in the pre-send box cipher by implementing those small substitution s-boxes, let's say the substitution tables, with some logic functions that corresponded to them by using methods like the of the minters and that kind of methods. The problem is that in the pre-send box cipher those s-boxes were quite simple because they were just, well, they're just 16 elements for a four-bit long. But AES uses quite bigger substitution tables and also many other block ciphers use tables to do their job because it's very, very more efficient. So, well, that problem is still open as I discussed with Daniel Page of the ladder project. It's still very open because some of those tables are, all of them actually are based on complex mathematical calculation and studies. So it's still very open. Sorry, you're asking me whether I use a mixed form of, I mean, in that case, if I be sliced part of the program and the other part is just a table, no. So you're asking whether I'm using just more operations like and so, or I use other operations also. You mean to implement those s-boxes, that's what you mean, all right. Actually, I implemented it quite a while ago, almost a year ago. And actually, I think I used end operations also, right. Because I used the method of the, I don't know if you, the method of the mean terms, you know, I just, since they were small s-boxes, I just seen all the outputs and then implemented them with this logic function made of mean terms. So, or end operations basically. It's not so efficient, that's the problem. I think that's the way also some of these tables are not translated this way. But at least for that small, or slightly worked. Yes, please. On your examples, you showed XOR as an operation turning in 8.XOR into 8.1 with XORs. I would expect XORs to be time independent even if it operates on 8 bits. Is there an example where it more clearly shows on 8 bits you could expect it to be implemented to be a time independent operation? Right. So, okay, you asked me about an example in which we have an operation that depends on the input for the execution time and that is transformed into, well, a version equivalent that is not dependent on the input. Well, unfortunately I didn't bring that kind of example here and I didn't get so far to transform that kind of things but just to let you know how it works, usually when you have that kind of operations that depend on the input, you may choose to implement an equivalent version of those with lodging functions the same moment ago and that's how they usually do in the papers you read about it. There are several papers on this slicing and they, for instance, implement these equivalent versions with really long expressions of atomic operations of the bits. I think I put some examples here that's quite interesting because I put a couple of links of a couple of papers. The first one actually is about the sliced version of the AES algorithm and that's quite interesting to see how they solve the kind of problems we were talking about a while ago but if I remember well, they use a lot of logic functions like that. So if you would have an algorithm where, for example, there would be a multiplication operation or a division and somehow you would run on a core where that's starting to depend to basically reimplement that whole algorithm. Well, it's all a matter, okay, you ask me what if I have a multiplication or an addition or a subtraction division or that kind of stuff. I think it is just about compromise. I mean, whether you prefer to lose a little efficiency just not to have to transform completely your block cipher because actually your security needs are quite met anyway or whether you are willing to do it and transform it completely because there are some abstractions, some implementation of those operations that are a little more efficient for this purpose but of course there's still the dependency for instance in the addition of the carry and that stuff so that kind of dependency cannot be completely avoided but at least it can be improved with particular implementation of that algorithm but if you want to change completely your block cipher because you think that today your security block cipher wouldn't be compromised by that that could be your choice, I mean, yeah. If that asks a few questions. Yeah. In one of your slides you had this function that turns regular data to size data. You mean with a multiple input or no? Just. Sorry. Yeah, this one. Right. Yeah. So you're losing time here because you're changing your data representation so even if you increase your input here it's like a load in the same keyword you're losing your time. Is there any trick to implement that efficiently or it's just basically it's setting them. Is there anything special you can share? Right. So you are suggesting that by using this function so you mean by performing the transformation of the data, right? You may lose time, right? Well, yeah, that's true. I mean, one of the warnings I'd like to include when in the instruction of this tool is to be careful about how many times you do something like that because every time you do it you lose time for the transformation that the compiler will introduce in the program, of course. And so to do, as I suggested, we'll be to do it carefully. And then you ask me about any trick to do the same thing more quickly, right? For example, using SCMD instructions if it looks possible or... So you probably are talking about an implementation that from the beginning is already, let's say, a bit less oriented, right? So, yeah, that's possible, of course. But that's really possible. I would suggest if you know what you're doing, actually. Yeah, because what I was telling about this tool is that it aims at sparing you this kind of work or sparing you the need to, you know, be too much... What about the qualifier on the array type that states this will be transformed in a slight version of the compiler? So you write it the normal way and any data access will be done in the sliced way. So you don't need it to perform the transformation because in the beginning it's in the sliced... Right, so you're suggesting to simply adopt a different way to access the data? Yeah. So for instance, you're suggesting that we might, instead of using this function we might access the bits of the array called array, sorry for this example, and properly, directly in a proper way just to avoid transformation, yeah? Oh, yes, of course. I mean, some implementation already do that. The implementation that I still don't need my tool already do that on their own. But yeah, that's, again, that case in which probably some people just prefer to start with the slicing by using a tool that already does it for you. Of course, any more efficient choice is, of course, preferable to that one, yeah? Is that answering your questions? Perfect. Right. Any other question or suggestions? Right. Okay, thank you.