 So now to the last talk of FSE 2019, which is Tower Slow Energy Stream Ciphers by Subadi Panik, Vasili Mikhalev, Frederik Amknecht, Takanori Sobe, Willi Maia, Andrei Bogdanov, UI Matanabe, and Francesco Regazzoni. And Vasili will present. Thanks a lot, Stefan, for the introduction. So can you hear me well? Perfect. This is the work done by many authors, as you can see. So I'm going to talk about energy efficiency in stream ciphers. So a short outline of my work, of our work. So I will start with introductions. So in principle, I will explain why we are doing this, give some motivation, explain the current state of the art. Then we will go to the comparison of the stream ciphers with block ciphers to figure out in which scenarios, which ones are more efficient energy-wise. And then we will discuss the energy impact of individual components of stream ciphers and of different design decisions. So in principle, here our goal was to figure out how do we design the stream ciphers as energy efficient as possible. And I will come up to the conclusion then. So although these two terms are really related, energy and power, they are in fact different. So people all often confuse them. In fact, energy is the time integral of power. So energy is linked to the total electrical work done by the device. So I mean, we can consider it probably as the most fair matrix to analyze the light witness of this or that design. So in the applications where energy is crucial are the battery-operated devices by the implants and different IoT devices. So when we talk about designing a cryptographic primitive, there is usually a trade-off between speed, costs like resources, amount of resources, and the security. So the security is something that is fixed. So we cannot do anything about it. But usually the goals are either to increase the throughput or to reduce the resources. So if we want just to increase the speed, I mean, if we don't care about anything else and our ultimate goal is just that throughput, I mean, that's fairly easy. So we just use more resources for that. So for example, I mean, we can parallelize everything, we can just copy the primitives and implement them twice, whatever. In the case, when our goal is just to reduce the area and the power, which is closely related to area, it's also quite simple. So in principle, we do our design in such a way that we use minimal logic and minimal area and whatever. But we don't care about the speed. So we do it really many times with very simple, let's say, Boolean functions. And after enough number of updates, our state is already well confused and all the initial bits are distributed very well. So this is another way to go. But when we actually want to optimize energy, this is not so simple, because here we have to consider both. So we want to increase the throughput and or decrease the power consumption. So in principle, the main idea while optimizing the energy would be to increase the throughput by larger factor than you increase the power or the other way around. So reduce the power consumption by larger factor than the throughput. So what do we have so far? This area of lightweight cryptography, I don't know how old it is, maybe 15, 20 years already. But by now or by recently, most of the focus have been put on the designs for low area and, I mean, quite many designs for low power. But there was not much about low energy consumption until the work of 2015 of Warnick and the others, where at SAC, they carefully investigated the energy consumption of block ciphers. And following this research, the block cipher midori, which was specifically designed for energy consumption, was developed. However, the energy efficiency of stream ciphers was never investigated in a good manner. So this is actually our goal here, to investigate different design components and to see how stream ciphers perform energy wise. So what we start with is the comparison of stream ciphers with block ciphers in order to see if it makes sense at all. Because, I mean, if, let's say, energy of stream ciphers is much worse than block ciphers, maybe it doesn't make sense to consider it. And actually, there is a common belief when people think that because of the long initialization phase of stream ciphers, they only make sense when we want to encrypt really huge amounts of data. But for the short amounts of data, which is usually the case of the lightweight crypto, I mean stream ciphers are much gross as compared to the block ciphers. So we wanted to figure out if this is true. And, I mean, in this work, we analyzed the following ciphers. So for the stream ciphers, we took the two stream finalists, grain and trivium. So these two have two different conceptions behind their design. So grain was developed to have the minimum internal state size. So its internal state size is just twice the key size, which is given by the time memory data trade-off attack, while the trivium was designed to have as small update and output functions as possible. So they didn't care too much about the internal state size, which is much bigger than this trade-off. However, they managed to use a really, really simple functions. So we also took a look at trivium, which is a tweak of trivium, but for 128-bit security. So just some small changes were included into it in order to get rid of the attack of the normal trivium if 128-bit was used. And we also looked at two recent examples of stream ciphers. These are plantlet and lizard. So some tricks there were used in order to decrease the internal state size below this trade-off curve. So in principle, they used short internal state size, but they actually have to access the key more often. And regarding block ciphers, we can see that present, which is a standard, which was standardized, and it was also shown to be very extremely efficient for energy consumption. And of course, the mid-dory, which was specifically designed for it. So if we take a look at the best cipher configurations with respect to the energy consumption, I mean, I will later explain what the best cipher configurations are when I will talk about the design decisions and so on. So we can see that trivium is actually much more energy efficient than the best mid-dory implementation. But this is when we actually want to encrypt 1,000 blocks. And one block is 64 bits. So this is nothing really surprising. This is what we more or less expected. But let's also see what happens when we look at the small rounds. So at this graph, you can see energy consumption for the different number of blocks to be encrypted by different ciphers. And what we can actually see here is that the mid-dory actually provides the best energy consumption, the lowest energy consumption, when we want to encrypt just one block, just 64 bits. However, when we want to encrypt two blocks of data, mid-dory already is worse than, for example, grain version 1 and grain 128. And after six blocks of data, trivium outperforms all other candidates. So we actually focused our research on these two ciphers, grain and trivium. So now we're going to discuss, I mean, the insights, what leads to this behavior. And the best thing or the most efficient thing what you can do in order to optimize the energy consumption of stream ciphers is to unroll rounds. So in fact, the idea is to increase the throughput again at the cost of the area. But this is not just doubling everything. So the idea is that we only replace the logic designed for one round by the one which implements several rounds, as you can see here. However, we keep the same register. So there is no need to copy the registers in order to double or triple the throughput or whatever, but just to copy the logic. And the stream ciphers were especially designed to allow this to be easily done. So in fact, most of the stream ciphers used feedback shift registers. And if the last bits of each of the feedback shift register is not used neither in the update nor output function, we can simply copy these functions. And then instead of shifting the values in the state by one position at every clock cycle, we can shift by two. This allows us to double the throughput by the cost of just copying the functions. So the same strategy of development was done in Trivium, for example. So in fact, many stream ciphers were designed to allow this easy unrolling. So in case of Trivium 1, the 16 bits are neither used, the 16 last bits are neither used knowing the output, knowing the feedback functions. So this allows to easily unroll 16 rounds. So this is the 16 bits per clock cycle version of grain version 1. However, third unrolling is also possible, but it requires more complicated of the algebraic structure of the update functions, because simply adding more copies of the round functions will no lead to the correctionality anymore. So after we start increasing the throughput by unrolling further rounds, we still get the improvements of the energy. But this happens only until certain point in time. And beyond a certain degree of unrolling, the increase in actually results into increase of the energy consumption. So there is sort of parabolic behavior. So if you take a look on the table, for example, we can see that up to 20 rounds, the power consumption or the energy consumption, not the power of the energy consumption goes down. But after this, it goes up again. And in case of trivium, this number of rounds is 160. So up to 160 rounds of unrolling, we get the decrease in the energy consumption per block. And afterwards, it's been increased. So in order to figure out what's actually happening internally, we took a look on the power shares of grain and trivium for different degrees of unrolling. So recall that the update function of grain is much more complicated than the update functions of trivium. But it uses less internal size and actually less number of clock cycles in the initialization phase. So for one degree of unrolling, so the classical implementation, we can see that the logical grain takes extremely a low part of the entire energy. But as soon as we start increasing, it's actually taking most of the energy. Already after 32 degrees of unrolling, we have around 64 percent of energy taken by the logic. And for 64, I mean, it's almost everything which is taken by logic. So if you can see, for example, here, we increase the throughput twice. So 32 versus 64. But the power consumption taken by logic increases more than four times. In case of trivium, this does not happen so fast. So in the beginning, again, the logic takes just a small portion of the entire energy. But even in the case of the trivium 160, so when we unroll 160 rounds, we only take around one half of the entire energy. And the other half is taken by the register. So the lesson that we can learn from here is that in order to really optimize the energy, just design a cypher which has very small functions and don't care too much about the state. So this is the major thing that you can do. But we also decided to look at the more small design decisions which could also help us to improve the power consumption. So the first one was to consider different types of flip flops. So in fact, the stream cypher's registers normally have two inputs. So in the beginning, they have to be initialized with the key on the IB. And during the clocking process or the initialization phase or the key stream generation phase, the registers are being updated by some functions. Therefore we have to have two inputs and therefore we need a multiplexer to figure out which input goes into the register. So there are two ways of realizing it. Either taking a simple flip flop with a multiplexer before, or we can have a scan flip flop which already provides this functionality. So this is the result and we can see that for all cases the scan flip flops outperform the regular ones. So when you're interested in the energy consumption, use scan flip flops. Another thing is the architecture. So the feedback shift registers can be realized in Galois and Fibonacci configuration. So in fact, this is based on either we have just one update function in one register, or we just split the update function into smaller ones and to feed them into different registers. So in fact, there is no big difference, but the Fibonacci ones were a bit better and they actually also allow us to easier unroll. So it's better to use Fibonacci ones. And another parameter that we looked at is how do we actually implement the round function. So there are different ways. The lookup table, we can just give the functionality to the synthesizer and ask it to optimize. And using the decoder switch and code configuration, which in principle was shown to be optimal, for example, for implementation of the RIS as boxes. So in case of the stream ciphers, the most efficient was always to give it to the synthesizer to optimize it. So just to summarize the lessons that we've learned, so it's better to use scan flip flops in the Fibonacci configuration, let the synthesizer to optimize everything. And I mean, the really key way to optimize energy consumption is to use simple update functions. So the state size is less important. And in principle, the initialization time can also be large, but after a certain time or after the certain degree of unrolling, it becomes less important. So we looked at different parameters and we can summarize that for longer data stream, multiple rounds unrolled stream ciphers perform better than the block ciphers. So even for two blocks of, if you want to encrypt as much as let's say 128 bits, the energy consumption will be less if we use certain stream ciphers rather than the best block ciphers. And the key to it is the simple functions. So the further steps would be to come up with even more energy efficient designs. So we tried to beat Trivium and it appeared to be not so far. So it seems that the design was really good in terms of the energy consumption. But for example, the designs Trivium allows only for 128 bit security or for 80 bit security. But if we want to go for 128 bit security, it seems that we can beat Trivium at least slightly. And so in principle, another possible further step could be just to fix some parameter and try to optimize energy under these restrictions. So for example, if we limit the area size, I don't know to 2000 gate equivalence, let's say which is a common threshold for lightweight devices, we can optimize the energy consumption there. Or we say that we have limited power consumption just supported by the device. What will happen then? Or if we need to have some fixed throughput, we can also take a look what would be then the best way to optimize the energy. So thank you very much. Any questions? And I have a question actually. So could you maybe give some insight how you arrived at this energy numbers, like what is the target platform architecture? Or is there a difference between what architecture you aim for regarding the energy consumption? So in principle, we were working with 90 nanometers logical process. And I think the frequency was set to 10 megahertz, just because when we go for the low frequency, the role of the static power becomes quite high. So we actually wanted to see how it deals with the dynamic power. And when we talk about 100 megahertz or high frequencies, the static power almost doesn't play any role. So does it answer your question? So it's A6, 90 nanometers. Any other questions? If not, then we can close the session and thanks to Vasily again.