 I welcome you to this presentation of the paper, Perfect Trees, Designing Energy-Efficient Symmetric Encryption Primitives, originally appearing at FSC 2022. This is a joint work of Andrea Caforri, Subadi Panik, Yosuke Tolle, Willi Meyer, Takanori Isob, Fuukang Yu, and Bing Zhang. The energy consumption of cryptographic algorithms and hardware is a key aspect when it comes to their integration into low resource environments. This fact is further supported by the ongoing NIST lightweight cryptography standardization process in which energy consumption is one of the selection criteria. The cryptographic literature features a handful of works that investigate the energy consumption landscape of hardware ciphers. Trail braces in this regard were papers by Kerkhoff in 2012 and Batina in 2013, both of which benchmark various block ciphers in hardware with a particular focus on energy. In 2016, Panik devised an energy model for our round-on-roll block ciphers as a quasi-quadratic equation, where big R denotes the total number of rounds in the block cipher and the constants encode various circuit aspects. In 2018, Panik identified stream ciphers as the most suitable choice when it comes to the encryption of larger quantities of data. In particular, among the investigated stream ciphers, Trivium outperformed all other schemes, including supposedly efficient block ciphers. Nevertheless, those results were observational in nature and the comprehensive energy model for stream ciphers remained at large. In our work, we will fill this gap and in the process propose a straightforward recipe for designing energy optimal hardware stream ciphers. It is important to note that the reason why a heuristic energy model for stream cipher is harder to conceive is due to the high degrees of R, which enormously complicate the underlying algebraic expressions and thus their study. In contrast, for block ciphers, R is usually relatively small. Spelling out our contributions, we devised the first heuristic energy model in the realm of stream ciphers that links the algebraic topology of the update function to the consumptive behavior, which is applicable to a wide range of Trivium-like, Grain-like and Subterranean-like constructions. A building block of our energy model is re-implementing our round unrolled Trivium by tessellating the update function strands into individual circuits, which yields a more gradual and slower power consumption increase with respect to the unrolling degree. This then provides a natural sac for our energy model. Secondly, we leverage this model and propose a new energy optimal stream ciphers in the Trivium family that reduce the energy consumption by up to 25% compared to the original specification. More importantly, however, for the first time, it is now possible to design stream ciphers that specifically optimize for energy. Power slash energy consumption in semiconductor circuits is due to two main sources, stag. This is leakage current and other currents drawn continuously from the power source and dynamic charge and discharge of load capacitances, i.e. 0 to 1 and 1 to 0 transitions. Power is the rate of energy consumption. In other words, energy is approximately equal to the product of the average power and the execution time, which is simply the time integral over the power. By this logic, the energy consumption should be independent of the actual clock frequency. A fast clock leads to a smaller execution time, but higher average power. And analogously, a slow clock increases the execution time, but yields a lower power consumption. Let us now transition to the meat of the paper. The Trivium update function consists of three independent logic blocks, T1, T2 and T3, tapped from the state register X1 to X288. We define each individual logic block as a strand of the following form, A plus B plus C times D plus E. A feature-rich library with three pin linear cells can implement one strand with three gates, one two input NAND, one two input XOR, and one three input XOR. Hence, the entire Trivium combinatorial layer then consists of 10 gates, three strands, and one three input XOR for the key bit generation. We investigated several circuits and compilation directives supported by the Synopsys design compiler. Regular, a run-of-the-mill synthesis of the entire circuit that leads us up to the tool to map the individual entities into valid circuits. Ultra, a high-effort routine that optimizes circuit area while not respecting entity boundaries. And finally, restricted, where each update function strand is synthesized individually using three gates as mentioned above. We observed that the restricted mode outperformed other directives, moving the point of optimality to R equals 288. In Banek's paper from 2018, the optimal degree of unrolling was reported as R equals to 160 using the regular synthesis tool option. This trend is clearly seen in the following example using the 19-nanometer TSMC cell library. Plotted is the energy consumption for the encryption of 1.28 million bits of data for all three synthesis directives, different unrolling degrees of R, and clock frequencies. Having established that trivial benefits from a restricted compilation approach in which each update function strand is synthesized individually, let us investigate this avenue more thoroughly. It is not hard to see that we can recursively enumerate the strands for each unrolling degree R, namely T1 of R, T2 of R, and T3 of R. Recall that in the fully unrolled setting, i.e., when R is equal to 288, there are 3 times 288 equals 864 strands. Each of them is composed of three logic gates of the synthesis, and so this is relatively straightforward to measure the power consumption of each strand. Plotted are the singled-out power measurements for each strand Ti of R using the TSMC 19-nanometer cell library. Intuitively, we would expect the power in the strands to increase monotonously with R as in block ciphers, but the figure clearly suggests that the increase is far from monotonous. The rat data points represent the strands whose power consumption experiences a sudden dip. This phenomenon also occurs for different cell libraries, such as non-gate 15-nanometer or UMC 65-nanometer, and this does inherent to the structure of the restricted unrolled trivium circuit. Why is this? A first observation is the fact that all T1 of R for R bigger equal 1, smaller equal 66, consume the same power until T1 of 67, whose power consumption is considerably larger. All inputs to T1 of R for R bigger equal 1, smaller equal 66, come directly from the register. Thus in some sense, their input nodes are all at a distance 0 from the register. However, one of the inputs of T1 of 67 comes from the output of T3 of 1, and thus not all its inputs are at a distance 0 from the register. This delay in balance, the input virus, gives rise to more glitches in the internal circuitry of T1 of 67, and thus hints at one of the reasons why it consumes more. Further, consider the boundary around R equals to 93. At R equals to 94, the power consumption of T1 of 94 drops. It is easy to see that all the inputs of T1 of 94 are at a distance 2 from the register, whereas the inputs of T1 93 are still imbalanced with respect to the delay from the register. It appears that that balanced strands consume less power than unbalanced ones. More formally, the circuit strands are connected naturally in a well-defined graphical topology. Each unrolled strand can be translated into a 5-ary tree with the root node as the output bit, whose sub-trees are other unrolled strand trees, or leaf nodes. Let capital T i of R be the 5-ary unrolled strand tree corresponding to the unrolled strand equation T i of R. The child nodes of the strand capital T i of R are therefore all the five nodes, capital T j of U, for which the corresponding terms T j of U are present in its recursive definition from before. To make the link between unrolled strand equations and their respective trees clear, we give three examples of varying complexity. Displayed are the unrolled strand trees, capital T i of 1, capital T i of 100, and capital T i of 200. We say that a perfect unrolled strand tree is a tree in which all non-leaf nodes have n children, and all leaf nodes are at the same depth. The unrolled strand trees in Trivium are 5-ary. Capital T i of 1 and capital T i of 200 are perfect unrolled strand trees, while capital T i of 100 is imperfect due to having leaf nodes at different depths. Clearly, capital T i of 66, capital T i of 94 were perfect trees, whereas capital T i of 67 and capital T i of 93 were not. A strand evidently consumes less power if the node it occupies in the circ graph houses a perfect tree. In the baseline Trivium design, there are 339 perfect unrolled strand trees, thus less than half of all trees. This raises the question of what happens when the tap positions of Trivium are altered in such a way as to obtain configurations that yield more perfect unrolled strand trees. Plotted is the power consumption of several hundred Trivium-like constructions that differ from the original specification in the position of their nine taps using the UMC 65 nanometer cell library. The tendency of higher number of perfect unrolled strand trees with respect to the power consumption is clearly demarcated. Having established a strong correlation between the power slash energy consumption of random Trivium instances and their corresponding total number of perfect unrolled strand trees, we can commence to look for potential Trivium-like derivatives with different tap positions that consume less energy. The SERP space over all potential tap positions is too large to fully enumerate, hence we limit the candidate space as follows. All linear tap positions are chosen from a multiple of three, as was the case in the original Trivium construction. The location of the non-linear taps are not a multiple of three. The leftmost tap of each adjuster is at least at position 64 to ensure easy parallelization until r equals to 64. These three filtering criteria allow us to reduce the number of candidates to roughly 2 to the power 25. Out of all those potential candidate constructions, we picked two promising energy-efficient Trivium replacements. Trivium LEF. This design features 495 perfect trees and an equivalent security level as the baseline Trivium-specific kitchen. It reduces the overall energy consumption by 15%. Trivium LEF with 665 perfect trees. This design reduces the energy consumption by 25% across different cell libraries at a cost of more initialization rounds to reach a comparable security level with Trivium LEF. Both proposals stand as the current most efficient symmetric encryption primitives known in the literature. Depicted as the register structure of Trivium LEF. Note that the individual sizes of the registers are more balanced than in Trivium, which is a good heuristic to obtain more perfect unrolled strand trees. Further notes that the placement of the leftmost taps is shifted towards the middle of the register to allow for more perfect unrolled strand trees of that one. The perfect tree phenomenon translates seamlessly to other existing Trivium-like stream ciphers. Trivium MB. The Trivium Tweek proposed by Maxima from Biryukov. Trivia. A Trivium-like cipher with a state of 384 bits. Trivium, the stream cipher by Kanto, proposed for efficient homomorphic encryption. And Triad SC, a former first round in this LWC candidate. Our model is even extendable to non-Trivium-like designs and ciphers that follow the crane or subterranean philosophies. In retrospect, in this paper, we proposed the first heuristic energy model for stream ciphers applicable to a wide range of construction. Our model opens the door for the design of future energy efficient stream cipher designs. Meanwhile, we proposed two new energy optimal Trivium-like stream ciphers that consume 15 and 25% less energy than the baseline specification.