 Hi everyone, I'm Kazuhiko Minematsu from NEC Corporation. In this talk, we would like to talk about Romulus and Remus, a feature of a family of lightweight authenticated encryption based on tweakable block ciphers. This is a joint work with Tetsu Iwata, Mustafa Kairare, and the Toma Pera. The topic is not based authenticated encryption with associated data, also called AEAD. It is a symmetric key encryption that provides confidentiality and authenticity. The encryption produces ciphertext and a tag, and the decryption returns a decrypted print text if the whole input is authentic. This picture shows the standard input and output of AEAD. We describe Romulus and Remus, a family of AEAD schemes based on tweakable block cipher. Romulus uses skinny as an internal tweakable block cipher and has a standard model security. Remus is a derivative of Romulus, and it uses skinny of shorter tweak than that used by Romulus. The skinny is used as a block cipher. The security relies on the IDLE cipher model. In this talk, we primarily focus on Romulus as it is a second round candidate of a niche lightweight cryptography project. And we briefly describe Remus as the last part of my talk. Romulus has two variants, namely non-spaced N variants and misused resistant N variants. Both consist of three members, but we are planning to simplify by reducing the members as we announced at the niche lightweight cryptography workshop 2020. Our design goal is the best of lightweight AEAD built on TBC. Romulus is designed to have a small state, rate one operation, and enjoys the strong security for both qualitatively and quantitatively. We also stress that the security proofs are very clean. That is an important feature for reliability. Finally, the structure of Romulus is simple and streamlined. This slide shows the family members of Romulus. The N1 is a primary member in our submission to niche lightweight crypto, and as I mentioned, we plan to reduce and we will keep only N1 and M1. The internal skinny has 128-bit block and 384-bit tweak. And we will also reduce the number of rounds of skinny from 56 to 40 to boost the performance while maintaining a sufficiently large security merging for the skinny. This shows Romulus N. A tiered E denotes the tweakable block cipher and AI denotes the associated data block and MI and CI denotes the plaintext and the ciphertext block. Then is the NARS-T is the tag. As you can see, there is a row function that is defined over two N-bit blocks. It works as a state-applied function. For processing AD, the first output of a row is ignored. The structure is based on an idealized version of the COFB block cipher mode proposed at 2016, but we applied lots of changes and improvements. Let me describe some of the design details. We carefully considered the algorithm to reduce the state size and the branching to improve the hardware performance. To avoid additional state increase, we recover initial tweaking at the end of each TBC call by rewinding the tweaking to the initial value. This is possible with very small number of additional extra circuits thanks to the simple linear tweaking update. We use the same row function to AD and message blocks. In fact, our plaintext feedback mode enables a limited parallelizability, but it needs more control logics because we need to implement two different row functions, so we decided not to employ. Finally, TAG is obtained by the same output slot as the ciphertext block, which contributes to reducing the multiplexers. LOW function is a simple operation defined over bytes. It is a byte matrix denoted by Z as shown in the light upper figure, and this function needs a single block state since the both red and blue lines can be independently computed. Portrait input can be handled by truncation and padding. The security condition for row is the same as that for COFB, which we skip here. Note that G is applied to the output side, not the inner state side. This simplifies the AD processes to the plain XOR changing. Our matrix G has a modular form suitable to various small data paths without needing much prexers and needs only a small number of XORs, which is good for both software and hardware. The internal skinny tweakable block cipher was proposed at Crip2 2016. This picture shows its round function. It is one of the most popular TVCs, and there are a large number of third-party analysis papers. It is going to be an ISO standard. After these cryptanalysis work, it has a quite large security machine. For the member of skinny we use, namely 128, a block and a 3-8-4-3 key, the best attack only leads to 28 rounds out of 56, with an impractical complexity much larger than 22128. For attack complexity of up to 22128, only 22 rounds are attacked. It implies that, even if we reduce it to 40 rounds, we have about 45 security measures. The performance of skinny is excellent on hardware and good for software. This slide summarizes the features of Romulus N. It has a small set side, which is the same as those needed to implement a TVC. And it is a great one. More precisely, N between message power 1 TVC core and N plus TVT associated data power 1 code. Here, T equals to the N for Romulus N1. Because the computation overhead is quite small, it is particularly good for short message. For security, it has N bit security with N bit block TVC. The security proof is based on the standard model, that is, the security of Romulus N is reduced to the CPS security of TVC, also called TPRP security. This is a conservative model, and it is desirable since we do not have to worry about the gap between the model and the instantiations. This is contrastive to the case of some sponge constructions, where large small round permutations are used internally, and no ideal behaviors are observed for these permutations. The limitation of our scheme is that it has a serial operation for both encryption and decryption. We think this is acceptable and reasonable for the applications of lightweight cryptography. In case we need to process many messages at server-slide, for example, parallel operation is still possible in general, and parallelizability is not much beneficial to constraint devices. This slide shows the security bound of Romulus N. As you can see here, the privacy notion is tightly reduced to the TPRP advantage of TVC. The authenticity is reduced to TPRP advantage, plus, roughly, the number of decryption queries divided by the tag space size. These bounds are equal to the famous ThetaCB3, an idealized version of a CB3 within constants. Also, assuming skin is secure, the bounds are equal to those achieved by an ideal AE scheme that has a privacy bound being zero and authenticity bound being QD, or relative to detail. The point is that we have no degradation in terms of number of encryption queries and query lengths. This is a strong feature achievable by TBC-based mode with standard model reduction. To understand the power of full NB to standard model security, let me show some examples. If we encrypt 2 to the 50 bytes of data with 2 to the 50 decryption queries, the security bound of Romulus still guarantees that the success probabilities at most Ipshron plus 2 to the minus 50 75 for Epshron being the TPRP advantage of TBC. In addition, support that there are 2 to the 20 users, then it still guarantees the success probabilities at most Ipshron prime plus 2 to the minus 55 for Ipshron prime being the multi-user variant of TPRP security. This is obtained from the standard conversion to the multi-user setting, and this is much stronger than the common birthday type bound that contains sigma squared over 2 to the N plus sigma is the number of total processed blocks. For the case of example 1, the bound is Ipshron plus 2 to the minus 32, and for the case of example 2, the bound is further increased to Ipshron prime plus 2 to the minus 12. Let me move to the description of Romulus M, our non-Smith's resistant version. It is based on SIV mode by Rogov and Shrimpton, as well as many other non-Smith's resistant schemes. It greatly shares almost the end components, so it is pretty easy to implement both. The security proof is a standard model and uses the proof techniques introduced by Naito and Sugawara and the nut mark proposed by Koriati et al. in 2017. This slide shows the security bound of Romulus M. For non-smith-respecting adversaries, the bounds are essentially equal to Romulus M. For non-Smith-using adversaries, with maximum r repetition of non-sath encryption, the privacy bound is r times the total squared blocks in encryption divided by 2 to the N, and the authenticity bound is roughly r times the total number of queries divided by 2 to the N. There is no degradation in input ranks except non-Smith's privacy, and it enjoys a graceful degradation, which means that the security is gradually reduced with respect to the parameter r. That is, if the number of non-smith repetition is small and the security is almost endless, and it maintains half of the input security even at the worst case of using a fixed mass. Recently, we showed that the security of Romulus M is preserved under the scenario of release of unverified plaintext, RUP. RUP is a kind of misuse that enables the adversaries to access the unverified plaintext at the encryption regardless of verification results. It can happen when the encryption device does not have enough memory, for example, and this may break many common A schemes such as OCB. There are two privacy notions under RUP called plaintext IONS 1 and 2, and one authenticity notion called int RUP. For Romulus M, we proved that Romulus M is PA1 secure and int RUP secure. This is the same as original SIV. Here PA2 is impossible to meet with SIV-style constructions. The int RUP bounds are equal to the original authenticity bounds for both non-suspecting and non-smith using adversaries. This shows a brief comparison with other TBC-based schemes, assuming we used the same TBC. See the paper for more comprehensive tables. It shows that Romulus N and Romulus M have smaller state sites than previous schemes with standard model security. They also achieve the best encryption rate. Now I move to implementation aspects of Romulus. First, the ASIC performance. This slide shows the ranking taken from the public site that benchmarked eight second-round candidates. For details, please refer to the report. This table shows that Romulus is among the top performers for various measures. This shows the throughput and area of submitted schemes, and this shows the energy and area of submitted schemes. For both cases, Romulus presents pretty good trade-off in these measures. We show some concrete figures of top performers from this benchmark. As you can see, the subterranean is the best, but Romulus is also very competitive. Let me describe some hardware implementation details. First, we utilize the scheme's free linear tweaking scheduling. As I mentioned, it reverses tweaking to the initial value at the end of every TBC call, and this requires only 16-7 extra gates. If we were to maintain tweaking state, at least 323 props would be additionally needed. The scheme is very lightweight, so it is also suitable to full unroll circuit. For example, the speed will be doubled by 2-round unrolling, which needs about additional 1000 gate requirements, and that increases the total area by only 20%. For FPGAs, there is a comprehensive benchmark led by the people from GMU. From their report, we found that Romulus N1 is the second smallest design on Arctic 7, in terms of number of slices and sevens, in terms of the throughput per cycle, out of 22 benchmarked candidates. Also, Romulus N1 is one of the very few designs to achieve competitive performance with less than 1000 lookup tables. For microcontrollers, there is also a comprehensive benchmark led by Lena Potobon and Mototok. From this benchmark for 32-bit platforms, Romulus N1 is ranked in the middle regarding the throughput. Interestingly, for 8-bit platforms, Romulus N1 is in the top tier, and our new 40-land version gains a performance via a factor of 1.4 and is in the top candidates, as shown in the right. We think 8-bit platform is important because it is very constrained and the AES is not very efficient on that. In the rest of my talk, let me briefly describe Romulus. The structure of Romulus is close to Romulus, but the internal TBC is built on a block cipher, which is modeled as an ideal cipher in the security proofs. The conversion of IC to TBC is called IC-based encryption or ICE for short. This conversion is a variant of XHX proposed at 2017, but we optimize XHX to reduce state size and computation for counter-incrementation. A block cipher of N-bit block and key is used to implement a TBC of N-bit block to N-bit tweak and N-bit key. We developed three versions of IC, ICE, namely ICE 1, 2, and 3, having different N-based mask generations. The masks are L and V, shown in the slides. The security bounds of rems are those of Romulus plus the bound of IC. The ratize order of sigma square over 2 to the C, where C equals N for ICE 1 and ICE 3, and C equals 2 to N for ICE 2. The proofs are modular and clean. Namely, we first show that ICE is a secure TBC and show that rems mode is secure if the underlying TBC is secure. To conclude, we proposed Romulus and Remus and this talk focused on Romulus. Romulus is what we believe the best we can do for lightweight, highly reliable AD with TBC. It enjoys a very strong provarous security bounds in the standard model for both nonce-respecting variant of Romulus N and nonce-misusing variant of Romulus M. The security of Romulus is reduced to the skinny's high security. Moreover, we only need skinny's CPA security against single key setting. The mode of Romulus achieves a weight 1 and is minimum as a tickable block cycle based AE with standard model security. That's the end of my talk. Thanks for listening.