 Hai, semua, saya Syang Ming, dan saya akan mempunyai P.J. Mass, penyelamat dan penyelamat dengan pengalaman yang tinggi dengan pengalaman yang tinggi. Ini adalah kerja dengan Damu, Jeremy, Stefan, Thomas, Matthew, dan Yul. Jadi, pertama saya akan memberikan permainan yang terkenal yang diikuti oleh penyelamat P.J. Dan setelah itu, Damu akan terus memperkenalkan pengalaman. Jadi, pertama, permainan. Jadi, penyelamat dan penyelamat dengan pengalaman yang diikuti oleh penyelamat P.J. atau AEAD sekejap-kejap, adalah algeradam kreatografi semestik yang berlaku dengan penyelamat, pengalaman, dan penyelamat. Jadi, parameter penyelamat yang terkenal adalah kanmat yang berlaku pelalaman yang sama. Anak mata asosiasi yang사 That requires authentication but not encryption dan yang perlukan Order que energyank thee dan sentamnya dan menggunakan Hemisek AEAd ia akan mengganggu pangkat ke-400 klik dan menganggu punggung existing Zhao, dan meluat rendimensional adalah Rule panjang. You will take the same nonce, the associated data, several tags in the tag and it will do the verification and decryption. So if the verification fails, it will not give any output so it will be a null output and if the verification is valid, then you will output the decryptive message. So in August 2018, NIST published a call for algorithm to initiate a lightweight cryptography standardisation process. It is similar to the past standardisation process like AES or char3. So this time they want to have AEAD functionality algorithm. So the main requirement is to have AEAD functionality and optionally have hash function, hashing functionality. And also in the call for algorithm, they also mentioned that the ability to provide site channel resistant easily and at low cost is highly desired. So in April 2019, there are 56 submissions selected as round 1 candidates and after a few months of analysis, we reduced to 32 submissions in round 2 and PJMAS is one of the round 2 candidates. So PJMAS is designed to enjoy fast bit size implementation in the presence of high order masking. So specifically, we favour a minimum number of N operations for efficient implementation on 32 bit platforms and a parallelisation degree to address 64 bit platforms or processes with vector instructions. And we designed it such that it has reasonable performance for unmask and masking hardware implementation. And also the design relies on the well-studied SPN structure, which involves the Xbox layer, the diffusion layer and the key addition. So now I'll talk about PJMAS. So there are two members of algorithm in PJMAS. The first is the primary member, which is PJMAS 128 AEAD and the second one is PJMAS 96 AEAD. So both of them uses the well-established OCB mode as the AEAD scheme and OCB is one of the Caesar final portfolio for high performance because it's highly paralysable and the underlying block cycle is our PJMAS 128 and PJMAS 96. So PJMAS 128 is 128 bit block cycle, well PJMAS 96 is 96 bit block cycle and both of them has 128 bit key and for the input nonce, once PJMAS 128 AEAD is 96 bit and the other is 64 bit and the output tag NAM is 128 and 96 respectively. So first I will talk about the OCB mode so it's highly paralysable. So as you can see here to authenticate the associated data, you partition the associated data into blocks and then you will excel with some offset value that is derived from the zikr key and encrypt it and then you will excel everything to get the authenticate value and for the encryption, you will do something similar. So we will partition the message into blocks and we will excel some offset value before and after the encryption and this offset value is derived from both the zikr key and the given nonce. So this will be different from the offset value used in the associated data and finally for the tag generation, we will take the excel sum of all the messages, excel with another offset value and perform the encryption and finally the authentication value that is generated from the generated from processing the associated data will be excel here to give you the final tag value. So in the original OCB, the offset value, the nonce dependent offset value is computed as follows so I will not go through it line by line but essentially in the specification, there is this value 8 bit shift over here. So this number is not selected arbitrary. There's a reason to select this number such that the last two steps here excel as a strongly excel universal hash function. So this is a very small detail that we have to handle when we want to convert OCB to hand to use a 96 bit block cipher variant. So because we did a quick check and realise that this when we use this exact same parameter for the 96 bit variant, it does not form a strongly excel universal hash function. So one of our tweak for the OCB to handle the 96 bit version is to perform a similar NS system by the OCB designer and we have selected our special parameter number to be 9. So there are several candidates as well but we chose 9 to be minimally better on some platform especially for some 8B microcontrollers where the if the shifting is close to the multiple of 8 is preferred as is less costly than other arbitrary shifting. So for the OCB mode, for the 96 bit variant, we have this small tweak to have the similar security and for the main block cipher, both PGMas128 and PGMas96 have 14 rounds and each round function consists of ad round key, sub pipe, and shift row which is the typical SPN structure. And the key schedule is also quite similar. It has the mix column, mix and rotate rows and ad constant. So here you can see the block cipher operation is very simple. So first for the round function, so this is a illustration of the round function so first we have the bitwise key addition followed by the sub pipe which is done column wise and for the mix row it will be performed row wise. So for the 128 bit variant, we arrange the data as 4x32, so we apply 32Sbox in parallel and apply the 4 rows of mix column. So for the 96 bit variant is pretty similar but we just omit 1 row so it's 3x32 and for the Sbox, we will use a 3-bit Sbox for that case. And for the key schedule, it's the same for both variants so we have the mix column which is done vertically and for the mix and rotate rows, only the first row applies a diffusion layer and for the other 3 rows, we have a rotation and finally we have an ad constant to break the symmetry in the key schedule. So for the, now I will talk a bit about the properties and the design rationale. So for the round function that Sboxes, we have the 4 bit and the 3 bit Sbox folder, 2 different variants. So both of them have optimal differential and linear properties and it also helps in the differential or linear throw propagation to other rows. So it's not select arbitrary, there's some special criteria that we impose on the Sbox. And the construction of these 4 bit and 3 bit Sbox uses only 4 and 3 N operations. So when we try to do masking, it will be more efficient because there's less non-linear gates to do the masking. And for the diffusion matrix, it's actually a very simple 32x32 binary matrix where it is a, where there are circular matrices for compact representation and they are sparse with high branch number 12. And for the key schedule update, it is fully linear so that for the masking is actually more efficient to do masking. And for the mixed column is the, it's a 4x4 binary matrix with branch number 4, which is actually the matrix with all ones and zero in the diagonal. And for the mix and rotate rows, the rotation constant are actually selected to optimise the diffusion and also selected to be close to the multiple of it. For the same reason, we chose 9 for the OCB96B variant. Now, I'll just quickly talk about the security analysis. So for the OCB mode itself, it's already proven secure, so we don't really need to analyse much about that. So for the PJMAS block cipher, we show that it is resistant against differential and linear clear analysis and also invariant subspace clear analysis. And actually, yes, it really attracted some attention and there's a third party analysis, which they always also publish in the same value. So they applied a higher order differential clear analysis on PJMAS96 and using a full cobalt attack to perform a key recovery. However, due to the data limitation on the PJMAS, when we consider the OCB mode, their attack can only attack a reduced round variant and does not threaten PJMAS96880. So next, I'll pass to Demoon to talk about the implementation. I'm Damon and now I will explain our implementation strategy and how we produce an implementation for a target device, namely ARM Cortex-M4. So first of all, the implementation strategy that we decided to use for PJMAS is the bit slice strategy. It's an implementation strategy that was initially proposed to perform server parallel evaluation of Boolean circuit in software in order to make des evaluations faster. Basically in this strategy, what you do is you look at the functions you want to evaluate as Boolean circuit. So basically they are composed of logical instructions such as XORs and NOTs working bit wisely. And you transform such circuit into an implementation where you will work on register of a certain size, basically namely the architecture size. So for Cortex-M4, it's going to be 32 bit. And you will evaluate this circuit using the CPU instruction, the equivalent CPU instruction. So for XORs, you will use the CPU XOR. And one of the main advantages here is that since this Boolean circuit was working on bit inputs, here you will allow to get a high level of parallelizations because you will be able to evaluate 32 bits in parallel for each of these instructions. So it's a very efficient strategy and more precisely, recent work in the last couple of years showed that for higher the masking, it is a very sound strategy because it allows you to produce some of the best and fastest implementations. So for Pijamas, what does it mean? So for the two main building blocks, basically the S-boxes and the diffusion matrix, how will we implement this? So for the S-boxes, we decided to find the best Boolean circuits to evaluate them. So for the two S-boxes, so for Pijamas 96 and Pijamas 128, you can see that you have a Boolean circuit composed of XORs, ANDs, and NOTs that evaluate the S-box in a very efficient way. So this will be straightforward to implement. It's just XORs and ANDs between the register and it allows us to evaluate 32 S-boxes in parallel since we were working bit wisely. For the diffusion matrix, due to how we decided to choose it with M-circulant matrix, it's going to be very efficient too. So what we will do is that for each register containing the state, we will do a scalar product between the matrix and this register for each of the bit. We will multiply this by the columns of the matrix and since it's circulant, we just need to shift it for the corresponding bits in order to get the sound output. So what we will do is that for each of the 32 bits in the register, we will extract the corresponding bit that we are evaluating and we will apply a mask to it in order to know if we need to add it to the accumulator or not. So if the current bit is zero, the mask will be zero. Otherwise, it's going to be a register filled with one. And then, we will update the accumulator awkwardly by exhoring it to the mask ended with the column of the matrix. And this is very efficient with both in C or assembly. So in C, what you do to compute this mask, it's almost for free or close to. You just subtract to zero the value of the bit. So it will either give you zero or an integer filled with one since you will work with 32-bit integer. And in ARM, due to the particularity of ARM where you have a bar shifter that allows you to shift operands in your instruction for free, you will be able to use an instruction called the arithmetic shift to the right. And what it does, it takes the leftmost bit, so the signed bit and spread it through your register. So what you will be able to do is do this accumulator or for free using the bar shifter. And this is very convenient and allows you to produce a very efficient implementation. So here, it's for the unmasked version of Pijamas how to implement this building blocks. And now we will look at how to apply masked implementation to Pijamas since it is the main goal to a very efficient higher order masking, a very efficient higher order masking block cycle. So for masking what you do is you split your sensitive values into d value such that the sum is equal to the corresponding one. So here we will split the state into d shares where d will be the masking order such that the XOR is equal to the initial state. The linear parts will be straightforward to implement. It's just basically applying each of this linear operation to the corresponding shares of the state. So for mixed row, for the d shares, we will apply mixed row on that particular share. Same for add run key and for key schedule. So it's going to only imply a linear overhead in the cost for higher order masking. The critical part are usually the non-linear parts which imply a blow up of that is at least quadratic. So to implement this for the ends instruction, we will replace them by ISW multiplication. So ISW multiplication is a multiplication proposed by Isha Isha and Wagner and this is one of the most used multiplication when masking implementation are involved in practice. And here we apply the small tweak where we also accumulate the result into an initial register. So this is basically the same algorithm as ISW but instead of directly storing the outputs into C which is the output register we will exhort it to the previous value of C. And you can easily verify that the desired output will be C exhort to the multiplication between A and B. And why did we decide to use such a tweak to ISW? It is because if you remember for the S boxes every in the circuit at step 3, 4 and 5 for both version you can see that when there is a non-linear operation it is exhort to a previous value. And in assembly it's very costly to always load and store and data. So what we do here is that instead of computing the multiplication loading the R2 or the output register we want to exhort it we directly do it in place such that we avoid such loads. And this allows us to get a multiplication algorithm with higher the masking that still has the desired properties but allows us to fit or boolean circuit for S boxes. In term of performances we did implement it on an ARM Cortex M4 as I was saying that was our target architecture and we compared it to an implementation of AES that Matthew and I proposed at Eurocrypt 2017 that we ported to that particular architecture. For each implementation we propose two versions depending on how we produced the randomness basically we call it either the pooling strategy or the fast strategy in the first strategy the RNG routine checks the availability of fresh randomness before reading the RNG output register and this will take a few clock cycles for testing possibly waiting up to 65 so this is a slow version for randomness whereas in the fast method in the fast mode sorry the RNG routine simply reads the output register without wondering whether fresh randomness is ready or not and in this version we are assuming that we have a very fast RNG available so we can see that PJ Mask indeed performs very well compared to AES for high order masking for small masking order it's of the order the speed gain is of order magnitude close to up to 2 a factor 2 whereas when the bigger the masking order is the bigger the gain since we can see that we are winning up to a factor 4 for D equal 128 in the fast mode between the two implementation so this shows that PJ Mask is indeed very efficient with masking and both RAM and code size consumption are also quite low very lightweight compared to AES recently also for the work were made by Matthew and other authors where they propose the tool that automatically generate sound and secure implementation bit slice masked implementation and they apply this to several of the lightweight candidates in order to have a fair comparison of their efficiency and also produce secure implementation using formal methods and so I invite you to check their work it was published at Eurocrypto 2020 in this work we can see that PJ Mask is also performing very well with higher the masking compared to other methods thank you for your intention