 So, the next talk is about rubato, noisy cipher for approximate homomorphic encryption. This is joint work by Jinxiu Ha, Xiongkwang Kim, beyond that stuff, by Ongak Li, Zhou Yong Li, and Michel Song. And the talk will be given by Xiongkwang. Good morning, everybody. Welcome to my talk. This is Xiongkwang Kim, and this is a joint work with Jinxiu Ha, Xiongkwang Li, Zhou Yong Li, and Michel Song from KAIST. I'm from Samsung SDS. Today, what I want to talk about is rubato, noisy cipher for approximate homomorphic encryption. Let's begin with homomorphic encryption. Homomorphic encryption is an encryption scheme that enables addition and multiplication of encrypted data. Some might think about partially homomorphic encryption, but in this talk, we say HE. This implies that exact homomorphic encryption or approximate homomorphic encryption that supports addition and multiplication. For example, there is a famous example, a fee scheme for modular ring and CKK scheme for complex ring. For this reason, HE can protect data even when they are being used. For example, homomorphic encryption is used machine learning inference and statistics of sensitive data while preserving privacy. Recent homomorphic encryption schemes have two demerits. The first one is slow encryption speed. Currently, homomorphic encryption schemes use super large parameters, our LWE sample, so it is very, very slow compared to conventional symmetric ciphers. The second one is large ciphertext expansion. Ciphertext expansion refers to how much ciphertext is expanded from its plain text. Ciphertext is 10 times to a million times larger according to the choice of parameters. It causes large memory and network bandwidth overhead. To resolve this problem, Lauter et al. proposed transcyphering framework, which is a conversion from symmetric ciphertext to a homomorphic ciphertext. Let's suppose a client wants to delegate a computation to a server. Some of them might think that a client naively sends to the server a homomorphic ciphertext and computation. I said earlier, it causes a large network bandwidth overhead. By using transcyphering framework, a client sends only homomorphically encrypted keys and encrypts all the messages by symmetric cipher and send a server. Given symmetric ciphertext and evaluating the encryption circuit that can result in homomorphic ciphertext of messages. By using transcyphering framework, the client can encrypt faster and get smaller ciphertext. For real numbers, in AsiaCube 2021, Cho et al. proposed RTF framework, which is a transcyphering framework for approximate numbers. On the client side, the real messages is scaled and rounded so that it's converted into a integer modulo t and a key stream of a cipher is added. On the server side, FE evaluation of the cipher and CKK spoofstrapping result in a CKK ciphertext of the message. In the transcyphering framework, there is a symmetric cipher and the cipher is evaluated both in clear and while encrypted. So in this sense, the cipher should be efficiently evaluated by homomorphic encryption. We call it HE friendly cipher. So far in most hardware, N gates and XOR gates needs roughly the same resources. However, in homomorphic encryption schemes, multiplication is much more expensive than addition. So to design HE friendly ciphers, low multiplicative depth and complexity are required. HE friendly ciphers are domain critical. When the domain of the cipher is fixed, then the further computation after transcyphering is done on that domain. At first, HE friendly ciphers are proposed, were proposed for binary use cases. There are low MC craving, flip rasta and dasta. After that, it was known that modulo ring is more appropriate for integer arithmetic and batching technique for HE. So HE friendly ciphers over modulo ring have been proposed, such as masta, pasta and hair. Finally, for approximate homomorphic encryption, HE friendly ciphers have been proposed. And Roboto, today what I'm talking about is in this case. One of the main question is, is there any way to reduce the multiplicative depth drastically? There is some observation. For deterministic cipher, there is a critical line in multiplicative depth for every key size. In case of flip, the key size is 1394 bits and multiplicative depth is 4. And in case of rasta, the key size is 351 bits and multiplicative depth is 6. However, in case of noisy encryption, we found that LW encryption does not require non-scalar multiplication and a skewer. Here is the table of HE friendly ciphers and related metrics. So you can see the first column, modulus of the plain text, number of key words, and multiplicative depth and number of multiplication per output word, number of random bits per output word. And the first row, you can see the HE friendly ciphers are sorted in chronological order except LWE. In the blue row, you can see multiplicative depth does not go below 4 even since flip was proposed, but LWE achieves multiplicative depth 0 and multiplication 0. So it was very great for trans ciphering framework, but in fact, it is not because it is too long to evaluate in clear. So we have to modify it. So we got an idea. We tried to mix together, so mix with stream cipher and alt-view encryption. So we tried to design a stream cipher with Gaussian noise, then there can be a security security for algebraic attacks. So for algebraic attacks, we can say two aspects. The first one is Grabner-Vasis attack. Here Grabner and MD means that the complexity of solving the system of equation of n variable m equation of degree d. When the error is added, the complexity of guessing error is multiplied to the complexity of Grabner-Vasis attack. So it implies that with same security, the degree d can decrease to d prime. The second one is Aurora-Geo-Tec. It's an algebraic algorithm to solve LWE. It is for dot products, because of LWE structure, the dot product is in their equation, but in the stream cipher case, the dot product becomes polynomial. So whole equation becomes of higher degree, and we can get smaller number of variables for the same security level. In fact, LWE decryption needs round-of-function, which is not easy to evaluate in exact homomorphic encryption. But in approximate homomorphic encryption, LWE noise can be regarded as error, and we don't need to round-of. So here we introduce a family of noisy ciphers, Roboto. Roboto is named after a music or time tempo Roboto, which means expressive and rhythmic freedom. For this name, we mean that parameters can be chosen freely for its purpose. So the Roboto is word-wise stream cipher with Gaussian noise, and stream cipher is like in this figure, and the stream cipher part is SP network with randomized key schedule. The in-round function, it uses hair-like linear layer and pasta-like S-box layer. You can, you could see pasta in the last Sunday in fh.org conference. Roboto is composed of fixed constant input and randomized key schedule and truncation Gaussian noise addition. We can, yeah, Roboto support three types of block size, which is small, medium, large, and small is 16, and medium is 36, and large is 64 words. When block size is larger, the required number of rounds decreases. So larger block size means larger throughput, and smaller block implies smaller latency. We adopt hair-like linear layers, which composed of mixed columns and mixed rows. In figure one, you can see that there are V square states of X becomes V square states of Y, and it goes to figure two. There are mixed column and mixed row. Mixed columns multiply a fixed MDS matrix to state of X in column Ys, and mixed rows multiplies the same matrix to the row Ys. The MDS matrix is defined in figure three, you can see that. And there are only 16, a block of 16 matrix is defined in hair. So we have to find for 36 and 64. So we brute-force-ly find a small component V by V MDS matrix with circular matrix. So for non-linear layers, we adopt Feistel network in a row. It is used in pasta, and it's a quadratic invertible function. Quadratic function gives the least multiplicative depth for a fixed algebraic degree. In Hera, brown function is used, brown function is cubic function because of its algebraic, because two different algebraic mid and the middle attack, but for rubato, truncation defends the algebraic mid and the middle attack. Finally, adding Gaussian noise, it generates the key stream of rubato. Here is the round function of rubato, and you can see Feistel in a row, and linear layer, and key addition. Here is a comparison of multiplication-related value between rubato and other HG-friendly ciphers. You can see that rubato achieves multiplicative depth two with moderate block size and small random bits per output word. And furthermore, rubato achieves 2.1 multiplication per output word. Here we briefly introduce security analysis of rubato. For symmetric key cryptanalysis, we guess the Gaussian noise and apply the usual symmetric key cryptanalysis. For linear and differential cryptanalysis, it cannot be used with guess and determine attacks, so we compute the linear and differential probability without Gaussian noise. For LW cryptanalysis, we linearize all the monomial and do the lattice attack or BKW attack for linearized lattice. For the SVP oracle, we consider both sieves and enumerations. For Aurora GeoTech, it is not the best idea to use linearization, so we replace the dot product to a polynomial. It's just a stream cipher part. So we give the selected parameters in the table. You can see security, block size, truncation size. Truncation size means that what is left not truncated. And two of plaintext modulus and alpha q, which is sigma, which is standard deviation over square root 2 pi and number of rounds. We measure the complexity of attacks and arrange in this table. So you can see the table below. And this table is scale of log 2. So it's like GCD attack for parameter 80 small is 2 to the power of 393.6. And you can see that Grebner-Basis attack and Aurora GeoTechs are dominant attacks for Roboto. One might think that where is the security margin? It is symmetric crypto, but we set the linear algebra constant omega as 2 for security margin. Here is the performance. Performance is evaluated with AVX2 instructions for client-side encryption and RTF framework implemented in let's-go library for server-side encryption. We chose shake 256 for extendable output function. And we fixed RLW dimension, number of slots and remaining level. So you can see ciphertext size, ciphertext expansion ratio, client and server performance and precision. You can see that larger block size gives larger throughput, smaller block size gives smaller latency. But not for 80s because its number of rounds does not decrease for the larger block size. So it's complicated for 80s. Here are performance comparison. We compare Roboto to HERA LWE to RLWE conversion and CKKS only environment. LWE to RLWE encryption is from Pegasus, which is published in IEEE S&P 2021. And for the fair comparison, we try to make same level, same remaining level, log of slots and for what? Precision, yes. But you can see the LWE to RLWE conversion is low log of slots and low precision because implementation of it supports only this only support this parameter. So we cannot do that. So you can see that both throughputs of Roboto outperforms of others. Here's the conclusion. We present a family of noisy ciphers for approximated homomorphic encryption and it is a combination of stream cipher and Gaussian noise. We give modular cryptanalysis for noisy ciphers. We show that the noisy ciphers are efficient in approximate homomorphic encryption. There are a few further questions. Is there any application of noisy ciphers? So far, we only found that transcyfering application for approximate homomorphic encryption is only, but it could be another application for MPC or ZK friendly ciphers. And second one is, is there any cryptanalysis which exploits both stream cipher structure and noise? You can see that we have analyzed for linearized lattice problem, but there could be an efficient algorithm for linearized lattice problem like ideal lattice problem. So thank you. Check out the full version at link below. Okay, same game again. Do we have a question or a speaker? Yes. Not everyone was partying tonight. Yesterday night. Wow. Hi. Thank you for your talk. So in one of your slides, you mentioned a succulent matrix. Yeah. Yes. So could you clarify again why do we need this succulent matrix to be MDS? That's a good question, but it's not. In fact, it's not. We do not. We don't have to choose with succulent matrix, but for the size of brute force, we just restrict the domain of brute force to succulent matrix. Okay. Thank you. No one has another question. And I have one. Can you show us again the slide comparing the depths and the randomness of the values cipher? Yes. I think it was this one. Yes. I'm a bit confused by the LWE parameter here. So I mean, for example, you could choose one of the NIST finalists and that would have much smaller modulus and much less random bits than what is given here. Can you comment? Sorry. Can you... So the LWE parameters that are given here looks huge compared to standard learning with error schemes such as the NIST candidate. Really? Saber, LWE, Saber, or Kyber, or NTRU. Right, right. But it's like a transcyphering application. So we need to make the same modulus for the same size of messages, transport of messages, too. So I think it could be... The modulus here is the plaintext modulus or ciphertext modulus? It's a plaintext modulus. I see. Okay. Sorry. Thank you. Thank you. And if there's no other question, let's give another thanks to our speaker. And the final talk of this session will be called Field Instruction to Multiple Data. This is joint work by Kinmimi Hong and Henry Lim, Junji Sim, Benjamin Hong, Meng Tan, Björgsson Wong, and Se Lin Yeo. And the talk will be given by Yuni. Thank you. Good morning, everyone. My name is Jin Jie, and today I'll be presenting our work on Field Instruction to Multiple Data. Okay, great work. So I'll give a quick introduction to homomorphic encryption first. So homomorphic encryption, as we heard twice earlier this morning, is essentially a way to compute encrypted data to perform computations on encrypted data. So we have a message, we perform some function on it that when we decrypt, we get plaintext as if we applied a function on the plaintext message itself. So there are applications in bioinformatics and in finance. So one feature about homomorphic encryption that we have is the SIMD packing. So this allows us to pack multiple data into a single ciphertext. So when we apply the function on the ciphertext, it's equivalent to applying the same function on every individual messages. There are also rotation and shifts operations that allow us to do intra-vector operations so we can shift the vectors around. So this essentially improves the efficiency of homomorphic encryption schemes by reducing the number of ciphertext needed in general computations. So how is this possible is because of the encoding, this encoding feature that we have. So the data is first encoded into a plaintext space, RT, before we encrypt the message. The plaintext space can actually be decomposed into slots as you can see here by the Chinese remainder terror. One interesting feature is that each slot is actually isomorphic to some finite extension field of degree D. So the isomorphism is there. So usually when we want to instantiate homomorphic encryption schemes, we always choose some parts of two cyclotomics, namely because of a standardization effort and their fast nanocyclic FFT algorithms that we can use to speed up the ring operations. So when we try to maximize the number of slots in a HE cybertext, what turns out that we always have to use large prime slide. In this case, we have to use a prime of around 41,000 to get the maximum slots in a cybertext. So but if we use consider using smaller prime slide three or seven, we are left with only two slots and a very last extension degree. So in this work, we try to answer the question, can we use smaller primes and still encode almost as much data? To this end, we use something called RMFV reverse multiplication friendly embeddings that I'll share earlier. So let me give an introduction to what's RMFV all about. You can think of it as a homomorphism, a pseudo homomorphism from a vector space full of extension field. So we can embed some length k vector into extension field of degree D. So degree W. In this case, we want W to be less than D, which is the degree of the slot in the homomorphic cybertext. We do this homomorphism by some called Rayman rock space. And for those who are not function field theories out there, it's essentially, you can think of them as a special set of polynomials characterized by some curve C. So the homomorphism in this setting will be coordinate wise addition and multiplication translating to element wise addition and multiplication in the extension field. Here are some basic maps we have with RMF ease. So first we define two maps from the Rayman rock spaces called pi and tau. And we can compose encode and decode in the right direction. So encode goes from the vector space to the extension field and decode goes from the extension field to the vector space. So pi and tau are actually linear maps. So encode and decode are now linear maps, which makes it very easy to implement in HE. We also have another basic map called the recode function. So what this does, it brings us from the extension field back to itself. Why is this so? It's actually because of a quote when we use the Rayman rock spaces. So the function field theory tells us that when we have two polynomials in L of G, so that's the Rayman rock space. So if we have F and G in L of G, when we multiply them, they actually exist. The product exists in a higher dimensional space L of 2G. So you can see how I define pi from L of G and tau from L of 2G. So that after I do the multiplication in the extension field, I can map it to an actual object in the L of 2G space. Hence, we can only support one multiplication. And so after every multiplication, we actually have to do a recode that goes to the vector space and back to the extension field. So if you are familiar with HE re-linearization, it is almost the same idea. So in this work, we actually try to describe how to use RMFVs with a homomorphic encryption. We call it field instruction multiple data, mainly because we're using extension fields. So this is how we encode data with RMFV and homomorphic encryption. So we have a string of data or multiple data. What we first do is we chop them up into k pieces. And then we take every k section, like in the one in orange, we put them into one RMFV encode and so on and so forth. So we get many of the finite field elements. And then we all pack them into one SIMD ciphertext. And we encrypt it as per usual in HE. So the decoding works in the data direction like trivially. The operations that come with SIMD, we have addition. This is a natural HE consequence. We have multiplication. As I mentioned earlier, we do normal HE amount. And then we need to do a recode after that because we only support one multiplication with RMFVs. We also have a rotate and shift functionality. But this one is a little bit more complicated. So we actually have to count the slots that you will be rotating by first in the SIMD setting. That's why there's a P, SIMD, k. And then after that, we shift it internally in the RMFV setting. So the RMFV rotation is also a linear map. So we can all compose them together nicely. We have two extensions on how we apply RMFV in our work SIMD. So namely, the first one is called the R4 RMFV. And the second one is a three-stage recode process for composite RMFVs. I'll start with the R4 RMFVs first. Recall earlier that I mentioned we define the tau from L of 2G. So in this extension, we instead define tau from some 2 to the power of RG. So this in some sense allows us to support our multiplications before actually needing to do a recode. And so we assign a tag to the ciphertext and say that when we reach the value of R, then we do a recode. So what this essentially does is that usually for the homomorphic encryption extension fields, they're actually much larger than what we can have with the RMFV RAMRoc spaces. So this allows us to use the whole field extension and it's more efficient this way. So this also reduces the number of RMFV recodes, which is actually a costly process. An interesting feature is that there's actually interoperability between data that's multiplied different times. So we have two pieces of data in like 2G and 4G, then they can actually be multiplied together also. Next, I'll move on to the three-stage recode for composite RMFVs. Composite RMFVs are just a way to build big RMFVs using smaller ones instead of the one I mentioned earlier. So we have an inner one and an outer one. So the inner one starts from the base field FT and then the outer one starts from some intermediate field F of TWN. So we kind of like just stack the RMFVs together. So we take K in many data and code them in RMFVs and we take K out many of such encodings and bit them into the outer RMFVs. So it's kind of like a similar composition idea that we have earlier. So this is composite, right? So why the composite actually works is because we can actually decompose x-finer extension fields into towers of field extension as given there. So these actually reduces the cost of linear maps, which is why we are looking at this. So for example, the example I have here, when we want to talk about mapping of F3K to something of extension field of degree 2048, the encode map will actually be something will actually cost 2048 columns. But using a composite RMFV with an intermediate field of degree 16, we actually use more matrices of column 16 and then one outer matrices of column 20128. So the size is much cheaper to use the composite ones. So the three stage record process for composite RMFVs is actually, so yeah, we do this to not record directly from the extension field all the way to the base inner field. So in this, the first step is we perform a decode from the extension field to the intermediate field and then we do a recode on the intermediate field to the base field and then we encode it backwards to the extension field. So you can see step one, step three, they actually perform over the intermediate field and the second recode is over the smallest base field. So we optimize this further by extending the inner and outer RMFVs to R4 RMFVs. So this will actually delay the inner recode needed. So when we do step one, we can actually recode at the intermediate field until it's necessary. Then we do the out the inner recode and then we can go back. So I'll share some experimental results that we have with RMFVs. So the RMFV parameters that we chose, so earlier I mentioned the Raymar rock space is characterized by the curve C. So we chose three different curves here, the protective curve, elliptic curve and the Hermitian curve. So each curve actually gives us a different value of k. So you can see that the predictive curve gives us the smallest number of points that we can encode while the Hermitian gives us the most. The Hermitian also is over defined over some base field t square. So actually when we want to apply this into HE, the slot degree is actually half effectively. For the HE setup, so we chose small primes like three or seven and we set to 80-bit security. So what we do is we actually prepare two cipher tags. The first one is the usual HE cipher tags. The second one is encoded with the FIMD encoding described earlier and then we do repeated squareings until the multiplication fails. We record the time and the number of multiplications done that allows us to decode correctly. And so I compared the amortized speed up between the FIMD cipher tags multiplication and the HE cipher tags multiplication for some R4-RMF3 results. So you can see here that we can only perform less FIMD multiplications compared to HE multiplications. So we have around three to four while HE has five multiplications. So this actually gives us a sense of how much noise is consumed when we do a FIMD multiplication compared to HE multiplications as the two cipher tags have the same, were initialized with the same noise budget. So I would like to say this is mainly because of the record process that's baked into the FIMD multiplications. More here where if you see when I set R equals to one, this means that I actually do a record after every FIMD multiplications. For R4 equals to four, I actually do one record after four multiplications and you can see the time difference of about 2.5 seconds. So since we actually noticed that for HE cipher tags, they actually only support up to four FIMD multiplications, we actually did another experiment where after four FIMD multiplications, we did not do the record and that's in the start row that you can see there. So we have a much better amortized speed up in this setting. So comparing the type of curve that was chosen and in consequentially the number of points that we can encode, it appears that curves that actually give more points will give a better amortized speed up without much R4, without increasing the value of R compared to the smaller curves like predictive and elective curves. Next I'll move on to the composite RMFE results. So here's a comparison between the R4 RMFEs and the composite RMFEs. So the first thing you'll notice is that we definitely do much lesser FIMD multiplications for composites than for R4s, namely because the costs of the linear maps in composites are much bigger than those in R4. However, the saving part about composite RMFEs is that the packing efficiency of composite RMFEs are much better. So I compute a value d over k which measures how much so d is the degree of the slot divided by k the number of points you can pack into a slot smaller is better. So you can see that composite RMFEs actually offer much better packing efficiency and also as a result a better amortized speed up time. This is a comparison between the three stage recode and a direct recode for RMFEs. So we didn't compute the direct map the direct recode map for composite seven productive curve because it was too large. So we tried to estimate it by using the closest R4 RMFE which is seven h of a similar k value. So we by extrapolating the the recode timing we find that actually the three stage recode is much faster at 0.75 seconds per FIMD map compared to extrapolated direct recode timing at 1.2 seconds. So we actually also looked at how should we vary the value of R4 for the inner RMFEs and so you can see that when we increase the R4 away from 1 to 2 we get a much better we get a faster time when we do the FIMD multiplications. So this is mainly because we actually delete the inner recode as mentioned earlier and so we perform the recode only at the intermediate field before needing to go over so hence it's much faster that way. However by increasing the R value of the inner recode this actually results that we can only use we can encode a total number of lesser points than the other one. So in some sense there's some balancing needed if you want to talk about speed versus the amount of data that you wish to pack for composite RMFEs. So to conclude in this work we actually showed a way on how to use small primes with homomorphic encryptions with almost the same amount of pack data on the whole generally. So we have two RMFE extensions the R4 RMFE and three stage recode for composite RMFEs. There are some tradeoffs when using FIMD in where in a sense FIMD multiplications consume more noise but we get a better amortization speed up when using FIMD and we also need to look at some kind of balancing between the running time and how much data we want to pack in this setting and with that thank you for listening. Thank you again. Yes thank you for the talk. So usually when we run the bootstrapping there is one step that is somehow costly where we have to homomorphically deep pack the message you know so usually for BGV or FEV you would have to run some kind of homomorphic entity let's say so it seems that your deep packing would be harder to to run do you have an idea about how your method would impact the the current bootstrapping methods that you have? So I assume you're referring to the kind of thin bootstrapping by for the BGV bootstrapping? We use the thin bootstrapping when when we are not packing several bit in each slot so you are packing half of the number of bits that you can pack right in each slot you use your method. Okay I need to check that out again but essentially because the bootstrapping actually uses small primes the more efficient version uses small primes in this sense so we are hoping that this will actually allow us to pack more coefficients into that kind of setting because I think the current work is that you want to use the small primes with if you try to instantiate the homomorphic activation scheme with small primes you get very little slots so when you do bootstrapping you need more cyber attacks so by with our work we say you can pack more data so you don't need so many cyber attacks to do the bootstrapping. Does that answer your question? And also like because I think we can actually get some kind of speed up when we use this kind of smaller primes so in this kind we were thinking this is the kind of contribution to that yeah. We have lots of times feel free to ask more questions. Thank you for the good talk and I have a short question what kind of use cases or applications for the plaintext of prime similar prime such as f3 or f7 in your presentation? Okay so as in the earlier question just now so bootstrapping uses small primes in the for the plaintext so there was one application we were looking at there are also a series of works in our paper that talk about people using small prime circuit kind of Boolean circuit for the homomorphic function inside the work so you can when you use small primes in the bgvfe case and then you instead of designing the usual bit your usual word kind of functions you can define Boolean circuits on that kind of cyber text so you can do like equality and comparison circuits so we have a list of literature inside the work there yeah. Thank you for your talk I want to ask when the plaintext modulus is a power of two if I remember correctly then sim packing does not work in that case will your with your new packing technique be able to apply? No it will not work too. Okay so that's still limited to prime power yeah because it's based on the simd original simd the simd factorization here so this only works with primes well if you're using powers of two psychotomic that is yeah if I'm not wrong yeah okay thank you. So I have one question related to your title and not your talk because it was called field instruction so I was hoping that maybe instead of just addition and multiplication we would get division once also apparently it's not that but do you see a do you see a way of getting directly divisions in FHE or it seems completely out of bound? Gee that's a good question I don't think we can get division like nicely from this in this setting but I think it would be interesting to look at whether when we do division in of the field elements and we can get a corresponding division kind of homomorphism into the finite fields in the vector space so yeah I think that would be interesting to look at I don't know whether it is possible or not. Okay thank you thank you okay I guess that's it so let's thanks all our speakers and have a coffee break.