 Hello, my name is Antoni Guimananis, I'm a PhD student at the University of Campinas and I represent our work titled Visiting the Functional Woodstrap in TFHE. So, a little bit of context, the current TFHE schemes are capable of evaluating linear arithmetic circuits in very efficient ways. However, non-linear functions are still a challenge for them, especially with high precision. Some systems such as CKPS the implement non-linear functions using approximations, for example, T, FUEA, and CHEV-CHEV series. They achieve relatively good performance because we can evaluate the end-in-batches using instinctive computations. However, if you need to evaluate functions with high precision, the cost of these approximations would grow super-linearly and sometimes exponentially and it could become prohibited. TFHE on the other hand implements circuits using binary clock gates, which can implement basically any function at any desired precision. However, complex functions require a very large number of clock gates, which would affect performance. In this context, there is a new alternative approach called Functional Woodstrap, which maintains basically the same versatility of binary clock gates while being capable of implementing much more complex operations. Thus, they present a better performance. In this paper, we revisited this technique and we present new methods and optimizations for improving it, especially for high precision functions. I will start this presentation with a very brief overview of TFHE and its Functional Woodstrap. Then I will present our contributions and results. TFHE. TFHE is based on the Learn with Errors problem and it's basically an instantiation of the problems using elements in the real task, which is the set of real numbers, module one. And inside the representation, it's the numbers in this interval. It has three types of ciphertexts and in this work we will use mostly the first two of them that I will get in details in the next few slides. TLWE sample. It's a pair, A and B, where A is a vector of any scalars in the totes. It's sampled from a uniform distribution, then you will have an example with any equals five. B is calculated using this equation, where S is the secret key. TFHE uses binary secret. Here's an example. E is an error sampled from a Gaussian distribution with mean zero and the stellar deviation sigma. This sample here that we just generated is not encrypting anything, so we call it a sample of zero. To encrypt something, we just take the fresh sample of zero that we just generated and add the message to B. So in this example, we are adding 0.2, the message to B, and we have the pair. To encrypt, we first calculate the face, which is message plus error. In this case, 0.23. Since it will work with exact computing, we need to round this space to some discretized space. In this case, we are rounding to multiples of 1 over 10. So we get the original message, 0.2. So the second type of ciphertext in TFHE, the ER LWE samples, they are basically the same thing that I just explained with one major difference. Instead of scalars in the totes, each element is a polynomial in the totes with coefficients in the totes. So for example, this is the message n, which is a polynomial with coefficients in the real totes. In this presentation to avoid confusion, I will call the TLWE samples, scalar samples, or scalar ciphertexts, and the TRLWE samples, polynomial samples, or polynomial ciphertexts. Our arithmetic additions and scaling scheme performed directly. We simply add each element of the pair or multiply each element of the pair by an integer. TFHE does not support multiplications between ciphertexts, it relies on external products with other types of ciphertexts. In any case, the important thing to note is that their arithmetic operations increase the error, and eventually it will affect significant bits of the message. If we want to continue the computation, we would need to perform a bootstrap procedure, which resets the error to some predefined amount. Building blocks of TFHE, the TFHE, it has three main building blocks that are necessary for the bootstrap that we use in this work. The first one is the key-sweeping. It has several uses, such as tweaking keys and parameters. But the main one that we are using here is the packing of scalar samples in polynomial samples. For example, here we have four scalar samples, each one encrypting MI, and we can use the packing key switch to pack these four scalar samples in one polynomial sample. The sample extract does exactly the opposite. It receives a polynomial sample and extracts the coefficient of some monomial. In the example, we are extracting the monomial of degree zero. The third building block in TFHE is the black rotate. It receives two inputs. ACC is a polynomial sample. C is a scalar sample. It rotates ACC based on the phase of C. So in this example, we have N equals four. The phase of C is 0.25. So it will rotate ACC by multiplying it by X to the power of two. If you rotate two positions to the left, here we have the result. The monomials that were rotated back from the beginning are now negative because this multiplication occurs modulo the 2NF psychotomic polynomial. So we have a negacyc property. With these building blocks, we can define the bootstrap. Original TFHE only works with Boolean values. It represents then as minus one over four or positive one over four. The bootstrap receives two inputs, ACC, which is the polynomial sample and C, which is a scalar sample to be bootstrapped. In this example, C represents a bit one. So its phase should be one over four, but let's say that for some reason the phase is one over eight and we want to correct this error. What the bootstrap does is to rotate ACC based on the phase of C using the black rotate and then it extracts the constant value of a rotated ACC. Basically, if the phase of C is something between minus 0.5 and zero, this multiplication will be to X to something between zero and N. So thanks to the negacyc property as we saw here, the constant term becomes negative and we have minus one over four. Otherwise, the rotation amount is bigger than N, so it will rotate more than once and the constant term will be positive again. When extracting, we will have one of these two values, which is the expected values for Boolean values in TFHE. The functional bootstrap, the basic idea of the functional bootstrap is to evaluate a function within the bootstrap instead of just resetting the error. In TFHE, the functional bootstrap evaluates lookup tables. Here we have an example. We have the input or selector, in this example two. We have the lookup table itself, which is encoding the square function and we have the output four, which is the square of two. The functional bootstrap is pretty similar to the regular bootstrap. There are two main differences. First, we are no longer encrypting Boolean gifs. We are encrypting integers in some ways, based for this example. And second, ACC is no longer a fixed polynomial. It now encodes the lookup table that we want to evaluate. In this example, the lookup table has four slots since we are working with laser four and each slot mapped through a sequence of 256 monomials in ACC. The bootstrap algorithm is similar. We use the black rotate to rotate ACC and the sample extract to extract the constant term. The process that is actually performing the lookup up here is the black rotate. However, it rotates ACC based on the phase of C. This phase contains an error, which is scaled by 2n. And furthermore, this value is rounded. So we have two sources of error in this rotation. In practice, let's say we want to work with a large base and we want to rotate 100 positions in ACC. For these parameters, the black rotate would actually rotate something between 7th and 130 positions. So this rotation is not precise at all. That's why we need to map each slot to a large sequence of monomials. And the lookup table needs to be relatively small compared to the size of the polynomials. We also need to add a precision offset to C, so that the black rotates ends exactly in the middle of these sequences. Another technique that we will be using this work is the multi-value bootstrap. The idea here is that we can evaluate many lookup tables with the same selector at once, which of course greatly improves performance. However, it also increases the error output by s times q minus 1 squared times, where s is the input base and q is the output base. Taking this work, one of our contributions is showing how we can remove this square power from the equation. So the function of bootstrap is a great improvement compared to logic gates, but it still has some problems with high precision functions. Taking a few examples from the literature, we can see that the sine functions, which only requires one bit of precision, can be evaluated using a polynomial with 1024 coefficients. It takes just 13 milliseconds, and the error rate is negligible. A 6-bit to 6-bit lookup table on the other hand requires a polynomial 16 times bigger and takes around one second to be evaluated. The error rate is also something that is not negligible. Basically, the problem here is that the execution time grows super linearly with n, and n grows super linearly with the desired precision. In this work, we introduce new methods to evaluate functions with high precision without increasing the parameters. So we can evaluate the 6-bit lookup table with a polynomial of 1024 coefficients, and we achieve a much better error rate and a much better execution time. So our contributions, we introduce two methods to combine multiple functional bootstraps so that we can evaluate large lookup tables without increasing the parameters of the system. We present optimizations to the core procedures of our methods, and we perform a complete error analysis including experimental validation of our two methods and optimizations. We also implement several common functions and compare our results with previous literature. The basic idea behind our two methods is to decompose the cycle text in multiple symbols and encode the function in several small lookup tables. There are two ways of combining them. In the chain methods, the output of a lookup is used to create the selector of the next. In the three-base methods, the output of a lookup is used to create the next lookup table. The chaining is a generalization of the integer comparison algorithm presented by Gorse and others, and it can only evaluate functions following this definition. However, it presents a much better error variance than the other method. This definition doesn't help much when defining new functions, but we found a few families of functions that usually present very good results with this method. Mostly, they follow test logics such as integer comparison, sign, and parrot, or carry-like logics such as additions and multiplications. We show results for an addition implemented with this method. The three-base method, on the other hand, is capable of evaluating any function. In this example, we are evaluating eight bits in white in base of four, instead of having a very big lookup table covering all these intervals, we have several small lookup tables. Each square in this image is a different one. Then, we evaluate all lookup tables in a whole with the same selector. First, C0, which is the least significant digit. We get the results, we create new lookup tables which we evaluate with the selector C1, the second least significant digit, and so on until we reach the end result. At first, this method would require an exponential number of functional bootstraps. However, all lookup tables in a whole are evaluated with the same selector, so we can use the multi-value bootstrap to evaluate all of them at once with a single bootstrap. Then, we have a linear number of bootstraps for any arbitrary function. This method also enables optimizations based on specific properties of the function. The sigmoid, for example, has two intervals, there are almost constant values, and it also has one interval that is almost a linear function. We can replace all of them, the three, reducing the number of lookup tables that we have to evaluate. This is the algorithm for this method, and here we see it's two main building blocks, the multi-value bootstrap and the public switch. We present optimizations for both. First, we present a specialized version of the packing switch, and then we introduce a multi-value extract procedure, which allows for text scaling with linear error growth, and therefore it improves the error output variance of the multi-value bootstrap. The basic idea behind the key switching is to homomorphically calculate the phase. For packing scalar samples in polynomial sample, this is what we need to calculate. F is the packing function, and KS is the key switching key, which is basically an encryption of the tree, the key itself. The most expensive part in this is this multiplication here, between KS and the polynomial generated by the packing function. This method is capable of packing up to any samples. However, we just want to pack B samples, where B is our base. So, this polynomial will have sequences of N over B repeated monomials. Instead of having these repetitions in this polynomial, we can pre-calculate them in the key switching key. In this way, the KS key becomes B times bigger, but the multiplication becomes N over B times faster, and generates N over B fewer errors. In our parameters, N over B is 256, which is a very significant gain. The second building block optimization is the multivariate extract. This equation defines the error-growth variance metaphor in ArchMap. Ho is the correlation between variables. When adding independent variables, we have a linear error-growth. When scaling a variable, multiplying it by an integer, Ho is 1, and the error-growth is quadratic. The idea here is simple. Can we implement multiplications as sequences of additions of independent scalar samples? We can always implement scaling as sequences of additions. But how could we obtain independent samples? Well, if we remember our lookup table in coding, each element of the table is mapped through a sequence of 256 monomials, and according to the independent heuristic, they should be and should remain independent after the bootstrap. So, at first, what we could do is simply extract multiple samples of the polynomial. However, we noted that although the independent heuristic holds true for the Gaussian error from the encryption, the error introduced by the approximated gadget to the compositions of the FAT, which is a novel from FAT, are not independent. We needed to increase the precision of the gadget to the composition from 20 to 25 bits so that we could achieve the result that we were looking for. Here we have the multivariate extract with a linear error-growth, while the direct scaling still has a quadratic error-growth. So, in general, the multivariate extract allows evaluating any hypertext scaling with linear error-growth provided that we had executed a bootstrap recently, which is almost always the case with TFAT. In the multivariate bootstrap of Karpov and others, this square comes from a hypertext scaling. So, we can just apply the multivariate extract here and reduce the error-growth variance from a quadratic to linear, removing this square power. Finally, our results. We implemented several functions to compare with previous literature. We achieved gains of up to 3.2 times compared to works that we're already using the functional bootstrap, and up to almost 9 times compared to works using logic gains. I will highlight a few of these results. This is a 6-bit to 6-bit lookup table. The difference between our method and the implementation of Karpov and others is that they use just one functional bootstrap with very big parameters, while we use several bootstrap with small parameters. We have gains in basically all aspects. The keys are smaller, the error-h is also much smaller, and in the execution time we got gains of up to 2.5 times. 32 bits in the comparison. We compare our work with the one of Varus and others, which was the base for our training efforts and that used the same number of functional bootstraps. However, we're still able to get a 3.2 times speedup over it as a result of our improved error analysis, which allowed us to get a much better parameter set. We also compare with the work of Zon and others, which uses logic gains. We consider the error rate for logic gains to be negligible because it exceeds the precision of our estimations, which was 500 bits. However, this value here is also negligible compared to the security lab. Eight-bit additions. We implemented this algorithm using the chain method, and we compared it with works using logic gains. The algorithm is linear in precision, so we can have an execution time of approximately 10 milliseconds per bit for integers of inside. For example, we could add integers of 100 bits in just one second. Our gains here reach almost 9 times, and these two values are also negligible compared to the security lab. To brief summarize our work, we presented two methods for combining multiple functional bootstraps, and we achieved gains of almost 9 times over previous literature. Our methods also enabled the possibility of efficiently implementing functions with even higher precision. We presented building locked optimizations, which are made for our methods, but also can be used in other contexts. We showed a specialized packing key switching, which for our parameters is 256 times faster and presents less error over the generic technique. We also present a multi-value expect procedure, which enables scaling with linear error growth, and we present a complete error analysis, including experimental validation. In general, the functional bootstrap is still a new technique, but it looks very promising. There is a lot of space for improvements on many of the techniques that we presented here. Thank you all for watching this video.