 We have three talks for you. The first one is called, Faster packing of morphic operations and efficient circuit bootstrapping for threshold fully homomorphic encryption. Doris. Sorry. Sorry. And we're missing. It's a paper by Idaria Kilotti, Nicola Gamma, Maria Gorgieva, and Marika Izabaschet. And the talk will be given by Idaria and Maria. So thanks for the introduction, thank you everyone for being here. So we're going to present today improvements in TFHC. We presented last year a page script. So it's a joint work as as we say that with Nicola. So thanks. Nicola Gamma from Maria Gorgieva and Marika Izabaschet from CA. So of course TFHC is an homomorphic encryption scheme, which means that it is able to perform computational encrypted messages without anything to the group. So in this presentation, we distinguish two main constructions, the level one and the bootstrapped ones. In level constructions, we set a function to be evaluated in the beginning, and then we search parameters to homomorphically evaluate this function. Instead, with bootstrap construction, we set the parameters in the beginning, and with those parameters, we are able to evaluate any function. Of course, these two constructions have pros and cons. Level constructions perform fast evaluations, but they definitely function need to be non-ended events. So it is not practical when, at least, we want to evaluate operation on dynamic datasets. Instead, bootstrap constructions have no depth limitations, but they perform very slow operations, because we have to often evaluate bootstrapping, which is very costly in terms of time and memory. So I represent the improvement or the construction in the first part of this presentation, and then I leave Maria to continue the presentation. So generally, when we think about homomorphic schemes, we think about a simple message space in which we can perform additions and modifications, but actually, this is richer than that. In our scheme, we use... Our scheme, TFHE, is a GSW-based homomorphic scheme, which we use three kind of ciphertext. So I'm going to describe the operation on the plaintext space, but of course, everything is done on ciphertext, the homomorphic scheme. So the first ciphertext is LWE, on which we encrypt Torus messages. So the Torus is the real Torus, which means the reals module one. And here we can perform additions or linear combinations. Then we have ring learning with errors with Torus polynomial messages up to degree N, which we perform again linear combinations. And then we have ring GSW ciphertext, on which we encrypt integer polynomials. And here we can perform both linear combinations and an internal product. So thanks to the fact that the Torus is a real module and is an integer module, we can also define an external product. It was defined last year between the Sonderring, GSW, and the Ring with LWE. We can also define all their operations, as instance a nonlinear operation between the Torus and itself. We can go from the Torus message space to the Torus polynomial message space via key switching. So the key switching here is not traditional. It allows to change the key, but also to evaluate morphism, which can be private or public, depending on the application. We can come back from the Torus polynomial to the Torus space via an extraction. And then we define a new bootstraping technique called circuit bootstraping to go from the Torus message space to the integer polynomial message space. And we can also use key switching to go from the integer polynomial message space to the integer space and back. Okay, so the first improvement we did to TFH is a new packing technique. So it is known that Ring, LWE, Cypher, takes care of two polynomials. The polynomial can be seen as a huge container on which each slot is an element of the Torus. So till now it wasn't unclear how to use the full potential of this packing in a GSW context. So we're going to show you how it works with an example. We use it to evaluate lookup tables, which are interesting to evaluate arbitrary functions. So a lookup table looks like this. On the left part we have inputs which are bits. And on the right part, the corresponding output, which are Torus numbers. And we evaluate a lookup table via muc3 where the muc gate is a ternary gate, choosing between a value d0 and d1 depending on the value c, which is called the selector. So in a classical way we can encrypt all the outputs for every input inside our Ring LWE Cypher test and then evaluate the muc3 to obtain the final result altogether. The problem is that, as I said, the Ring LWE is a container and it contains in our case 1,024 slots. And it's generally not so common to find a function that has 1,024 outputs. So some slots remain empty and this doesn't use the full potential of the packing. So our idea was to pack in a different way in a vertical way if you want. So this packing technique can be used even if the function has a single output and it is quite easy to fill all the slots because it is sufficient to have a 10-bit input. So how to use this packing to evaluate the lookup table? I'll show it with an example. So in this figure I suppose that I was able to pack all the outputs in four containers Ring LWE and that the output I'm looking for is in the second container. So we initially evaluate as muc3 that selects the correct container and then we perform a blind rotation that rotates the slots inside the container and that brings the output I want to search on the first place and then we extract the first place via the extraction operation. So the techniques of vertical packing and batching or horizontal packing are compatible and they can be mixed together depending on the application you're interested in. The second improvement we need to TFHE is a new computational model based on automata. So last year we presented an automata model based on deterministic finite automata. The deterministic finite automata are decisional automata so they output a single bit of information. Instead this year we use a new technique from the automata logic which is called the deterministic weighted finite automata which are instead the decisional computational they return a weight which is on torus polynomial and the idea is that the weights act like a circle or memory that accumulates the finite results all along the computation. So in both kind of automata the transition have the same cost. So I will show the difference between the two with the example we evaluate the max function between two integer x and y both composed by n bits and to evaluate the max function we use a disautomata which is the automata corresponding to the comparison. So disautomata reads and inputs all the bits of x and y starting from the most similar components and then we switch between states and it decides if x and y are equal states in the states in the central state we call e for equal. If x is smaller than y it goes up to the state b and stays there till the end of the evaluation if instead y is greater is x is greater than y it goes up to the state a and stays there till the end of the evaluation. So if there is equal finite automata if we want to compute as instant the second bit of output we have to deviate the automata after the evaluation of after reading the two first bits of x and y to our final state and then we have to evaluate to evaluate the automata we start from the end and we assume we go till the beginning. So the problem is that when we do that previous computation get lost because we lost all the suffixes and so we have to construct and different automata for every output bit. Instead we've weighted automata we still we continue evaluating the comparison automata but instead we mark some we mark the accepting the transitions and we have the specific ways when I write the color arrow and we evaluate the entire automata and at the end we extract the final weight which contains all the results in a single pass. So the idea is to make an example is that if the final second bit of the output is equal to one this means that the automata went through one of the orange transitions. Ok the third improvement it's a new homomorphic counter we call TBSR is a ringed homomorphic counter that is able to do three basic operations in an efficient way the instruction of bits, the implementation and the division by two. So it can be used to perform efficient evaluation of automated functions such as school book modification or multi-addition and of course it can be mixed with previous computation but for time concerns we cannot give more details. So I let Maria continue the presentation. Thank you Elaya and the second part of the presentation is to show how we can compose our building blocks that Elaya explained you in order to obtain a faster evaluation of larger circuits. In particular we are interested by the question whether we have to use bootstrapping after each gate or after many gates. Most JSW construction are based on the MUX gates and historically the outputs and inputs of our MUX gates are of the same types than in JSW. In opposite the level of noise and the precision are not the same and we have to slow down whenever we need to increase the level in particular where we use level 0, 1 and 2 we can map them directly to native operations using Flop32 or 64. In opposite when we use level 5 and 6 we use operations in GMP and they are extremely slow. In MUX they are in this MUX we use level of noise that is different and for this reason some composition pattern of the lookup table and the diagnostic finder to do matter are very visible and others are very slow. With the improvement of the external product now we we change some of the side effects of a new JSW side effects with link LWE this improvement was proposed last year and now now it's not possible to connect the LWE outputs with a new JSW inputs and now the evaluation of the lookup table and the diagnostic automata even faster but we cannot connect them and for that all other compositions are impossible. To some of us in one time we have a huge running time of penalty between each level especially 0 to 3 and in other times our transitions in the automata operate at the same level and in other to optimize the total number of levels we should compose automata and therefore the number of compositions of automata is equal to the total number of levels. In this picture you see an example of compositions of automata of three levels and for each level we have one automaton per bit of output to give an idea for the running time we use Google for traffic hours and that means that the dark red automata runs for hours after the orange one in a few seconds. We can apply the optimization of the external product of the last year to the last to the lowest level to improve the battle mass but only to the last one because now the outputs are not of the same type and we cannot connect them with another inputs. We can apply the optimization of the way to find the automata that Ilari explained to you in order to replace this main automata with only one and this optimization can be applied to improve the battle mass in each level of our composition but you see now we have our red part and we propose a new method in which automata are in level one but for that we propose a new item that we call circuit bootstrapping and that allows us to switch between a SATA text of type AWE to JSW. With the previous solution all of them are test of circuit bootstrapping idea and they are homomorphically evaluate the equation of the SATA text of JSW. This involves to evaluate quality automata in level two or evaluate the compositions of semi-linear automata in level three and two. The two solutions allow us to degrease the noise and also to to switch in the types of the SATA text but both are very slow and in this work we propose a faster circuit bootstrapping that use directly the internal structure of the JSW SATA text. The new JSW SATA text is composed by a matrix and each matrix is composed by lines and each lines is SATA text of NWU. Here we have the generic formula of the SATA text and now the goal is to reconstruct this matrix line by line. Starting with the SATA text of NWU the first step is to apply a bootstrapping in order to degrease the noise and to divide the message and the second step is to apply the private key switch in order to mix the message directly with the key. Now you see we have our needed element and we can reconstruct the matrix of the SATA text of the JSW. In TEPG we have two different strategy for the bootstrapping. The first one is the circuit bootstrapping model that I just explained to you and the second one is the gate bootstrapping that we presented last year. This bootstrapping is faster but be applied after each binary gate. For the circuit bootstrapping we know that our circuit bootstrapping comes in 134 milliseconds but is applied only after many gates and the transition is very fast only 34 microseconds. The circuit bootstrapping is used for the compositions of automata and in other hands we have the gate bootstrapping model. The gate bootstrapping is applied after each binary gate and now bootstrapping binary gate comes in 13 milliseconds. Our binary gates have the same same cost and the composability is very easy we can compose all of them. Now we want to compare the two modes, the circuit and the gate bootstrapping and we are interested by some limited functions and in this picture we have in red the gate bootstrapping mode and in purple and blue we have the circuit bootstrapping mode for the multiplication of two numbers of gate bits we see that the gate bootstrapping is the best strategy, up to 4 bits and after that the best strategy is the circuit bootstrapping but the good news is that for the following multi-conscription we need only one set of parameters and we can always switch between the different modes and therefore use always the optimal solution for an arbitrary function represented by a hookup table we use here the lab scale and we can observe that the speeder between the circuit bootstrapping and the gate bootstrapping is almost 1 million and for a TVG it's an open source library and you can visit it in the GitHub address the circuit bootstrapping is also implemented and available to the experimental branch we have tested our library for a very huge circuit with more than 150 million binary gates and with that more than 100,000 the objective of the test was to test, to verify the correctness of the equation and also to verify the independence heuristic in this heuristic we assume that the noise of the input are independent and for motion distribution therefore the noise of the output are also independent and will also for motion distribution under standard deviation that we can pre-comput in this curve we see the predicted values in the purple curve and in the blue one the experimental and we can see that they are much perfect here and they are much below the red one and the red one is critical for the correct equation now to conclude in this work we propose a new backing technique the heuristic backing we propose a new computational model the way to find the automata and also a new multi-counter that he said we also propose a faster circuit bootstraping that makes the level and the full information more practical in the paper you can find also some example of parameter set and more comparison between the different optimizations and for future we still work to make the full market contribution more and more practical thank you very much for your attention any questions for Ilaria and or Maria next up can you go back a little bit once you talk about this particular function is a lead but it is lead but if it is maybe just shout the input of the function is very long maybe 20 piece or 3 piece you will have to do that so how can you avoid that there is something you want us to go through the virtual backing you are talking about table table this table this table is very large how can you avoid that ok you can back larger you can give me an example of large for you if you have 20 piece you will have to do the table 20 piece you have to back 20 you can do it with 20 containers 20 being the WB side effects because you need every container one one million yes we have 10 containers yes you have 2 to the power of 10 containers so if you have 2 to the power of 20 you have to do containers let me ask but if you have 2 to the 40 you have 40 piece input you will have a feeling entry and by your by your method maybe you have to do that I mean it is still it is measured look to me this method doesn't work for complicated functions am I correct or I am using a separate does it work for for a very large functions with long foots for example you will do some humming on some long strings this method we have also an improvement on 1000 because we back them up at 1000 with the actual parameters we will also work but yes the difference the improvements will be a factor of 1000 compared to before we can always at the place to apply the operation only to 1 bit to apply directly in the same time on the 1000 and okay so I think this doesn't work I think it is best to take the discussion online so if there are no further questions let's thank the speakers again