 All right. Thank you, Otec, for the introduction. So this work is about evaluating a morphically neural network. So we want to evaluate neural networks over encrypted input. So as we know, machine learning as a service is gaining more and more importance every day, and new applications come out constantly. In the usual scenario, we have a remote party that has a model, for example, a trained neural network. And a user, let's say Alice, wants this model to be evaluated on some input X that she holds. So the usual, the easiest model of interaction is for Alice to send X to the remote party and receive the model evaluated on X. Of course, this works, but there is a huge problem here with, oh, sorry, with Alice's privacy, because her data, X, is leaked to the remote party. And this might be sensitive information that we don't want to be leaked. So one possible solution to this problem is, of course, fully homomorphic encryption, where instead of sending just X, Alice sends an encryption of X and gets back an encryption of the output of the result. So of course, this is good for privacy because data is encrypted both ways, meaning that the remote party doesn't learn X and doesn't even learn the output M of X. The usual problem is with efficiency, because all the solutions based on fully homomorphic encryptions are known to be somewhat cumbersome or inefficient or in any way complicated. So the goal of this work will be to evaluate neural networks homomorphically as efficiently as possible. And note that we do not care about training from encrypted data. We assume that the model already exists, has already been trained somehow with some data. We don't know how that happened. And we assume that the model is available in the clear. So the weights of the neural network will not be encrypted. So very quick refresher on neural networks. Some of these things have already been said by Shafi in her lecture notes. So we'll just repeat a few key concepts. So neural networks are collections of computational units called neurons, which are arranged in several layers. So we have an input layer that receives inputs from the external world. Then we have several hidden layers. And the number of these hidden layers define the depth of the model. And then we have an output layer that communicates the result back to the external world. And if we zoom in, and if we look at one single neuron, what happens inside the neuron is basically this. So there are inputs excise that come into the neuron on wires, which are associated to several weights, wi. And now two things happen in the neuron. So we have, first of all, this object, which we call multisum, because we scale the inputs excise by the weights wi. So we multiply them, and then we sum everything together. And then we apply a nonlinear function, which is called an activation function. The result, which we call y, is then the result of the neuron, which is propagated to the following layer. And please here note that everything is on the real. So in general, there is no restriction. So x, i's, w, i's, and y are all real valued numbers. In this work, we consider, as a specific use case, the problem of digit recognition, meaning that we will have a picture that represents a handwritten digit. This will be given to our model. And the model will have to predict, will have to understand which digit is depicted in the image. The thing here is that we will do everything homomorphically, meaning that we will start with an encryption of an image, and we will recover an encryption of the predicted label. And as a data set, we use the common NIST data set, which contains several thousands images of handwritten digits. So currently, the state of the art for this problem is Kryptonets. Shafi also mentioned this work. It was proposed in 2016 by people at Microsoft Research. And it achieves blind, meaning omomorphic, so it operates on encrypted data, noninteractive classification. And here, noninteractive means that it does not rely on multiparty computation, so there is no interaction between the user and the server. It also achieves almost state-of-the-art accuracy. There is a small loss, but it's fairly limited, so almost 99% accuracy. The problems here are that usually the activation functions are sigmoidal functions, meaning that they have the shape of an S, basically. And here, these functions are replaced with low-degree polynomials, because they are easier to evaluate homomorphically. And this has been shown to work pretty well. But the biggest problem is that it uses somewhat homomorphic encryptions, meaning that parameters have to be chosen at setup time. So when we set up the system, we have to choose the parameters taken into account, the entire structure of the system. So again, the main limitation of this approach is that the computation for each neuron depends on how complex the entire network is. So it depends on the total multiplicative depth of the entire network. And this is particularly bad, especially for deep learning, because in deep learning, we can have models where we have tens or even hundreds of layers one after the other. So the complexity is very high, so the parameters have to be chosen very large, and this makes the approach quickly inefficient. So it's an approach that, in general, does not scale well. Instead, in this work, we want to make the computation scaling variant, meaning that what happens at the neuron level does not depend on how big the total network is. And in order to achieve this result, we will rely on bootstrapping, which is a technique to refresh the ciphertext in order for it to be able to support more computations. So first of all, a restriction on the model. So we said that one of the steps that we have to perform is homomorphically computing this quantity here that we call multisum. So we will be given the weights of the model in the clear and encryptions of the inputs. So what we do is just we take the encryptions, we scale them by the corresponding weight, and then we sum everything together. The thing that we have to take care of is that in order to maintain correctness, we can scale only by integer constants. If we scale by floating points, we lose correctness. So we have to make sure that the weights are integer, whereas before, we had no such limitation. So of course, this induces a trade-off between efficiency and accuracy, because the way we discretize the weights will impact, of course, efficiency and accuracy. It's a matter of precision. So how precisely we want to discretize the weights? So again, the goal is to define an FHG-friendly model of neural networks. So a model of neural network that can be efficiently computed or evaluated homomorphically. So we define this object, which is called discretized neural network, or DIN. Here I put the definition, but actually it's very simple. We just take integer values for the inputs, integer values for the weights, and then we take an activation function that maps whatever comes out of the multi-sum back to the input space. This kind of cyclical structure makes sure that we can go on computing for an unbounded number of operations. So first of all, this model is not as restrictive as it might seem. For example, it has already been done, even in a more restrictive fashion, with binarized neural networks, where everything is binary. So both the inputs, the weights, and the activations, of course, are binary. And it has been shown to work well. Of course, there's a trade-off between size and performance, because if we want to discretize the network, they will become bigger in order to maintain the same level of accuracy. And finally, we know that a basic conversion from a generic neural network over the reals and a discretized neural network is extremely easy. We can just chop off the decimal part, and that is actually a DIN. Of course, this might not be the best way to do it, so this might not be the best way to do the conversion, but it already works, meaning that it achieves a model that respects this definition. So again, once we have a DIN, how do we homomorphically evaluate it? So first of all, we want to evaluate the multisum, and this is easy, because we just need an homomorphic encryption scheme that supports linear operation. Then we want to apply the activation function, and this is a tricky part, because it depends on which function we choose, and I will tell you more about this in a moment. Then we have to bootstrap in order to refresh the ciphertext and be able to go on computing, and then we have to repeat this process for all the layers of the network. We also have some issues to face, so first of all, we have to choose the message space of the encryption scheme. We have several ways of doing this. We could guess a number based on some prior information that we have. We can take statistics on the training set and hope that the test set or the new instances will behave in the same way, or we can take the worst case over the entire input space, meaning that we support even the worst possible input that we can receive. This is usually the safest approach, but not guaranteed to be the best, because if the worst case happens with very low probability, then it might be better to just sacrifice perfect correctness in order to achieve more efficiency. Another issue is that the noise grows with a momorphic operations, so we have to start from a very small noise, because even if we apply bootstrapping here, just to compute the multi-sum, we already have to perform some momorphic operations, and this makes the noise grow. So we have to start with a very small noise, and in order to maintain security, even with a small noise, we have to compensate by using bigger parameters, larger parameters for the encryption scheme. And finally, the main question is, how do we apply the activation function homomorphically? So the basic idea is to activate while we apply the bootstrapping procedure. So we combine bootstrapping and activation function, meaning that we move from an encryption of a value x to a refreshed encryption of a value f of x. So we apply the function, the activation function f, and we bootstrap at the same time. So what happens is basically this, this is the same picture as before, only that now inputs and output and the output are encrypted. So we have encrypted inputs coming in, they are weighted with these weights, and they are activated and refreshed. The result is basically a refreshed version of this Y that we saw before. So again, two steps. First, we compute the multi-sum, and then we bootstrap to the activated value. The starting point is a work called TFHE. This is not threshold for the homomorphic encryption, it's torus for the homomorphic encryption. There is a collision in the acronyms. It's a work by Kilotti et al, which was presented in 2016 and is nowadays, to the best of my knowledge, the fastest implementation of the bootstrapping procedure. So the basic assumption is LWE over the torus, where the torus is the reals module one. So in this work, the authors define several flavors of LWE-based encryption schemes. The two that we will use are LWE and TLWE. So with LWE, we encrypt a scalar into n plus one scalars, whereas with TLWE, which is roughly ring LWE, we encrypt a polynomial into k plus one polynomials, where n and k are security parameters. Now, I will not go into the details of the bootstrapping procedure, but in order to just give a quick intuition, the analogy that the authors use is the wheel of fortune. So when we want to refresh a ciphertext, we prepare a wheel, which is divided into slices, and each slice contains a possible result of the bootstrapping procedure. Then we homomorphically spin the wheel, and we take whatever ciphertext is pointed to by the arrow at the end of the procedure. And homomorphically spinning the wheel means that instead of having access to the secret key s, we have access to an encryption of the secret key s, which is called the bootstrapping key. So basically, we start with the ciphertext that contains a certain message. We homomorphically spin the wheel by a quantity which is equivalent to this message, and then we pick whatever ciphertext we end up on. This is an intuition on the bootstrapping procedure. So in this work, we focus on the sign as an activation function. So we want to homomorphically compute the sign of a message, and the way we do it is this. So we start with the wheel on the left-hand side that contains our inputs. So from one to i, from minus one to minus i, and the zero. And we map this wheel to the wheel on the right-hand side where the top points are associated to plus one, and the bottom points are associated to minus one. Then we homomorphically spin this wheel, and we pick the ciphertext that we end up on, and that will be an encryption of the sign of the message. Along the way, we also introduce some refinements. So first of all, we reduce the bandwidth usage with the standard packing technique of encrypting the polynomial instead of many scalars. So instead of encrypting the pixels one by one, we encrypt the polynomial that contains an entire image. So we start with a TLWE encryption of this polynomial, where PIs are the pixels, and we obtain this ciphertext, CT. Then in the first hidden layer, we prepare this polynomial WPOL, which contains the weights, WI, times x to the minus i. And now it's easy to see that if we multiply CT and WPOL, the constant term of the result will be exactly the encryption of the multisum that we want to compute. Also, we can dynamically change the message space of our encryption scheme. So we could keep the message space constant throughout the entire evaluation of the neural network. We just take the bound on all the values that we want to support, and we say this is our message space, and it will not change. But a better idea is to change the message space as we go on through the evaluation of the network, because the intuition is that we can take larger slices when we need less slices. So if we don't need a very large message space, we can take larger slices and accommodate for a larger error. And this allows us to take less aggressive parameters. How do we do this? Well, the details are in the paper, but a quick intuition that I can give is basically we change what we put inside the wheel. So every time we have to compute a bootstrapping, we have to set up this wheel that I showed you before, and by choosing what we put in the wheel, we can dynamically change the message space of the encryption scheme. So the bottom line is that we can start with any message space that we want at encryption time and then change it dynamically during the bootstrapping procedure. So let me show you an overview of the process. For example, let's say that we want to evaluate a discretized neural network with 30 neurons in the hidden layer. So the user starts from an image, it encrypts it as a polynomial, and obtains one TLWA cybertext. So here we are in the input layer of the neural network. Then sends this over to the server. The server multiplies whatever it receives by this W pole, so the polynomial with the weights, and obtains 30 TLWE cybertext, one for every neuron in the hidden layer. Then extracts the constant term and obtains 30 LWE cybertext corresponding to the multi sums. Then bootstraps into the sign, which means computing homomorphically deactivation function, obtains 30 LWE cybertext once again, so one per neuron, computes this weighted sum here, moving towards the output layer, obtains 10 LWE cybertext, one per neuron. Sends everything back to the user, the user decrypts, obtains 10 scores, which can be seen as the probabilities given from the network to the digit, to the image basically. That takes the argmax, which means selecting the most likely label according to the network, and if everything works correctly, the user should recover seven here. So the label should be corresponding to the image that was sent at the beginning. So here we have some numbers, so we implemented the solution, and here we have some experimental results. So we can focus, for example, on these lines. So on top we have the results on inputs in the clear. So this is the original neural network that we have. I would like to stress that we did not put too much effort into optimizing the machine learning model. We just wanted something that we could evaluate homomorphically and see how this framework performed. So we start from a neural network over the reals, which has this accuracy. Then we discretize it, and then we use the sign as an activation function. This induces a certain loss because we lose precision during the discretization, and also because we change the activation function to the sign. So this is the accuracy that we have in the clear, and when we evaluate it homomorphically, we have, for example, these values. These two values correspond to different implementations of the procedures. If you want the details, they are in the papers, basically two ways of implementing the bootstrapping procedure. As we can see, the accuracy drops a bit. This is because we set aggressive parameters in order to gain as much efficiency as we could, and the problem here is that sometimes the bootstrapping might return the wrong result because of too much noise. The thing here is that neural networks are supposed to be resilient to noise, and in fact we verify that even if sometimes we flip some bits here and there, the results are mostly correct. All right, so here we have some more numbers for our framework. So the first line refers to a discretized neural network with 30 hidden neurons and the second with 100 hidden neurons. So first of all, the size of the ciphertext is very limited, only eight kilobytes for an entire image. Here we have the accuracies, the encryption time, the evaluation time, and the decryption time, and I would like to stress that encryption times and decryption times do not depend on the shape of the network, so they are completely independent of the model that will be evaluated on the ciphertext, and also an important feature of this work is that the evaluation time scales only linearly with the number of neurons, so we don't have to change any of the other parameters, meaning that it just depends linearly on the number of the neurons or the number of the layers. So we have several open problems, so first of all, we would like to build better discretized neural networks. As I told you, we did not pay too much attention into refining the model, we just wanted to convert a normal neural network into a discretized neural network, and so we basically multiply by a constant and remove the decimal part. There might be better ways of doing this procedure, and also once we have obtained a discretized neural network, it might be possible to retrain, so to fine tune the integer weights that we have found in order to squeeze all the accuracy out of the model. Also, we would like to implement everything on GPU because within a layer, all the neurons can be processed completely independently of one another, so this is the perfect scenario for parallel computing and GPUs. And more importantly, we would like to apply this framework to more models, for example, convolutional neural networks and to more machine learning problems. But in order to do this, we have to be able to support more activation functions. So here we were very limited by the fact that we can only compute the sine function if we are able to compute homomorphically and efficiently the max function or the so-called rectified linear unit, which is a very popular activation function nowadays. This would open the way to homomorphic evaluation of very complex deep learning models. Okay, so this is it. Thank you for your attention.