 Thank you for the introduction. Yeah, my name is Leon, and I'll be presenting this work, this joint work with Andreas Hülsing, Tanja Lange, and Duval Jero. So this work is based on lattice-based cryptography, which is a very promising, post-quantum-secure alternative to the systems we use today. The key sizes are small enough to be practical, and also the ciphertexts and the signatures are of a reasonable size. Now, there's a lot of active research on theoretical and practical security, but also more and more implementations are available. So what about security of these implementations? Now, this work tries to answer this question by presenting the first side channel attack on the lattice-based signature scheme. It exploits information leakage from the discrete Gaussian sampler via cache memory. There are more schemes that use discrete Gaussians, but our attack target is BLIS, which is an efficient, lattice-based signature scheme, and there are more implementations available of BLIS. For instance, BLIS is also included in Strongspan, which is a library for an IPsec-based VPN. We didn't attack Strongspan, but we attacked the research-oriented implementations made available by the authors. I will briefly introduce cache timing attack, Duval is going to give a talk after this, and he knows a lot more about this, so I will just keep it to the basics. So cache memory is a small, fast bank of memory, which is shared among all threads, and it basically tries to bridge the gap between processor speed, which is quite fast, and memory speed, which is quite slow. Now, data is typically stored in cache lines, and here you see three threads that share the L1 cache, and all these threads, they compete for the same resources, so they compete for the same memory lines. So data is typically stored in cache lines, and these are typically 46 bytes big. Now, what an attacker can do is he can fill specific cache lines of a shared memory with his own data, but he can also flush it, it depends on what he wants to do, but for this case, just assume he fills specific cache lines with his data. So in this example, the first top two cache lines are filled with attacker data. Now, the attacker will wait, and he will wait for the victim to perform cryptographic operations, and he will notice that some part of the cache has been used by the victim in this cryptographic operation. So in this case, you now see that the first cache line is now containing victim data, and from it, the attacker knows that his data has been kicked out, so he learns the cache line of the data that has been used by the victim, and from it, he can derive the actual part of the data that the victim has been used for the cryptographic operation. So I will briefly introduce Bliss. It's a lattice-based signature scheme, so you need quite a bit of lattice theory for this, but I will just keep it to the basics to understand our attack. Now, Bliss stands for bimodal lattice signature scheme, and it's introduced at Crypto 2013 by Leo Ducals and his co-alters, and as I said, there are some implementations available, and they all use n-through lattices, and n-through lattices are basically polynomials in this ring RQ, where it's basically a polynomial ring where the polynomials are reduced by x to the power n plus one, and the degree of, the degree n is a power of two, and each coefficient of these polynomials are reduced mod Q, where Q is a prime. Now, if you want to add two polynomials, it's basically just adding the coefficients, but if you want to multiply polynomials, you typically get a polynomial with a higher degree, so you also have to reduce by this polynomial. So there are two ways of doing this. You can do this via, for instance, the number theoretic transform, but I find it more easy and also for this attack to just assume, to just think about it as a vector matrix multiplication. Now, this boldface, bold letter F and G, these are the coefficient vectors of the polynomials, and these capital F and G, these are the matrices whose columns are the rotations of these coefficient vectors, and they get an opposite sign because of the plus one over here. So basically, if we want to multiply polynomials, it's just a vector matrix multiplication. So the secret key in BLIS consists of two vectors, or two polynomials, F and two G plus one, and both F and G are sparse, and they have their entries typically in plus or minus one and zero, so they have a lot of zeros and some of them are plus one, some of them are minus one. The public key A also consists of two polynomials and they satisfy the equation you see here. So A1 times S1 plus A2 times S2 is Q mod two Q. How this is typically computed is to take a polynomial AQ, which is simply two G plus one divided by F, and you restart if F is not convertible. So it's basically the second secret key factor divided by, or secret key polynomial divided by the first secret key polynomial. And then you can take A to be two times this division polynomial and a constant Q minus two. Now an attacker can validate correctness for any candidate he has for the key simply by plugging it in into this equation, and from it he can simply derive the second key, and if they are both small, they will suffice for the secret key. Now what's also important here is that both minus S and S, they are good enough to sign for a secret key. So they can be used as a secret key. I will give a simplified version of the bliss algorithm. There are a lot more steps involved, but this is basically what's being used in the attack. So the first step is to sample discrete Gaussian factor. This is a noise factor, and it has to have this specific distribution, and I'll come back to that later, but this is the first step in the algorithm, and you use this noise factor together with public key A to construct the factor U. From it, you derive a challenge factor C. C is the output of a hash function, which outputs sparse binary vectors. So here you see the sparsity is a parameter kappa, and kappa is usually much smaller than the dimension N. Then you pick a random bit, and you set the first signature factor to be the sum of the noise factor plus the secret key time search challenge, and you reduce it all mod two q. And then this is the signature for a message mu. Now note that I'm using a subscript N, so there's also a subscript two, but the second z two signature factor, it loses information of the secret key, so we will not use it. And also, like I said previously, you only need the first secret key factor anyway. Now note that the secret key is sparse, and the challenge is also sparse and binary. So we don't need to do the modulo two q. This is just simply an equation that holds over the integers. And here c is the rotation, like the n through let is basically of the challenge factor. And so what we get eventually is an equation that's hidden in a signature over the integers, which is the first signature part is a sum of, well, the noise factor and the secret key times this matrix, which we construct from the challenge factor. And the unknowns to the attacker are the noise factor, this bit B, and the secret key, of course. So why do we use a discrete Gaussian distribution? Well, this is used to achieve both provable security and the smallest signature size possible. It's actually not straightforward to sample from this distribution in practice. So it really looks like the normal Gaussian distribution, but it's only defined over the integers. And it's not simply just sampling a Gaussian, sample a normal Gaussian and then round to the integer. You get a slightly different distribution, which makes the proofs not hold anymore. So this makes it a good target for a side channel attack because, well, it's also not known yet. If this is doable in constant time. So, but first let ourselves ask the question, how do we use this additional knowledge of this noise factor to find the secret key S? So note that I dropped the subscript one, but think about it, there's always a subscript one, but it's hidden. So I'll be discussing three attack scenarios. And the first one is just to give an example of what can you do. And the second and the third one, they are at X scenarios we have actually implemented. So again, we have this signature equation that holds over the integers. And suppose we can determine any Y completely from a side channel attack. So every noise coefficients we just know it. Well, then we only need one signature and we can solve the following equation just by a linear solver. And we still don't know bit B, but like I said before, minus S and S are both valid as a secret key. So this is doable. But it might be rather unlikely to have so much power in a side channel attack. I don't think the Gaussian samplers are that bad. But yeah, let's move on to the second scenario. So I've zoomed in a bit because we don't get everything anymore, but suppose we get some of these noise values. So we might have sometimes we get Y zero, but sometimes we get Y and minus one. We sometimes get some of these coefficients. Now, we assume this set is small and this is also what we've seen in practice. Typically you get either none of these noise coefficients or you get one. Now, if we zoom in, then we still have, we have this equation, so we just have the inner product of the challenge. Oh yeah, these are the rotations of the challenge factor. It's not that we have N challenge factors, these are all these rotations we had before. So one thing we can do is that if we have, if we know the noise coefficient, we save the corresponding row of the challenge or the corresponding rotation and we save it together with C, with the corresponding signature factor, the signature coefficient. You basically save everything you know about this equation. Now, we can acquire enough of these factors from multiple signatures and we form the following set of equations. Now, as you see here, this bit B is randomized for each signature. Like I said before, you would have to have N of these signatures to get to this set of equations. So you cannot use a linear solver, you cannot guess this bit B. We assume this bit B is well protected. So unfortunately, all bits B are unknown, but we can apply a small trick that if we know the noise coefficient, we can be selective and ensure that the noise coefficient is equal to our signature coefficient, which is something we can simply check. And then we save our corresponding challenge rotation because we would eliminate bit B. So here you see again this equation, but C i now equals Y i. So we get an inner product, which is equal to zero. This happens with quite a high probability because S is sparse and the challenge is also sparse. So that's something we can do and then we can acquire enough of these vectors from multiple signatures. And then we know that the secret vector S times the challenges we collected, which we form a matrix of, this is equal to the all zero vector. So with very high probability, the secret vector S will be the only vector in the integer left kernel of the matrix we just formed. So this is something we can solve for. So now let us go one step further. We don't have specific values anymore for noise coefficient, but we have tuples. So for example, in bliss one, we got the situation that we know it's either a seven or an eight. So one of these values is true. However, if we look at the outcome, then in 90% of the time, it was a seven that has been sampled and not an eight. So there's like this bias. It's not 50, 50% probability in these tuples. There's a bias towards one of them. So with very high probability, one of them is likely to be the sampled value. So we can apply the same method as previous. If we know that it's in this tuple, we save and we ensure that the corresponding signature coefficients equal to the most likely sampled value. Then we save the corresponding challenge rotation. Now we do not get the all zero vector anymore as an outcome of this multiplication, but we know it's small. We didn't make that many errors. We had this bias towards the outcome. Now we can use this magical LLL algorithm to compute small vectors and research for S in the unitary transformation matrix. So S pops out some small vectors and a unitary transformation matrix. We simply just brute force search this unitary transformation matrix. And we can always verify correctness with the public key if we found it. So this is also something we can do. I will now briefly give a feeling of how we attack bliss, which uses a CDT sampling. So just to wrap up the whole story, in bliss we need to sample this noise factor, which has this ugly or this weird distribution which is hard to sample from. And we just discussed three attack scenarios using additional knowledge of this noise factor. Now as a result, we implemented cash attacks on two discrete Gaussian samplers. They're called the CDT sampling and the Bernoulli-based sampling. Both of these samplers use table lookups, so they are vulnerable for cash attacks. So how does CDT sampling work? Well, you basically save the cumulative distribution function in a table T and at sampling time, you generate a random value between zero and one and you simply perform binary search to find the sample you want. So this looks like a straight line or like, but if you look very close, it saw dots because also this function is only defined over the integer. So you just look at, okay, where's this random value located? Now some speedups are used in practice. For instance, you only use the non-negative values and you just pick a random sign at the end because this discrete Gaussian distribution is centered around zero. You can also use an additional table I with intervals because this binary search can be quite long. You have to sample a whole vector of these discrete Gaussians and well, if you have a large table, then even this binary search can be quite long. So you use an additional table with intervals. So you basically first pick an interval and in this interval you perform a binary search and this greatly speeds up the whole algorithm. Now, note that there are two tables and there are a lot of table lookups. So we can look at specific patterns that allows us to get a lot of information of a sampled noise coefficient. We found two types of cache weaknesses where you get a really, really good precision of the sampled value. So the first type, the first weakness is called the intersection weakness and you basically use the knowledge of the access in I. So the interval that has been sampled and the binary search in the normal table and you just combine this knowledge. There's also a last jump weakness. You basically track the binary search using multiple accesses in this big table T because you have to have more than one access you're doing a binary search. You sometimes get that you're doing a binary search in one cache line and then at the end it jumps to a second cache line and then you get real precise information. So what you can do is you simply brute force search for all these cache weaknesses using for tables T and I from a specific parameter set we assume the tables are either from a specific, the tables are constructed in a specific way, we assume we know the public parameter so this is something you can do and then you're just being selective again. You only pick those weaknesses which allows you to satisfy a scenario three. Note that we get tuples again, we don't get a specific value because in the last, you're doing a last table lookup and afterwards you return one of two values because you're doing a binary search and either it's bigger or smaller than your random value so you return one of two values so this is the best precision you can get. And also you can get this high probability this bias outcome because it's, yeah that's just because you were talking about a table with a probability with probabilities. So this is something that can happen. So we performed some experiments. We first did a perfect side channel experiment and here you see five different lines for five different bliss parameter sets. The authors made them available and we just used them. Note that you get a success probability so if we just collect the number of, if you just collect just enough signatures to form this lattice where which is input to the LLL, then yeah, the success probability can be quite low but if we collect a little more and simply just randomize the whole process we pick a bunch of factors which is the lattice basis and we just perform LLL then it quickly grows to 90% success probability. We also did a proof of concept attack using the flush and reload technique. So here you see a visualization of the last jump weakness. We want two cash lines to be hit. So here you see one of them is hit but we also want the other ones. We want like this last jump to the second cash line. So if both of them are hit, then we know we get the information we want and experiments with bliss one which is 428 bits of security. They succeeded 90% of the time. For more details you can look at the full paper. We got, we have a similar attack method and achieved similar results for the Bernoulli Bay sampling method. We also did experiments with that and in our full paper which we updated yesterday, there's also analysis of the cash weaknesses of the Knutiao sampler and the discrete cigarette samplers. So yeah, for more details, check the full paper. Thank you for your attention. Thank you.