 Okay, before starting my talk, I checked the schedule for both Chas and Crypto and it seems that this is the only talk about CoBase Cryptography. So I wonder how many of you have heard of CoBase Cryptography, can you raise your hand? Okay, some of you. Okay, let me, anyway, let me begin with the background of CoBase Crypto first. So CoBase Crypto is a branch of post-content crypto, and right now it seems that the situation is we have very confidence inspiring post-content signatures by using hash-based signatures. For encryption schemes, it seems that it's either CoBase or Lattice seem to be promising. So that's why we care about CoBase Crypto in the first place. So as the name implies, CoBase Crypto is based on coding theory, so let me give a small background about coding theory. So in this talk, whenever I'm talking about code, I mean linear subsets of f2 to the n. Okay, and since it's a linear subsets, you can, of course, define it using the kernel space of matrix, and in this case we call the matrix, parity check matrix, usually denoted by h. And one thing important in coding theory is I want to do, well, error correction, I want to do decoding. So let me define it first. So you can view decoding using two definitions. So first one is that you're given a vector c plus e, where c is a co-word and is an error vector of small weights, like weights up to t or something. And your job is to recover e or c, or equivalently c. And another view of decoding is that, so you're given a so-called syndrome, well, syndrome is a matrix vector product, so like h times e or h times c plus e. And your job is, well, similar, you want to recover e in this case. So we have two different views of decoding. And what's co-based in Christian about, so basically there are two types of in Christian schemes, either MAC-ALIS or need writer. And each of these correspond to one of the definitions I introduced in the previous slide. So for in the MAC-ALIS setting, the plan tax is going to be a co-word and your cyber tax is going to be a cyber tax plus an error. Of course, you can see that if you want to recover the plan tax, you have to do decoding. And in the need writer setting, it uses the second definition, so the cyber tax, sorry, so the plan tax is going to be an error vector. And the cyber tax is the syndrome corresponding to it. And then our job is to recover e, well, which is, again, decoding. And please notice that here I put a star on this penalty track matrix edge, because, oh, this is crypto scheme, so we want that the public key did not have the structure of the code, the underlying code. So we somehow like scramble the penalty track matrix a little bit, and then this matrix becomes our public key. Okay. For the receiver who has a structured penalty track matrix edge, he can do decoding, decoding and the decryption, and that's roughly how it works. Okay. And in general, well, code-based encryption are very simple in some sense, because it just either use Macalys or need a writer together with some codes. And whether the scheme is secure or not depends on which code you're using, okay. So of course, since code-based crypto has been invented, till now, many codes have been proposed to be used for code-based crypto, but unfortunately, many of them have been broken, but at least there are two noticeable codes that hasn't been broken. And that's binary Gopal code and QCND PC code. So here I have a timeline, so the binary Gopal code, well, the use of binary Gopal code in code-based crypto was proposed in 1978. You can see it's almost as old as public key crypto very long time ago. And for QCND PC code, it was just introduced in 2013, so you can see there's a very big gap between the confidence we have in these two codes. So now you may wonder, so, well, we already have this very confidence-inspiring original Macalys crypto system using binary Gopal code, then why we need to talk about QCND PC code. Well, the issue here is that the original binary Gopal code construction uses a very big public key around several hundreds of kilobytes, maybe. But the benefit of using QCND PC code is that the public key size reduced a lot, so now it becomes something like several kilobytes only. So that's why people propose QCND PC code. So, of course, because of the advantage QCND PC code brings, so after 2013 until now, many papers have been published to see how efficient can be on different platforms. And then this year I published this quick bit software written by myself, and the reason why I have this software is that all the previous papers didn't really take care of the constant time issue. So for example, this Pico Crypto 2014 paper, it provides some constant time operation like encryption and decryption, but it's on a platform that doesn't really have the cache. So you can see it's a problem if we really want to deploy this game. So I wrote QCND and QCND provides constant time key generation, encryption and decryption, so everything is constant time. And it works for many platforms, not just platforms without caches. So essentially, all reasonable 32 or 64-bit platforms can be used. Okay, now let's see some performance results. So here I have a table for the 80-bit pre-content security parameter. And today I'll focus on the decryption. So let's see the decryption column. So what's being shown here is basically that QCND is much faster than previous implementations of either constant time or non-constant time implementations. And this is because I'm doing computation in a very different way from the previous works. And you might have already noticed that I didn't show higher security parameters. And this is because of some annoying issue of failure rates. I couldn't achieve kind of good failure rates for higher security levels, so that's why I didn't show here. Well, at the end of this talk I'll come back to this issue again. So now let's take a look at what this QCNDPC code is. Well, it's actually not as scary as the name implies. So NDPC just means that the parity checks metric is going to be sparse. Okay? There are not going to be many non-zero entries in the parity checks metrics. And QC stands for quasi-cyclic. This basically means that your parity check metric is going to be concatenation of several square matrices. And each square matrix is cyclic. In the sense that each row is going to be the previous row shifted by one position to write. So here I have a simple example like here. So it's a parity check metric with two blocks, two square metric blocks. And one small thing you can see is that you can view it as row shifted by one position or also column shifted by one position doesn't really matter. And once we have this parity check metric then in order to view the whole system we need to be able to decrypt. And of course in order to do that we need to decode. So how can we decode QCNDPC code? The algorithm is called statistical decoding. Or maybe you hear more frequently the bit-flipping algorithm. And so this algorithm runs like many, many iterations. You repeat the iteration many times until success. And at the beginning what we do is to start with some vector C plus E. For example in the Macalase setting this is just your ciphertext. And then the first step in each iteration is that you compute the syndrome. Compute the syndrome HP, some syndrome here. Then the next step is that you try to see which of the entries in the syndrome are once. And then you pick the corresponding rows. And then you add them together to form a vector U. And please know that here the addition is performing the integer ring. And at this moment I claim that the higher the entries in U is, the higher the probability that the corresponding entry in V is in error. This is my claim. And then what we should do is kind of clear if you assume this is right because you see some higher larger entries here. And if it's larger then we assume that it's more likely to be in error. So what we do is that we flip the bits corresponding the bits in V i. If U i is large. And we cannot guarantee that one iteration will success so we have to do this several times. If some of you actually know something about coding theory you probably feel that this is a very weird decoder. But let me try to explain the rationale behind this. So the rationale is that, so for example here you see the parity zero. In this case the Ergson, the Ergson cannot retail that whether there are some entries corresponding to the ones here in V. In the error it can be that those positions are not, none of these positions are zero, right? Cannot retail. So in some sense this gives us no information at all, I mean for the Ergson. But for the case that the parity is one, actually this gives us some small piece of information because you know for sure that for example in the first row, you know for sure that either the first entry or the third or seventh or tenth entry in V. You know at least one of this is in error, okay? Of course there can be more but at least one is in error. So this gives us some information although it's not a lot. It seems that we don't know which one is in error. So we basically give each position like one score, okay? So that's the rationale behind this Ergson. So at this moment let's think about how to perform the Ergson in constant time. So one obvious thing is that of course we want to like have constant number of iterations. That's very obvious. Let's not discuss this here, we just think about like how to do each iteration in constant time. So I think the obvious way the straightforward way is to treat the whole matrix as a dance matrix and then just perform the dance matrix operation and then settling becomes easily constant time. And that's basically the previous works are doing. It's not as simple as this but basically like this. But you can see that this is not that satisfying, not that efficient because you see that the parity track matrix is very highly structured and it has only a few non-zero entries here. So what I did is very simple. I would like to make use of this varsity. And of course while I'm using this varsity I don't want to leak any information to the adversary. For example you cannot just pick the first entry, the third entry, seventh entry of V and then do the computation. Well this is not constant time, okay? And the idea behind QuickBit is actually also quite simple. So I basically consider everything as polynomial operations, okay? So for example the first step in each iteration is going to be syndrome computation. So now you see we have two blocks of the parity track matrix and we consider the first column here as a polynomial in this ring f2x, modulo x to the n minus 1, okay? So very, you can see easily that because we try to build this as a polynomial so the second column is going to be the first column multiplied by x and this is similar for the remaining columns also. Just each of them is the previous one multiplied by x, okay? And then when you do this multiplication this v here becomes a coefficient of this x, power of x. So eventually it becomes, well, two polynomial modifications in this ring, okay? That's how I do syndrome computation as polynomial operations, okay? And how can we really carry out this polynomial modifications? Well first of all you need to see that these are not generic polynomials. So here it's like we have a dense polynomial where you can represent it as an array of 32 or 64 bits words. And for f which is sparse, we represent it using the nonzero turns. So whenever fi is one we put it into this array, okay? So we have an array of indices. And then you just compute vf as like x to the i1 v, x to the i2 v and so on. So each x, y, v is simply a rotation of v, easy to see. And in order to carry out the summation you just need to do xor because we work in JEP2, okay? So everything seems okay. One remaining question is how can we do, like, is this rotation constant time? If you don't do it carefully then you wouldn't make constant time. So I still have to deal with this issue. Okay, so what Cooke does is we're actually using a very well-known technique. It's called the Dirichifter. So for example if we want to shift the vector v by s positions what we do is to view s i's in its binary expansion. s1, s2, s3 and so on, okay? So again it's not so hard to imagine what's going on. So we deal with the first bit here. So we first compute v shifted by l-1 positions. It's like assuming that s1 is 1, okay? So now we have two results. It corresponds to s original s because s1 is 0. 1 corresponds to s1 equals to 1. And then we just select one of them in constant time, okay? Depending on what s1 is, okay? This is not so hard to do. To deal with the remaining bits well it's just similarly also, okay? Just shift it by some values correspond to the second bits and then you pick in constant time these two values, okay? And so on. And eventually you will have v shifted by s, okay? And the next, sorry, the second step in iteration of the decoding ocean is going to be the computation of u, okay? So in this case we have similar situations. So you have syndrome here, parity check magic here. And we also do something similar like we consider each row here as a polynomial, but this time the x modulo x to the n minus 1. So we no longer work in the binary field, okay? And then you can see that if you still remember how we compute u, we actually pick the rows correspond to 1 here. And you can translate this into like this multiplied by this row, this multiplied by this row, this sum over this up together, okay? So again you see the same similar pattern, like s0 becomes the constant term, s1 becomes the coefficient of this x and so on. So eventually it becomes polynomial multiplications again. So like this time sf times nsg, okay? Yeah, so at this moment you can imagine that we can also use the constant time rotation, the barrel shifter technique to do this. But one thing that's different is that we no longer work in jet 2, we're working in integer ring. And that makes some small difference because we cannot use XOR again. So how do I do this? So the most straightforward approach is to like, so for example you want to store the entries of the vector u and you just store it in an array of b bytes maybe. So here I'm assuming that each entry fits into like maybe six bits, okay? But of course when you are writing your program, of course you don't have like six bits structure, you have maybe bytes, okay? So the most natural, most straightforward way is like doing like this. So you can imagine whenever we have a new XIS and want to add this into this bunches of counters, so maybe we need like b additions maybe. But we can actually do much better if we are doing in a bit slice way. So here you see that I'm essentially transposing the whole matrix here. Okay, so now you have six bit bits words here. Try to finish up. Okay, and then you're just doing the bit slice way, you mimic the addition circuit, and then eventually you see that for each addition here you actually need much, much less than one addition structure. Okay, so if you understand the idea then similarly you can compute the bits that correspond to whether we need to flee the bits or not in a bit slice way, okay? So this is my last slide. I want to talk about the future of the scheme actually. So there is actually some problem like the failure rate. We have some failure rate like 10 to the minus 8 to the 80. It doesn't sound very secure, but for all the implementation so far it also has similar problem. So what I really hope is that some people can really have some analysis of how low the decoding failure should be or do to make the scheme constant inspiring. Maybe like this paper. That's all my talk and everything has been put in this link. So if you are interested please check it out. Thanks.