 I don't know if you saw a chest this year. There was a competition. Many people could submit white box implementation of PAS and then people could also try to break it. And it was very impressive to see that most of the solutions could be broken just in a day. So I think it would be very interesting to have Pascal point of view for white box crypto. So please join me to welcome Pascal for this invited talk. Thank you. Thank you. You're making me blush. Thank you very much. First of all, to Yoshi and Toma for having me. It's an honor to address the Asian crypto audience. So what I'm going to talk about today is white box cryptography. I think it's a subject that has been largely overlooked in cryptography. Essentially a lot of people think that if you look at the background in the literature, you can see that some papers have been published on designs for white box implementations and all of them have been broken. So basically when you talk to people about white box crypto, they roll their eyes and they say this is a snake on security. This cannot exist. But actually I think there is a lot more to it. And I wanted to share today a few thoughts that I had recently about white box crypto that I entitled white box crypto menu. So first of all, what is white box crypto? It's not necessarily completely trivial to define what it is. So the concept at a very high level is pretty simple. And you can explain it to kids with a few words or kids who know how to program. So basically when you take a cryptographic program, so we don't care exactly about the functionality, it could be a block cipher, it could be a signature scheme, it could be whatever. You have a program here, which is generic in the sense that it takes as input the algorithm's input, like the plan text, if you would like to encrypt for instance, the key has a separate input and then you put let's say the cipher text. So this program here is very generic and this is exactly what you find in libraries. So when we talk about white box implementations, we're kind of changing that paradigm to our code, a given key into the program. So the program now is not generic anymore towards the key. The key is completely fixed. It's our code into the code. But the program here is going to take an input and output the correct output, but with that key embedded inside. So it seems very easy to explain and understand. But usually there is some infusion between what white box crypto is as opposed to what obfuscation is. And if you think about general proposed obfuscation, it's something that is way, way stronger actually than white box crypto. For obfuscation, actually we're talking about a transformation that takes as input a program like any program that puts another one obfuscated. And what we want to hide is basically everything about the program, like basically any property that the program may fulfill, we want to hide it. And so in such a way that actually if you're given the obfuscated program, there is no other way for you to, I mean, you don't have a choice. I mean, the only thing that you can do is basically to execute it as if it were a black box article. And there is nothing that you can extract from the code of obfuscation. And so there is, of course, this question, how realistic obfuscation is. I think the, I mean, the paradigm of obfuscation has been around for years. We still don't have, I mean, satisfactory solutions for that problem. And we also have known impossibility results. But if you actually, if we come back to the black box crypto, we can see that it can be seen as a very different in the sense that it's way more respected. So here we don't want to hide any general program. Basically, we want to consider programs that actually code a certain, a certain function F. And F is basically a perfect primitive. So not every program can be a graphic program, right? So we're, what we're looking at is a very restricted class of programs. And also we don't want to hide any property, but just some property. So for instance, you want to hide that it's AS that you're using, just want to protect the key. And so when you think about it and when you try to basically model the security of white box programs, there are some cases where, I mean, viewing the code as a black box oracle makes sense only in some contexts and nothing like that. And also we do not know of any possibility results that could occur in a very general way. And at this point, it's very unsatisfactory. I mean, because we don't know of any example of a construction, even for AS, for instance, that will be probably secured. So there's been proposals, but basically they've been broken. And we don't know if white box crypto even just exists at all. So it's a very mysterious subject. So you can try to approach this subject by two ways, the practical side and the theoretical side. And so the practical side, it's a very, I would say it's a very active area in the industry. And there are some people in the industry who claim to have secure solutions. And of course they do not reveal the design, meaning they would not tell you how their white box programs were designed and generated. This remains proprietary, but you can have access to the programs and you can try to reverse engineer them. And so we've tried to, with a white box contest, so it was this year's chess capture the flag event. It's a competition that we co-organized. It was about AS implementations. You can look it up. The URL is here. There were 94 challenge implementations that were posted on the submission system. They all have been broken. There was a considerable effort invested in there. 200 people were working day and night. You could see that some implementations that were submitted were broken like half an hour after. And the longest implementation has survived is 28 days, basically. And it was eventually broken by DuMuay, a PhD student, crypto experts. And then the adoring process reached 406 strawberries, so there was a system of points with strawberry points and banana points, but whatever. So if you were attending chess, probably you know more about what happened during the competition. There was a presentation on their own session. So the conclusion of that is that everything that has been submitted has been broken. And in the competition, you didn't have to reveal exactly how your challenging implementation was made. It's just you just had to provide it without exactly telling how it was designed. And also the competition was anonymous to invite people from the industry to propose their own challenge competitions. But the conclusion is that everything was broken and it's long here to us. Just see if in practice we have, they are out there solutions, I mean programs that could survive longer than 28 days. And on the theoretical side, it's still very unclear because we can define white box crypto in theory. We can define security notions, right? But the big question is, I mean, there is the question of existence. Can we actually build the construction, even if it's theoretical, even if it has nothing to do with practice, can we have even just a single construction of white box crypto out of indisputability of education, for instance, or can we do it the other way around? We don't know that. Apparently, there is no obvious connection between the two. So we don't even know if it's possible to achieve it. And to me, it's one of the biggest open questions. So I wanted to, usually when we talk about white box crypto, it's about block ciphers. So we're going to implement AES typically. And here I wanted to actually focus on something else, which is public key signatures. And I'm going to talk about public encryption after that. So how does it work? So of course, when we talk about the security of white box implementations, we're not talking about breaking a particular program, but breaking a distribution of programs. So the distribution, the programs about how it's built by a code generator, which includes the design and all the intricacies of how the program was generated. So we call that a white box compiler. So we have basically a white box compiler. But what is it exactly? Well, you generate a pair of keys for your signature scheme. And you just take this deciding key and get just to the compiler to go with some randomness. And the public key possibly you press the button, and then the code generator is going to give you a program that unbates that key and that signs anything. And so again, I would like to stress the difference, the big difference between a function and Oracle and a program. And so a function is an algorithmic description of a medical object. It's in specifications, if you will. And Oracle, Oracle's are what we use in global security. So Oracle's are basically a process running somewhere in the sky. It can be a schedule, that is, it remembers over time all the queries that you make to it. You can have only remote access, you don't know anything about how it's done inside, you just have input output access to it. And it can use some private randomness, if it wants. So it's like you're interacting with something that is on the other side of the planet, and you have no way of knowing what's going on in that process. All internal variables and so on. And the program is completely different. So the program is basically a word in the programming language. It's a string. And so it is completely stateless, because you can always reboot it. You can always copy it, transfer it to somebody else. You can observe each and every internal variable. You can modify each and every internal variable. And of course, assuming that the program is allowed to make some system calls, you can just capture the system calls and simulate the system as you want and reply with anything of your choice. So a program is something that is in an environment that is, I mean, the program is an object that is dreaming about being in an Oracle. But it cannot be an Oracle because basically it's memory-less. So you can just erase it, start it from scratch, and repeat. And so it's completely different. So if you will, this would be a Twitter smart card. You cannot open it if you assume that it's temporary system. And here you have executable software. And basically you can, as an attacker, you have much more power here than here on this thing. So first of all, so there's a question of whether it makes sense or not, actually, to try to have a white box implementation of a signature scheme. So basically you would be given a program that unbays, that artcodes the signing key inside the program. And you give a message. It returns the signature on that message. And so if you're given a program, do you actually, I mean, what is the meaning of trying to extract the signing key from it? Because you're given the functionality, right? So there's a response to that, which is available, actually. So typically in a white box implementation, you would be given an implementation that is password-protected. The same way a smart card that signs an email is being protected. So in the case that when the program is password-protected, it means at some point when it was generated, a password was given to the compiler. And if you give the correct password together with the message, it would get the signature. If you give something different, it would give you some object, something that does not validate as a signature on a message. And so in that case, if you're not given the password, usually the password is there to protect against code lifting, which is the attack where the attacker remotely accesses your device, like your mobile phone, for instance, whether the program is running, steals it, and tries to recover the key or use the functionality. So in this case, actually extracting the key makes sense. Also, you can have what is called compressible white box implementations, where the code is voluntarily made very big, very, very large, or artificially consumes a lot of RAM, for instance. And you would like to actually extract the key so that you can perform the signature operation much quicker or with less resources. Also, the code could be traceable in the sense that even if the same signing key is used, we could have several programs where basically if, so the different programs are given to users, and if at some point you find one program over the internet, you can incriminate one of these users. So the code is traceable and you cannot remove the watermark. And also, you could have a restriction of messages. For instance, the program could just agree to sign messages that are in a certain subspace or verify a certain format. So here, I'm just going to assume that there is no password. There is no restriction on the message space. But we're typically in the scenario where we're given an incompressible or traceable implementation. Right. So now it's short signatures. So in this you're all familiar with short signatures. We use basically a group of other key. We're given a generator of this group. And then we use a hash function basically to have the message together with a random element of the group. And then we compute the other part of the signature. And we have this verification equation here, which involves the hash function and the two parts of the signature as NC. So it's known that short signatures are basically distention infortable in the random model. We know that this is not true in the standard model. But we know that it's probably in the standard model probably a difficult problem to forge. And so this is a very basic signature scheme. And I'm trying to look at how we can actually implement it with a white box program and what happens when we want to do that. So basically this is the Schnorr signing procedure again. So we take this file on K, exponent it so G to the K, we hash that together with the message, get the commitment part of the signature, multiply with X, the prime key, and then we subtract that from K and we get this part of the signature. So basically an implementation of Schnorr signature should resemble something like that, right? Except that everything here in this blue perimeter, we don't know exactly how it's implemented. It does functionally something equivalent to that, but it could be obfuscated, we don't know exactly what it does. So, but so wait a minute. So I said at some point earlier that if we're given a program, we can simulate all the system calls, right? And here we're making system calls, right? I mean we're calling a random number generator to have our K value here. So actually what this means is that if we take the textbook version of Schnorr and if the program really makes a system call to have this K here, we can just intercept that and put whatever value of our choice. And then if we put any value, we can recover truly the private key by just looking at a single signature, right? So it means the the textbook version Schnorr signature scheme cannot be implemented as is, right? We need to already we need to make modifications in order to ensure that it's realistic to try to have a world-class implementation of it. So we can try to have to use a pseudo-random generator inside the program. Obviously if it depends only on a system call, again we can either, I mean if we know the parent you can know this value, we need only one equation. If we don't know it but we can call it on different messages, there's two equations and we can recover the key as well. And so depending, I mean the program could could create K as a using a PRF on M, but it's not good either because you can if you can extract this function then your seven attack applies again. And putting so the only solution that seems possible is that the program actually doesn't doesn't make any call to an external source but computes the random that it means as a pseudo-random both in the message and in the private key. And in that case we can assume that yeah it's a I mean a way about secure implementation could exist. Now way boxed with Romania, what do I mean by that? So way boxed with Romania is the place where the the program that signs given the secret key is safe and cozy. So it's safe and cozy in the sense that we have somehow, we have this security game, so I'm supposed I'm having an adversary here, I generate a random key pair, I give to the adversary this the random public key as well as a program that unbates the secret key. And I ask the adversary to send me back the secret key. And if we, I mean if we're in Romania we believe that this is hard to do, right? So what we mean by Romania is that if we define this game here played between the teenager and the adversary at some point we can claim that there is no efficient adversary that successfully does that. Right? So we don't know exactly why but we assume at some point maybe in the future, maybe in two years, maybe in five years, maybe never, maybe we assume that we can prove we have a security proof and we can prove that this this attacker essentially does not exist. I mean it cannot be efficient and have a non-legigible, perpetual success. So we have it. So now it means once again using the techniques of global security we created something, an implementation that is probably secure, right? We're in Romania so we solved the problem. So now it means essentially that we can we can distribute in the wide programs that are safe because you cannot extract secret keys from them. So it means basically that we solve the problem which is enormous in this industry which is having tamper proof software. So we can get rid of all this immense zoo of secure objects of trusted hardware and we can essentially replace them with software and because with some magical way we found a way to prove that this is hard to do we essentially have the same security. So Pyptomania assumes that, right? It's an assumption. I'm not telling you how to prove that or how to design how to design the white box compiler so that you can prove that this is hard. I'm just assuming that we have a compiler for which we have a proof that this is hard. So before I go on we need the classical security notions for signature schemes. So basically, so there are plenty of different security notions but there are at least these nine ones here. So the the weakest one is actually unbreakability under a key only attack meaning you're given the public key and you're asked to find the super key. And then you have unbreakability under a known message attack you're given message signature errors and here at CMA you're given an article that signs whatever whatever you want. And we have this column here where all the adversaries here are trying to recover the key. We have these these adversaries that try to find this signature on a randomly chosen message and we have existential forgeries where the adversary can choose the message. So actually if we look at Schnauz signatures we know that this is hard basically this is a discretized problem but we also know that these two notions are actually fulfilled under the one more discretized assumption. But actually when we talk about these notions they are good enough to capture what we mean by white box victory implementation. So we need to introduce something more basically known program attacks. So in a known program attack it's actually what I've said before now we're not providing the adversary with any signing article which is providing it with a program this time. And so this is basically unbreakability under a known program attack. I give one program I expect to see the signing key out of that adversary. And the first thing that you can see is that if you define that way unbreakability under a known program attack there is an objus connection with unbreakability under a chosen message attack. Why? Because essentially if as an attacker you can extract the secret key from the public key by just being given an oracle access to the signing function. Well in particular if you're given a program that signs you can just do the same thing. Using the program in a black box fashion the same way you would make a call to the oracle. I'm not saying anything fancy here it's very simple right? So we have actually this reduction that plays the role of the ubiquity PA attacker. You give a program and then the reduction just uses that as an oracle to simulate the signing oracle to the adversary. So we can convert this adversary the ubiquity CNN into a ubiquity PA. But now we come to the point where we need to make more precise what we mean by white box cryptomania. And white box cryptomania means that essentially even if even if this notion here is actually strictly weaker than this one in cryptomania which means we're losing something from switching from an oracle access to the signing function as opposed to a programative access if you will to the signing function. We don't want to lose anything either. In cryptomania we assume that we're not losing anything meaning that basically these two notions are equivalent. So in that world it has to mean that there is a reduction in the other direction. That there is also a security reduction that takes a ubiquity PA attacker and that converts it into a ubiquity CNA attacker. So there must be a reduction that does exactly that. You give it a public key access to a signing oracle and it converts an adversary that given a public key and a program returns the secret key and the reduction completes the game here returns the secret key and has a certain success probability depending on this we said probability of the adversary. So we assume that we must have some kind of a reduction of that nature. That's what white box cryptomania here means. So now we see that we can actually build a meta reduction. So if we forget about the white part here what we have here is exactly the same thing as this slide before we assume that there exists that reduction that that takes the ubiquity kp adversary and converts it into a ubiquity CNA adversary. So we assume that this reduction exists and we can see that we can build a meta reduction that basically simulates the adversary here towards the reduction simulates the challenger here and will solve the following problem. So the meta reduction here is given a random public key is given access to a signing oracle and will output a program and the way it works is essentially I mean it's very easy to see the meta reduction just launches R forwards the public key to R every time hard is making an oracle call to the signing function it will just forward the query to the signing oracle and same thing for the output and that's on point M will receive the input for the ubiquity kp adversary which is basically a public key and a program. So what happens now is that M the meta reduction will just stop and will output this program there. Does that make sense? Yeah everybody seems to be staring at the screen. Okay so if you take so there are a few details in there but if you take my word for it these words. So basically what it means is that we now have a way this meta reduction if we assume work box to come in here it means we also have a meta reduction that transforms access to a signing oracle into a natural program that does the same thing and that seems a very strong very strong statement. So there's a small problem with what I said which is I assumed that the public key that was given here was the same as the one that we forwarded to the reduction and in practice it could be different. I mean the reduction could apply some transformation on this public key and provide a program that actually signs with respect to a different public key and so the meta reduction would not work because we're given a public key access to a signing oracle for the signing key matching that public key and essentially the reduction we don't know how it works but we just launch it and give us a different public key and different program. So what can we do there? So we would return that program but we don't know exactly how it relates to solving any problem with respect to the initial public key. So this is where we need algebraic programs so we're going to assume that actually the reduction here is algebraic and algebraic is a very easy notion it just means that an algorithm that is algebraic over a group at some point if this algorithm outputs a new group element it must have been built from previously seen group elements with exponents that can be reconstructed given the code the code the precise code of the algorithm so you can look at it like that. So at some point so we have an algorithm at some point you run it and as input there are a number of group elements right but if at some point this program outputs a group element then it means that given the code that is between here and there there is a way by just by just using that code to extract the exponents here that there is each was built by just composing the previously seen elements and there is a way by just looking at the code to reconstruct exactly the exponents that were there so actually there are all the essentially all the security reductions that we know are algebraic and it's a notion that is very simple I mean it's very simple to to describe but it's actually more powerful than the generic model okay. So now we come back to our problem with the reduction which is we're given a significant