 Hello, my name is Mark Zandri and today I will be talking about white box trader tracing. Let us first recall the goal of trader tracing. Here we have a content distributor who is broadcasting encrypted messages to a large set of recipients, whom each have their own personalized secret key. We are concerned that an unscrupulous user called a trader may leak their key to an unauthorized user, thereby allowing the unauthorized user to read the encrypted messages. There is no way to really prevent a user from becoming a trader. However, if the content distributor comes across the leaked key, we want the content distributor to be able to identify the user it came from. Then the content distributor can take remedial actions, such as prosecuting the trader or at least revoking the trader's credentials. In this way, trader tracing serves to deter piracy. I will now discuss some of the main features one may want out of a trader tracing system. First and foremost, the trader may not simply output their key. Instead, they may embed the key in a tamper resistant hardware device or potentially an obfuscated decoder program in an attempt to hide their identity from the tracer. Another important feature is collusion of resistance, which means that we can identify a trader even if multiple traders collude in producing the pirate key or decoder. We would naturally want to use minimal computational assumptions in achieving our goal. For example, we would hope to rely just on public key encryption as opposed to hard algebraic tools. Next, we would ideally be able to trace decoders that don't work all the time or which don't perfectly recover the message. For example, to have a tracing in a movie streaming scenario, you could imagine a decoder outputting a low-quality version of the movie instead of the original high-quality version. Finally, we would ideally have public tracing, which allows anyone, not just the content distributor, to trace. This provides for maximal deterrence since not anyone, including the unauthorized user, will immediately learn the trader's identity from any decoder program or device. There are also advantages of public tracing for prosecuting traders, but we will not discuss them. When we refer to private tracing throughout this talk, as opposed to public tracing, will mean that only the content distributor can trace, public tracing means anyone can trace. One feature that has largely been taken for granted in the works is the tracing model. You can imagine the decoder to be an actual hardware device, which may be difficult to break open or inspect. Even with a software decoder, it may be difficult to analyze the program because program code is notoriously difficult to analyze. As such, the vast majority of the trader tracing literature works in a black box tracing model, where the tracer only makes queries to the decoder and sees the responses without having to know how the tracer actually works. This black box tracing model is so ingrained in the literature that it is written explicitly in the syntax essentially all prior trader tracing definitions. There's basically one exception to the black box model, and that is that some early schemes did in fact look inside the decoder itself. But all such schemes required that the trader actually output a valid key for the system as opposed to a general decoder program. And importantly, in all prior schemes that did not use black box tracing, you could easily defeat tracing by outputting a general decoder program rather than a key. So all of the prior tracing that works for general decoder programs all works in the black box model. Now there is a trivial trader tracing solution, which gives each user their own ciphertext component and decryption key. This scheme does really well according to all the features we've discussed so far. It's collusion resistant, requires minimal assumptions, allows for tracing arbitrary imperfect decoders, gets public tracing as well as black box tracing. The main problem is that the ciphertext as well as the public keys are quite large and grow linearly with a number of users and this can become very impractical even for a modest number of users. The bulk of the trader tracing literature therefore has focused on producing schemes with shorter parameters in particular shorter ciphertext. There have been a number of improvements over this trivial scheme, often using algebraic tools to compress terms to the point where we now have asymptotically optimal schemes under certain assumptions such as learning of errors. This talk will instead focus on a limitation of prior work, which has nothing to do with parameter sizes. To start, I will crawl a previous work from 2016 with my co-authors Rio Nishimaki and Daniel Wicks. Here we proposed using public tracing plus setting each user's identity to contain super sensitive information, say a bank account number or something like that. This way, the moment a trader hands a decoder to an unauthorized user, that user can trace using the public tracing algorithm and then learn the sensitive information of the trader. The result is that the only way a trader can distribute their key is to also distribute their sensitive information. Therefore such a mechanism would be a very good deterrence, even in a situation where the content distributor may never come across the prior decoder. The problem is that if the identity embedded in the secret key is very sensitive, the honest user would clearly want to keep it secret if they are being honest. In this prior work, we've proposed keeping the identity hidden from the content distributor by using a multi-party computation protocol to generate the user's secret key. This indeed based on the guarantees of multi-party computation will keep the identity hidden during the setup phase. In this work, we take things a step further and consider keeping the identity hidden even after setup and moreover keeping the identity hidden from potentially other users of the system besides the content distributor. Consider for example an encrypted group chat application where the users are encrypting their message using a trader tracing scheme with embedded sensitive identities. Now consider a malicious user who can send a message to the chat and see how an honest user responds. In essence, they can mount a chosen ciphertext or CCA attack against the honest user and can potentially try to use such an attack to learn the sensitive information of that user. Ensuring privacy of the honest users in this attack scenario will therefore be the main focus of this talk. Our first result is an impossibility showing that with CCA attacks and black box tracing, anyone who is capable of tracing can break privacy of the honest users. In the case of private tracing, we're only the content distributor can trace. This means that the content distributor can use CCA queries against an honest user to learn their identity. In the case of public tracing, which is really what you'd want for maximal deterrence, this means that anyone can trace and therefore anyone can use CCA queries to learn the identity of honest users. The proof of this impossibility is simple. Anyone who can trace with a black box tracing algorithm can just mount the tracing algorithm over the network. The tracing guarantee means that the adversary learns the user's identity from the user's answers to the CCA queries. We note that tracing usually only needs to know whether or not a user decrypts, so we don't even need the full power of CCA attacks. We just need to know whether a user decrypts a given psychotext or not, in order to mount the attack. In the public tracing setting, what this means is even an outside user who doesn't have any secret keys at all can potentially mount the attack and all they have to do is see whether a user responds to their messages, but they don't actually need to see the response itself and they can nevertheless mount the attack. For our first positive result, we give a feasibility showing how to obtain public tracing and simultaneously preserve user privacy even under these chosen psychotext attacks. In light of our impossibility result, this necessarily requires the use of white box tracing, meaning that the tracing algorithm actually has to inspect the actual program code of a decoder program. It cannot just trace by making queries to the program. In terms of building blocks, our scheme requires functional encryption and non-interactive zero-knowledge proofs or visits. Different assumptions can be used to instantiate the functional encryption scheme, resulting in different parameter size trade-offs. In particular, if we don't care about parameter sizes, we can use generic public key encryption together with NISIX to achieve our scheme. If we want optimal parameters, we can use distinguishability obfuscation. I will now discuss the high-level idea behind our proof of this theorem. The proof idea is to use something called an unobfuscatable program. These were defined by Barakadal, and they were originally used in the context of proving an impossibility for strong forms of obfuscation, and they can be constructed from minimal computational assumptions. What is an obfuscatable program? An unobfuscatable program are programs with two properties. The first is that they are non-learnable. This means that there is no efficient way to recover the program just given black box access to its functionality. So if you have an adversary that can only make queries to the program, it will not be able to produce any code for the program. The second property is reverse engineerability, which says that given any code that has the same functionality as the original program, even if it's been obfuscated or adversarily manipulated in some way, given any code that can produce the same function as the original program, there is an automated process to actually recover the original program code in its entirety. So superficially, unobfuscatable functions seem to be very analogous to what we need, which is the ability to do something with code, namely trace, that you can't do given black box access. But perhaps it's not immediately clear how to actually turn these programs into a trader tracing scheme with decryption, functionality, collusion resistance, et cetera. So here's a fail attempt to try to build a private trader tracing scheme from an unobfuscatable function. So we'll assume any trader tracing scheme, say from prior literature, which may have black box tracing. The idea is to set the identity of a user to be the code of some unobfuscatable program. This unobfuscatable program may have the desired sensitive information of that user embedded, for example, into the comments of the program code. This way, if you can recover the unobfuscatable program code, including comments, you would learn the sensitive information. And so the hope is that by setting the identity to be this program code, you can then trace and recover the sensitive information. The problem, of course, is that prior works use black box tracing. So what an adversary can do is a remote adversary can do is that they can simply trace over the network to recover the unobfuscatable program. But they actually recover the program code by the guarantees of the trader tracing scheme. And once they have the program code, they themselves can also see the comments. Even if you somehow obfuscated the code, you could still reverse engineer it to learn the original program code with the comments embedded and therefore learn the sensitive information. OK, so what our solution is to do is to embed the program more carefully in a trader tracing system. And we do this using a new object we call function embedded trader tracing. Here, we embed a function into a program as opposed to an identity and augment the tracing procedure to additionally take as an input an input to the function. The tracing algorithm won't learn the function and instead will just allow for evaluating the function on the given input. Importantly, the trader tracing algorithm never learns any code for the function. And this is actually a security requirement that the tracer cannot learn the code for the function just given black box access. All they can do is evaluate it. We show how to construct such an object from functional encryption in Nizzix. Basically, the prior work already uses functional encryption to build trader tracing. And we augment these works with an evaluation functionality. Nizzix come in to guarantee that the tracer can't learn the function F itself. And I'll elaborate on this a little more in a couple slides. All right, so once we have a function embedded trader tracing scheme, we then set the function to be an unoffice scalable program. And the result is a white box traceable scheme with public tracing and user privacy. The idea for tracing is that any code for decryption can be turned into code for evaluating the embedded function F using the tracing algorithm. And then once you have code for evaluating the function F, you can use the reverse engineer ability of the unoffice scalable program to recover the original code, which then contains the sense of information as a comment. Note that for our construction, it is totally fine if the function embedded trader tracing system uses black box tracing. And in fact, our construction leverages the existing work, which uses black box tracing. And you can view our conversion as nevertheless upgrading a black box tracing scheme into a white box tracing scheme that ensures user privacy. OK, so let's now take a quick detour to discuss the limitation of functional encryption that arose when trying to use it to build our trader tracing system. Consider the canonical motivating example for functional encryption, which is a spam filter on a remote email provider. Here, a user wants to outsource email management to a remote provider. But the remote provider needs to figure out how to route spam messages to trash. The obvious solution is to have the provider just own the decryption key, allowing it to decrypt the messages and apply the spam filter on the plain text. But the user would like to keep the messages hidden even from the provider. So what the user will do is employ a functional encryption scheme and give a special secret key corresponding to the spam filter to the email provider. This key will allow the provider to learn the result of applying the spam filter to the contents. But it won't actually reveal anything else about the contents beside the single bit of whether the spam filter decided the contents were spam or not. This will allow the provider to route the messages accordingly, but will maintain the privacy of all other information in the message. But now we ask, what about security against the spammer? Concretely, the spammer may try to learn how the spam filter works in order to design a ciphertext that can circumvent the filter. In particular, you can imagine the spammer sending various ciphertexts and then trying to learn whether or not the spam filter classified them as spam by observing, for example, whether the user clicked on a link in the email. If the user clicked on a link, then the spammer is reasonably confident that the email circumvented the spam filter. And if the user doesn't click on the link, then the spammer may hypothesize that the spam filter caught the message. So basically, the spammer in the setting is mounting a CCA attack against the provider's secret key in an attempt to learn about the spam filter. And the natural question is, what can the adversary learn through such an attack? Certainly, it's unavoidable that the spammer can make black box queries to the spam filter. Given a message, you can just encrypt it, send it to the spam filter, and see if it gets caught by the filter or not. But the usual notions of security for functional encryption actually potentially allow the spammer to learn a lot more than just black box access. They potentially allow the spammer to learn the actual code of the spam filter. As the barracudal, unoffice-gatable function show, knowing the code may reveal more information about the spam files than just having black box access. We therefore propose a new notion of black box function privacy for functional encryption to capture this scenario. Essentially, black box function privacy guarantees that the spammer is limited to just black box queries to the spam filter and can't learn anything beyond what can be learned by black box queries. And we note that this notion is also needed for a trader tracing application to ensure that the CCA attacks don't reveal the code of the embedded programs and only allow the programs to be queried on function inputs. We show a simple transformation from any functional encryption scheme to one with black box privacy, and this is where the mystics come in. All right. So that summarizes our privacy result. And now we briefly discuss another potential limitation of the prior trader tracing literature that we explored in this work. To motivate things, consider a typical modeling assumption in the NPC literature, which assumes a reliable broadcast channel. This means any message sent to the broadcast channel is identically received by all parties. Why is such a model important? Well, if you have an unreliable channel, some users may know the adversary is malicious, but others may think the adversary is honest. And this would break the guarantees of many NPC protocols. We note that some works have developed protocols with point-to-point communication, but typically at the cost of many more routes. We now ask, what if the broadcast channel is encrypted under a trader tracing scheme? We will assume the ciphertexts themselves are sent over a reliable broadcast channel, but the question then becomes, will the virtual plaintext channel remain consistent? And we show that for existing trader tracing schemes, the answer is no. And in particular, we show an impossibility that if you have black box public tracing, this allows anyone to compute ciphertext with inconsistent decryptions. Basically, since the tracer learns the identity of a user just from queries in the black box setting, different users must respond to these queries differently thereby having inconsistent decryptions. On the other hand, we give a partial positive result showing how to use functional fully homomorphic encryption and lockable obfuscation to achieve consistency. The limitation of this work, however, is that we are only able to obtain tracing under a constant number of collusions. And the efficiency of our scheme is exponential in this collusion bound. Note that even in the setting with a constant number of collusions, however, white box tracer tracing is still necessary per hour impossibility. Our protocol and proof for this are pretty complicated. But the basic idea is to start with a scheme where tracing inquires a certain secret. And this prevents users of the system from running the tracer tracing algorithm themselves to learn the inconsistent decryptions because only the person who has the secret can run tracing. But we want to allow public tracing. So what do we do? We encrypt the secrets under the fully homomorphic encryption scheme and have the tracer then perform the tracing algorithm homomorphically using the decoder program. Note that this homomorphic tracing cannot be accomplished with just black box access because you actually need the program code in order to do the homomorphic operations. The problem though, is that the results of tracing remain encrypted in the FHE scheme. So to get the results of tracing in the clear, we need to use lockable lock escation and due to another number of subtle issues that come up in making everything work, this outline appears that stuck at handling a constant collusion bound. We therefore leave improving our results for showing an impossibility for fully collusion resistant consistent tracing as an interesting open question. I'll now conclude with an interesting direction for future work. Note that trader tracing can be seen as a special case of a more general problem of watermarking software where trader tracing is a special case of watermarking decryption programs. All prior watermarking results analogous to trader tracing use black box tracing algorithms to extract the watermark. As a consequence, similar privacy and consistency issues may arrive in the more general watermarking setting. So an interesting direction is to explore white box techniques in this more general setting. Another interesting direction, it is known that some programs cannot be watermarked. For example, unlearnable or even unoffice scalable programs cannot be watermarked. On the other hand, we have a number of positive results for watermarking, but there is a large gap between these positive and negative results. So an interesting question is whether white box tracing can allow for watermarking more general programs than are possible using just black box tracing. All right, this concludes my talk. Thank you for listening.