 This talk is about Trader Tracing for pseudo-random functions in this joint work with Rishabh, Sam and Brent. Software watermarking has been proposed as a mechanism for proving software ownership and protecting against the unauthorized distribution of software. Typically, in a watermarking scheme, a user can embed a tag or a mark within a program so that the resulting marked version of the program is functionally equivalent to the original program. Here, the mark could be a username, a serial number, or some other kind of identifier. The security requirement is that if an adversary manages to obtain a marked version of the program and tries to remove the mark from the program, in doing so, the adversary necessarily destroys the functionality of the program. More precisely, the watermarking scheme consists of two main algorithms, a marking algorithm that takes as description a Boolean circuit C and a Mark M, and outputs a marked version of the circuit. The second requirement is the extraction algorithm, which, given a circuit C prime, will extract the mark from the underlying circuit. The two requirements on a watermarking scheme are one, functionality preserving, which essentially says the marked version of the circuit and the original circuit behave identically in terms of its input-alpha behavior on almost all inputs in the domain or in the input space of the function. The second requirement is unremovability, which is a security requirement that says that an adversary, given a marked program C prime, will not be able to construct a new program C star, such that C star and C prime compute exactly the same functionality on most inputs, and C star does not contain the mark with respect to the extraction algorithm. So essentially, an adversary either has to construct a new program that is very different from the marked program, or the adversary will produce a program that is similar to the original program, but which will preserve the watermark. These two requirements together are sufficient to show that watermarking is only achievable for functionalities that are not learnable. As such, the study of software watermarking has so far primarily focused on watermarking cryptographic functions in natural class of non-learnable functions. Many previous words have in fact focused on watermarking pseudo-random functions, or PRS, where essentially the program has hardwired and cited the key for a pseudo-random function, and on input x will simply output the evaluation of the pseudo-random function on x. This is useful because pseudo-random functions are the workhorses of symmetric cryptography, and so once we can watermark a pseudo-random function, we can use it to watermark the signing key for a message authentication code, or the decryption key in a symmetric encryption scheme. The starting point of our work is to revisit the security definitions underline these kinds of software watermarking schemes, especially in a setting of pseudo-random functions. Let us take a look at the unremovability requirement in greater detail. Under the current formulation of unremovability, we require security to hold against any adversarial strategy that preserves the functionality of the mark program. Here we capture this notion of function equivalency by requiring that the adversaries program preserve the exact input output behavior of the program. So consider this setting in the case of a pseudo-random function, and consider in particular an adversarial strategy that produces a program that only outputs the n over 4 bits of the pseudo-random function, namely on any input x, it will evaluate the prf at x, and then it will truncate the output to only preserve the first n over 4 bits. Under the existing security notion, this strategy does not preserve the exact input output behavior. In fact, it preserves the exact input output behavior on none of the inputs, and so this is considered to not be functionality preserving, and as such, the definition of unremovability stipulates no guarantees on whether this new program contains the mark or not. In fact, if you look at all existing constructions of software watermarking for the case of pseudo-random functions, they would be unable to recover the watermark from this kind of program. It turns out that this can actually be quite problematic in concrete applications. So imagine a scenario where we have a watermarkable prf, and we want to use this to protect the decryption keys in a symmetric encryption scheme. In this case, a program that outputs the first n over 4 bits of a pseudo-random function might completely break semantic security of any encryption scheme that depends on it, and yet it would not be possible to recover the mark from this particular decryption program because under the existing definitions, it does not pursue functionality. Let's illustrate this using a simple example. On the left here, we have an image that is encrypted using a standard prf-based encryption scheme, say Encounter Mode, and now suppose we run our decryption program that only recovers the first n over 4 bits of a pseudo-random function on this encrypted image. Just by visual inspection, it is pretty clear what the image is encrypting, and for all intents and purposes, this program actually functions as a pretty good decryption algorithm. However, the watermarking security definitions provide no guarantees on whether we can extract a mark from this program or not. So if we're depending on the watermarking scheme to ensure that we can identify who leaked or who compromised a particular key or who the owner of this particular program is, such a decryption program that essentially breaks security of the encryption scheme, we would not be able to get any of the watermarking benefits from such a program. And this, I believe, illustrates why when we consider protecting building blocks like pseudo-random functions, restricting the definition, the security definition, to only allow adversarial strategies that preserve the exact input-alpha behavior is overly restrictive and does not can capture the full range of potential attack strategies. Strategies that might be considered to have defeated the scheme in any realistic application would not be permitted under the existing security model. And more broadly, this problem with the existing definition is that it's fundamentally tying the adversary's goals to the functionality requirements itself. Well, typically in cryptography, we try and decouple these. For instance, in the case of encryption, we require that the honest parties are able to recover the message in their entirety. And but when we consider the adversary, when we consider the security notion and what the adversary's goals are, we will say that recovering any information about the message from the ciphertext will constitute a successful attack. Naturally, the same philosophy should also apply to softer watermarking schemes when we consider notions like pseudo-random functions. Here, the exact functionality preserving seems like a very natural notion when capturing correctness, but it does not seem like the right notion for security. Namely, what we would desire if we want a watermarking scheme for pseudo-random functions is that whenever the adversary produces a program that breaks the security of the primitive, then the watermark should also be preserved and should be able to be extracted, given access to such a program. And this motivates the main primitive that we introduce in this work to address the shortcomings in existing definitions of security for watermarkable PRFs. We introduce the notion of a traceable PRF, where essentially we require that this marking security or this unremovability guarantee to hold not just against programs that preserve the exact input-alpha behavior, but more generally any program that manages to break security of a pseudo-random function. This notion is very similar to the setting in trader tracing where we require that any program that is able to decrypt cybertexts must necessarily preserve the embedded identifier, namely the mark. In the case of a traceable PRF, we require that any program that is able to break PRF security must contain the watermark itself, and the watermark should be extractable from any such program. We can model this more generally as follows. Suppose an adversary is given a marked version of a pseudo-random function and this adversary produces a new circuit C. We say that the circuit C is good or that it breaks pseudo-randomness if essentially it can win the PRF security game. Given input-alpha behaviors of either the pseudo-random function or the truly random function, the circuit C will be able to tell these two distributions apart. Now, if we look at this particular definition as formulated currently, there is actually a problem. It is problematic because the circuit C here could actually just contain a single hard-coded input output of the pseudo-random function. The circuit C is derived from the original pseudo-random function so it contains information about the input-alpha behavior. So we construct a distinguisher that has a single PRF at x-star hardwired inside it. It can always trivially break and distinguish these two distributions just by evaluating at x-star and seeing whether the response matches the pseudo-random value or does not match, in which case we can conclude that we are interacting with a truly random function. In this particular case, this particular circuit C is not very interesting because it only contains information about the pseudo-random function at a single point and in some sense is not really capturing the full capabilities of the PRF. So in order to make this definition meaningful and useful, we will relax the security requirement and say that the adversary has to produce a circuit C that contains some kind of global structure about the PRF behavior. Namely, the circuit C should be able to distinguish the pseudo-random function given only random evaluations of the PRF. So PRF evaluations on random points in the domain. In other words, we allow the distinguisher, we require that the distinguisher not only be able to break strong pseudo-randomness but even break weak pseudo-randomness. The adversary has to produce a circuit C that contains some global structure of the pseudo-random function that contains information about the outputs of the PRF on a large fraction of the domain. So just more formally, the syntax of a traceable PRF consists of the following four algorithms, a set-up algorithm that samples the pseudo-key for the pseudo-random function, a key generation algorithm that basically embeds a mark or an identifier here denoted ID within the key, an evaluation algorithm that basically implements PRF evaluation, and finally a tracing algorithm that given oracle access to some distinguisher will be able to produce an identity or a set of identities that were used to construct that distinguisher. The properties we require are correctness, which basically says is basically saying functionality preserving holds, namely, the unmarked version of the key, the master PRF key, should implement roughly the same functionality as the marked version of the key. Pseudo-randomness requires that we still have a pseudo-random function, so evaluation using the PRF key induces a pseudo-random distribution. And finally, we have this tracing security definition, which as I defined earlier, says any adversary that manages to produce a useful distinguisher that can break pseudo-randomness on uniform-random stamp, given uniform input outputs will be able to will must necessarily preserve the watermark. More formally, we modeled this as a game between an adversary and a challenger. The adversary starts by specifying a mark or an identity. It gets the secret key for that particular identity, and then it needs to construct a distinguisher. In a single key setting, we only allow the adversary to submit a single key. The requirement is that, as long as this distinguisher is useful, namely it breaks weak pseudo-randomness with non-negligible advantage epsilon, then the tracing algorithm will successfully recover this identity with probability that is close to epsilon. One nice property of traceable pseudo-random functions is that we can directly use them to obtain traceable notions of other primitives built from pseudo-random functions. For instance, if we consider the construction of symmetric encryption for many pseudo-random functions, and we replace that encryption scheme with a traceable pseudo-random function, then we get a traceable encryption scheme. In fact, this gives us a secret key trader tracing scheme. I'll note here that this is not the case if we start with a watermarkable PRF for many of these reasons that I outlined at the beginning of this talk. So traceable PRFs are useful in the sense that they allow us to directly take an application that depends on pseudo-random functions and derive a traceable version of that application. With watermarkable PRFs, because it satisfies this much weaker security notion, these types of results do not apply. So in terms of constructions, we show in this work that under the learning with errors assumption we can construct a single key traceable PRF. So here tracing security holds against an adversary that just obtains a single key. And the resulting construction is in a secret tracing model. So only an authority that knows a secret tracing key will be able to extract the mark from a particular circuit. And then assuming indistinguishability, obfuscation, and one-way functions, we can actually get a fully collusion-resistant traceable pseudo-random function. And this even supports public tracing, where any party is able to trace, just given the public parameters of the scheme. And something that's noteworthy here is that the assumptions that we need to rely on in both the single key setting as well as the collusion-resistant setting are exactly the same as the assumptions we need for watermarkable PRFs. So even though we have dramatically strengthened this achievable security with this new definition of a traceable PRF, we can still obtain many of the similar kinds of constructions from the same underlying assumptions. And they all rely on similar building blocks. However, there is no, say, black box way of taking a watermarkable construction and translating it into a traceable construction. And in this talk, I'm mostly going to focus on the construction based on LWE. So to construct a traceable pseudo-random function, we are going to rely on a new intermediate notion that we call a private linear constrained pseudo-random function. And this very much parallels the constructions of trader tracing from a notion called private linear broadcast encryption. You can view private linear constrained PRFs as essentially the analog of a private linear broadcast encryption in the PRF world. So let's begin by defining this notion more precisely. So in a constrained PRF, the holder of the PRF key can constrain it with respect to some constraint hereby again modeled as a Boolean circuit C to produce a constrained key. The constrained key can in turn be used to evaluate the pseudo-random function at all points x in the domain that satisfy the constraint. In the case where we are considering a linear constrained family, then the input points are associated with some index t. And this index t you can view as a number between 0 and 2 to dL. And the constrained keys is also identified with an associated with an identifier. Here this is a number between 0 and 2 to dL minus 1. We say that it's a linear constrained family when basically a constrained key with index ID can be used to evaluate on all domain elements whose index is smaller than the index associated with the key itself. Finally, we say that a linear constrained PRF is a private linear constrained PRF if the index that is associated with a particular index point is actually hidden. Finally, there exists a sampling algorithm that can be used to sample inputs with a specified index. So again, there's 2 to dL possible indices. There's a sampling algorithm that takes a particular index and will sample a domain element in the domain of the PRF that has that particular index. Now, we require some additional security properties to hold for a private linear constrained PRF. The first property is that the points in the domain of the pseudo random function that has the smallest index, namely index 0, these are the distribution of these points are actually indistinguishable from just sampling a random element in the domain of the PRF. So the distribution over the points with index 0 is computationally indistinguishable from a uniform random distribution over the full domain of the PRF. Next, we require that a PRF input that has an index i versus an index j are computationally indistinguishable unless the adversary possesses a key for an identity that lies in between this interval. So notice that if the adversary possesses a key that lies between this interval, then the key can be used to evaluate the PRF with index i but cannot be used to evaluate the PRF at the point with index j. However, if the adversary does not have such a key that either it is unable to evaluate the PRF at both i and j or it is able to evaluate the PRF at points i and j. So identity hiding says that in this case, the adversary would not be able to distinguish a point with index i from a point with index j. And finally, the last requirement is pseudorandomness which basically says the value of the pseudorandom function on inputs with the largest index namely 2 to dL will appear pseudorandom even if the adversary has many constrained keys. And this follows because any constrained key will never enable evaluation at the largest point in this set. So once we have a private linear constrained PRF that satisfies these three properties, it is actually fairly straightforward to construct a traceable PRF. So let's here show how we can use these three properties to aid us in constructing the tracing algorithm. The assumption here is that we have an adversary who manages to produce a distinguisher that is able to break weak pseudorandomness of the traceable PRF with non-negligible advantage epsilon. So the distinguishing advantage here is epsilon. So suppose now that we want to trace this particular distinguisher. The first property we have is that if we sample a domain element with index zero, these are going to be indistinguishable from a random domain element. And because the distinguisher is able to break weak pseudorandomness with advantage epsilon, so given just input output behavior on random domain elements, then if we replace those random domain elements with random elements with index zero, then the distinguisher will still succeed with advantage around epsilon or negligibly close to epsilon. At the other end of the spectrum, if we instead of sampling domain elements with index zero, we sample domain elements with index 2 to dl, well, by the pseudorandomness property, the decoder or the distinguisher should have advantage that is negligible or negligibly close to zero, because the distinguisher should not be able to distinguish in this particular setting. Now, if we look at the points in between, if we consider two points i and j, we know that by this final property, this identity-hiding property, the distinguishing advantage can only change by a negligible amount as long as the adversary did not have a key for the lies between that interval. So if we take these three properties altogether, the implication is, on one end, the distinguishing advantage is epsilon, at the very other end, the distinguishing advantage is zero, and that in between any interval that does not contain a key known to the adversary, the distinguishing advantage cannot change except by a negligible amount. And so there must be a jump somewhere, and that jump can only appear in intervals that contain the identity. So now we can actually trace that particular identity or set-up identities in the case where there are multiple, using an algorithm for something called the oracle jump-finding problem introduced by Nishimaki, Wicks, and Jandri. And this is basically using the same kind of algorithms that can be used in trader-tracing constructions. So once we have a private linear constraint pseudo-random function, it's actually quite straightforward to construct a traceable PRF using previous techniques developed in that setting of trader-tracing. So all that suffices now to construct a traceable PRF is to construct a private linear constraint PRF. The starting point for our construction here will be a standard constraint PRF for general circuit constraints. So suppose the domain of our PRF consists of L-bit strings, we will consider a circuit constraint PRF over this domain. If we only want to support linear constraints, this is quite simple. We can just encode a linear constraint as a circuit functionality. Now the problem here, though, is that the indices for the domain elements are basically the domain values themselves, and these are completely public, versus in a private linear constraint pseudo-random function, we require that the index associated with a particular domain element has to be hidden, and this is necessary in order to have properties like normal hiding, index hiding, and so on. So a natural solution to this is, well, if we want to hide something in crypto, the usual approach is to apply some kind of encryption. So instead of having the points themselves be the index, we will instead take our domain to be the ciphertext space for some symmetric encryption scheme, and the index associated with the point is basically the value that we get when we decrypt the particular point. So now when we construct our circuit constraints, or when we encode the linear constraints, what we're going to do is we're going to include the decryption key for the encryption scheme as part of the constraining circuit, and on input a domain element, which is basically a ciphertext, we will first decrypt the ciphertext to check what the index is, and then we will compare that index to the index associated with the key, and depending on that comparison relation, either allow for evaluation or not. So the only problem now is that this constrained key could leak information about the indices. So if you have the constrained key, you might be able to learn some information about the decryption key, and now if you know the decryption key, you can decrypt the index associated with any domain element and learn that index once again. So somehow we need to make sure that the decryption key is actually hidden in the constrained keys under our privately-near-constrained PRF system. It turns out that this we can also do by instead of using a constrained pseudo-random function, we use something called a private constrained pseudo-random function, which basically says the constrained key hides the underlying constraint. So with this, we can actually build a privately-near constrained PRF, and which in turn implies a traceable PRF. So just to summarize, to construct this privately-near-constrained PRFs, we can combine a private constrained PRF with any symmetric encryption. This gives us a privately-near constrained PRF. Here, in order to sample points that have a particular index, it is necessary to know the secret encryption key. So this is why this scheme only provides a secret tracing capabilities. And then this will give us a traceable PRF. So starting from the LWE assumption, we can obtain private constrained PRFs that is secure against a single key. So an adversary that has a single key will not be able to break any of the corresponding properties. And in turn, this gives us a single key privately-near constrained PRF, which implies a single key traceable PRFs. So to just summarize, in this work, our focus was on revisiting some of the definitional foundations of software watermarking, especially in the case of pseudo-random functions. And hopefully I have convinced you that in many settings where we are using a pseudo-random function to build a more general primitive, the existing definitions, security definitions for software watermarking are insufficient. And to address that deficiency in the existing definitions, we introduce this notion of a traceable pseudo-random function which requires that the watermark or this traceability guarantee to hold against any program that is able to distinguish or break pseudo-randomness of the underlying primitive and not only just against programs that preserve the exact input-output behavior which can be problematic depending on the exact application. And I think this illustrates a more general point that when we study notions of software watermarking, we should not always immediately tie functionality preserving to a notion of input-output preservation. In particular, there are many natural settings where we want to distinguish between what the honest parties should be able to do, so the functionality preserving for honest parties property versus what it means for the adversary to break security of the scheme. Just we should be able to capture adversarial strategies that do not necessarily replicate the exact input-output behavior but could still break the scheme in other contexts. So with that, I will wrap up and for full details on the constructions and the proofs, I'll refer to our paper on eprint. Thanks.