 I'm Julia Sauvage. I will present our results on practical saleback of the rate for PCG to the random generator. This is a joint work with Charles Boulaguet and Florent Martinius. We decided to study the PCG, or Permitted Common Material Generator, because the PCG is claimed to be hard to predict. On the advertising website, pcg-random.org, attacking it is said challenging. So challenge accepted. Our main result is a practical procedure that from a certain amount of consecutive outputs will predict all the next outputs of this generator. It's based on a guess and determine attack. Most of the running time is spent on solving possessed Victor problem in low dimension. But in fact, it's practically do-it-all and we've done it. PCG has been designed by Melissa O'Neill in 2014. It's widely used, for instance, It's the default generator in Python NumPy package. We studied the presumably most secure version, PCG64, which is precisely the version used in NumPy. The seed is composed of the first in the map state and possibly the increment. Depending on the river, we use the default increment for our own. So now I will present to you the algorithm of PCG. Each time the generator is clocked, the internal state is updated linearly. So it's just multiplied by a constant A, then the increment is added to it. Once it's done, 65, 64-bit output is extracted from this internal state. The two 64-bit halves of this state are absorbed and the result is rotated by a variable amount. More precisely, this amount is the sixth most significant bits of the state. So the sixth most significant bit gives us a number between 0 and 64, which will control the rotation. So a general idea of our attack is to guess some well-chosen thoughts of the internal states. So the goal of this is to get rid of the annoying rotation and go through to get educated in your congruential generator, which is easy to break using Euclidean lattice techniques. There are two cases depending on whether the increment is known when it's the default increment used or unknown when it's given by the sieve. If we are in the easy case, we will reconstruct this full state. But if we are in this more complicated case where we don't know the increment, we will get only a partial difference and then we will get the next rotations that we will use in another algorithm. So we are capable of detecting bad guesses without error in the two cases. When we are in the easy case, when sieve increment is known, we work with an alternative sequence of states in which the increment sieve is removed. It will give us a geometric sequence, which will be very useful. Here are the details of the attack. We consider some consecutive states, three in the case where sieve is known, and we will get the L list significant bits of the first internal state. This will all the list significant bits of all the following states, because the modulus is the power of 2. We will also get the six most significant bits of each state, because this will allow us to undo all the rotation and will give us access to the xor of the two alphas of each state. With this information, we get the yellow parts by xoring the known bits with the output. The rest of the attack use this yellow middle part to reconstruct everything. We remove from this yellow part its c component to get a truncated geometric sequence. We are now facing the problem of reconstructing a geometric sequence. This problem is equivalent to breaking a truncated linear differential generator in an easy and practical role of play in case. This is easy because the multiplier and the modulus are known, and there is no increment. It's a well-known problem, and it will make us use Euclidean netism. All of the n consecutive states of a geometric sequence with a common ratio a for a vector that belongs to the lattice L present. T can be seen as an approximation of this geometric sequence we are looking for, in which the least significant bits are unknown. This is why T is closed to the original non-jugated version. We can recover the missing bits by finding the closest vector to T in a lattice. A lattice is a discreet subgroup of iron. Here we have an example of a lattice in dimension 2. The two red vectors are the basis of this lattice. The lattice problem we are interested in is the CVP, the closest vector problem. Given a green point, we search in the lattice the closest point to the green one. This is a hard problem because it's NP-hard, and we're not going to solve the CVP. We're going to use an algorithm that approximately solves the CVP. This algorithm is the Babaille algorithm, the Babaille-Rending algorithm, and only uses one rounding and two matrix products. To work on this algorithm, we need a good basis of the lattice and the red basis. Here is not a good basis of the lattice because one of the vectors is far too long. Before using the Babaille-Rending algorithm, we need to reduce the basis of the lattice. For example, to obtain the blue basis here, which is shorter, and to reduce the basis, we're going to use the LLL algorithm, which is polynomial in the dimension of the lattice. To summarize in the easy case, we take three consecutive outputs. Let's make 192 bits. We guess 37 internal state bits. Then for each of the two po37 possibilities, we solve an instance of CVP in dimension 3. We reconstruct the internal state, and then we check if it produces the expected outputs. We've put some effort into the implementation. Each trial only requires 25 CPU cycles, so the whole procedure runs in 23 CPU minutes. That was for the easy case. Now let's move on on the hard case. So that was for the easy case when the increment was known, and the basic strategy was to remove it, get a truncated geometry sequence, then reconstruct the missing bits using lattice techniques. Now in the hard case, where the increment is unknown because it's part of the seed, that's the default situation in NumPy. If we want to make the same strategy work again, we have to find another truncated geometric sequence. And this sequence will be formed by the difference between two successive states. So this allows us to play essentially the same attack as before, except that we will need to guess one more rotation. We need k plus one states to get k differences. And we'll also need to guess the least significant bits of the increment C. So in details, we will focus on five consecutive states as before, we'll guess some least significant bits of the first one and we'll get all the rotations. Using the same trick as before, we'll get to know the least significant bits of all these five states. And because we've guessed the rotation, we can, as before, access the middle part by exploiting the XOR between the two halves. As before, we get this middle yellow part in the states and to get our geometric sequence, we'll compute the differences between these partial states. So by guessing parts of successive states, we can reconstruct parts of the differences between these states. And because the differences form a geometric sequence, we can reconstruct the missing bits as before by solving an instance of the closest vector problem that this time in dimension four. Now the problem we have is to check if the guesses are valid because we are no longer rebuilding the actual full state and we cannot generate the output directly. It turns out that we have a powerful consistency check and we are capable of computing a part of all subsequent states in two different ways. First, we can use the partial differences that we have computed because combining the first state with the IF partial difference yields a part of the IF state. So that's one way. The other way is to use the output and the rotation because with the IF output, the IF rotation and the least significant bits of the IF state that we know, we can just by XORing then compute the same part of the IF state. So this requires knowledge of the IF rotation that we don't have, but we can just try all the 64 possible values. If there is no match, then we know that our guesses are wrong and we can do that for any value of I. So in fact, it's a very powerful consistency check and it discards all wrong guesses immediately. So once this guess and determined phase is over, we know the correct values of the guess bits and we know the correct value of the partial differences. And from there, we need to get the actual full value of the initial state, but we're still in a much better situation than before because we can reuse our consistency check to compute the values of all subsequent rotations. Indeed, we know the correct value of the partial differences and we know the correct values of the least significant bits of each state. So we just have to try all possible values of the other rotation until it matches on the middle of the state. So this allows us to compute all rotations. This means that for all subsequent states, we can get the correct values of the six most significant bits. By subtracting, this gets us the six most significant bits of all full differences. The full differences on 128 bits form a geometric progression. We know the most significant bits. So using again the same technique as before, this time in slightly larger dimension, we can reconstruct all the missing bits and this gets us the complete full differences on 128 bits and obtaining the actual values of the first state from there is relatively easy. So to summarize, in the hard case, we have to guess between 51 and 55 bits of internal states. We have to guess the first five rotations, which already makes 30 bits, and we have to guess between 11 and 13 least significant bits of both the first initial state in the increment. And for each guess, we have to solve an instance of the closest vector problem in dimension four, which we still do by rounding, then perform the consistency check. That's the bulk of the computation. So we have proved that the attack is correct if we guess 14 least significant bits, but in practice, it works fine with only 13 and it works with probability 66%. If we guess only 11 bits. So concretely, doing all that takes 55 CPU cycles per guess. And so this means that the running time of the attack is estimated between 12,000 and 20,000 CPU hours. So running the attack for real requires more powerful hardware. So we used the Jean-Z, a supercomputer, located in a French National Computation Center. More precisely, we used 512 cluster nodes, each with 40 cores. And on this machine, running the attack takes 35 minutes. So it's fairly practical, even though it requires a more powerful setup. So to conclude, reconstructing the seeds for PCG is fairly practical. It requires a little amount of key stream and is computationally feasible. So this means that PCG is not cryptographically secure. Well, it never claimed to be. It just claimed to be challenging. And in any case, don't use an MPI to generate cryptographic nonsense. It's a bad idea.