 Welcome to the talk about analyzing the GPS ciphers G1 and GA2. When Apple launched its iPhone in 2007, it made internet accessible for everyone via a mobile device. The first iPhone did just support a 2G data connection. And even though we talk about 5G today, 14 years later, 2G data connections are still around us. For example, in case of poor network reception or IoT applications. Initially, 2G had 4 encryption algorithms, 2 for the voice and text messages, and 2 for the data transmissions. All algorithms have 64-bit input key. Those algorithms were designed in the early 80s and 90s with the principle security by obscurity. About the voice algorithms A51 and A52 we know a lot, as they were reverse engineered early on. Already in 1999, A52 was broken by Ian Goldberg and David Wagner. A51 is less weak, but still breakable. Later on, 4 additional algorithms were added to 2G based on Kazumi. The vulnerability of A52 might be connected to export restrictions. Some design documents of the Etsy group state that the algorithms, and here I quote, should be generally be exportable taking into account cooned export restrictions. Another statement says, within the operational context, the algorithms provide an adequate level of security against eavesdropping of GSM and GPRS service. The main question now is, what does exportable mean? There is no direct statement related to the Etsy algorithms, but we know from laws and other sources from this time that exportability in this kind of context text means 40-bit security. While A51 and A52 have been analyzed, we just know little about the security of GA1 and GA2. Both algorithms are stream ciphers designed by the Etsy in 1998 and are responsible for encrypting the data transmission of GPRS. It is interesting to see that the Etsy forbids the implementation of GA1 since 2013 in films. This fact might be related to a presentation by Noel et al. GA2 is still mandatory to implement in films. Still, both algorithms were not publicly analyzed on their security claims. And for this I would like to hand over to Christoph. Thank you very much, David, for the introduction. So now I will talk about the ciphers GA1 and GA2 in detail and also present you the text. And I will start with GA1. So eventually we got the source code of GA1 by some source. And as you can see here, it is an LFSR-based stream cipher. So there is a 64-bit session key and together with the initialization vector, this is non-linearly mapped to a 64-bit seed. I will concentrate on this seed from now on. Once you have recovered the seed, you can recover the session key by inverting this non-linear mapping. So this 64-bit seed is then linearly mapped to a 96-bit internal state, which is depicted by these blue cells here. So it is divided into three registers, A, B and C, and they are of length 31, 32, 33 bits. So in each clock cycle, these LFSRs are clocked by feeding the output bit into several positions of the LFSR, which is indicated by this XOR operation. And then from each LFSR, there are seven tabs which go into a function called F. And then the output bits of these three functions are added together and then they produce the bit of the key stream. And in each frame there were 1,600 bytes of key stream generated, and this is that I. This I goes to 12,800 if you denote it in bits. And the goal of the attacker is to recover the 64-bit seed. As I said, you can recover the session key from the seed from some bits of non-key stream, where the bits of non-key stream is limited by this data, by these 12,800 bits. And this is a valid assumption for an adversary to have these bits of non-key stream, for example, from some header information. So the exact weakness of decipher is that after this linear initialization process from the seed to the register states, the joint 64-bit initial state of registers A and C can only be in a set of two to the 40 possible states. So there are 64 bits in the state, but you only have two to the 40 possible states and not two to the 64. So there is quite a loss of entropy. More precisely, you can denote the linear initialization from the 64-bit seed to the register states by matrices. For each register, you have a matrix MA, MB and MC. And mathematically, what we have is that the dimension of the image of the matrix MA concatenated with MC, that this dimension is only 40 and not 64. And the dimension of the image of the matrix MB for the initialization of register B is of 32. This is what we expect. And then this weakness leads to a very straightforward meet in the middle attack on the cipher. I will present a very simplified version here for the exact attack that we use, which is slightly more complicated. I would refer to the paper. So it first consists of an offline step where the adversary stores the first bits of the output stream from register B. So the adversary initializes the register B with all two to the 32 values that are possible and then generates the output stream through this f function from this register. And the number of bits then depends on the success probability of the attack. So the number of bits for our attack here is 65 or about 70 in the simplified version. So it's very, very low data. And then these outputs are all stored in a hash table and this requires about 45 gigabyte. And then this is an offline step. So this table of 45 gigabyte can be stored on some hard drive and you only need to pre-compute this once. And then there is an online step in the attack that given the first bits of the key stream of that, it exhaustively searches over all the two to the 40 values that come from the joint registers A and C. As I said before, there are only two to the 40 possible values for the states of register A and C. So we exhaustively search over all of them and then compute the corresponding output bits of the f function, so the sum of f of A and f of C. And then we try to find a match for fB in the hash table. This access to the hash table is very fast. And so once the match is found, we have a candidate for all the register states, A, B, and C. Because if we choose the number of bits for the stored in the hash table, if we choose this large enough, then we can eliminate many false positives. And so we get very few candidates for the register states. And so once we have these register states, we can compute a 64-bit seed. So this is a very, very simple idea. So the attack was possible because the image of the joint linear initialization matrix of two registers has a very low dimension, a relatively low dimension. So in this example, dimension 40. And the question is if this weakness is likely to occur intentionally or not. So we experimentally checked what would happen for two random primitive LFSR, where the rest of these initialization methods stays exactly the same. So we took the initialization process of GEA 1, but just changed the feedback polynomials of two LFSRs to random feedback polynomials, random primitive feedback polynomials. And then we made this for 10 to the 6 trials and computed the dimension of the image of this joint initialization matrix of two registers. And the highest of the lowest dimension, sorry, the lowest dimension that we observed was 54 in all these 10 to the 6 trials. So we never even got close to dimension 40. And you see in this table that from one column to another, the number of spaces that we found drop roughly by a factor of 4. And if we assume that this would go on, so if we extrapolate this data here, then we would estimate a probability of 2 to the minus 47 to obtain an image of dimension 40. So it seems very unlikely that this property occurs by accident. So now let me talk about the crypt analysis of the GEA 2 cipher. So again, we got the source code of GEA 2, and it looks quite similar to GEA 1. So the difference is, the main difference is that there is one additional register. So this initial state is bigger. We have another LFSRD. And the idea that we had before targeting the initialization process does not quite work here. Because once we look at the joint initialization of two registers, then there are much more bits left from the other two registers such that this meet in the middle attack would be quite expensive. So the idea here is that we target the key stream generation by a combination of algebraic attacks and divide and conquer methods. So if we look at a straightforward approach based on algebraic attacks, so basically linearization, so we first of all observe that the algebraic degree of F is 4. And because we just add up the states of each LFSR, these LFSR are only linearly mixed. And this means that if we want to count the number of monomials in each algebraic normal form for each key stream bit, we can compute this by summing up these binomial coefficients here, where the 29, 31, 32 and 33 corresponds to the length of the registers. So we basically count up all monomials of degree 1, 2, 3 and 4 that are possible to occur in each computation of zi. And there is also the constant monomial, this is why we add a 1. So we have roughly 152,000 monomials that can occur in each algebraic normal form. But the problem is that we have only 12,800 bits of key stream available. So a very simple linearization attack would not work in our case because we need more equations than monomials, which is not possible by this low amount of data that we have in each frame. So if we look at a straightforward guess and determine approach, then our idea would be to reduce the number of monomials below these 12,800, below these amount of equations we get from the key stream. So this means we need to guess some bits of the register states in advance. So let us denote the number of guesses that we made for the initial states in register A, B, C and D by NA, NB, NC and ND. So in this way we can reduce the number of monomials by subtracting the number of guesses in the upper part of these binomial coefficients. And the problem is if we want to go beyond these 12,800, then we need to guess at least 59 bits of the state. And this is quite a lot. So if you take also into account the solving the linear system, then this would be a complexity higher than exhaustive search. So this approach does also not work. So what we do is we combine it with a divide and conquer approach, which we basically also did in GEA1. So the problem we consider here is we are given two finite sets as 1 and F2 and two functions F1 and F2 that map from one of these sets to the binary vector space of length C. And if we are given such an element T from the binary vector space of length C, then we want to find all the elements, all the pairs of elements from S1 times S2 such that they add up to T if you first evaluate the functions. And this is basically the same what we already had in GEA1. And the approach was that for all states S1 and S1 we store this S1 in a hash table at position F1 of S1. So we evaluate this F function and store then S1 in a hash table at the corresponding position F1 of S1. And for each state S2, then we check corresponding values for S1. And if we have found the match in the table, then we have likely candidates for S1 and S2. And then they see the length of the string T. And this is the parameter in which we can control the success probability of the attack. So basically the number of false positives we get. And in the attack we combine all of the three previously mentioned methods. And again I will present a very simplified version here for the actual attack that we use. I refer to the paper. It uses some shortcuts and some optimizations which lead to a slightly better complexity. But for the basic idea I think this slide shows it quite easily. So the first step is to guess NA bits of register A and ND bits of register D. Where the parameters NA and ND will be discussed in the next slide. And then as a second step we derive linear combinations, see many linear combinations of the key stream bits that do not depend on all the variables of register A and D that we have not guessed. So in other words we want to derive a linear mask MI for I1 up to C. Such that for all of these linear combinations of this inner product MI with the X of the output stream of registers A and D after the F function. Such that this is a constant function. And constant functions means that it does not depend on other variables. So on the variables that we have not guessed. So we know this value. And then we can apply step 3 where we define the T from F2 to the C. This T corresponds to the idea on the previous slide. So we define this S which has in each position in each bit. It has the X or of these linear combination of this constant function that we know MI and SAD. And the linear mask applied to the key stream Z. This is what we observe. So MI, MIZ, the inner product MIZ is what we observe from the key stream. This MISA plus D is this constant function that we computed in step 2. So we derive this vector and then this is exactly what we need to apply the technique on the previous slide. So we want to find all the beta and gamma from S1, S2. Such that F1 of beta plus F2 of gamma adds up to T. So it's equal to T. Whereas one is the set of all the initial states of register B. And S2 is the set of all the initial states of register C. And the F function maps these initial states to the corresponding output stream from this register. Taken with the inner product of these linear masks. And in the simplified version if you choose NA equal to 11 and D equal to 9 and C equal to 64. We obtain a state recovery attack with complexity about 2 to the 53.7 GA2 evaluations and 32 GB of memory. The improved version we have in the paper needs roughly 2 to the 45.1 GA2 evaluations. But these attacks use almost all of the available data per frame. So all of these 1,600 bytes. So you can do some trade-offs, some time complexity versus data trade-offs, which are shown by this graph. On the XS axis you see the number of key stream bits available to run the attack. And on the Y axis you see the binary logarithm of the time complexity. So if you just want to go below exhaustive search you need less than 2,000 bits of key stream. And for the conclusion I would like to hand over to David again. Thank you Christoph. Now we know that both algorithms are weak. We needed to estimate the impact of our findings. So we tested multiple phones on the support of GA1 and GA2. And basically all phones supported both algorithms. Remember the support of GA1 was prohibited by the Etsy in 2013. Still the latest devices that we tested were supporting it. So for our responsible disclosure we had two main aims A. We would like to see disabling GA1 in all phones according to the specification or standard. And we would like to see GA2 sunsetting as soon as possible by the specification. For this we needed to notify a large group of parties before going public. This party includes baseband vendors, phone mature factors and operating system suppliers like Apple and Google. For that we used the GSMA and Etsy Coordinated Vulnerability Disclosure Program. GSMA is the organization of all mobile network operators and the Etsy is the specification organization of Europe. In particular the GSMA CVD program was very helpful in coordinating the communication with all parties involved. In the end most of the phones which we tested again disabled GA1 via a software update. And now also conformance test case ensures that GA1 will be disabled in phones before they enter the market. For GA2 we achieved their following. GA2 is debricated and the specification discouraged the implementation. For newer phones the implementation is even prohibited. Coming to our conclusion, GA1 offers only 40-bit security and GA2 is less weak but still breakable. The thing we can learn about it is that the insecurity of those algorithms has affected our communication until today. With that I would like to thank you for your attention and we are happy to take any questions.