 Hello, I'm Andre and I'm going to present our paper, low-weight discrete logarithms and subset sum in 2 to the power of 0.65n with polynomial memory, which is a joint work with Alexan Amai. So for the structure of the talk, we'll first talk about the subset sum problem and some non-basic algorithms to solve it with a polynomial amount of memory and then we'll dive into the low-weight discrete logarithm problem and in the end we come back to the subset sum problem and our results here. So the subset sum problem is defined by given n numbers a1 to an defined modulo 2 to the n and a target which is also a number defined modulo 2 to the n and a weight parameter omega and we are asked to find a subset of these ai's that sum to t modulo 2 to the n and the size of this subset shall be of size omega n and we represent this subset via this binary solution vector e and we only consider random instances so random instances mean instances where the ai's are chosen uniformly at random and why shall we consider the subset sum problem at all from a cryptographic perspective it's mostly due to its cryptanalytic applications meaning there are several problems of cryptographic interests which can be expressed as some pictorial approximative versions of the subset sum problem as for example the decoding of random binary linear codes or the low-weight discrete logarithm problem and hence more efficient algorithms for the subset sum problem often directly directly relate to more efficient algorithms for these respective problems so let us start with a very basic algorithm solving the subset sum problem with a polynomial amount of memory only so this algorithm has a meet in the middle strategy to actually find the solution vector e here e is of weight n over 2 so we fix the weight parameter omega to one half in the subset sum setting so because the weight parameter of one half is in terms of weight the worst case for for this problem and it's easy to generalize the algorithms to arbitrary weight but for a more easy exposition we stick with one half for the moment in the low-weight discrete logarithm case we will have to keep track of this weight parameter so but now we stick with one half so we split the vector e in two addons x and y where x has its weight only in the first half and y only in the second half now we meet in the middle algorithm would enumerate all x and y to finally find the solution vector e but we cannot afford to use this as we want to use only a polynomial amount of memory we define functions instead we define a function f and function g which map a subset to the corresponding subset sum or t minus the corresponding subset sum modulo two to the n over two here's this modulus is more like a technical need to actually apply memoryless collision search algorithms to those functions so we will now find collisions between those functions which we can do with a polynomial amount of memory using standard techniques and then a collision between those function relate to some x plus y which at least fulfill the identity or whose corresponding subset sum is at least equal to the target on the lower half of the bits and obviously a solution also fulfills this identity so how does the algorithm work finally so we search for collisions between f and g this relates then or collision then relates to some x plus y who whose corresponding subset some matches t on the lower bits and then we check if the identity is fulfilled on all bits if not we have to search for a new collision if it is we can output the solution and if you calculate the running time of this algorithm which is simply the time it takes to find the collision between those functions times the number of collisions that does exist as we need to make sure to find our golden collision you will conclude that the time complexity is 2 to the 0.75n so next we'll see an improvement made by Becker-Corot and Jue to this algorithm and how did they improve on this algorithm they did so by using the representation technique which is originally from how Graf Graham and Jue and this technique is also a key ingredient for our new algorithms and hence we'll describe the idea of this technique in the context of memoryless collision finding or in the next slides so our actual setting is with our two functions f and g which map from some domains to the same range and there does exist a collision between x and y that somehow relate to our e so in our case x plus y is equal to e and we chose x and y of this weight disjoint form and as we do so there is only a single decomposition of e in x and y of that form so that x and y add up to e and the goal of the representation technique is now to actually increase the domain of the functions which might sound bad in first place as collision search procedure then needs longer to actually find the collision between those functions but hopefully after we increase the domain there are several useful collisions so meaning collisions that actually relate to our e so collisions where x plus y is equal to e and how do becker-coronju achieves this they do this by spreading the weight of the addons x and y over the full n coordinates instead of using this weight disjoint form so by doing this they get actually many representations of the same vector e in the form of different x and y that add up to e so in summary the becker-coronju algorithm works similar as before but uses enlarged domain sizes which actually lead to more collisions but also to many good collisions so collisions relating to our e and the time complexity heavily depends on the proportion of good collisions among all collisions as the inverse proportion is the expected number of collisions we have to search for until we find a good collision and of course the time it takes to find a single collision tc here and even if the time complexity to find such a single collision tc grows due to the increased domain sizes the technique by becker-coronju does improve this proportion so much that they end up with a better algorithm of time complexity 2 to the 0.72 n and as i said in the beginning it's quite easy to generalize these algorithms to arbitrary weight so on the next slide we'll see a plot where we see the time complexity exponent of both algorithm as a function of omega so here on the y-axis we see the time complexity exponent and on the x-axis we see the weight omega and we see for weight of 0.5 we see the known complexity exponent 0.72 for bcj and 0.75 for the folklore algorithm but not only in the case of weight of 0.5 but regardless with which rate you choose the bcj algorithm outperforms the folklore approach so we keep this in mind and now want to talk about the low weight discrete logarithm problem so the low weight discrete logarithm problem is defined over some group g of order roughly 2 to the n which is generated by some known generator g and we get some group element beta and a weight parameter omega and we're looking for the discrete logarithm of beta to base g which we call alpha and this alpha satisfies g to the alpha is equal to beta and the weight of alpha shall be omega n so weight here corresponds to the weight in binary representation so firstly let us cover some facts regarding this problem so there is a time lower bound for all algorithms solving this problem in the generic group model which is actually the square root of the search space so the search space here are all n bit numbers of weight omega n in binary representation the amount of these numbers is n choose omega n and the square root of this is actually a time lower bound for generic algorithms and the only algorithm that's known so far that achieves this time lower bound for all omega is the meet in the middle algorithm so the downside of this algorithm is that the memory complexity of this algorithm is equal to the time complexity so also equal to the lower bound so the time lower bound so if you consider the discrete logarithm problem without the low weight terminus then this is actually subsumed by the above definition for the case of omega equal to one half since if you consider normally if you consider discrete logarithms you choose a random discrete logarithm and with reasonable probability the weight and binary representation of this discrete logarithm is around n over 2 and there's actually another algorithm for solving discrete logarithms that's by Pollard the Polaro algorithm that has a time complexity of 2 to the 0.5 n and polynomial memory requirement unfortunately this algorithm does not depend on the weight so it's the time complexity exponent stays at 0.5 regardless of the weight but in the case of omega equal to one half this is actually the time lower bound so just in the case in the usual case the algorithm by Pollard achieves the time lower bound and processes a polynomial memory complexity so in our paper we give a nice relation between the low weight discrete logarithm problem and the subset sum problem showing that both algorithms we've seen before so the folklore algorithm and the BCJ algorithm are also applicable to the low weight discrete logarithm problem so let us have a look on the landscape of algorithms we know so far for solving the discrete logarithm problem so that only have a polynomial memory requirement so here we have the folklore algorithm and the BCJ algorithm also applicable to the low weight discrete logarithm problem the algorithm by Pollard which has always a time complexity exponent of 0.5 for all weights and we know there is this lower bound here the dotted line but there is no algorithm that has only polynomial memory requirement achieving this lower bound for all weights so now we want to present an improved algorithm for the low weight discrete logarithm problem that leverages the representation technique a little more so for this case we define again two functions f1 and f2 now not mapping to some subset sums but to g to the power of x respectively beta times g to the power of minus y so then we're again searching for collisions between those functions where a collision now relates to some x and y where g to the x plus y is equal to beta which is of course also equal to g to the alpha so in other words x plus y is equal to alpha which we'll use to represent alpha as the sum of many different x and y so recall that in the subset sum case we also represented our solution e as the sum of some x and y where we chose the weight of these addons to be half the weight of the solution vector and in the subset sum case this actually makes a lot of sense since the addition is performed component wise over that so if we for example would add some weight to x and y then for sure we could not end up with e as their sum as either we would have too many nonzero coordinates or we would have coordinates equal to two in our resulting vector what is both not what we are looking for but now in the discrete logarithm case this x plus y is actually computed over that modulo the group order but let's for a moment forget about this modulo reduction so actually x plus y is the summation of two natural numbers and if we look at the binary representation of such a summation we can actually have carry bits so if we have in binary representation one plus one that actually produces a zero in the resulting vector and a carry bit so two ones in x and y produces a single one in the output vector alpha on the output in the binary representation so if we now add some weight to x and y we can still end up with our alpha of weight omega n so the idea is now to increase the weight of the addons a little more which increases our domain size but hopefully also it increases the number of representations so the question is which weight or do we have to choose for x and y so that we can expect them to add up to alpha and we call this weight of the addons phi of omega and actually calculated it and here we see a plot where the green line is actually the weight you should choose for your addons and in comparison to omega over two and we see that in most cases we have to choose choose a way higher weight of the addons than we did before so in summary we use again these functions f1 and f2 but now we use as a domain all numbers of weight or weight in binary representation phi of omega times n which increases the domain size but also gives us more representations and again we can calculate the time complexity as the proportion of good collisions among all collisions to the minus one times the time it takes to find a single collision and if you calculate this you'll end up with this cryptic formula here dependent on some binary entropy functions which is admittedly hard to pass so on the next slide we'll see the updated landscape of algorithms we know so far for solving the low weight discrete logarithm problem so here we have again the folklore and the bcj algorithm and the algorithm by pollard and the lower bound but now we have additionally our new algorithm as a red solid line we see that it interpolates to pollard's algorithm for weight 0.5 and it outperforms the bcj algorithm for all weights but still it does not attain as a lower bound by schub so to reach this lower bound actually we'll once forget about our polynomial memory restriction so we want to present a time memory trade-off now by employing a technique called parallel collision search that's actually able to produce a time memory trade-off for collision finding by giving the algorithm some memory I don't want to dive into detail on this technique but I'll just tell you so much if we employ our algorithm with as much memory as it needs to find collisions which is actually this inverse proportion then the time complexity is improved by this square root factor and this lets the time complexity exactly hit the time lower bound for generic algorithms of course we can also apply this parallel collision search technique to the bcj algorithm and as we can we'll see a comparison of our algorithm improved by parallel collision search and the bcj algorithm improved by parallel collision search on the next slide so here we have in green the bcj algorithm time and memory complexity memory as a dashed dotted line and in red our time and memory complexity we see that in terms of memory as well as in terms of time we outperform the bcj technique and recalls that this red solid line is actually the time lower bound for generic algorithms so if we compare our algorithm to the meet in the middle algorithm we achieve the same time complexity but we improve on memory complexity so that's actually what I wanted to tell you about the discrete logarithm problem let's now go back to the subset sum problem so recall our folklore algorithm for subset sum which where we had this functions f and g and we were searching for collisions between them where collision actually gave us some x plus y whose corresponding subset sum is equal to t on the lower half of the bits and then we check this identity or we check if the identity holds on all bits and if it does not we search for a new collision and we have to do this until we find a collision that actually sums to our e so we are in other words we are performing an exhaustive search over the remaining half of the bits and actually we want to get rid of this exhaustive search by introducing a layered collision search so instead of doing this this exhaustive search we introduce a new function h1 that actually outputs these upper not matching bits of the corresponding subset sum and as it is a function it also needs some input the input is here some s where s is a starting point for a collision search procedure between f and g so usually a collision search starts at some point of the domain and then you compute an iterative chain of applications of the function until the chain collides with itself and this starting point is the input to our outer function h1 here and then we need a second function h2 which is actually quite similar it also matches the starting point of a collision search between two other inner functions to the upper bits but not the upper bits now but t minus the upper bits or more precisely the upper bits of t minus the upper bits of this corresponding subset sum and it has a difference that a collision in these inner functions does the corresponding subset sum of this collision does match the zero on the lower bits and not the target so it does not matter which output of h1 and h2 we take if we sum it together then the corresponding subset sum matches t on the lower bits so now we can search for a collision between these outer functions and if we find such a collision this actually relates to some element x1 y1 x2 y2 which if you sum it together the corresponding subset sum has or matches t on all coordinates of course still this x1 plus y1 plus x2 plus y2 does not have to be a binary vector so we would have to check this in the end so by actually introducing this layered collision search we end up with nested row structure which is actually inspired by the work of Dino, Dunkelmann, Keller and Shamir from crypto 2016 where the small rows here correspond to an application of the outer function and actually a collision search in these inner functions and if you then perform a collision search between the outer functions you actually end up with a giant row structure where the colliding of the collision on between the outer function is represented by this outer row collision so again the time complexity is equal to the proportion of good collisions among all collisions to the minus one times the time it takes to find a single collision and here this proportion is actually equivalent to the probability that the resulting collision is a binary vector and if you calculate this you end up with an algorithm attaining time complexity of 2 to the 0.65 n so let me summarize our results so we in our paper we give a more general problem which we call a group subset sum not in this talk but in our paper which actually subsumes both problems so the low weight discrete logarithm problem and the subset sum problem so enabling us to apply algorithms that were known for either one of the problems as for example the folklore algorithm as well as algorithms only known for the subset sum problem to both problems additionally we give also improved algorithms for both problems so for the low weight discrete logarithm problem we could give improved polynomial memory algorithms and we could reduce the memory complexity of the meeting the middle algorithm in the subset sum problem or in the case of the subset sum problem we introduced a nested collision search which which actually enabled us to construct an algorithm of time complexity 2 to the 0.65 n which only uses a polynomial amount of memory thank you very much