 Yes, that's can I put it in my pocket? Okay. It's time to start session of block sehpatsu. We have three talks. First talk is a new algorithm for unbalance between the program by Yvitch Nikolik and Yusasaki. Talk is given by Yvitch. Thank you. Okay, so first let's try to understand the meaning of this unbalance. Excuse me. Meeting the middle attack with a simple example. Imagine you were lucky enough or smart enough and managed to invert the compression function of SHA-256. Say in 2 to the 64 you can find pre-image for the compression function. And now you want to convert this pseudo pre-image attack into a pre-image attack for the hash function. So how you do it? Well, of course, use the well-known meeting the middle attack. So, basically, you're using two compression function calls. This is your IV. This is the target hash that you want to find pre-image for. So first you create a set of 2 to the 96 pre-images using the pseudo pre-image attack. You store this set in the hash table. And then from the IV you shoot with 2 to the 160 images for the first compression function because the middle space is only 256 bits wide. You can hope to find, on average, one collision. Okay, so this is a typical meeting the middle attack. So why is it unbalanced? Because as you have seen, the cost of this function, it's 2 to the 64 whereas the cost of this function is only one. So that's why we call it unbalanced meeting the middle problem. Okay, that's the only thing. So this function costs more than this function. Everything else is pretty much the standard. Now, Diffie and Helma introduced the meeting the middle problem some 40 years ago and they called it meeting the middle. They introduced the problem for finding the keys of double desks. So they called it meeting the middle because they are indeed the two encryptions with two independent keys for meeting in the middle. However, today when we talk about meeting the middle, we have in mind some more general meaning. For example, the latest attack on AES that they made to yourself joke, meeting the middle attacks, have nothing to do with meeting in the middle of the cipher. It's not that you have, I don't know, 8 round AES and then after four rounds these two encryptions are meeting. No. There you have an offline phase where you create some pre-computation table and then in the online phase you shoot into this table. Okay, so it's not meeting in the middle. So basically today when we say, when we talk about meeting the middle, what we mean is nothing more than a collision surge between two functions. Okay, whenever you hear the word meet in the middle, forget about meeting in the middle of the cryptographic primitive, but think about collision surge between two functions and that's how we're going to regard this problem here in this talk and that's how you should see the problem always. So we are trying to find a collision between two functions f and g. So we're trying to find two values x and y such that f of x equals g of y. That's all. Now we can differentiate several types of collisions between f and g. The most common types when f and g have a range larger than domain, that's the first type and the second time when the range is not larger than domain and in this talk I'm going to focus on the second case and further we can simplify it even further and then we can assume that we are dealing only with n-bit functions f and g and we are trying to find a collision between these two functions but we are looking for unbalanced collisions. So we introduce this value of r, so we assume that one of the functions, let's say g, is r times more expensive than f. So in the previous example of shot 2, g was 2 to the 64 times more expensive than f. Okay, so r equals to 64 divided by 1 equals to the 64. Okay, so now let's take a look at the state of the art, what are the best known algorithms that solve the unbalanced collision search problem? Well, when r equals 1, actually this is the balanced collisions, of course we use the Floyd cycle finding algorithm and we can find collision between two functions in time 2 to the n divided by 2 and this algorithm doesn't require any memory. When r equals 1, as we saw in the case of shot 2, we use the myth in the middle, so basically we store square root of m divided by r images of g, we store in a hash table and then we produce around square root of r times n images of f to shoot in the hash table and because the myth in the middle space is n, when you multiply the sizes of these two sets you're going to get n so on average you can assume you got one collision. So you can see why we disbalance here the number of calls and number of calls because g costs r times more, so even though we produce this many images of g because it costs r, so the total time complexity for producing the first set is square root of r n same as in the second case. So this myth in the middle requires time of square root of r n and memory square root of n divided by r. In general case, I'm quite sure that all of you know that the myth in the middle follows the famous trade of time times memory equals n but keep in mind the time cannot be less than square root of r and you can easily show this. So let's try to remember the minimal optimal time is square root of r times n. Okay so we have this excuse me nice algorithm that can solve that has been used for many many years. Why do we have to come up with a new algorithm? It works perfectly fine. Well it's a bit strange. So in the case when so here on this chart I have the dependency how much memory do you need depending on the ratio of the cost of the two functions if you want to find the collision in optimal time. So this is a log of r this is log of m so when log of r equals zero in other words when r equals one in other words the balance case so when the two functions are balanced we don't need any memory to find the collision. And now look at this we increase r slightly a little bit we increase and then the required amount of memory jumps to square root of n. So even though we increase slightly the memory jump to square root of n and as you can see it's kind of counterintuitive the larger the ratio between the two functions the less memory we need. So imagine one of the functions is four times more expensive only four times more expensive then in order to produce the collisions in optimal time we need square root of n memory. Okay so because of this kind of discrepancy we are trying to find we are trying to come up with a new algorithm. Okay so let's take a look at the new algorithm. So the new algorithm combines two ideas the first is unbalanced interleaving or unbalanced selection function and the second one is one or should vener parallel collision search. So let's first take a look at the first idea. So as I mentioned before in the balance case how we find collisions between two balances of course we use the Floyd cycle finding algorithm in other words we define a new function h of x with depending on some selection function that's either 0 or 1 so h of x either equals f of x or g of x depending on this function that outputs 0 and 1 with equal probability of one half. Okay and then we run we run Floyd for this function h of x we find a collision and after a final collision we check if it's a collision between ff or fg or gf or gg so in half of the cases is the good collision that we are looking for. So if it's a bad collision we have to repeat the search on average we have to repeat the search only twice. So put this into perspective this is your function f this is the row this is the cycle finding algorithm so as you can see in below it's either g or f randomly and then once we find a collision which is here let's say so it between g and g so this is not good obviously we have to repeat again and after repeat we found a collision between f and g good we can use this as a collision. So this is the balance case now unbalance interleaving happens exactly everything is the same just the selection function output 0 r times small then it outputs 1 meaning this function h of x equals f of x r times small often and now when you produce a collision for h of x it's a collision between f and f with much higher probability if you want to be a collision between the probability it's collision between f and g it's only one divided by ha and that's why you have to repeat the search r times okay so again we put this into perspective you can see now excuse me you have much more invocation of function f than g so we run Floyd cycle finding algorithm we find a collision but of course it's between f and f we run again it's again f and f and after repeating this around r times we're going to find a collision between f and g okay so we have this unbalanced collision why don't we use this as a as a new algorithm for for the search of unbalanced collisions but the reason is so you can see the complexity of this algorithm is one cycle finding cost square root of n and we have to repeat this r times so the complexity is r times square root of n but optimal time complexity was square root of rm so we kind of add factor of square root of r to the time complex so this is not optimal we need some more advanced ideas and the second idea and this is the more advanced idea is instead of finding collisions with the Floyd cycle finding algorithm we're going to use van Orsch of Wiener algorithm to find a collision so this is very famous in very frequently used algorithm whenever you want to find multiple collisions if you're looking for a single you always run Floyd if you want multiply you always switch to van Orsch Wiener parallel collision search or as I like to call it multiple collision search so however unlike Floyd algorithm this algorithm requires memory so let's take a look how it works it works in two phases in the first phase you build a hash table so basically you start with two to the m random points and you iteratively so assume we are trying to find a collision for some function f so you start with two to the m random points and you iteratively evaluate the function f on those points you build for each of the points you build a chain that's sorry build a chain of length two to the n minus m divided by two and then you store in a hash table only the n only the ending and the beginning points of each chain okay so we create the hash table and then when we are trying to find a collision we take a random value and start evaluating again building this iterative chain and every time we extend by one value we check if that the newly constructed value coincides with one of the stored points of the chains so if it does coincide obviously it collided this chain somewhere with this chain and because this value we also have in the table we can backtrack and find where collision occurred and that's we can find the collision for for f so as you can see during the construction of the table it has two to the n points each the length of each chain was two to the n minus m divided by two if you multiply this thing you can see that when we were constructing the table we pass this many values and if the length of this chain and because in total they are two to the n values if the length of this chain is two to the n divided by how many points here we get if the length of the chain is this much then on average we're going to find a one collision so the algorithm works nicely so the total time complexity to build the hash table and define collision so to build the hash table is to the n plus m divided by two and requires to the memory and to find s collision is s time we have to find a collision search algorithm for one we pay to the n minus m divided by two and now we can present our new algorithm for unbalanced collision search excuse me or unbalanced meet in the middle so we define the function h is unbalanced interleaving between f and g and this sigma outputs r times more zero than one so basically h of x equals f of x r times more frequently than g of x and then for this function h of x we construct hash table according to the one or should winner algorithm with this many entries this much memory and then we find a collision for h of x if it's collision for f and g excellent if it's not we have to repeat the collision search algorithm so basically because h of x equals f of x r times more often we have to find on average our collisions in order to to find the in order to solve the unbalanced collision search problem so again to put this into perspective this is the the hash table so again as you can see there are many more evaluations of f than g now the whole trick why this thing works see here even though i have many more f's than g but keep in mind the cost of g is r times more than the cost of f f so basically how much i paid here i paid the same time complexity for f and for g that's the whole trick why the whole kind of algorithm works because of the construction of the hash table and then of course i build a i find a collision i check no it's a collision between f and f i find another collision no between f and f and after repeating these times i'm going to get a collision between f and g and solve the unbalanced collision search problem so the total memory complexity to the m this much we pay for the construction of the table this much to find our collisions when this number is large in this number in other words when m does not exceed r then we end up with a very nice trade off that says t square m equals r square m okay so in order to solve the unbalanced collision search problem you can use this algorithm which follows this trade off curve it's a bit familiar so this algorithm in comparison to the standard meeting the middle algorithm has a better time for certain values f m better memory for certain values of m and what's most important or one of the most important things is the unbalanced collision search but one of the functions is r times more expensive than the other function can be solved in the optimal time t equals square root of r m with not more than our memory okay so this is the Floyd for the balance case this is the old meet in the middle the standard meet in the middle and this is our new algorithm mirror when we are trying to find the optimal uh when we are trying to find the collision in optimal time okay so now actually the amount of memory is not two to the n divided by two around when the the ratio of cost is close to one but actually the smaller the ratio the less memory we need so obviously here from zero to one third our new algorithm outperforms the standard meeting the middle algorithm in terms in terms of memory however you should you should also know that sometimes some of the problems the ratio are of the cost of the two functions depends on the on the memory so you cannot just simply use the the trade off that I presented so you you have to come up with a new trade off and also the new algorithm will not work if one of the functions is given as sets okay because in order to work we have to evaluate both f and g on certain random points and because we need to evaluate on on random points we cannot use our algorithm in a known plain text attack it's always chosen plain text also in some papers I've seen some of the people say we actually do not care about memory complex we're just interested in time complexity that's the case keep in mind both the standard and the new algorithm they achieve the minimal time complexity it cannot go any lower so if you don't care about the memory you can just ignore the new algorithm if you do care you should probably take a look but keep in mind the algorithms that that that use infeasible amount of memory they are completely entirely useless in practice so you should you should care about memory so the new algorithm may replace the standard depending on the ratio are as we saw from the from the chart and in our paper we have certain number of applications to collect you can take a look there we of course didn't go through all the papers that use standard meeting the middle algorithm and try to replace with our because it's kind of trivial to do so what's more interesting actually certain balance collision search problems can be reduced on balance and then solve using our algorithm for example when you're trying to find collisions between two functions but for one of the functions you have to pay data complexity so each evaluation of the function means one one one query to the cypher so if you want to remove reduce the number of queries you won't kind of to switch to the unbalanced case and then you can use our algorithm or if one of the functions has a reduced domain size again this is reducible to unbalanced collision such and to conclude you'd consider a new algorithm whenever you're dealing with unbalanced meeting the middle problems the rule of the time is the ratio of cost of the two functions less than totally undivided by three then probably you should use our algorithm than the standard meeting the middle algorithm in some of the open problems you can try to find more tricky cases use cases of the new algorithm and maybe try a new algorithm for the case for the known plaintext case so basically when one of the functions is giving us a set it will be really nice and that's all thank you any question and comment small comment about the case in which the function g is a function not a permutation not uniquely invertible if you are trying to build the three backwards actually it has only a linear number of possible values so if I go back for t steps the expected size of the number of predecessors which are t away from the final point is only order t because even if I try to go back and I got some of them two or three possible values some of them might not have any predecessors therefore some branches die out and this is a well-known combinatorial property that actually the number of predecessors if you go a very long way backwards is very limited yes so you might want to use it in your attack even when you have a set okay comment on the question I have one question yes in your algorithm you consider two functions f and g about the more function three and the function for the problem but three is terrible actually we cannot do anything for three four we can do something you know because four is kind of two plus two three is one of the most interesting problems I think we have and everybody in computer science has but no I don't even know for case for probably something can be done I cannot say it right now