 So I'm glad to tell you that the problem had been solved with a real experiment recently, so we now understand the time complexity of the problem. So a guy in France actually constructed the haystack and put a needle into it, and it took him exactly 48 hours to find the needle. So the only problem with this is that he didn't provide the asymptotics. It's one data point and it's not clear how the complexity will grow with the size of the haystack or with the size of the needle. So there is still something to do in this problem, and the problem is actually very, very general, and almost anything we do in cryptography can be viewed as some kind of search for needles in exponentially large haystacks. Just to give you a few examples, whenever we are looking for encryption keys, there is a needle sitting somewhere at an unknown location in a huge haystack of possible keys. When we look for hash collisions, when we look for chosen plain text or cypher text, which are useful in order to attack a cryptosystem, when we look for S-boxes with good differential or linear properties, searching for outputs of stream cyphers which are biased and determining whether there are or there aren't any such biases, and so on and so on. Now, such searching for rare needles in huge haystacks is useful both when you are designing and when you are breaking cryptosystems. So when you design a cryptosystem, you try to show that our cryptosystem behaves S-randomly and S-uniformly as possible and doesn't have any kind of unusual probabilistic behaviors, and on the other hand, when you are trying to break cryptosystems, you are trying to find deviations from randomness and then to exploit them. So the kind of search problem that we will consider in this talk is the following. A haystack is an almost uniform probability distribution because I assume that the cryptosystem is actually quite good and it behaves reasonably well. The stream cypher has outputs which are close to uniform and I'm looking for one unusual event and it's unusual in the sense that it has higher than usual probability. So here is a picture that you should keep in mind. These are all the possible events and this is the probability and most of them have a probability of about 1 over n, n is the number of possible events, and sometimes it's 0, sometimes it's 1 over n, sometimes 2 over n, but there is one needle which is a probability peak and you have to find it in the most efficient way in terms of time and memory. So you'll all agree that this is a very, very general approach and you'll probably assume that it had been studied to death and we have the best possible algorithms for looking for such needles in probability distributions, but we were actually surprised that after we found the first few algorithms which are non-trivial, it became a very rich and varied kind of research area and as I'll say in the conclusion, we are not sure at all that we have actually found the best possible algorithms. Okay, so introducing our notation, as I said before, there are big n, which is 2 to the small n possible events, exponentially many events, we denote them 1 to n. One particular event which I denote by y0 happens with a larger than usual probability p. All the other events have probability of approximately 1 over n and throughout the talk, I'm going to assume that the probability is n to the minus c for some constant between 0 and 1. So it could be square root n, cube root n, tenth root of n, whatever and these are the kind of probabilities that I'll be interested in in the algorithm. So p is somewhere between 1 over n and 1. Now most people thinking about this from the point of view of probability theory or statistics would consider the following model. So you have a black box which is the distribution generator and you have a large red button at the top and every time you press the button out comes one sample out of this distribution and it comes with the probability which might be slightly non-uniform so sometimes you get y1 and y2 and then y0 and then y0 again because it's more likely, et cetera, et cetera. So in this model indeed there is very little you can do. You can slightly improve over the obvious algorithms but not by much. So this is not the model that I'll be talking about in a moment but just to tell you what is the baseline, what are the algorithms that we are competing against. Let's just have a quick look at this model. So we are given a black box which can sample the distribution with a big red button. We want to minimize the asymptotic number of samples ignoring constants and lower their terms and I assume that you know or you guess or you are interested only in certain value of p. And the goal is to find the y0. So the simplest possible algorithm is to use a large array with n counters so you are starting to press the button and every time you see a value 5 you add one to the counter at location number 5. So you basically get an experimental description of your distribution and what happens is that you expect to see the first occurrence of y0 after about 1 over p, process of the button and if you are going to do 10 over p you expect the entry for y0 to have about 10 value, account of 10 while all the others are likely to be almost 1 of them are 0 because p is much larger than 1 over n. Occasionally you have 1 and rarely you have 2 due to some birthday paradoxes. So this is the kind of experimental distribution you are going to see and the highest value is going to be the y0, very easy but the problem is that it needs huge amount of memory. So overall complexity is that the time is some constant over p and the constant determines what is the expected value here which should be much higher than the rest but it needs a memory which is order n. So we can do better quite easily and basically we can reduce the memory by sampling the distribution order 1 over p times and with good probability y0 will be among the values that I sample in my first round. So I'm just sampling it 1 over p and I use the values I sampled as the names of the bins that I'm now going to count in. So I'm not giving a bin to every possible value only to the values which have been suggested in the first 1 over p samples. So with good probability y0 will be among those bins I'll count into it and see that the value is increasing rapidly so I don't need most of the memory which is going to contain zeros anyway. So overall complexity is time which is order 1 over p memory which is 1 over p so if I'm thinking about n for example which is 2 to the 100 and p is 2 to the minus 50 I need time of 2 to the 50 and memory of 2 to the 50. The time is okay, memory is too much so I'm trying to reduce memory. Okay, so what can I do? I can loop, I can get completely memoryless algorithm by looping over 1 over p possible event. So I sample the distribution get a value y and now I'm sampling multiple times about 1 over p times the distribution only counting how many times I saw that particular y. And even though y0 had occurred many times I'm not aware of it because I'm memoryless and I'm not counting on other values other than the one I'm now concentrating on. So you are trying the first one and you didn't see unusually high distribution. You try the next one after 1 over p choices of values to count on you are likely to hit y0 and then the counter for this y0 will go up dramatically and you are going to choose it. So basically you can get running time which is 1 over p square and the memory is order 1 and what I've done is simply moved from lowering the memory and increasing the time and there is a general time memory trade of which you can easily get the time times memory is 1 over p square any point in between. Okay, so this is the in this sampling model this is the best known algorithm for finding needles in haystack but I'm going to show you that you can do much better. So in real cryptography and cryptanalysis we are usually interested in a deterministic function which has a random truly random input. So instead of having the box with the red button we are going to have a deterministic function f to which you apply an input x which is absolutely random and you are going to get y. So if I'm trying x equals 0 I'm going to get a particular value y. Think about the outputs of a pseudo random sequence or you are looking at the properties of an s box differential property. So if I'm giving always the same input to the s box I'm getting the same output it's a deterministic function. The only randomness is coming from the x at the input that I chose a particular input random input to the s box but I chose a particular key to the stream cipher. So it looks as if it makes no difference you know, it's just slightly different way how to generate random outputs with a possible deviation from randomness but makes this is going to make huge difference in the complexity of the algorithm as I'll show you in the next few minutes. So I'm thinking about this box as being basically a transformation from a uniform distribution to a possibly non-uniform distribution but it is the transformation itself is deterministic. Okay. Now here is a way to think about it. I have all the possible access that I could input into my deterministic box and each one is choosing deterministically a particular output value. So here are the n possible inputs let's assume that there are n possible outputs and this one is never chosen by any one of the inputs. This one is chosen by only one out of the 10 so the probability that I'll get this is 0.1 and here is the unusually large needle because this is chosen by four different inputs. So I'm not transforming the problem of finding the needle into the problem of finding a node in the output layer which has an unusually large indegree. Okay. Now this is going to give us the following memoryless algorithm. Consider the graph G which is created by iterating F. So I have a deterministic function F which behaves randomly with the possible exception of Y0 and here I drew just such a random graph and you can go over you start from a random point and you all know that you go deterministically over a sequence of edges but because it's a finite graph eventually you must close a cycle and this is called a raw structure and it had been studied by many people in order to factorize numbers and Floyd had his famous algorithms, et cetera. So the raw algorithm for finding the cycle and especially the entry point into the cycle can be done with constant memory basically two fingers one of which is running at single speed and one of which is running at double speed and they are chasing each other and they expected running time if the total length here was n the total I mean if the size of the graph was n then the length of this path until you will get a birthday and therefore repeat yourself is going to be square root n. So the algorithm is exploring only a tiny part of the graph. Now let's look again at this random graph but with a slight difference. I know that most points in the graph behave randomly but there is one point which has an unusually large indegree. So I'm going to think about a process in which I'll choose a particular value let's say 8 and now I'll start making it more and more popular by picking another random value like 93 cutting the edge going to 68 and instead making it 0.28 and I'll make more and more random points point toward that particular Y0 that I chose. So this is the model and now the question is does the graph change its properties in any measurable way as more and more edges are concentrating over 8. So here is an example where 8 is getting more and more edges pointing at it and there is a sudden phase transition which is when the probability P of Y0 exceeds 1 over square root n we claim that Y0 is suddenly likely to jump into the cycle and in fact to become the cycle's entry point which is very easy to solve with the raw algorithm in square root time. So why is that? Let me give you the intuition in order to find large needles when the probability is larger than n over minus 2 0.5 Y0 is likely to occur several times along the path of length square root n which is normally generated by all the regular points not the hyper-ability points. So I start from a random point I follow a path as long as I didn't hit Y0 I'm extending it until it becomes about square root n but within square root n distance you are likely to have several occurrences of Y0 so the second occurrence of Y0 is going to close the raw structure and therefore Y0 will be the entry point into the cycle. Okay, so this gives us an algorithm for finding Y0 in the memoryless algorithm with optimal time you couldn't do any better of course and compared to the previous algorithm which is order 1 over P square so you could save something like square root n in the running time which for n equals 2 to the 100 could be a huge saving factor. So how about finding smaller needles? So if for example n was 2 to the 100 and P is 2 to the minus 60 the algorithm will not succeed because you are not bigger than n to the minus 50. So the intuitive approach is to repeat the raw algorithm for many different random starting point but there's a problem. If Y0 with its relatively low probability did not sit on the cycle it doesn't matter how many starting points you are going to choose it will never be the random entry point into the cycle. Now the graph is fixed and repeating the algorithm from different starting points will not help you at all. So instead we are going to use multiple flavors of F. So I'm going to define the ith flavor of F as a new function which is take F and change its input by adding i to it. So if you are looking at the local properties they are all preserved a popular value will remain popular because it only changes the names of the vertices that point towards it but not how many there are. But if you think about the global structure if you are applying F of F of F or you are applying F add 1 apply F add 1 apply F add 1 you are going to go in totally different directions the structure of the graph will be completely different. So because I'm going to get a new graph but Y0 will remain as popular as ever I'm now going to be able to try another time and with some probability it will be in the cycle entry point. But this is not the best algorithm we have it gives you a slight improvement over the previous algorithm just repeating it for many flavors but you can do much better. And here is the algorithm. It turns out if you analyze it a bit that the probability of Y0 would be selected as the cycle collision point for a particular flavor is a new P prime which is P square times N and this is compared to the original probability which was just P but because P is larger than 1 over N actually this probability is always larger than the probability you started with. So here is the picture. You started the problem with a uniform almost uniform probability distribution with one needle whose height was P and now if you are asking what is the probability that your Y0 will be chosen as the entry point that Y0 will be chosen with a higher probability as the entry point but all the other values will remain uniformly distributed. So I've enhanced the probability of Y0 being chosen by running the raw algorithm. So in particular it could be that if this is my N to the minus 0.5 threshold where the raw algorithm is likely to succeed I was too low to begin with but after the amplification of probability now I'm above the probability so I can apply the same algorithm recursively to the new problem which enhances the height of the needle. So this is an analysis but the new two raw algorithm is going to have the following version. It is running multiple flavors multiple versions of the raw algorithm and it runs them sequentially and deterministically, very important and it uses the cycle entry point I found in the previous step in order to deterministically determine what is the new flavor and the new starting point for the next version of the raw algorithm. And it looks for the entry point into the cycle which consists of cycle entry points. So all of this is best described by this picture. So here I started from a random point. I applied the first flavor 0 of my function I did a raw and I found this as the cycle entry point. I take the identity of that value which is not likely to be y0 and I use it in order to determine deterministically by hashing it a new start point and the new flavor which is a blue flavor rather than the black flavor. I run the raw algorithm I get its cycle entry point again it's not likely to be y0 I use the identity in order to choose a new starting point and now the red flavor and I keep jumping but I also have a higher order two finger algorithm which tries to follow everything and determine when did you repeat the cycle entry point among the cycle entry points. So you're doing local two finger algorithm and the global two finger algorithm and you repeated the same entry point here you know you are going this direction and here you got exactly the same entry point a second time even though the raw itself was different you can find it by Floyd's algorithm and the claim is that now this is going to be your y0 so we call this the two raw algorithm and I could go further and I could do a three iteration and a four iteration every time misting a raw algorithm inside the raw algorithm and each one of them is going to give me the best known algorithm within a certain range of parameters and this summarizes the whole picture so if the probability was larger than n to the minus 5 then these are the time complexities that I'm going to get and this is true for a certain range of times if it was between n to the minus 0.75 and n to the minus 0.5 I get another complexity and there are many cases in some cases you want to use two rows some cases three rows some cases four rows and this is the graph the previously best algorithm was this one and the new algorithm this is in log log log of p versus log of the running time and you see that in a large range for example between n to the minus 0.5 and n to the minus 0.75 I'm going to get an improvement which is square root n n equals 2 to the 100 means that I'm 2 to the 50 times better than the obvious algorithm okay, so in the full paper I'm also analyzing what happens when you are allowing not memory less but you are using some memory and this gives me another graph which is even stranger because it bends in and out and complicated so to conclude there are some open problems is our new algorithm optimal prove or improve prove, disprove or improve my gut feeling is that for the memory less algorithm I don't see how you could do any better but I'm not sure at all that our way of using memory is optimal and you could have those crazy graphs going in and out getting better it's a fascinating area to look at find additional applications for the new algorithm we applied it directly in order to try to find the best differential probabilities of S boxes you could use it to find the deviations from randomness in stream ciphers and so on and analyze other types of needles in haystack search problems I only analyze the case in which the needle is defined by being higher than usual probability if it was lower than usual probability my algorithm will not be applicable at all and you could think about other definitions of needles thank you very much