 Okay, ladies and gentlemen, so welcome to the first out of two invited talks. So the first invited talk will be given by Antoine Jou, and I will just give a few words on Antoine and his work. So Antoine got his education in Ecole Polytechnique in class of 86, and then he has been a member of the crypto team by Jacques Stern at ENS in Paris. And now he's a chief engineer at the DGA, and also a professor at the University of Versailles and Quentin. And Antoine has been really broad in his research, working both on asymmetric and symmetric cryptography. So when I asked around a little bit to get his main contribution, I got a whole list of different things. So I can just mention a few of them. So on symmetric cryptography, Antoine's work on hash functions is very well known, and the collision on SHA-0 and the introduction of multi-collisions. And he has also been a contributor in correlation attacks on stream ciphers and also inventor of ARMAC. And in asymmetric cryptography, he has the tripe-partite Diffie-Hellman protocol. He has worked on cryptanalysis of HFE and use of pairings in cryptography, major contributions. And of course then a lot of work on discrete log and factoring. So for example this morning, Antoine was the co-author of the paper receiving the best paper award. So at last he has also service to the community, and he has been director at the ICR director. So let me welcome him here by giving him a big applaud. So Thomas, thank you very much for the introduction. So first of all I would like to thank all the organizers for kindly inviting me to give this talk here. So as you know when you received an invitation to give an invited talk, the first thing is, oh wait, am I this old now? But okay, after some time you just think about it and say, okay, of course I'm going to accept it, it's a great honor. So what can I speak about? I'm not going to give the usual talk that I give when I have a new algorithmic idea or whatever, I need to speak about something else. So the idea was I'm going to speak about high performance computing in cryptanalysis and try to give something, well, a tutorial is really too much. Just give my view on what is happening when you want to do big computation in cryptanalysis. And so this is what it is going to be about. But you're probably going to ask me, why do you want to speak about this? What is the point? Well, there is one very easy to explain motivation, it's the historical link. You all know that there has been a long standing relationship between computation and crypto and the place where we are and the proximity of Bletchley Park is clearly there to emphasize this link. So all of you will be going to the Bletchley Park and visit either, well, probably both, the Museum of Computing and the Museum of Crypto and Cryptanalysis. So you have all the link you want from an historical point of view. But of course I am not an historian, so I don't know anything about this. So this cannot be the reason I am going to give this talk. So the other reason is doing big computation in cryptanalysis is really kind of a background activity that you do in support of research and well, at least that I do in support of research and which is very important to give new idea, to put new idea in a concrete form and moreover it's something we never speak about. You know it's, as in the fairy tales, you never know what happens after the happy ending. Everything is fine, the algorithm is magic and it's going to do everything. Okay, you may get a glimpse of the prince at the end of his life and I am going to tell you we have done such a big computation just as Vanessa did during the first talk. We have done a big computation, we give you a little details but that's it. But what happened while doing the computation? So well, it's mundane, it's boring, it's whatever, so nobody wants to hear about this. But in fact it's not that boring. It's even fun, well, sometimes frustrating because it often doesn't work but still it can be really fun. So that's my motivation for speaking about this. So next you can tell me, okay, but large computation are not done only in cryptanalysis so why speak about this very specific kind of computation? Well, obviously it's because the kind I do but there are some really specific things that are quite important when you do computation in cryptanalysis and differs from other computation. The first thing is we are really simple minded and we want to do either a demonstration that some algorithm is working or we just want to break a record. So we don't want computations that you can run every day to do basic thing. We just want to do it once and be done with it. So this has a nice consequence, we don't need to reuse our code so we can program in whatever way we want. So by the way, if someone here wants to do nice programming in a company just close your ears and don't listen to me because I will give bad ideas. And another thing is since we are doing record breaking we need to find computing power you all know the saying about given horses so you just use whatever is available. So sometime you can run computation on strange things which makes things even funnier. And the final point which is really a crucial thing is that when you do computation in crypto usually the end result is really easy to check. If I have computed large discrete logs I can just give you the discrete logs and you are going to check them very easily. And this is not the case for many other kind of computation. For example in physics if I am doing weird stuff to compute the behavior of planets or whatever the only way you can check if what I am doing is correct is by redoing the computation or by having a real world model but you can just look at the end data and say it's okay. In our case this is a very important thing. So one of the consequences that we can do whatever we want as long as the end result is correct everything is fine. That's the positive way of saying it. The negative way of saying it is if something fails then at the end well you have nothing to show. Okay so when you are doing this kind of big computation there are a few main steps. So the first main step is to have some algorithmic starting point so you need to have something you want to do a computation on. And usually you just validate this by some kind of toy implementation in a high level language in magma if you are doing number theory or whatever. Okay the second point this is rather political stuff you need to go away and beg for computing power or whatever just find computing power and once you have it you need to choose a target computation which is compatible with it. Well then the easy thing you just need to program the stuff and after that you need to run the computation and this is you know just the boring thing nobody cares about but in fact well it's not that easy and it's really something we need to work in and one of the point of high performance computing is enabling people to be able to manage large computation without too much hassle. Okay so what kind of starting point can you start from? Well basically anything in crypto. So I have written a personal sample of the kind of thing you can use but you just can add your favorites to it so you can of course do things with latest reduction I did this kind of thing a long time ago during my PhD you can have fun with collisions and multi-collision you can use elliptic curves, pairings and even more complicated stuff you can do index calculus of course you can have fun with new kind of decomposition algorithm that you can use for knapsacks, cords or whatever and you can do some Grubner bases well there are hundreds of things you can do this is just a short sample of things which are not trivially easy to compute which can make big fun when doing large computation. Okay but even if you have a nice starting point sometimes they are not suited to make big computation why? Because sometimes just by having a toy implementation you already get something which is very nice and doesn't need much more to become something interesting. So one example is you know pairings as Thomas said one of the things I have been doing in the past was proposing this tripartite Diffielman stuff how did this come around? Well at that time it was back in 1990 well 1999 sorry and in 99 at Eurocrypt there was a paper comparing the reduction of Menezes-Ocamoto-Vernstern on Fryrock so comparing two kinds of pairings for cryptanalysis of discrete logon elliptic curve and well what happened when I read this paper I said okay wait a minute I have a toy implementation on my computer and it's much faster than what the authors are reporting so what's the thing? What should I do with this? It was at that point that I realized if it's that much faster it was you know a toy implementation in Paris-GP which is some program you probably used and it took a few seconds to compute a pairing so well if it's that much faster then probably we can use it constructively and well you all know the story so you just have no big computation to do another thing is a recent paper with Sorinayonika which is called pairings of Volcano and they are doing some crazy computation with Isogeny Volcanoes okay there are some very nice interesting algorithmic techniques proposed by Sorinayonika the implementation are just basic magma code and they are way enough to do kind of records on better things that what we could do previously so there is no need for more than a toy implementation so okay we fail we have nothing to do so if we don't fail at this early step then we need to find computing powers there are many ways to do this the first thing is you know the old-fashioned technique we all use this in the past well I did in it is just well look around find machines that you have access to and just use them so the nice thing is that this is easy to arrange especially if you can buy a few extra machines usually you can control the kind of machine you buy you know that the resources are here and you can use them the problem is well it's not easy to square because even if you have money to buy 10 computers you can buy 100 if you have money to buy 100 you don't have money to buy 1000 it stops quite soon the next thing which has been proposed for factoring is you just use all these idle cycles on the machine around on the internet you know all of you have computers which are doing nothing so I just beg computing power from you and try to use it well I must admit that I have never done this because there are really huge requirements which make the thing too much of an asset one of the requirement is that you must be very user friendly which is not easy you really need to program nice stuff have nice screensaver or whatever ok I have no time to do this some people can do it but I can't the next thing is if you go for this then you are doing your computation in an adversary model you never know some of the computer might send you just crap data to corrupt your computation and this is not easy to deal with either and finally one of the very important problem is the limited communication bandwidth which is a real problem when you want to do huge computation especially for linear algebra things so I have never done this another thing which is very nice which is occurring these days is you just go to one of the large computing centers there are several around Europe and you say I would like power and if you are lucky enough they give you some power and this is very nice because you have a very high and dedicated computer ok you don't control the architecture you have to fit within it but it's still ok the problem is the job management is not always easy so running your computation in this machine is not necessarily easy and if you don't have the luck to be able to apply and get things from here you can also go for high performance computing in the cloud and just buy your computing power from whichever provider you want ok personally I prefer to try the free option because in France trying to use the money to buy this kind of computing power is an administrative nightmare so don't want to do this ok so now assuming you have your computing power what you want to do is you choose a target so this is easy your target must be corresponding to the computing power you have it can be either a proof of concept sometimes it's enough it can be a real size demonstration and the best case is you can attack cryptographic size parameter or make a new recall you can't always do that and when you choose your target you should be reasonably sure that you are not going to work for 4 or 6 months and get nothing at the end so be reasonable ok so when is it enough to do a proof of concept well I have a few examples one of my old things about shazero computation we just did a 35 round a collision and a 35 round of shazero it was a few minutes of computation so it was just a demo but well it was the only point in the attack where we could have a full collision and at that time stupidly we have no clues that near collision could be interesting sorry the next thing you can do for proof of concept is what we have been doing for all the new algorithm for hard knapsacks why we only need a proof of concept here is because all these new algorithms are previously parallelizable there is some value somewhere that you just need to loop on so you can really benchmark the thing by just choosing the correct value if you know the solution you know the correct value and you benchmark the algorithm very easily without having to run for a very long time so proof of concept is enough and it's the same thing for linear decoding that you will see in a few days ok a more interesting case is when you do something which was not previously possible but we using a very moderate amount of computing power so for example we did something like this with Ramboulon a long time ago and we broke some knapsack based hash function proposed by Damgard which was very interesting but which happened to be very easy to break by a lattice reduction using only a short computation similarly with Elianzhorn we proposed a new crypto analysis of an old system called PKP sorry ok it was the full run would have been very long but once again knowing the correct value you could demonstrate the thing by a reasonable computation and the main thing here was that we were able to reduce the memory ok another thing is what Thomas speak about is correlation attack on LFSRs where well we could do a real life well a 40 bit LFSR example a fast correlation attack on 40 bit LFSR which is something which is not that uninteresting because some crypto system around use several of those and combine them together and in this case if you have some correlation and you can improve the correlation attack you attack the LFSRs the individual LFSRs one by one but it was only a few CPU days computation so ok it's just a medium easy case a similar thing is what we did with Vanessa much more recently when we considered the discrete logarithm problem of extension field of degree 5 not 6 and at that time the full computation was totally out of range but what we could do was demonstrate how partial sieving was going well partial relation construction and by using an adapted version of Robbner basis computation we were able to do the thing using a small amount of computing power but just a demo case ok so once you have eliminated all these options you know now that you are going to aim at some record or some very large computation so you know the computing power you know your target you know what you are doing so what you need to do now is code the stuff ok I will just keep this because it's really easy it's really the easy part you spend a few weeks coding and the thing is keep it very simple and stupid avoid all the fancy stuff you know object oriented programming what is that you don't need it just remain at low level ok sieve and assembly is very nice all the rest is just fancy stuff that is going to get in your way later for the same reason avoid library everywhere unless you really if it's some very minor task you need to do once and don't want to rewrite it's ok but that's the only exception avoid adding new and new and new stuff to make the computation impossible to manage don't care about reusability or portability of your code anyway you will throw it away because it's the best thing to do to improve it and just make sure that your code will be easy to change because you will need it optimize but not too much and the main rule is to avoid nasty surprises because lots of nasty surprises are creeping around so that's why avoid library program from scratch and use kind of very conservative and defensive programming I will explain later what I mean by defensive but it's very important ok and then you go to you know the tedious boring step of running the computation ok but you will see that surprise are there the first thing is you have your target in mind but just don't jump to your target because the landing will be hard so what you need to do is scale up slowly to the intended size so you start by some examples that should be easy and then goes up from this expect problems the first problem is that software can easily fail well you know all the easy stuff you have not looked at while preparing the computation all the easy phases well they don't scale at all so you expect to have to reprogram them on the fly as the computation size grows because they are not working anymore ok of course since you have programmed everything in a rush there are hundreds of burgers around just waiting for you but they are rare bugs that only come on you at the worst moment so what you need to do is to make sure that they are not going to make you spend your computation just to fail at the end because this is ridiculous so as far as you can you try to stop somewhere at several positions in your computation and try to check the data by using just independent easy to write code that looks at the thing and say ok is this reasonable or complete crap if it's reasonable or even better if this is guaranteed then you are fine if it's crap it's time to go back and do something else ok then after expecting software problem expect hardware problem the first hardware problem risk is you know electricity is very nice magic but very often it fails well you might say in real life it doesn't fail that often but when doing large computation well it's a real thing if you are doing computing power on shared computers availability of the computing power is a real problem so avoid having too tight a schedule because usually you will not meet it the worst thing is I don't know why but whenever I try to do some very big computation I have nasty surprises you know everything seems fine but when you re-read the data at some point there is something wrong all the numbers are correct but one of the numbers is wrong this cannot be a bug unless you are extremely unlucky there is no way this could be a bug because usually when a single bit goes wrong somewhere it's going to amplify so the most probable thing is it's probably an hardware fault and depending on the machine you can get some bit corrupted in memory at some point or something which is probably more probable is when you write down your result to disk you know you are using complicated protocols to share the disk between machines who knows what happens not me but usually you may have problem so once again check your data because if they are wrong the sequel is not going to be fun ok so just to let you have in mind a kind of scale of what big computation are I have looked around and find a few reference points so reference point from crypto mostly but a few from other places when you look at crypto examples so the largest computation, discrete log computation in finite field I am aware of was done by Klein Jung in 2007 and it was 160 digit and the total computing power was this so it's about 17 CPUs so if you do this on a single core when I write CPU throughout the talk I mean core because now you have this processor with four cores or eight cores or whatever and you never know how to count them so I mean one core so if you just use a single core you are going to run for 17 years so this this is not a small computation but they are larger so the first computation which is finished I am aware of is this RSA 768 which took 1,500 years for the saving and 150 more for the linear algebra which really is big ok and it's so big that I have not written the full list of authors it's a really long list and I think this stuff really need human people to be there and try to run the thing and correct the computer and do everything so it's really a major project just to run this kind of thing the one before the RSA 200 digit was much smaller ok and if you really want to have a huge computation then ask Dan they are still running this easy elliptic curve 130 bits binary field by using a generic algorithm so this is something which is feasible but at the real range real limit of feasibility and the estimated the estimated power is around 16,000 years on a single core which is really a huge computation ok so some computation from other field some people you know like computing digits of pi the last record is 10 trillion digits ok it took 3 CPU years well it's rather a small computation compared to some of the one above in crypto and just to have some more things I looked at the at the press so press is a european thing where you can apply for computing power so I look at their site and looked at the latest project and the time allotment that were given during the last call for project and the biggest thing they gave was for climate simulation and the total number of the total computing power that has been allocated to this project is 16,000 16,000 years so exactly what would be needed to do the biggest the biggest computation in crypto above so you see big computation in crypto are not this ridiculous compared to big computation at large ok so now let's go for a few examples of computation I have done in the old or recent past just to see kind of things that may happen when you do this so one of the oldest example I want to speak about is a point counting on elliptic curve which we did back in 98 with Renal Lersier and the starting point was Renal PhD thesis from 97 and you probably know that when you want to count points on elliptic curve using the scoff-elkisatkin algorithm what you do is you have two main phase you just compute partial information about the number of points modulo, small numbers and once you get this you need to pass everything together and the classical technique at that time was called the match and sort algorithm and the idea was just do a collision search on the elliptic curve ok and this cost a lot in memory you know the time and memory these kind of algorithm are the same you need to have big lists in memory, sort them and then look for a collision at that point we had some modular data available for large computation and as it was natural Renal had started the match and sort thing and it required one month ok easy just wait the problem is that we had a power after three weeks well the real story is even worse than that you know two weeks after the computation that started we had a call from the people doing the maintenance of the place and Zettel told us ok in one week we are going to shut the power to do some maintenance electrical maintenance ok nice on on Wednesday morning oh problem because the program was not doing any saving of the data to this so we couldn't restart it ok so during this week we we worked hard to find a way to cordon the program in a way that would allow us the computation good on Thursday noon we were ready we knew exactly how we were going to dump the data to restart after the power failure so we went for lunch and when we came back from lunch as the electrician told us we were preparing for tomorrow but somehow the power went out ok so in this case what we do is do I restart the computation or no we went back to the drawing board can we solve the problem in a different way and we could and we found a new algorithm with the same asymptotic complexity where time is concerned but that used a much smaller amount of memory and we could do the computation using 4 CPUs during a single night and during a single night usually you don't have any power failures so it was fine and well the description of the algorithm went in this paper one year later ok and as I said we reduced the memory cost so I told you before that the first shazero stuff I did was just a toy example that we had nothing to do with it but a few years later things had progressed and it became possible to try to attack shazero really and so it was based on an improved version of the analysis and essentially what you had to do was find a differential pass and try to follow it long enough so this really looks like a brute force algorithm so it's what people in high performance computing call embarrassingly parallel it means you just have nothing to do you put all the machine you have you start them at different points and tell them ok just go on and so this is quite easy and it was the first time I did some big computation on ball road power some people had some big machines they wanted to give power to people to try them so ok that's a good idea we can try it and after 8000 cpu hours which is roughly 9 cpu years not that big 3 weeks real time on 160 cpu we got a collision which was published one year later in this paper ok so this was really easy no bad news no power failure the only fun thing is at that point I went to see the machine and the guy who was managing the machine told me ok you know these are cold and these are hot these one are running your computation so it's fun ok so the next more recent example is a triple collision algorithm from 2009 with Stefan Lux so we had a paper about this in azurecrypt 2009 and we wanted really to illustrate that this algorithm was interesting in practice so the idea is if you look at the early literature to find a triple collision in a random function you need a lot of memory and when you need a lot of memory it's not possible to do the things so what we found was a simple way to reduce the amount of memory and with this reduced amount you get a computation with three phase phase one is you just compute many iteration of the function for which you want to find collision you start from random value you iterate until you get some distinguish point and then you stop ok and you do this many times in parallel using many machines so this is easy then you get everything centralized and you sort the value and find triples with the same endpoint and you rerun roughly phase one on these triples to make the sequence converge and if you are lucky enough you get the triple collision ok the problem is we are looking for power and at that time we only found some very strange machines where most of the computing power was on graphic cards you know it was the recent period when people said oh graphic cards are very efficient when you compare their cost to their computing power so let's use them to do computation and we add this ok so it was a strange experience to program on these things thankfully the algorithm was easy enough to make this reasonably simple short and easy and whatever and it was well it was really 8 times faster than doing the same thing on the CPUs of the machine and knowing that the CPUs costed about the same thing as the graphic card it really means that in terms of raw computing power it was really true the graphic cards were much more efficient ok so phase two this was easily done on a single CPU well you need to synchronize three things and it's really more complicated than just doing this stupid phase one so for this reason since it's also less costly it was easier to code it and do it on the CPUs of the machine and after all we added the triple collision on a 64 bit cryptographic function you know to absorb two desks and it was only a 100 CPU days computation so it's only four months well three months something small ok so now my last example which is probably the most interesting one is kind of a sequel of Vanessa Stolk I'm going to give you a bit more details about what's happening when we do this kind of big computation ok well index calculus is an old friend I have been doing this kind of computation since well probably in 98 so discrete logs in GFP discrete logs in GF of 2 to the N discrete logs in GF of P to the N for not so small prime and other less classical stuff so all this is index calculus so it's really a non-landscape it should be very easy well I can promise it's not a routine task even if you have done it many time before ok and you are going to see why now ok so just to know about the magnitude of the computation we had previously you are going to see that they are quite small everything is expressed in CPU days but years so you see it's quite small the biggest one we did was this discrete log in GF of 2 to the 613 which was only 3 CPU years and you also have the kind of architecture we have been using for this stuff so in the early days it was just single single core machines which were very easy to use and then we went to quad-recore machine and then to machine with 16 processors ok and this one was 4 machine with 16 processors so it's ok but it's still small it's something that you can use at home well mostly and so all these records were done with very small computing power so now let's start from the initial view we had for GF of P to the 6 so the theoretical view is very easy, Vanessa just gave it before so what we want to do is first sieve then do linear algebra and then find all the individual logarithms for all the extra value you want to get this is the theory very easy, three lines, nothing to explain well the practice phase one you need to sieve strangely enough after sieving some of the relation on the disk are incorrect nobody knows why so verify the relation by testing whether they really have zero on the elliptic curve it cannot hurt and it's better than having something wrong the linear algebra well this was the theory but in practice linear algebra is more complicated than that we do strict regression elimination then long shows algorithm that's what we had in mind because long shows algorithm is what we have been using previously and then complete the logarithm which is quite fast it's kind of a backward Gaussian elimination to get the extra logarithms on this initial view it's slightly more complicated than the theoretical one but it worked quite well we had everything confirmed with a small computation on 130 bits so a bit more data the sieving was just one hour on 200 CPUs so very easy we got 50 million equation in 2 million variables and after the strict regression elimination we are down to 600,000 equation on variables the linear algebra the long show step was one full day on 128 CPUs but it's still ok and all the rest was easy so the total thing was half a year of computing power ok so very easy simple computation no problem ok so clearly as I told you before we try to scale up slowly so don't go from 130 bit to 160 try to take steps because it can be useful so the first 2 steps we are going to 6 times 23 and 6 times 24 well as I told you the easy steps do not scale at all when we try to do the strict regression elimination for 6 times 23 it was roughly ok but already at 6 times 24 we didn't had enough memory on the machine to do it so we had to do the strict regression elimination on disk which makes things a bit harder and then it became too slow and we had to do it in a multi-threaded way and even after doing this the equation coming out of the strict regression elimination a few of them were wrong so it might be a bug it might be another failure you never know so thankfully in this specific case the equation coming out of the strict regression elimination still makes sense on the elliptic curve so just go back to the elliptic curve and check them if they are correct keep them ok the second problem we have is that long shows is getting slow and the problem is made harder because on the machine we have a time limit on jobs you can't run for more than a day then your job is killed so you need to find an automated process to save and restart without having too much trouble ok so for the 6 times 6 times 23 it was 3 hours and 1000 CPUs something very easy to you don't need to do anything the strict regression elimination not enough memory rewrite to work on this can then multi-thread ok we started from almost 900 million equations in 4 million variables and after the elimination 1 million equations so it was divided by 4 the number of variable which is quite good for regression elimination and finally it was just a few hours and 32 CPUs so it's ok corrupted equation added check and long shows was 3 days good strangely enough completing the data you know it was 10 minutes before well now it's 17 hours well it's somehow related to the strict regression elimination the fact that this one is becoming harder makes this one hard also but still the final phase individual logarithm was a few minutes at this time we use the magma code now it would be 40 seconds oh no 14 seconds so it's ok 1 year so you see alpha year 1 year ok scaling is finally not that bad 6 times 24 ok you see that the real problem is long shows because saving is still very fast less than a day long shows is getting slower well the problem is that there is a non balance here we are using 1000 CPUs and here only 64 so clearly something is wrong and the total CPUs is almost 4 years ok so now the next step which is logical is the one Vanessa presented this morning we go to 6 times 25 and as I said long shows is going very slow we need to to split the job in a nice way it's very complicated completing the logarithm is becoming harder and it's even worse than before the corruption on disk is back so after completing the computation of discrete logarithm something really strange occurred most of the logarithm were correct well almost all to say the truth but a few here and there were incorrect and trying to do backward Gaussian elimination with a few false value doesn't lead to any correct result so we had to add we had to add a correction step to remove the incorrect logs which was a pain ok so as Vanessa said this morning saving was very easy 62 hours well 2 days and a half on 1000 CPUs 14 14 billion equations in 16 million variable ok we went down to 3 million so 5 fold improvement and this took about a day when we say about a day it means that when we do the Gaussian elimination on disk in fact we need to run it several times to adjust some strange crazy parameter until we find the right one and once you have the right one it's just a few hours but you need to find the right parameter by dichotomy search or whatever but the long shows was a full month and this is really a pain ok the total was 12 CPU year so the next logical step the one you probably expected to see here was 6 times 26 ok so the theory was there the view confirmed by 6 times 25 is here so everything is fine we should be able to do it and the long shows should take 4 months it's long but it should be achievable so the saving ok this time we were in a rush so we saved on 8000 processor so it was just a day oh first problem and for some reason it would to do the full saving on the parameterization we had would have taken 27 hours so the saving was cut in the middle well not a problem except that it was also cut in the middle of an equation so to re-read the files we had to patch the program again ok not a real problem but in practice it can be inconvenient so now we have 40 billion equation and in 33 million variables we can do the Gaussian elimination go down to 6 million which is large but feasible and we expect the long shows to be done in 4 months ok just launch it so we started on September the 22nd so of course it should be finished well it was slower than expected in real time because the machine was very busy and we had to wait between the runs so the jobs were just there waiting for computing power to become available again so this is a mess so at the end we expected the computation to end on the 4th of February so we were there waiting for the computation to end so you know you have 1000 round left of the long shows and 100 and 10 and 5 and then why didn't it stop the orthogonalization process did not stop so mathematically this is impossible ok because if you have a matrix of dimension 6 million you can't expect to find more than 6 million orthogonal well mutually orthogonal vectors this is not possible but it still happened so it might be a bug it might be whatever so ok so it failed so how can we process the first option was ok add a sanity check this can be done easily in long shows because when you save your data to disk on long shows one thing you could do is look at the vector you have currently and check whether they are still orthogonal to the very first vector of the computation if they are everything is probably fine and if they are not everything is probably wrong so ok so this could be done but running again for 4 months especially since we only have access to the machine till the end of May ok so the second option is try to improve long shows to get more CPUs but due to the communication even if we can't do this we are not going to gain enough in scale to make something reasonable so option 3 go back to the drawing board ok so go back to the drawing board and the solution is known you know we should use block V demand by Coppersmith and it has already been used several times by Tomei for this computation of this feed log and by Kleinium for this one so each time they had oh something which was about a one month computation the systems were smaller than ours so it's not clear it's not clear we can do it really so ok but still there are three phases the first phase is very nice because instead of having a single run of matrix vector multiplication you do several run in parallel independently from each other so you can use computing power easily then you need to find a linear relation between all this stuff and there is this which explains how to do it with an efficient algorithmics and finally you need to recompute part of what you did in the first phase and derive a solution from it by putting these data inside ok but clearly what we have here is not good enough for our purpose so if we want this to work we need to scale up the approach and get something better ok so we did develop this program and what is happening on the previous example I told you scale up slowly so restart from this example and look at what is going to happen so I recall the long-shots data and ok first phase do ok we scaled up so we decided to do 32 independent matrix vector multiplication so this can be done on 33 hours using 1000 cores so it means that for each computer with 32 cores we are doing a run of matrix vector multiplication and it can be done very quickly next we use Tome's algorithm so this is done on a single computer with 32 cores and it's quite fast in 9 hours we are done we get the relations we need the linear relation we need and then we redo half of the matrix vector multiplication phase and in 15 hours we get the thing so the total CPU time is a bit bigger than what we had before but the real time went down to 28 days to 2 days and an half ok so now the new thing to do the sieving to do the complete computation including sieving is 5 days real time ok and the magnitude of the computation is 14 CPU years so the only thing we need to do is scale up to the next one ok so I emphasize that here the real time is without counting the weights between the 1 day runs ok because we are doing nothing but we are still waiting ok so what is happening the expected time is for the first phase is 125 hours so 5 days we started on March the 28th so it should be finished ok so well the power was back the day after but the machine is very busy these days and the first phase is still running I just checked this morning before the talk the 32 threads only 24 are finished so I can't give any more result for today and that's it so we may have time for one or two questions do we have a microphone probably yes when you talk about CPU time CPU over what year so ok so just to ok that's a very good question when I give my timing in CPU years I give the timing in current CPUs at the time the computation was done so it means that Moore laws is not taken into account at least for the time when the processor power were going up and why did I make this choice is because when you perform the computation it's irrelevant managing the one processor computation is not harder if the processor is going 10 times faster ok so what is happening is doing bigger computation with more cores is still harder so with the new big machines things are becoming slightly easier you can scale up the computation a little but still if you want to do a 1000 year computation it's a major project so this was the point of choosing this strange looking unit further questions ok so then let's thank Antoine again