 So I'll talk today about modern methods of factoring. Well, unfortunately, they are not very modern because the main advances in this direction were in the 80s and beginning of 90s. And just the recent factorization records that we have seen are just application of these methods to increase in computing power that is in hands of researchers, academia, and so on. So I'll try to give some overview without going deep into math, just a bit deep so that if you later would like to learn a bit more, this material may serve like bases. But I will also publish some additional material on Google Drive with some simple explanations of number field sieve and similar methods, which can be helpful if we, in the future, decide to explore this harder perspective of these things in more details. OK, so feel free to ask questions in between. So what's actually the problem of factoring that's formulated? And it is, so suppose we have a big number, capital N, which has n bits, small n. And the problem of factoring is to find its prime decomposition. So find its factorization into prime powers. And this factorization for integers is known to be unique up to the order of vectors, but there cannot be different decompositions of the same one. This is actually the property that factorization is unique as important because it doesn't exist everywhere. And for example, in fields, it's of course not the case because in fields, every element is divisible by anything else. And in fields, every field element can be present and have like multiple infinite number of representations. But it's property of the ring of integers, of the ring of polynomials, and of some other so-called Dedekind domains, which includes some funny algebraic fields, which are basically extension fields where you add some roots of certain polynomials or some square roots of negative numbers and so on. There are some fancy sets where factorization is unique. And this is used to some extent in vectoring methods, but not everywhere. And because of that, many, well, even though, for example, this could logarithm problem is related, but in the sets where there is no unique factorization, many methods do not really apply there. OK, let's go on. So if the number cannot be factored, it's called prime number, and it's relatively easy. So there exist deterministic algorithms which are polynomial and the number of bits that determines if number n is prime. Factoring is, of course, an NP problem because you can quickly check that factorization is correct, but it's not known to be NP-hard. And I think, as far as I understand, as many people do, that it's not NP-hard. And due to that, algorithms faster than just pure exponential exist, but on the other hand, purely polynomial algorithms do not exist. So we don't know where it really is, this factor in problem. Oh, how can we simply affect her? So we just go over all numbers up to the square root of n. And if number is prime, we test if it divides n and, of course, when we exhaust all the numbers up to the square root, we exhaust all the numbers, all the factors. And what remains, it can be bigger than square root of n, but it must be prime. It must be single prime that remains. And complexity of this is about square root of n. That's trivial methods. A bit more interesting method that's related to methods I will discuss is the following. So suppose you generate two numbers, x and y. And then you test if x square equals y square. Model the number that you're going to factor. Why is this equation interesting? Because if this happens, if this holds, then if we put y square to the left side and apply the simple formula, then we see that there is a product of numbers that equals 0, the model n. So model the numbers to try to factor. And there is a certain chance, actually quite high chance, that either of this is actually not n itself, but some factor of n. And if we compute GCD of x minus y and n or x plus y and n, then it's very likely that we actually get a factor of n. Otherwise, so if this doesn't hold, then we go just upon and generate the new numbers. But in general, if this holds, then we almost win. Usually with probability of more than 1 half. So if this number's small enough. And by proper arranging x and y, we can make the complexity about square root of n again. Why is this important? This is important because I will show how we can generate x and y in much more interesting way. And that's actually how factoring becomes easier than square root. How exactly? So we generate x and y, so that the equation is more likely to hold. And this is in the advanced factoring methods. Now there is some interesting stuff. Suppose that for some number xi bigger than the square root, it happens that if we compute the square of x and take it model n and we have some number z, then it factors into powers of 2, 3, and 5. Just 2, 3, and 5. So 2 to the sum power of ai, 3 to some power of bi, and 5 to some power of ci. So we can actually factor z. So we try for every z that we compute this way, if it factors into this, into 2, 3, and 5. And what's interesting, interesting that if we collect efficiently many of such numbers, and if you have three factors, then I need four such relations. Why is that? Because if I collect four different relations like this with sufficiently big x bigger than square root, then what happens if I have four such relations? I can compose a matrix where the powers of these prime factors are in the matrix. And then I seek for linear combination mode 2, model 2. Why model 2? Because this basically means if some, for example, with some column, if we sum up, for example, some rows, and there is 0, model 2, this basically means that these numbers sum to some even number. And this basically means that if I multiply this zi, zj, together, if I multiply them up, then this exponents, they will add up. This will be even number. And this means that this will become a square. And the same will happen for three. So if this collects sufficiently many elements in this matrix, then with big chance there is, of course, a linear dependency, model 2. And there exists a sum of rows with alpha, beta, gamma, and delta, 0 or 1, so that this linear combination gives 0, 0, 0, model 2. If this happens, then, for example, there is this alpha, beta, gamma, delta, then we know that if we take xi to the alpha, xj to the beta, xk to the gamma, xl to the beta, delta, then the square root of this guys, the square of this product, if we put this here, then it will translate to some, when everything sums up, so in this vector, we'll get some u2 here, 2u2 here, 2u3 here, 2u5 basically here. And if model 2, this gives all 0s. And then this means that we have now the product of squares into some power. So instead of having 2 to the sum, 3 to the sum, 5 to the sum, we now have 4 to the sum, and so on. And this basically means that this is the square and this is the square. Both are squares. And we get this v and w this way. And we get this equation that we were looking for. So there, for example, suppose that we would like to factor number 143. And I take square root of 143, which is smaller than 12. And I compute powers of this number, so I slightly increase and compute the 14th square, 15th square, and 16th square. They're not divisible by 2, 3, and 5. But 17th square is just 3. 19th square is 75. 21th square is, if I take all this model of 133, and 21th square is 4 for 1 equal 12, which is 2 times 3 square. And so on. And if we note, then, so if you look at this, then if we multiply 17 by 19, then their squares, for their squares, will happen that we'll have 3 by 3 and 5 squared 2. So 17 times 19 squared equal to 3 squared 5 squared, which basically means that 30, if we take this model of 143, we have that 37 squared equal 15 squared. So we put this to the left side, apply a simple formula, and see that there is indeed equal 0, model 143. We compute GCD and we have 11. And 11 is the factor of 143. Any questions so far? Well, this is super cool, because it's super simple. One question I have is, is it still the case that the probability that the V and W kind of won't be equal, or the GCD just doesn't give you any prime? Yes, there is a chance that GCD doesn't give you, but in practice, it's quite low. So I think with probability that is 1 half, if you get this equation with random V and W, if they are both sufficiently small, then I think with very high probability, you get an actual factor. And are these like provable things, or like heuristics? Yeah, I think this is provable. I think for sufficiently small V and W, for which this happens, you can rigorously prove that the probability is very high. And what do you mean by sufficiently small? I think smaller than M. Well, if they are like close to square root, or if one of them is closer to square root, then I think you can bound this. I've seen some estimates how you can compute it. I think it's widely assumed that the probability is about 1 half that you get correct thing, because actually it may happen that you don't get the answer, only if one of this is bigger is multiple of N. So if this is N, or if it's 2N or 3N, but if X and Y are both smaller than N, then probability of this is smaller, right? I see, okay. So this method is very nice, but the problem is that the fraction of numbers divisible by two, three and five is very small. And we have to adjust this method somehow to benefit from this. So the metrics part that we looked for linear dependencies is very simple because only three factors, but we have tested. So if you see that I, for example, tested 20 numbers and only in four cases I got nice decomposition. So the smart method of factoring introduced what's called smoothness. And we call number zero B smooth if all its factors are smaller than B or like smaller or equal to B. And in our case, if we say that it's smaller than B, then clearly the previous case when we considered only two, three and five, we talk about six smoothness. But of course we can consider bigger smoothness value. So how this works, we take some X, I'll tell later how we take this X, we compute X square, model Z and test if it's B smooth. So we select some number B and test if it's B smooth. If yes, we add it to our table and this is called seeding. If we find about B, well, B over logarithm of B, to be formally, then if you find about B such X, then we have a matrix with, so if it's B smooth, then in our table of exponents, there are at most B columns because every column corresponds to exponent of some prime and there are at most B primes here in this interval. So in our matrix, there will be B columns, B over L and B. And if we find about B such X, then after applying, for example, deletion elimination to the matrix, then we can find some linear dependency. And basically we'll see that pretty much the same way that some product of squares, so product of squares gives you another square. And when we aggregate these guys and take it model N, then we have X square equal Y square and we are done. So what's the complexity of this principle? So basically the matrix step, we work with matrix B times B, but interesting that we don't need B cube operations to find linear dependency, to find basically an element of the kernel because the matrix is very sparse. And interesting that there are special algorithms called block B demand or block Lansage algorithms for sparse matrices that find elements of kernel of sparse matrix. And usually, so this algorithms works in about B square times well, sparsity somehow. So it's about B square. And this is one step that of course can be improved in hardware. So there will be improvement to see even but the matrix step I can already tell that there are several suggestions how to make this thing much faster and much more efficient than hardware. So this can be considered independent of the sieving part, the very complex part of the number of fields. But this thing is very simple and can be explored, I think, really easy how fast it can be on custom hardware to run this algorithms or modifications of this algorithms that are suitable for multiple course and like this. So what about sieving step? The sieving step of course all depends on this number Z because the smoothness probability for a number, it depends how big the number is because clearly if the number is smaller than B then probability that it's B smooth is one. But when it increases, there is a very big chance that it has a big prime factor. And because of that, more and more numbers are not this smooth. So the probability decreases significantly with the grow as that grows and we have to take this into account. So the sieving, for sieving we need B such numbers and to basically the complex to find one number as the fraction of B smooth numbers among our outputs for given Z and multiplied by the complexity of test. So how expensive it is to test that Z is small. But actually what's really interesting that the modern sieving methods they're quite fast in a smooth test and they may take into account the fact that we created Z in a very special way and the amortized cost of smoothness test is very low. So what's really, really important here is this probability. The probability that this number is smooth and the smaller we can make this number the bigger probability is that it's B smooth. And of course, since the metric step is about of B square, we would like that this be of B square so that both steps are balanced. And if we make the test complexity almost one or close to one, then the trade-off is where the smoothed probability is one over B. But it of course depends how big Z is. So the smaller Z is, the bigger we can take this B and the bigger numbers we can basically factor. Questions so far? Yeah, so actually I like to stress that this smoothness property is very, very important thing. It's actually why factoring is much faster than brute force factoring that square root of N is because we can in integers there is this exist this notion of smoothness and there are numbers that factor this way. So in other sets, we don't have this notion and because of that problem similar to factoring like this could look are much harder there. So this is the crucial property for factoring. And what's the typical size of B that we're talking about if we want to factor like 800 bit number or something? So you see that we make both, we would like for trade-off, we would like that the time of both steps is all B square. So if you can spend two to the 60 time, then your B is two to the 30 clearly. All right, okay. Now I understand why the matrix is sparse. Yes, so numbers are like 30, to the 30, so it's 30 bits. And of course, they have, don't have too many prime factors. So they can have like average number, I think in recent records like 50 or 100 or something, not terribly many. So compared to the size of the matrix, the number of non-zeros is very small like one. Yeah, you can mention like 100 over the 30, very small. So you mentioned that there was some suggestions to use A6 for the matrix step or what about the seeding step? Yeah, so what about the seeding step? I'll have to go how exactly seeding works in two algorithms in quadratic sieve and number field sieve. And then I will explain so how hardware will help there. So about the smoothness, how we get actually this faster factoring that's faster than square root. It's, for example, if B is constant like six we have used, then probability that the number is B smooth is one over N to the logarithm of B. But if B is this big, like exponent of square root of logarithm, so which is not like logarithm by half but square root of logarithm, then the probability that the number is smooth is one over square root of B. And because of that, if we, to find B such numbers, we spent B to the three over two. Time and the matrix step is B square. So by further tuning this number, we can arrange that numbers are factored with about this complexity. How I will be a bit more precise in the next algorithm, which is called quadratic sieve. So what's interesting about quadratic sieve is that it can be explained relatively trivially compared to the number field sieve. So what's in the quadratic, what we are doing. So the first nice step is how we select our X. If we select X close to square root of N, so square root plus some epsilon. So of course, this is the integer part of square root. Then if we compute X square, a modulo N, then if we took N square modulo N, so this will become almost zero. And what remains is epsilon square and two epsilon times square root of N. This makes our number Z of size about square root of N. And because of that, it's much more likely that it's B smooth compared to random Z modulo N because number that is half of digits of N is much more likely to be B smooth. So we're gonna find O of B over log of B such X. We construct our equality of square quotient we are done, but how to test smoothness. And for smoothness test, it's interesting. So we basically try X and we have arithmetic progression here. So if we test B square, then we take square root of N plus one, square root of N plus two and so on plus B square and how we work with that. So why it's called quadratic sieve. We test the smoothness as follows. So suppose that let's take some prime smaller than B. Then we solve an equation that X square minus N equals zero modulo Q and for X somewhat closer to square root. And we can solve it because in if it's prime then such equations solved easily there are polynomial algorithms to do that. And if we solve that then for any K we have that if we add to X Q or two Q or three Q then modulo Q there will be all the same. Meaning that if Q of X divisible by Q that means that Q of X plus K Q is also divisible by Q. And coming back to the previous slide. So remember we constructed X square minus N for X which is a arithmetic progression. And this means that for some first element so how to find smooth numbers here. We take one number one prime and we find here the root. We find we solve for some X in the beginning of this equation X square minus N equals zero modulo Q we solve it. And then so suppose this is true for this number. And then this means that if this is divisible by Q then we can add to the argument any multiple of Q and with like an arithmetic progression in equal gaps the outputs of our polynomial will be divisible by Q and we can divide it by Q like a rattlesnake sieve. And eventually if we repeat this many, many times for different Q some numbers will boil down to zero to one because we factor out all the prime numbers of them. So how this sieving works is that in a very long array we'll find some number to start with and then every Q number we divide by Q and we know it will be divisible. And to repeat this for all the prime numbers and eventually what remains are there must be about B numbers out of B square. That up once and these guys will be smooth. Questions? I wasn't really able to follow the last few steps. Okay, so do you want me to repeat or you want later to take a look into the final one? I think I get it. I think the idea here is basically that instead of like just sequentially trying random numbers but you do a sequential try numbers that are multiples of some smooth number and so that way you have like some small and medium sized smooth number and that way you have a higher chance of hitting something with the numbers. Yes, that's more or less the case. So to summarize it shortly, we find one number here that is divisible by prime by solving an equation. And then we know that- Just to clarify, we do that for each prime less than B. Yes. Oh, I see. For each prime less than B. Okay. And then- So we have the list of all these numbers that are squares of root N plus one up to root N plus B squared. Yes. And among them, some of them are divisible by a prime. We know that we have found one such number. So we solve this model of Q and we find such X. So this is easy to find such X. This means that for any prime, it's easy to find in this sequence which ones are divisible by this prime. So this is like eratosthencif, basically. Yes, it's the same eratosthencif, but viewed in a bit different way. So in eratosthencif, you have like prime three and you cut out three, six, nine, and so on. But here, you cut not three, six, nine, but you have to start from somewhere in between from like 47. For example, you figure out that Q of 47 is divisible by seven. And then you divide Q of 47, then Q of 47 plus seven, 54, 61, 68, and so on. So in equal gaps, you find numbers that are divisible by this prime. And you divide, so you basically like overwrite. And then you repeat with another prime. It will give you another arithmetic progression. So... And you divide until it's not divisible. So if it's divisible by 49, you will divide by 49. Yeah, well, 49 is not a prime. So you... Right, right. But if it's divisible by seven, right? Yes. So you find all the ones, but some of them might be divisible by seven squared, seven cubed, and so on. Yes, I think they recommend to divide, to try to divide it further. Okay. So you indeed divide it and... And basically then you do that for all the primes and find all the numbers that you've reduced to one. Yes, okay. Okay. And because of that, the amortized cost of testing one number for smoothness is not that high. So how is it compared to the kind of trivial constructive method where you just try random z? It's much better because this number says relatively small. So they are about square root of n. Right. And because of that, the probability for them being b smooth is much higher. And this basically means that we can take this smoothness bound with the same smoothness bound as before, we can attack much bigger numbers. So because if in regular method, if for example, you can spend like two to the 60 time and then for random z, the probability to be a smooth is to the minus 30, for example, and you can break, I don't know, 200 bit numbers with this complexity. Then since in the new method, z's are of size square root of n, that basically means you cannot attack twice bigger numbers roughly. But there's kind of two tricks, right? One trick is to make sure that your z has half the bit size and then there's the other trick, which is the saving trick. Presumably you could keep the first one, but not do the saving. Yes. So the saving gives just kind of amortizes the test cost, but I think even trivially the test cost is not that high. Well, if you do it like stupidly, then you have like b numbers. If you have like b smoothness, you have b square numbers and you test everyone for b smoothness, you have to try b factors, right? So you spent b division trial divisions for this, b trial divisions for this and so on. So eventually it's then like b cube. So there's a time memory trade off here, right? So like basically if you do what Justin said and not do the second step by saving, but with a trivial way, then you can probably do it with much less memory and more locally. Yes, yes. That's certainly true. There are even faster methods in the very end of this like an extra method called elliptic curve factoring that allows you to find small factors faster. It allows it to like, if you know that your factor is small, then you can find it, whatever it is, faster than by just trial division. So you can optimize using this trick as well. So there are of course, we're quite many optimizations here and the fact that, so how you can exploit this in hardware. So you recall what I said about arithmetic progression that we do this even using arithmetic progression we access memory in a very predictable way. So we take this number, then we step by this prime that we divide by then again and again. So all these steps are pretty predictable and it's possible to share this task among several cores. And this is quadratic if the modern methods would use what's called lattice if, but they all share pretty much the same strategy and memory accesses here are very predictable. And I think after you have generated this numbers, you can properly share them among your smaller computers. So if like everyone can do their tasks to a future route to find the smooth outputs. That makes sense. This is the second crucial point I wanted to make that the saving step is very predictable even though it requires quite certain amount of memory. It's very predictable. And you can even compute this guys on the fly and you can divide this into segments and do this process independently for different segments. Just you get lose into the total complexity but you can save a lot of memory for every single core. Right. Okay, so we proceed. And yes, interestingly, so the optimal B is E to the half, E to the square root of half logarithm of n. I will not compute the directly right here. So because of the square roots, this all this complexity estimates are a bit odd and not trivial to state. But the thing is, so since B is this way, then the total complexity is B square. This basically means that when you square this, you have a two instead of 0.5 under the square root. Basically that's complexity of quadratic C. And the next advanced method is called number field sieve. And number field sieve uses very sophisticated algebraic tricks to make this number even smaller. So all the rest is almost the same but they managed to have this z even smaller than square root of n. They make it like some root of n which is not like square root but something in between like 2.5 or cubic root, it depends, but if they make it smaller, the smoothness probability greatly increases and because of that, you can attack bigger numbers. That's the core advantage of the number field sieve. And how they do it, so this unfortunately involves quite a bit of algebra. Let me just state the main points here and maybe later if you would like to understand a bit better, you can return to this. But what's important here is that algebraic part doesn't play almost any role into the, like computing this in hardware. So the code doesn't change significantly from the number, from the quadratic sieve and the properties of architecture needed to break the number with number field sieve is almost the same as for the quadratic sieve. So you can have understanding of quadratic sieve and build ASIC for quadratic sieve which will be quite efficient for number field sieve as well. So how number field sieve works? First, they take some polynomial of small degree, concrete numbers about five or six, so that it has some roots that we know model N. And this polynomial actually can be derived rather trivially, so we can imagine, so this number, if this is five, then clearly this number should be about 5th root of N. And we can imagine that if we take digital representation of N and divide it into five segments and have this M error representation of N, then we can find such polynomials. So such polynomials are literally trivially defined and there are many of them. Suppose also that if we consider not model N but just over integers that it has some root which we denote by alpha. So by default, it's probably not divisible. I don't have any real roots. And actually if it does, then it probably we can find the factor of N trivially. So we assume it doesn't. And then what we can do is that if we have any polynomial which are built over alpha as variable and if we substitute into this polynomial number M, we actually get the homomorphism from the set of polynomials to integers. And this nice homomorphism, it has very nice properties that basically if for some set of numbers A and B, we consider simultaneously A i minus B i alpha where alpha is this root and we consider A minus B M. So we try to find A and B such that this thing is, so this thing is smooth. And because of that, we can find that some product of this guys equal Y square. In parallel, we do the same but in the algebraic number in the algebraic number field. And we try to find a smooth numbers of this kind. And if they are smooth, then this product is, we can find using the same linear algebra, we can find it's equal to beta square. And if it is equal to some beta square that we know, but we haven't computed it yet, then it's possible. So there are some sophisticated algorithms that compute the square root in Q of alpha, but not for every number, but only for numbers, only for the elements that are squares. So it's not like in the field where you can compute square root, but it's in some ring of algebraic integers where you can compute square root when it exists. And if this two things hold, then basically you can apply our homomorphism to beta square and you will have X square. And on the other hand, if you apply homomorphism to this product, then you will have a product of AI minus BIM. And we know that we have found that A and B so that they equal to Y square, so that eventually you have X square equal to Y square. And what nice here is that, well, we have, we search for A and B so that this number is smooth and this number is smooth as algebraic number. And we can make them rather small. So that's the main trick that if A and B are small enough, then M is also small enough. We remember that it's some root of N. So this guy is small and this guy is also rather small. So even though we need that both of them are smooth, this probability is much higher than for the Z that we searched before that is smooth. And if both of them are smooth, then if we find sufficiently many that A and B, then using the same linear algebra trick, we find X square and Y square. So that's basically the core of a number field sieve. So to summarize, even if you don't understand anything like I did many like first 10 times I read about that. So the main property is that we work in two areas simultaneously in integers and in algebraic integers. And we manage to have our numbers both like smooth in both places. And- Sorry, algebraic integers should that be Q of alpha or Z of alpha? Well, this are rational, but yes. I think that, I think we can view them as actually Z of alpha because well, these guys are always integer in our case, I believe. So we're actually, this all probably should be, I think this Q should be, so here the homomorphism is works for Q, but I think it's all the further explanations they are over Z over alpha. Yeah, thank you for that. So basically we just to find smooth elements of this set in Z over alpha is not trivial because we cannot divide that easily in that field. But what we can do is we can again map this number to integers using simple formula. So we substitute this to the normal that we have. This is called a norm. And fortunately, the norm is multiplicative. So if this guy is a square, then so is the norm. So what we basically do is we find a norm which is B smooth. And after composing sufficiently many norms, we find the product of elements that is a square. And because of that, there is very high chance that the product of algebraic elements is also square. Oh yeah, I think I'll skip this details. Yes, so there are some explanations why these numbers are small. Basically, they are in current D about second to the 0.4. And this is actually sufficient to give us this increase in the length of the number we can factor. If we go to some concrete complexity, so we can translate these ideas into concrete complexity and there is a very well bit weird thing. So if n is the number of bits, then this is the formula. Then we have to take the square, the cubic root of the number of bits. We also have to take logarithm of the number of bits and take the power of one over one point five, multiply all this by 2.4 and deduct 16. And this gives you the complexity in the way that the square root of this is the smoothness bound. So practically for recent factorizations, this is about like two to the 13, this is about to the 60. The 60 of what? That's of course a good question. But if we translate it into core years, using the recent factorization records, then we will see that this is about 50. So each element is about 50 CPU cycles. So it's some kind of basic integer operation, big integer operation, which gives you like, I think about 50 CPU cycles. So this is the complexity of number of fields. But this is just the computation complexity, the number of operations, but this doesn't tell us so far what's the complexity to break it in dollars. But that's actually, of course, very interesting for us what's the actual complexity to break this thing in dollars. We know from recent factorization records, how many core years they have spent. But this core years are very, well, but weird core years. So these are not GPU years. These are mainly cluster years and clusters of different sizes in different countries, people around by different people and so on. So these numbers are just aggregation of something. And this is not very precise, but what's interesting that the architecture here is, well, IBM PC to the big extent, and yeah, just normal clusters. What we can do is we can try to translate the cost of the various cost of factoring this 900 core years into the electricity cost. So how we do this, we take a number of years and we calculate how many hours, we calculate how much one core consumes that energy and we calculate how much one kilowatt hour costs. And eventually what I get is that the 795 bit number, it costs only, well, only $8,000 worth of electricity. $8,000 worth of electricity. So electricity is not the dominating cost here, but we know from the mining that actually for very big computational efforts, electricity becomes to dominate. And this actually means that these numbers from the electricity perspective, these numbers are very small. So from the electricity, this costs like $8,000 and this costs $30,000 of electricity. That's from practical point of view, it's not really big. And of course, we can expect that if proper amount of proper funding is spent into that, we could see like if one decides to spend $1 million, then even on regular clusters, he can easily factor 900 bits or something. And on proper hardware, even probably more. What do I mean by the way, any questions so far? Yeah, I mean, I think this is very interesting. Maybe we should go through the numbers one by one so that we can properly understand them because like. Yeah. Okay, so basically I take number 900. This is two to the 9.8. Then how many kilowatt hours in the year? Kilowatt hours mean thousands of hours. It's eight because we have like 25 or 24 hours per day, 300 days and so on. So it's about like eight kilowatt hours in the year. And then I take a rough estimate that one core in the cluster consumes about 30, 50 watt. This is of course not very precise. So it can go up and down a bit, like maybe a factor of two or something. And then I take that one kilowatt hour in some rich places. Well, places where electricity is cheap, for example, like near hydro plants. There are numbers published by some hydro plants in the US that they can spend like four cents. They can sell you kilowatt hours, four cents by four cents each. And this basically means that four or three, that it's about two to minus five, about like one third of USD. Okay. If you multiply this together, you get this number. Yeah, that sounds very reasonable. Okay. So what we can expect from number fields if in this regards. So there are two main points here. So one thing is about seeing that on one hand with tests all of these square integers for this smoothness. And there are methods that do this using all of B of memory and all of B square time. So all of B of memory is because we need, while we store result in B numbers, but fortunately we don't have to store much more than that. So using a bit of parallelism and the processing segment by segment, we can do this with all of the memory. But the memory access is not random, can be parallelized and memory is used predictably. And that's why I think that and there are some kind of theoretical designs how circuits can be constructed for NFS. The problem here is that the current algorithms current code for NFS may be rather sophisticated because there are like tons of improvements by a factor of two or three, how exactly we go over this integers because there are pairs of them and so on what we store but we do not store. So some something optimized for cluster and so on. So there is a ton of balance whistles here when we might have to like dig up to figure out how this can be properly implemented on hardware. But I'm pretty sure that with a certain amount of research this can be figured out how you should do this on hardware probably. And for the moment. Interesting comment here. Like, I mean, we know that basically the computations itself, we can probably expect a factor of a thousand to a million reduction in electricity, right? From mining and so on. So, yes. So that means that what we should look at is actually not the power consumption of the CPU because that will go almost to zero but maybe just the power consumption of the memory and memory controllers. Yes, and also another thing is that we now have extrapolations from this amount of core years but the thing is the actual amount of computation. So this is not the amount of computation that they have done is that how much they have spent but probably a big fraction of this core years were spent on, I don't know, cash misses or memory accesses or whatever. So it's not, it does mean that the number of operations is exactly this number. So it's maybe because of x86 architecture this number can be reduced significantly in terms of the number of operations that is being done. So this complexity, the estimate that is here is based on the assumption that this core years were spent entirely on computation. So if like only 1000 of them were spent for computation then there should be a subtraction of another 10 here and when they calculate the worth of electricity and so on. Yeah, and then the second very important thing is in linear case, so basically we find a kernel element for metrics with all of the non-zero elements and currently we spend it all be squared time. But the thing is and all of the memory but the thing is there are suggestions to have for like multi-core algorithms that using the same amount of memory can do this in much smaller time. Maybe with the expense of more random memory access but still if we can spend less time on the linear step this basically means that we can increase to balance these guys again. We can see if this step becomes much cheaper for some reason then we can balance them by increasing the smoothness bound. And so that with increasing the smoothness bound we can break bigger integers even using the same algorithms just by proper balancing the things together. Do you see the point? So currently both take B square but if this takes not B square by but B to the 1.5 then clearly we can balance them together and use some different B so that they again use the same. So maybe we should test more integers so that the smoothness probability is different so that they have, they spent again the steps spent the same time. But where does the B to the 1.5 come from? B to the 1.5 comes from parallel algorithms that works with the sparse matrices. So there are suggestions by DGB, by Daniel Dernsten who suggested some parallel algorithms for this Bloch-Wiedemann and Bloch-Lanzers existing algorithms which are not very much parallel but he has some suggestions. They are mostly theoretical ones but I think there are others which can be made practical. So it's very interesting. But how would parallelism help though? Because if we say what we estimate is the amount of electricity then parallelism itself doesn't reduce that. Well, you see currently we spent B square time and B memory so that the electricity here spent for this is more like B cube but if we spent the same memory but B to the 1.5 time the electricity will not be B cube but B to the 2.5. So he basically suggests replacing some memory with cores and because of that reduce the overall computational time and thus reducing the overall electricity consumption. Wait, so you're assuming that the memory is always turned on? Is that the assumption? Because I think there's memories now where you basically only pay an electricity cost when you wanna access them. Yes, yes, there are static and dynamic RAM and it's of course very right and very interesting question. So which one should be used here? I think it depends on the algorithm which kind of memory you use because some of I think the one that you pay only for you can turn it on and off is I think it's bigger and by itself it requires a bit more space and chip and so on. I mean, SRAM is crazy expensive. I don't expect anyone would use it. Yeah, it's also expensive but if you run it for a year, maybe it's worth it. I mean, my impression is you literally can't get gigabytes of SRAM at the moment. It's so crazy. It's several orders of magnitude in between. Several orders. I heard it differently but I don't understand too. Okay. Well, I think that they're... But I mean, I don't think that's such a big deal because even DRAM, the cost when you don't use it is actually not that high. You have to refresh it, but I mean, it is very low. I mean, a good laptop doesn't really significantly drain its battery and stand by and it still refreshes its DRAM. Yeah, I remember there is like fraction of 10 or something of interest you need. Oh, much higher than that, I think. I think we're talking much, much more than that. Okay. I'm totally not an expert here, but I would really love to talk with some experts about all this. I mean, one thing to consider maybe is like, there's this memory from Intel called Optane Memory and from what I understand it doesn't need any refreshing. So it's like persistent memory, but works a bit like RAM. Okay. It is, but it's still a lot slower than DRAM as well. Well, that's between SSD and DRAMs. Right, but it's not that much slower. It's like a small constant. More than 10, right? That's more constant. I thought it was less than 10. I thought it was something like five, but I need to check, I guess. Okay, should we go on maybe? Sure. And one question I had, you mentioned the extrapolation from the existing numbers. Yes. How much variance is there in the runtime? Like, can you get super lucky and just complete your algorithm really quickly or not? I don't think so because basically you have to collect quite many relations, quite many smooth integers here. Right. And so your only chance is to find some sub metrics that is linearly dependent. So that you actually, you don't need all the B numbers, all the B prime numbers to give you the solution. It's like what I had in my example. In my example, I had that I need only three and five, so I didn't need two. But the chance for this to happen is not so high, I think, in real numbers. And there are actually, it's much more likely that you will have some parasitic solutions or pseudo solutions if you have to filter out so that as far as I know, people even create numbers that matrices with much more rows than needed, because otherwise they get some solutions that they don't really work. Right. Okay, so what else interesting that I try to make some estimates, for example, in one case, as I just extrapolated from the current CPU and liquidization record. And in other cases, I try to figure out what the maximum as a advantage could be. So I got that if we're talking about course, cluster course that spends like 30 Watt, then the biggest advantage I think would be about a thousand in the electricity advantage I mean. There are some more detailed calculations in my paper, in my report. I will send the link soon for the ones who haven't seen it. So I also, I compute it. So I, for some conservative scenario, I added some small advances like to the two to the three into the algorithms improvement and somewhere in the Moore's law and so on. So they don't differ significantly, but there was also super conservative scenario where I analyzed the case when we have found this B to the 1.5 time algorithm for matrices. And then I tried to estimate the security in bits. So how I did that, I took ATB to AS key and I tried to figure out so if you take the current Bitcoin hash power, how much it would cost if the same, if we don't run a shot of 56 there about AS, how much it would cost in terms of electricity to break ATB to AS. And actually got that you would have spent $50,000 for this using current mining as well before the fall. 500,000. 500,000, yes, but it was for Bitcoin price a month ago. And then the 128 bit security, it's reasonably that it's cost of 128 bit recovery in ASIC, which is about to 67 USD, which is of course beyond our capabilities. And, but what is 256 bit security 256? I say that the system has to be security if it becomes 128 bit secure, if you cut out half of the key or if all algorithms get quadratic speedup, that's actually the same. So, and in this circumstance, I analyze the perspective of different modular sizes and here are the numbers. So, you see that in this metric, if we stick to CPUs to the clusters on which the most recent numbers are broken then ATB security is 950 bit and 128 bit security is 2850 bits. But if you consider a conservative scenario where you have some advances in ASICs bringing 1,000 reduction, 1,000 factor in the reduction of electricity cost, then 80 bits is not 900 bit number, but 1,500 bit and 128 bit security is 1,000 bits more for the number to be broken. This is the RSA model that have both prime numbers. And super conservative scenario is if you found this advanced hardware algorithm for matrices then we basically have all our modular sizes increased by a bit 20% or something. Also, this table is in the report for the ones. I have two extras. There is a slide about elliptic curve factoring method and there is a slide about discrete logarithm. Basically, the elliptic curve and for discrete logarithm, the principle is quite similar. So what they basically do if you want to find discrete logarithm of some element of a prime field, if you talk about prime fields, if you want to logarithm h with respect to base g, then you generate many he, many exponents he and the test if the number is this smooth with the same sieving principle. And that you find sufficiently many such bits most numbers, you compose the same linear system. And basically for every, you get the discrete logarithm for every number in this factoring base where you logarithm every prime number smaller than b. And after you've done that, so how to fact, how to discrete logarithm this guy, you take different- Wait, how do you do that? How do you compute the logarithm? Ah, well, if you find b such guys, then you have the same metrics actually, but the columns, so you compute your linear dependency not modulo two, but modulo p minus one. So if the prime field is p, then basically means that all the exponents that they wrap around p minus one. And if you have different, if you have of b, b smooth exponents with a different he, then basically you can solve for linear system. And well, this linear system, the unknowns will be actually these guys and you will find it by direction elimination. Okay, so you compute the logarithms of all your primes. Yeah, that's actually the last slide. Right, you compute the logarithms of all the primes and then you just keep on randomizing h until you make hb smooth and then you figure out what is the discrete logarithm. Yes, any questions on the entire talk? I mean, as Danck had said, I imagine the electricity costs for A6 will be somewhere between a thousand and a million, a thousand being the kind of the less conservative estimate as opposed to the conservative one. Yeah, I mean, I think one interesting exercise would be that we could do trivially now and would be to just look at the consumption, the electricity consumption of memory. Yes, you can, but... I think that will give you a factor of two to the five already because I suspect it's about one watt or something like that instead of two to the five watt. So what I, for example, I've calculated myself that there is this pro-four-quick hash which requires for z-cash about 200 megabytes of RAM. And if you take A6 for them, then... If I'm again, 200 megabytes? 200 megabytes, yes. And basically what you do with the RAM is sorting. So you just kind of radix sort your RAM several times. That's how this in general works. And the advantage in terms over a regular CPU over laptop is about 1000. Okay, yeah. I mean, another question is, does it all have to be in the same memory? Like, because the problem is if you need high bandwidth, then you probably have quite a bit of power consumption. If you can lower your bandwidth requirements, that would be less. What if you could split it across 1000 different memories with all much lower bandwidth requirements and can merge it all together? Like, for example, do all the sieving steps for different primes on different hardware and merge the results together at a later step? Is that possible? Yeah, that's possible. I think that's what's actually being done by all these authorization teams. Because in that case, you can probably, like, lower your bandwidth requirements extremely and go with much lower power requirements for memory. And I think you will end up with something very low. Already like in normal computers, memory isn't actively cooled, right? By that, you know, it doesn't really consume that much power. And the laptop's already much less. So I think, yeah, yes. I mean, in fact, 1000 seems easily possible there. But even outside of the hardware question, it's like the algorithmic question, it is very scary, you know, that maybe some sort of overseas algorithm comes out and has a slightly better asymptotic complexity and that completely changes everything. It's a good point. I think, as far as I understand, in the factory and the fintagers, not much progress has been made in the last 20 years. But in discrete lovers, and there have been many different approaches because there are different algorithms for prime fields. There are other types of algorithms for extensions of prime fields, which are important for pairing-based crypto. There are algorithms for fields of characteristics too, and so on. So there are like several groups of disk algorithms and they all follow this number field sieve and one on another guard. So I think that if they have found something really, really important, that would be translated probably to the number field sieve. But on the other hand, the algorithm is itself pretty complex. I think it's the fact why it works. It requires significant algebraic background. So the code itself is not that difficult, but as far as I understand, the authors they spent several years just to make all assumptions realistic. So they basically had to add a few more components to the procedure before it started working for every integers. In the beginning it worked only for very specific types of integers like two to the two to the n plus one or something like this. And to make it for all integers, they have to go through sufficient domain obstacles. So it's a really difficult problem and I think no one can estimate if there is a breakthrough in the upcoming years regarding that. When did this happen? When was this kind of? From 90 to 93. And one of the prominent guys in this research was Leonard Adleman. You may remember that in RSA, there are Rivets, Shamir and Adleman. And some people say that Adleman did the minimal amount of work, but still is included into the RSA list of authors. But I think as far as I understand, this has been very well compensated by his contribution to the factory. He is one of the people, like very few people who made this NFS really possible. So I would also be really interested in your judgment as a cryptographer. So we know that it seems like the progress has slowed there. And I guess there's the two different competing theories. One is, oh yeah, we are kind of at the edge and there isn't that much more improvement possible. And the other theory is like, the bar by NFS is set so high that it would be crazy for a young cryptographer now to go into factoring because they would have to spend so much time just getting there. And it wouldn't be that likely that they find a new algorithm. I wonder what your judgment on that is. Yeah, I think that's even, it's not a cryptographer actually, the problem, in fact. So I think it's much more mathematical problem because there is almost no cryptography involved in that and there are like some deep mathematics things involved. Like when you, for example, when you try to read about this imaginary quadratic groups, it's pretty much similar. So you would have to have like really deep understanding the algebra before you figure out how all this works and I think just to improve that it's a really an ambitious task and to don't have that many mathematicians these days. So as far as I understand, the number of working mathematicians is decreasing. So many people go into more applied science whereas this factoring is actually more theoretical. So my gut feeling is that we shouldn't expect real advances from the theoretical side but we definitely should expect some advances in practical side. So when I see this sparse linear algorithms and this sieving done on the regular CPUs, I feel that it can be much more sped up. So if some person who is confident in a number field sieve talks to the person confident in the silicon design, I think within a few days they will quickly figure out how to make this on hardware much, much faster. That's my gut feeling. Okay, interesting. But the other question is basically, so you said you don't expect huge progress right now but it sounds like that's not because you think there's nothing out there to be discovered but there's nobody who's gonna work on it now. Yeah, yeah, something like this. I think people who are closer to the Discord logarithm research could give a better answer because there are some advances there. There have been some advances. Maybe not in this particular case but in some like sister algorithms, there were some advances. Maybe they can tell a bit better about like the unexplored area and some potential to use them in NFS. So one question I have is that you seem to have focused a lot on the cost of electricity. Is the claim that this totally dominates relative to the cost of hardware? Oh yeah, I think so because, well, the hardware itself, so if we talk about numbers like what was that, 3,000 core years. So if it's a month, this means like I know 30,000 cores and yeah, so it's, I know 30,000 cores what it is, three million dollars maybe, think even cheaper. So of course, it's more expensive than this cost of electricity. But when we talk about custom hardware, I think if we take a cheap of some moderate size, I'd know 10 by 10 centimeters or something and imagine construction, I'd know million of these chips factoring. I'm pretty sure that their cost will be smaller than the cost of running time. If we talk about millions of dollars spent for this, so the design will be, I know several millions and the chips themselves may be also several millions, but the amortized cost will be small, I'm pretty sure. And the electricity consumption can be really huge if we talk about, I know hundreds of millions of dollars for electricity. Mm, I'm, I don't know, I'm not totally convinced. Because I mean, for the power consumption, that can really go down a very significant amount, let's say between a thousand and a million. But then the area, which is going to be your cost, that it's less clear that it would go down significantly. Area will not go down, but if we can leave with the rather small chips, then we can put significant amount of them on the die and then we can leave with the reasonable favor of probability. So if they're not like meter by meter, then I think their production can be amortized significantly. But I'm not an expert on that, of course. So that's just- I mean, I guess- I mean, I think asymptotically, you are certainly right depending, like if we assume that you have a reasonable, like assume you're okay with factoring it in one year. I would say like if you want to factor it in one week or one month, like A is up, and you only have one number to factor, I would say very likely the cost of hardware will actually dominate. Because you're basically only running all your hardware for that amount of time. But like- Also because you won't be able to design proper hardware within this timeframe. Right. Because if you know what number to factor, you can optimize your hardware already for that number. Significantly, I think. That's also a good point, yeah. Wait, so that is our situation. So what is the speedup if you're ready, if you design your hardware specifically for this number? Where do you get gains? That's a good question. I think it is, well, we all have this model reductions, right? So I think where this number is used in the model reductions, and if model reduction circuits can be optimized, given a particular number, then we can win there. So all this factorization records, that of course they knew which number to break, but I think they couldn't really exploit it because they just use regular CPUs. But on custom hardware, I think if you can hard code the number into the model reduction circuit, you should get some benefit. Okay, if it's only the modular multiplication, then I think you get roughly a two X advantage for hard coded versus programmability both in terms of area and in terms of HCC consumption. Yeah, of course, I'm constant factor. Yeah. So wait, but I mean, the numbers you square are of a very specific form. Like it's like root N plus epsilon. For quadratics, if yes. Oh, I see. But for the number, if you please. Yeah, I think it'd be different. There's a, you're not sure you're square there. I mean, anyway, I think that would not be my sort of main worry. I mean, another way to look at these security assumptions is just to, because, you know, we talk in terms of dollars, but maybe like the bottleneck is actually just how much electricity the world could produce. So I'd be curious to know like how much HCC is the world producing? And if it was 100% of the electricity dedicated to this problem, how much time would it take to factor? And is that more than 10 years? And if it's more than 10 years, then you know, we definitely safe. Unless there's some sort of... Well, but I mean, like please always, these are always estimates. Like you always need a margin of security. Yeah, yeah. 10, of course. Yeah, yeah, yeah. No, 10 is not enough for a margin of security. I mean, 10 years. And a factor of 1000, I would say. And whatever factor you want, yes. Because the dollar amount's always very kind of difficult to reason about, I guess. But you know, energy and time is more kind of physical. Okay, so 21 trillion kilowatt hours. Trillions, 10 to the 12th. So two times 10 to the 13 kilowatt hours per year. This current worldwide electricity consumption. Anyway, we can do these estimates as well later, yeah. Okay, but anyway, all of this is pointing me towards the 3000 bits and avoiding the thousands. That's my gut feel. Yeah, I mean, thank you Dimitri. This was very, very well prepared and very understandable. Very good. At least for me. And I think I understand it's much, much better now. Yeah, same here. Yeah, thank you for your time, guys.