 So, I'll begin by finishing something that I wanted to do last lecture, but I ran out of time, which is to discuss the parity problem. So last time we were talking about how to use equidistribution estimates, basically mostly on primes, and how to use this to prove results like bounded gaps between primes. Okay, so we now have quite well understood SIF theoretic mechanism, mostly based on the syllabic SIF, for converting estimates on primes in ethnic progressions to bounded gaps between primes. So, what I mean by equidistribution estimates on primes is that you consider sums like this. You take, say, all n between x and 2x, you restrict to an ethnic progression, and you sum, for example, maybe a shifted version of a mangled function, or maybe the prime counting function instead, or we also need to count just the trivial function 1. And for some of the more advanced SIF theory type stuff, we also have to consider Dirichlet convolutions or various functions alpha and beta. So, just remember the Dirichlet convolution is defined like this. So, various functions here, the key thing that have in common is that they all depend basically on the prime factorization of an n plus hi, right? So, these are all functions basically of the prime factorization. And you sum them in some ethnic progression in some range, say, between x and 2x. And for all these expressions, this is expected to be a main term in this expression, something like maybe x times some function typically just of the modulus, plus an error term, okay, which would depend on a and d, okay, so maybe, okay, okay. And by equidistribution estimate on these functions, what we mean is we have some bound on this error term, maybe on average, okay. So, there's various types of input that you could put on these bounds. But you assume some upper bounds, maybe on average, in some sense. So, you have all these bounds and then you start taking linear combinations of these bounds. So, the whole art of SIF theory is to choose very cleverly chosen linear combinations of these SIV axioms. At some point, you insert the completely trivial fact that if you have a sequence of positive numbers, then the sum is positive as well, okay. But basically, if you take linear combinations and you just keep using trivial non-negativity, then if you're lucky and you chose your SIVs very optimally, you can eventually get results such as there exists n between x and 2x, such that n, let's say, for example, at least two. Well, you can do it at most, but nowadays you can also do it at least. That's the- No, I didn't see what you were writing, kind of, but I'll take it back with me. Okay, right. So, I mean, you can use SIF theory to count all kinds of things, but one of the things you can do is you can do bound on gaps by using the SIV axioms and some clever linear combination of them and positivity to show that you can find at least one n in here such that two of the shifts of n are prime and this will give you bound on gaps between primes. Okay, so I sketched how this argument worked last time. So, one can hope that maybe you could use this to prove other things. So, maybe you could prove that, yeah. So, we saw last time that if you assume the best possible equidistitution estimates, the generalized Elliott-Halbert's time conjecture, then you can get the bound on gaps between primes to be at 0.6, so you can hope maybe you can improve this even further. So, two is the optimal, but maybe just be slightly less ambitious and just shoot for four, can we use SIF theory to at least show that you can find a pair of primes of distance four apart or less? And unfortunately, the answer is no. So, a long time ago, I forgot exactly when, 80s or so. Silberg pointed out this parody obstruction, which prevents, well, it certainly prevents proving the Twin-Pan conjecture from SIF theoretic means, but it also prevents proving a bound of four. And I wanted to explain why. See, basically, SIF theory is, I mean, this N parameter is, nothing so interesting happens to N when you do SIF theory. You just take linear combinations of these axioms and, but you never mess with the N sum, and so everything here is linear in N. And because it's linear in N, you can replace the summation of N with a weighted sum of N, and everything still works. So, if you had a SIF theoretic argument that took equidivision estimates and non-negativity and produced a gaps of size m was four, then the same argument must also hold with weights. So given any non-negative weight function, if there was a weight which had exactly the same equidivision estimates as the unweighted sum. So if you can find a weight for which you have the same main term, and error terms were basically the same upper bounds, maybe of a slightly worse constant or something. But if you have the same upper bounds, the same error term with a weight. Of course, weighted sums, as long as you're non-negative, you also have a trivial non-negativity. If a sequence isn't non-negative, then the weighted sum is also non-negative. And so, if you just inspect how SIF theory arguments work, if you could prove this sort of estimate by SIF theory, then you could also prove the same estimate with a weight, which means in particular that you should also be able to prove not only can you find an N for which you got a gap of 4, but or less. Okay, but that N must also be in the support of your weight, okay? So if you can prove a gap of 4 by SIF theory, then you can also prove it with this weight. And this unfortunately leads to conjectural contradiction, because I can cook up a weight for which you have, well, we believe we have all the same equidivision estimates for the weighted sum as for the unweighted sum. But for which there is no number N in the support of the weight for which you have this conclusion, okay? So, yeah, this is a sort of indirect proof by contradiction argument. How do you get your head around it a little bit? So if you could prove h1 less than 4 by SIF theory, then you could also prove a weighted version. But the weighted version is false, at least we believe it's false, because we can take the following count example of weight. So if you take the following weight, I think I need one. Is this the one I want? Hang on, that's the opposite one, that one will work. Hang on, hang on, wait, wait, wait, wait. Okay, actually, okay, actually, I just. A lot of muscles. Yeah, yeah, I don't want to do things quite this way. Let's say, yeah, okay, a bit of a mess. Yeah, I didn't quite do things to set things up properly. I want these to have the opposite sign, so I want that to be is equal to, okay, actually, yes. I just realized I didn't set this up properly when I prepared my data, so this is worse than, okay. So it's supposed to be really simple, and now it's not gonna look for simple, unfortunately, but yeah, there's a stupid obstruction, what three that I have to deal with. Yeah, what I need to do is I need to pick, I think I did not set this up properly, but if you pick a weight, something like this, this may not be exactly right, but if you pick a weight like this, where lambda is the Louisville function, so lambda is minus one to the number of prime factors of n counting multiplicity. So this is basically the Mobius function, but it's still non-zero for non-squaring numbers, and in particular, it's squares to one always. So this weight, what it does, at least, so let's say n is two mod three, and it doesn't matter what it is when n is zero mod three. This weight, if n is two mod three, it's only non-zero when n and n plus two have the opposite parity of number of prime factors, and n plus two and n plus six also have the opposite. Okay, there's some choice of signs for which one I'm gonna say works, and yeah, you want n and n plus two to the opposite parity, and n plus two and n plus six also have the opposite parity, which means that n and n plus two can't both be prime, and n plus two and n plus six can't both be prime. Although n and n plus six, this pair can both be prime, but these pairs can't, and similarly over here, and so the way this weight function is designed to be set up is that if you pick any n in the support of this weight, then you can't have any n plus two and plus four, any two of them prime. You can only have a gap of six, but not two or four, if I said things up properly. But this weight, we believe to have the same equidistriction properties as the unweighted function one, because if you expand out this weight, this is one minus lambda of n, lambda of n plus two, minus lambda of n plus two and plus six, because lambda squared is one plus lambda of n, lambda of n plus six, when n is two mod three and something similar when n is one mod three, and so this weight is equal to one plus a linear combination of functions which involve two factors of lambda. And when you stick that in, so for example, if you stick in a weight like lambda of n and plus two, so maybe these two guys combine, maybe for example, if this is two, you'll get two terms, one of which involves the prime factorization of n plus two and the other involves the Louisville function of n, and we have this heuristic called the Mobius randomness heuristic, or in this case I guess the Louisville randomness heuristic, that this function, the lambda of n, fluctuates so randomly, it should basically exhibit a lot of cancellation with any expression that doesn't involve the prime factorization of n, but instead, for example, it involves the prime factorization of n plus two. So it's not a very precise heuristic, it's not a theorem, but we believe that any expression which involves both, say, Louisville of n and something of n plus two involving the prime factorization of n plus two and maybe n plus four and plus six, that should have lots and lots of cancellation, and so that should all just get absorbed into the error term. In fact, we even believe if you're ambitious some sort of square root type cancellation, it should be much better than the error terms that we ever play with. And so these extra terms should just be noise in the error term, and so whatever equity district estimates we can hope to prove for the weight one should also be true, at least conjecturally, for this weight, and then therefore any sifter at a gargantt would also give a gap of four with this weight, but this weight is designed so that there are no gaps of the four in the support of this weight. So, yeah, unfortunately, I kind of butchered this argument, but this is still a big explainer a lot better than I did. It shouldn't be called not parity problem and parity confusion. So, okay, yeah, yeah, but that's my fault, okay. Okay, but all right, okay, yeah, you can make this argument a little bit clearer. And it does tell you that there was no sift-theodic way to purely use sift-theory to beat six. Now, you can do better if you don't use just sift-theory, you use something else. In particular, if you don't just have a sum over n, but if you break it up, if you can do something non-trivial with the n summation, like split up to by linear sum, and maybe, Enrique, I don't know if you're gonna talk about that. Sorry? I don't require that overlapping what I'm going to say next time. Okay, yeah, but anyway. Okay, so unfortunately, none of these methods will get you the trend prime conjecture, but okay, something else is needed. All right, okay, so that's the parity confusion, I guess. Okay, so all right, so now I want to talk about how we prove equidistant exponents on primes. So, yeah, so for example, how, yeah, never mind the shift, let's just look at, okay. Okay, so we have some prime numbers in ethnic progressions. Maybe subtract off the main term, which you can write, okay. So we have the discrepancy of the prime numbers in ethnic progression minus the expected term, the average, and you want to bound the sub-expression. Okay, so let me just write this as sum over n, lambda over n, f over n, where f is just, well, so it's just the unicated function of n between x and 2x, n is a more d minus a few d. Okay, so it's basically a weighted sum of this arithmetic function, this function that depends only on the prime factorization of n, and this very explicit non-arithmetic function, which just depends on this congruence class, and on the size of x, the size of n. Okay, so one can ask a more general question. How can you control sums? Yeah, so it's extremely general question. Okay, how can you control sums of this form where one function is arithmetic, it depends on the prime factorization of n, and the other one is non-arithmetic, it just depends on sort of local factors of n, local properties of n. Okay, I mean, so maybe like an exponential phase, maybe you're a character, these sort of things. Okay, and how would you control, how would you control sums like this in general? Okay, so standard answer is to read chapter 13 of Ivan Yatchinkovsky, that will, yeah, I mean, this is basically, so this is closely related to sums of primes. Okay, how do you sum a non-arithmetic function of a primes is almost exactly the same sort of question. And so we have a lot of techniques for doing these things. So of course we always have the crew triangle inequality thing that you can always do, cure all the cancellation and get a bound. And this will always give you some bound, I guess if n is finally supported, but it's never usually the bound we want, usually we want to save like a big pile of logarithm over the trivial bound, but this is the benchmark, it's the trivial bound, yeah, so we usually have trivial bounds here. In this case, usually it's something like x times a polylog in x, and you wanna gain an arbitrary pile of logarithm generally over the trivial bound. Okay, so there's a variety of techniques that one can use. So for example, you can decompose f into other terms, you can split f into other terms into good pieces. Okay, and a very common technique is Fourier summation, so Fourier expansion. So for example, this particular function f, you can split either by multiplicative Fourier analysis into a multiplicative Fourier expansion over the non-principal characters of a period Q. I think I had that right, period D, okay? Or you can do an additive Fourier expansion, which if I am not mistaken, is something like, okay, it's more complicated than this. Okay, maybe D is prime, it is like this, okay, there are other terms or something. If a correction terms of D is not prime, but roughly speaking, it still depends on whether you can, actually, maybe I still need to enforce this to be co-prime. Okay, but there is some sort of Fourier expansion that you can do, it's roughly like this. Okay, in the additive characters, E D of course is the additive character. Okay, and so you can split f into pieces, and so you can split something like this into lots of pieces, maybe of this form, or maybe this form, and then you can just use the triangle inequality and you start staring at something like this, okay? So you can take this expression, break it up into characters, multiplicative additive, and estimate them separately. Usually you can't exploit cancellation any further, so you just give that up with the triangle inequality and you estimate these things instead. So this unfortunately loses, of course the triangle inequality can be very lossy. So this step can be quite inefficient if you have a lot of terms in your expansion. If your original expression is not very close to being a character, then you have a lot of terms here and you lose a lot with this step, but often you still have to do this because there's a lot of gain that you can get once you have some sort of character structure over here that you want to save. Okay, so this is one thing you can do. You can also split bit alpha into pieces and we'll see how that works later. Okay, so there's that. If you split things in such a way that everything's multiplicative, so if f is multiplicative, then you can try to use multiplicative number theory. Okay, so if you are staring, for example, at a something like this, okay, having an extra character, then multiplicative number theory steps in and you can start trying to estimate this in terms of the all function, well, not the all function yet, but the Dirichlet series, okay, and this has some form of it. This has some expression in terms of, so an all function and then you can use either classical complex analysis methods or nowadays you can also use pretentious methods to start controlling this expression in terms of what you know about the all function. Okay, so this is one thing you can do if you have a multiplicative character, but unfortunately because our understanding of this denominator is not as good as we'd like, this tends to only work well when your multiplicative function character is a very small conductor like polylog unconditionally or if you assume GRH or something you can do better, but only if you have assumed some strong conjecture. Okay, so you have this, but it only works for multiplicative characters of rather small conductor. What else can you do? If you have not just one function F, but lots of functions F, and you only care about what happens on the average, then the large seven equality is your friend. Okay, so if you are actually looking at a family of these sums, then for each single function F, maybe all you can do is it's a trivial bound, but if you have a whole family of these FIs and these guys behave in some sense orthogonally to each other, then very often you can get a really good bound on the square sum. You know, by, okay, there'll be some natural bound on a square sum, which means that you beat the trivial bound on average for most of these FIs. As long as you have some good orthogonality here. Okay, so I won't, I'll just assert that you can do things like this. Of course, yeah, so that doesn't help you if you only have one F, it only helps if you have many F, so a lot of the games, it's somehow dispersed a bound from one F to a bound on many F. Okay, so that's another thing you can do. What other tricks do we have available? Right, so another thing is that if you can, if you can decompose your original arithmetic function, so it depends on the prime factorization, so typically it would have some factorization into a Dirichlet series that maybe you can factorize it as a Dirichlet combination, a convolution of two other functions, or sometimes maybe a combination of some finite number of Dirichlet convolutions. That if you have a factorization here, which exploits the arithmetic structure, then this linear sum, now it becomes a bilinear sum. Funny how that is, okay. Okay, so if you have a Dirichlet convolution structure, then you can split up a linear sum into a bilinear sum. And then there's a lot more games you can play with a bilinear sum than you can with a linear sum. So for instance, if this is multiplicative, if you're trying to sum against a multiplicative character, then things are really nicely factored, okay? So you get a product structure, which is very nice. So that's sort of a, yeah, so that's one thing that can happen. At the opposite extreme, if you don't have a multiplicative structure, if you have instead additive structure, if for example this happens to be a linear phase, for example, then it's not multiplicative, but then you can often exploit cancellation just in this bilinear kernel, even if you don't know very much about these weights. Okay, so we have one final tool, which is actually trivial, but incredibly important, which is the Cauchy-Schwarz inequality. Okay, so the Cauchy-Schwarz inequality is a way in which cross-correlations, like correlations between, say, a beta and something else, can be controlled by self-correlations, right? That's what the Cauchy-Schwarz inequality does. And this is great because self-correlations are often a lot easier to understand than cross-correlations. So if you have a sum like this, then what you can do is that you can split it. And often this was, so in more advanced applications, you know, a double sum, you may have a triple sum or something, and there's lots of ways in which you can split and do Cauchy-Schwarz, and knowing where to cut the sum and do Cauchy-Schwarz is half the game. If you want really good equidivision estimates, there's a whole art of where to cut. But just, let's not worry about that right now. If you have this sort of sum, then you want to bound this. Then you can take the beta out here. Okay, and if you do Cauchy-Schwarz, okay, you can just, you can split this into, so probably everything I'm saying is completely standard, as I said, this is all in chapter 13 of advantage Kibowsky, but okay. So if you have this sum, you can split by Cauchy-Schwarz, okay? And the point here is that beta shows up here and these gamma show up here, but you split them up, okay? That you no longer have joint sum involving beta and gamma. Okay, now, this is non-negative. This you just bound by some trivial bound, okay? Once you're non-negative, there's no cancellation to exploit, so trivial bounds tend to be pretty sharp. And then over here, okay, so this you have to expand out, okay? So what you typically do is that you expand out and you get a two L is an L prime and then we have something like gamma of L, gamma of L prime. Okay, typically it's good to move the M over here. And there's something like F of ML, ML prime, okay? And so you have an expression like this and then it's common to split up this into cases. So there's the diagonal term, L equals L prime, and then there's the off-diagonal terms, okay? And usually treat these differently. In many applications, actually, there's a continuum. It's not just equal, not equal. You can split up depending on how many prime factors L and L prime have in common or how many prime factors divide L minus L prime or something, okay? But roughly speaking, as a first approximation, you have a diagonal term and a non-diagonal term. And with the diagonal term, then there's no cancellation here because this is now positive. So for this, you pretty much are forced to use trivial estimates, which aren't very good, but on the other hand, the diagonal is pretty small usually. If L is being summed over some range, I'd say L2L, the diagonal is only a portion one of L of the whole sum. So as long as your L is non-trivial, if this sum is summing over some non-trivial range, the diagonal contributions should still be better than the trivial bound. And the bigger the L, the smaller the diagonal contribution will be. And then conversely, you have the off-diagonal terms. And here what you can do, so you can do further cashier-schwarzers sometimes, but often at this point, you can just get a lot of mileage actually out of just taking a triangle inequality, bounding this by something trivial, and then estimating some like this. And the point here is that all these unknown arithmetic functions like beta and gamma are gone, okay? This is a function purely of sort of very smooth functions F. And this is a function F, maybe a character, or something that you understand very well. And if so, if you understand in particular exponential sum estimates in various ranges, okay? So this is, okay, so actually that may be in complete exponential sum estimates. Okay, then if you have good bounds on here that exhibit lots of cancellation, then you can get some non-trivial gain here. Okay, now of course, you lose something, okay, so you're giving up some cancellation, some potential cancellation because of just using a triangle inequality. So if L is very big, then this may be inefficient. So you've got these two cases. This case, the diagonal case is better when L is big, but worse when L is small. And the off-diagonal case is the reverse. It's actually worse when L is big and better when L is small. And so there's a whole art of picking where to cash your shorts so that you try to balance these two terms to be about the same size, okay? But I don't wanna talk too much about that, that's sort of technical. Okay, so this is the basic strategy. Well, this is the simplest model. Okay, often you have other parameters to average over. And so you often don't just want exponential sums, what do you want to do? Control one exponential sum, but maybe an average of exponential sums, which often leads to multi-dimensional exponential sum estimates. So to get the best results to exploit the most cancellation, you often need to control exponential sums on the average. And this requires often quite high-powered exponential sum tools. So a typical thing that you might wanna care about is that you might start with something relatively simple, like the occlused sum. Okay, but then you might want it to be either average over the A and B parameters with some weights, or average over the Q parameter and hope to get some control on a more compound sum. Maybe it's an absolute value somewhere. And it's harder. Okay, so you have to use some non-trivial techniques to do all things like this. If you're averaging over these sort of parameters, then the typical techniques you need are techniques from Delene's proof of the Riemann hypothesis on over-function fields. And this is what Zhang, for example, ends up using. And what we do in polymath, if you wanna average over Q, that's a whole different kind of fish than there's a whole different set of techniques using automatic forms. I guess Philippe will talk about some of these things. Yeah, so that actually we don't use in Zhang's work and follow-ups. It seems, the automatic techniques are deeper, but they have a weird restriction in that they tend to only work when this A is fixed, like two or something. But for our application, you need A to be very, very large. And then we can't seem to exploit any sort of averaging in Q by these all-morphic methods. And instead, we have to exploit averaging in these sort of parameters, which are using more of the Delene-type methods. Okay, so I guess that that's one of the main differences between Zhang's arguments and previous. So before Zhang, there was lots of work, particularly by Fouvery, Bumbier, Frutland and Vaniage, on getting equidivision estimates for things like these sort of sums for quite high moduli. But they were usually powered, if I'm not mistaken, usually by all-morphic-type estimates, which, yeah, involved averaging and exporting a cancellation in Q. But as such, they were mostly restricted to bounded moduli A. And now the new thing is that by using Delene instead of Clusterman, what is it called? Clustermania, right? You can get, you can now also control variable A. All right. Okay, so these are sort of the general techniques. And then so the whole art, is to somehow, how to put these together. There's also one other technique, which maybe I won't write down, which is just Foubini's theorem. Okay, but you're always interchanging sums into the show game and always trying to tease out the cancellation. But okay, it's, maybe I won't say. That's it. So these are sort of the methods we have. It would be nice to have more methods, actually. I mean, it's, you know, these are great what we have, but they can't do everything. In particular, we can't prove square root cancellation in almost anything with these sort of methods. You can prove some cancellation usually, but you can be the trivial one by a little bit, but not by the amount that you want. Yeah, so like expanding F often is a bit lossy and Cauchy-Schwarz often is a bit lossy. We don't have very many methods that are not lossy, which is a bit of a problem, but we can do something. Okay, so these are the basic methods. So some examples of how they work. So, okay, so the classical example is the Bombay-Vinogradov theorem. Okay, so the Bombay-Vinogradov theorem, the way it works is that, yeah, so you, let's call this, let's say, D, you look at the discrepancy of, for example, the prime counting of a mango function in some, no, no, no, no, no. Okay, that's this and x and two x. Okay, so you take these discrepancies, take the worst one for your D, and then you sum over all D up to something. And the claim is that you'd be the trivial bound, the trivial bound is x times log x to the power. You should be the trivial bound by any power of the logarithm as long as it's theta is less than one half. Okay, so, and you can do slightly sharper than this, but this is one version of the Bombay-Vinogradov theorem. Okay, so this is one equidistant estimate up to level one half, and it's very general. You can replace the mango by any convolution, basically, of two functions with reasonable bounds. Where, yeah, for example, where each of these alphas is at most, say, logarithmic in size, times maybe some power of the Deweyzer function. And pretty much any of the interesting, I think functions we care about have this sort of size bar. Okay, so this particular generalization, by the way, is due to more hushy. Okay, okay, so how do you prove something like this? So you just have to use these ingredients and you just, it's just no matter which ones you pick. So, yeah, so rather than like you prove this to you, I'll just tell you the ingredients that go into it, and then you have to work out that it works, but you have to check all the numerology. But the strategy is that you first decouples into multiplicative characters, okay? And so, if you expand that into multiplicative characters, use the triangle inequality. The great thing about using the triangle inequality here is that it kills the super A because of the A, when you do the multiplicative expansion, the A, when it comes into this coefficient, and maybe take absolute value, it just goes away. So it's a very efficient way of killing the soup. And so, roughly speaking, you end up with something like this. Yeah, so, yeah, so you can break this into character sums, and there's some normalization, which I, maybe this, but okay, you find like this. There's a technical reduction here, which is that these characters might not be principally, what's it called, primitive, yeah, okay. So you do some standard maneuvers to reduce to the primitive case, so let me just ignore that step. It's fairly standard. And then what you do is that you split this into, you factor this into some convolution. For example, one mangle can be split up as the mobius function convolve, the log function, for instance. And then you use this multiplicativity, which I'm sure I wrote down somewhere, but it's gone. Okay, this expression is just split up. Oh, hang on. Before you do this, okay, there's some cases. It depends on where alpha and beta live. Okay, so you can chop things up by pigeonholing, give up a log or something, and you can assume that alpha lives in some range, say, N2N, and beta lives in some range M2M, okay. And NNM, they basically need to multiply up to X, otherwise you don't get any contribution of the sum. Now, there's some degenerate cases where alpha is very small, beta is very small. And in those cases, the bilinear sum is sort of degenerate, and you don't get enough cancellation there. So there's some cases in which alpha or beta is really, really small. Right, yeah, okay, so in fact, you don't do this. Okay, right, sorry. Okay, yeah, so actually, you don't do an AU splitting of this expression, otherwise you might get some degenerate convolutions. You can write any function as a convolution of the direct data function itself. And this is a useless factorization. It doesn't tell you it could be any cancellation. So you need a smarter cancel structure than this, so yeah, you have to split the sum in such a way that the NNM are a bit bigger than this. All right, yeah, so you thought of into expressions like this. If your character has a very small conductor, you can use multiple number theory, like the in fact, the Segal-Wolford's theorem, and you get very good bounds there. So you can assume the conductor is big. And then so that you have many, many different characters now because you have big conductor. And then what you do here is that you do something like Cashew-Schwarz. You can control this by an expression like this and then a similar expression in the betas. And then what you do is that you use a large seven equality, which I also believe I wrote down, but it seems I've gone as well. Yeah, but there's a large seven equality which allows you to control expressions like this like the square sum. And the key point is that as long as your conductors, your characters have conducted less than X to one half, any two characters will still be pretty much orthogonal to each other on the interval from X to two X because the product will still have conducted less than X. And so it's periodic or pure, much less than the width of your interval. And so you have enough orthogonality that you can get lost in the cancellation. And if you crunch through all the calculations, you will eventually get the bomb behavior of the theme you need. But because it relies very much on this orthogonality, the moment you go above X to one half, this whole argument completely breaks down. Okay, so that's the sort of the classical benchmark equidistribution result that we have that gets up to X to one half. And all the more recent ways to improve equidistribution basically do something completely different. I mean, they still use the same box of tools. But they don't use them on a ticket of characters. They rely much more on additive for expansion and Cauchy-Schwarz. I mean, Cauchy-Schwarz is still present. The way you prove the large-sibling equality is ultimately Cauchy-Schwarz. I won't prove it here. Okay, so now what I'll talk about next is how you decompose. Yeah, so as I said, you have something like the Mangold function and you want to split it into pieces. Okay, so as I said, there is a standard splitting which is you can split the Mobius function and the log function. This is this is element is log n. Okay, and then of course, as I said, this is trivial, you can split any function trivially using the direct order function. So delta of n is one and n is one and zero otherwise. And just to complete the picture, Mu's was also, the Mobius function of course inverts the identity as the Mobius inversion form of this is the constant function one. Yeah, so this splitting is not quite the splitting that you want to use because it contains degenerate pieces. Mu can go, these both these functions can go all the way between one and x and if this is too close to one or this is too close to one, then you get a degenerate convolution with no, not enough cancellation to actually do anything. So you actually want to use splittings in which you have non-degenerate summations. Yeah, so you want to split up your Mangold function into pieces. Maybe a convolution of two pieces or maybe many pieces which are somewhat non-degenerate and there's two ways in which you can be non-degenerate. One way is you have a convolution where the terms are kind of balanced that both of these terms are supported on fairly large numbers. Like maybe this guy's supported between n and two n, this guy's supported between n and two m and so n times m equals one is equal to x but maybe n is bigger than some x to some power. The typical thing might be say it's a two fifth or something and m is bound above by the complementary like three fifths. So one good term that will come out is some sort of convolution where both of these terms are of medium size. Then you have a lot of cancellation and if you're coming into many pieces maybe you just group them into two pieces, one of both of which are of medium size. So these you might call some type two sum. Oh, okay, this is gonna be really confusing. Let me not call them type two sums, okay. Okay, so that's one term that comes up. Another term that is good. So having a degenerate imbalance somewhere, one thing is small and one thing is big is bad unless the big term is also very smooth. So if you have a convolution where this is supported on some very small n and some large m, then ordinarily this is bad but if this function is very nice, like if it's a smooth function, so it's just some smooth function of m over m and this is the nice smooth function. So it could be the constant function one truncated in some nice way or maybe the log function is also very smooth function truncated in some nice way. If you have a very smooth function here, this is also a very good term to happen your expansion because if you're trying to control something like this, you split it up into pieces, okay. If this term is, if this function is really smooth, often you can just stick it into your exponential sum and you can just compute this thing really precisely, maybe by a possible summation or something like that. And so you can just control this term so well. Yeah, so if this was rough, then you can't do very much. That's due to absolute values and that kills the cancellation but if this is smooth, then you have a good chance of understanding this sum and often you can actually get a lot of mileage just out of taking absolute values and you can really get good balance. So these sort of terms are also pretty good. So you want to, and yeah, so these are classically called type one sums, but for reasons which will become clear, I don't want to use that notation, okay. Yeah, so you want to split up the Michael function into pieces which are either rough but balanced or imbalanced but smooth. And these are the types of things that you want to do. And so this, that we understand how to do. So okay, so one of the classical ways to do this in sort of the cheapest ways is called von's identity, for example, and this already works for many applications. Although we want the strongest results often this is not quite good enough. So the way this works is that you pick two thresholds, U and V, somewhere between X and one. Typical values might be put in X one-third, this is very typical. And then you can split the von Michael function into a big piece and a small piece. So this is lambda of n restricted to n being bigger than V and lambda of n restricted to n being less than V. And actually typically since we work in X or two X, these functions are the same and this is just zero. And you also split up Mobius into two pieces. Okay, and von's identity, if you just take these three facts about lambda and delta and mu, and you just play with it a little bit, you'll find after this brief calculation that you can split the von Michael function into very strong, okay, these two expressions and then plus this expression. Okay, all right, so there's a certain amount of basic algebra that you split this into these three pieces. I won't do it here, it's not all that difficult or actually all that illuminative actually. But okay, if you believe this splitting, you see these terms are good. These are what are typically called type one terms because if U and V are small, this is a small convolved smooth and this is small convolved smooth. Okay, so that's often a good term. And this term is a good term of the other type. It's large convolved large, okay. You have to group this one into one of these two terms. And so that's another term that you can play with and these are what are called type two sums. And so this is already a very useful splitting. I mean, so one of the classical applications of this is to bound prime exponential sums like this. Okay, so this is one of the things that you can use. It's not quite the most efficient splitting that is out there. So nowadays we use other splitings if we want really sharp exponents and things. So let me first mention inequality that we don't use. Sorry, decomposition we don't use but we sort of morally use which is Linux identity. Okay, so the easiest way to describe Linux identity is to temporarily increase the zeta function. And if you do a Taylor expansion of log zeta around z equals one, formally just the policies of log gives you this expression, at least on the level of formal power series. So this is valid for S really close to infinity for S really big. So if you start with this identity and you equate coefficients, what you eventually find is that you can decompose the vermin code function into an infinite, sorry, into an infinite series. So this is the log function basically like one, one times log, okay. This is one, okay, so this is one minus delta. It's one except when n is one, in which case it's zero. Okay, so it's the large values of one. So this is L, this is roughly like the divisor function times L in some sense. Then you've got this triple thing here. This is like the second divisor function times L, maybe over two or something, over three. And so on and so forth. Okay, so the divisor function is counting solutions to AB equals N, second divisor function is counting solutions to ABC equals N and so forth. Okay, so you have the splitting and in principle all these terms are really good. So this is smooth, so you're going to terms, you just degenerate in which case one guy's smooth, one guy's smooth and one guy's small or it's balanced. And so this in some sense is the optimal decomposition. And morally it reduces the study of equidivision of a mango to the equidivision of one and tau and tau two and tau three and so forth. Unfortunately, this series is infinite and because of that it's not convenient to actually use it directly. So we never actually use Linux identity directly. I don't know actually actual use of this identity other than sort of conceptual. I mean it is actually fine that sound because smooth. Oh yeah, okay, there's the most X terms or something called log X terms. Right, the log X terms and typically yeah, we can only get, we can only control boundedly in many of these terms. So as far as I know, there's no actual use of this inequality. Oh, okay. Okay. Um, keep to mind 6621 linear growth in book when applied this, the total little problem, okay, S++, Y++, one of the time, is implicit in between lines prove from getting the rather theorem in the book. I'm kind of using this at then. OK, I'll take that back then. OK, I know the book up there. I've never actually read past the first chapter, actually. Yeah, yeah, yeah, yeah, yes. OK, maybe I should actually read, yeah. All right, OK, so there are uses. But OK, but for the what we do, we don't actually use it. OK, another question. You mentioned the lack of efficiency of one. Are you referring to the fact that it wastes logs or do some? No, no, no, yeah. So for our applications, you can always waste logs because we're going to gain a huge power of log. Our objective is to gain huge power of log. Now, the. Can I interrupt? Because I will not cover the things at all. So that's important for me to say something. I mean, the number of terms here, as you know, there will be too much. But like X, and that's a killer. I mean, there's too many of them to say it, but log X is huge. So what Linny did is just apply it for n, almost pranks. Ah, yes. The number of terms will be very small. And then, you know, the two of you just don't pranks from the lemmob seed or whatever you call it. And they worked. And then his brown make it much more practical and friendly application based on the same ideas. But yeah. That's a great point. Yes, that's what I'm going to. Yeah, that's the next. You mean there was some sort of pre-severing to get used to? Right, yeah, that's a good point. So yeah, if you apply this, the length of this series is basically the number of prime factors of n. So if you see all the small prime factors, then this is. OK, then I agree. This becomes a useful inequality. Identity, so. OK, yeah, to answer her artist's question. Yeah, so the inefficacy in von's identity is that, for example, if you choose, no matter what you choose with these U and V, you have to, at some point, study, you have to create terms which are imbalanced. And imbalanced in the sense that the scale of this guy can be as low as x to the 1 third. And this guy can be as large as x to the 2 thirds. So the best convolution, the best of type two sums that you can have are those of both x to the 1 half. And often, you don't want to move too far away from 1 half. It becomes, the closer you get to degenerate, so one kind of of x you can't do anything with. So the closer you get to degenerate, the less cancellation you get. And so you want to minimize the distance from 1 half. And it turns out that no matter how you structure von's identity, you always have to do something. There's always at least one term, which is a distance 1 sixth from balanced. And that's a little bit of what you can do. But if you use Heath Brown, which I'm going to cover later, then you can cut this down to 1 tenth, which is actually very important for Zhang's work. And you can cut it on further at the cost of adding some extra terms. So instead of just type one and type two, there's now also some type three sums. Except that Zhang's type one and type two is not the same as Wang's type one type two. It's extremely confusing. Type three, whatever, better not to say. Yeah, this is why I've not written down what's type one and what's type two, because it's going to change very shortly. Yeah. We should give more descriptive names. Like, type one type two doesn't convey any information if you don't know it already well. I think people should say linear, linear, try linear. There you go. There you go. OK, linear. OK, that is actually a good. OK, yeah, type one sums up the linear sums. Except there's nothing linear about the linear term. It's just that it's not by linear, try linear. Right, well, it's here. It's linear smooth. It's just made for linear forms, linear forms, try linear. OK, more accurately, type one is linear forms with smooth coefficients. Type two is by linear forms with rough coefficients. Type three is try linear forms with smooth coefficients. OK, maybe type one, type two is not such a bad name, actually. All right. OK, so the identity that everyone uses now. And in some sense, actually, it is optimal. That we've got this balance issue. You can't do much better than Heath Brown's identity as it turns out. But OK, so it involves a certain problem, the k, which you can pick. And it doesn't really matter what k is. So as long as it's big enough. So k equals 10 is usually good enough for applications. So the way you prove this, so let me derive it first and then state the identity. So you set the Mobius function into two pieces. So this piece is a small piece where you're less than, I think, 2x to the number of a k. And this piece is the big piece. OK, so when k is big, these are all very small factors. And these are the big factors. And the reason you do this is that you observe that, OK. So the Fomengold function is Mobius Convolve L, which is Mobius Convolve 1. Convolve L, because this is the delta function. And you keep doing this. And you eventually, you can expand up mu into a huge convolution involving k terms of Mobius, k terms of 1, or k minus 1 times of 1, and 1 term of the logarithm. So this is a big convolution. And it's not in good form right now, because there's all kinds of terms that are rough and in balance and so forth. But what you observe is that if all the mu's here up were big, then this expression would actually just vanish identically on the interval that we care about, x to 2x. Because the big part of mu is supported only on numbers bigger than 2x to the number of a k. So if you convolve k times, it's only supported on numbers bigger than 2x. And convolving more will not change that. So this just vanishes. So this is 0. This is what we want. So now what you do is that you just rewrite this as mu minus the small guys. You stick this in, expand it out. You get one term that looks like this, and you get a lot of other terms. And so after a short calculation, you get heat to keep this brand identity, which is that the Wrangled function actually splits into a linear combination of the small mu's, convolve l, and then another linear combination of a few more, and so forth down to a lot of terms here. So you get a lot of terms. But only if I don't know, you get about k terms. You get a first term, which is a lot like this l. Thank you. But I now only have two boards. Where did Linux identity go? Did I erase it? Yeah, you already did. OK. Well, that was not done properly again. But yeah, this term was, so Linux identity started with it, like something like an l, something like a one convolve l, one convolve one convolve l, and so forth. So this is a similar identity. There's some extra terms, but they're small. So at least initially, we almost don't notice them. They become more of a nuisance at the end. OK, yeah. But basically, when you do this convolution, when you do this splitting, it's like Linux, but with about a number of terms. And so the point is that you get lots of convolutions, but all the big terms are smooth. And so you have a really good shape of splitting into lots of good terms. So if you use this identity with k bigger than 5, I think, in particular, what happens is that, so using this, you can split lambda into finite number of pieces of various types. And so, OK, so there's a certain combinatorial argument that I will skip. All right. So yeah, these are terrible names, but these are the names that we stuck with. You can split lambda into three types of pieces. One are the linear smooth sums, which I'm going to call type 0, even though they're usually called type 1, which these are sums of the form lambda convolved psi, where psi is smooth and supported at a big scale, supported at some scale bigger than x of 3 fifths, as it turns out. So you've got a small, you've got a small rough thing convolved a really smooth guy. So these are really good terms, as it turns out. And so the first few terms basically give you something like this. Then there's the bi-linear sums. So these are things like alpha convolved beta, where alpha and beta are now rough, but they're supported between x of 2 fifths and x of 3 fifths. So within 1 tenth of the square root of x, as opposed to von's identity, which is only within 1 sixth. And then there's the type 3 terms, which look like convolutions of three smooth functions. And these guys are supported on N1, N2, and N3. And this is supported on something. And N1, N2, and 3 are reasonably big. As it turns out, the way I've set this up, between the 2 fifths and 3 fifths, maybe up to some logs or something. And there's some other technical condition, I think, that's between 1 fifth and 2 fifths. And their products are bigger than x of 3 fifths. OK, so they're not 2 degenerate, but they're somewhat small. But there's three of them. OK. And so it turns out through a certain routine energy combatorics that if you split all these terms up into dyadic pieces, all the terms that you get will be of one of these three types. And this fifth is optimal. If you want to get even narrower, then 1 tenth. You can actually get within 1 14th of x of 1 half. But then you have to introduce not just tri-linear convolutions, but also type 4 and type 5 sums, which you have convolution of 4, convolution of 5. But we don't understand the fourth and fifth divisive function at all. So this is the limit of our current type of technology so far. So it turns out that using the Heathman identity, you can split up the von Mönchger function into these three pieces. And then to prove equidistribution, by finitely many, actually, it's like polylog, some of the pieces that you end up splitting into. But because we're going to gain a huge pile of log, it doesn't matter that we lose some logs here. And then so in the last lecture, I'll tell you how to prove equidistribution estimates on each of these individual sums. And then you just use triangle inequality. And this will give you equidistribution estimates on the von Mönchger function. OK.