 What I'm going to talk about today in this last lecture is applications of the material that I've been discussing to the primes. So I'm going to start by stating the results that I want to sketch some proofs of. And in stating any results about primes, in fact, it's convenient to introduce the font-mongold function. And that's defined by lambda of n, which is going to be log p if n is a power of a prime p and zero otherwise. And really, the reason you introduce this is to avoid having huge numbers of logs involved in your statements. So the prime number theorem is equivalent to the statement that the average value of the font-mongold function is 1, as n tends to infinity. So the applications to primes that I want to talk about are about counting how often linear forms take prime values. So I'm interested in finding prime values of linear forms. So I'm going to be interested in a collection of linear forms. Let psi equals psi 1 to psi t be a collection of t linear forms, affine linear forms. So those are going to map. And they'll be in d variables. So they're mapped z to the d to z. So for example, e.g., I could have t equals 4, d equals 2, and the form is defined as follows. psi 1 of n is n1, psi 2 of n is n1 plus n2, psi 3 of n, and so on. And this is the familiar system, a system we've been talking about quite a lot. This is talking about four-term arithmetic progressions. So one thing I'm going to talk about is how to count the number of four-term arithmetic progressions of primes. But this is a lot more general. So to give another example of something that is covered by this, so I could take again what t equals 3, and d equals 2, and the forms as follows. So psi 1 is n1, psi 2 is n2, and psi 3 is some constant, big n, minus n1 minus n2. And then if I can count how often those three forms are primes, what I've done is represent n as the sum of three primes. So n equals p1 plus p2 plus p3. So that's what's called the ternary-goldbach problem. Actually I could state the binary-goldbach problem in the same formalism, of course, but that's not going to be covered by my main theorem, as you probably know. So what is a general theorem about this? So this is a theorem of myself and tau, and with a crucial input by myself, tau and Ziegler, that being the inverse theorem for the Gauss norms, which I've talked about. So the theorem is if you take any collection of linear forms, so suppose that this collection, psi, has finite complexity. I'll define precisely the complexity in just a moment, but what finite complexity means is essentially that no two of the forms are multiples of one another, which means no two of the homogeneous parts, so I'll call those psi i dot. So the homogeneous parts, just what you get by ignoring the constant term, are multiples of one another. And then the conclusion is that you can count how often those forms are all prime. Well, I'm going to count over a certain region. So suppose that k is a subset of a box of width n. I'm going to say it's convex because that's what we did in the paper where we worked these things out, but really any nice set could be decomposed into convex set. So just suppose k is a nice set that you're going to count over. Then if you count over k intersected with the lattice points of k, and you count how often those forms are all primed in this weighted manner using the von Mongol function, then well, there's what we call a local global principle. So asymptotically, this can be computed as a product of certain local factors, beta sub infinity times the product over primes, beta sub p, plus an error that's expected to be small, so little o of n to the d. So I'll tell you what the local factors are in just a second. And where this, well, the rate of decay of this, well, it will certainly depend on d and on t. And it also depends on the size of the coefficients of the psi. So the coefficients of the homogeneous parts. Now I'm allowing affine linear forms, and I'm allowing the constant term actually to be rather big. So it also depends on one over n times the constant terms, but other than that, it's uniform. So for example, you get a, this does decay for both of these systems that I've written up on this board. So here, so beta sub infinity, well, that's really just how many lattice points I'm counting over. So it's the volume of k. It's not actually, it's not quite the volume of k because primes are always supposed to be positive numbers. So it's the volume of that part of k on which all of the psi i's are non-negative. So it's essentially just the number of points you're counting over. And the beta sub p's reflect local behavior of the primes. So beta sub p, well, it's kind of a local analog of p of this same average. It's going to be the average over m now ranging over z mod pz to the d. Of the same expression, but with local variance of the Fomongol function. So what should a local variance of the Fomongol function be? It should be a function with average value one, which is somehow detecting the local behavior of the primes mod p. So what are the primes mod p? Well, apart from p itself, which is just one prime among infinitely many. The primes are equidistributed in residue classes other than zero. And so that makes this definition quite sensible. The local Fomongol function of x is going to equal p over p minus one. If x is not divisible by p and zero if x is divisible by p. So you can think of this as a local global principle for the primes on linear forms. So in any given case with a little bit of work, you can go and compute exactly what this formula gives you. And I'm going to tell you the outcome in one case. So exercise, so this exercise will involve well substituting into the above. But then you have to remove the weights that I've given to the primes from the Fomongol function. So plus removing the log weight, which is a simple matter. So it's a formula for the number of k term progressions. So the number of k term progressions less than n of primes. Well, it's asymptotically the following constant. So it's 1 over twice k minus 1 times the product over p of beta p times n squared over log to the k, n. Where beta sub p is given in the following formula. So beta sub p is 1 over p times p over p minus 1 to the k minus 1. If p is less than or equal to k, and if p is greater than or equal to k, 1 minus k minus 1 over p times p over p minus 1 to the k minus 1. So it's quite complicated, actually. Of course, if k is 4 or something you could work out in an explicit numerical value for this. So that's the theorem whose proof I want to talk about. Are there any questions? I imagine not. So mostly, so far in these lectures, I've talked about just some very specific examples, for example, 4-10 progressions. And I haven't talked about general systems of linear forms at all. So maybe I'll just say a couple more words about general systems of linear forms. The theory of Gower's norms, et cetera, applies in the general context. Well, in the general context that I've just described. So for finite complexity, systems of affine linear forms. And so in fact, there's a big generalization of what I call the generalized von Neumann theorem. So there is a generalized von Neumann theorem. So remember the generalized von Neumann theorem said that if you want to count 4-term progressions, the example I gave said if you want to count 4-term progressions weighted by some functions, then it's enough to control the Gower's norms of those functions. So let me introduce the obvious generalization. So introduce an operator T sub psi. And I guess it also depends on the convex body k of some functions. F1 up to FT will just be basically the average value. So 1 over n to the d, let's say, doesn't matter too much, whether I take n to the d or 2 n to the d. And then the sum over this convex body or the lattice points of the convex body of those functions weighted by the appropriate linear forms. So it's a generalization of the multilinear operators I considered before. So before, I talked about, I think, just 3-term progressions and 4-term progressions. And so the generalized von Neumann theorem states that if the fi are bounded, we have this is bounded by Gower's norm. T of psi k F1 up to F of t is bounded by some constant times any of the Gower's norms, where s is something called the complexity of this particular system. And there's a linear algebra recipe for computing the complexity. I'll tell you what it is. It's not, well, I'll tell you what it is and then make some remarks about it. So what it is, it's the smallest s such that for any i between 1 and t, I can partition the forms psi i, the forms other than psi i. I can partition psi 1 psi i minus 1 psi i plus 1 up to psi t into s plus 1 classes with the property so that psi i is not in the affine linear span of any of them. So it's a strange sort of recipe. I could explain, though, why, for example, what the complexity of 4-term progressions is. So for example, if capital psi equals psi 1 up to psi 4 is the system defining 4-term progressions, then the complexity is 2. And the reason the complexity is 2 is clearly if I remove any one of those four forms, I can partition the remaining forms into three classes, just the trivial classes. And then the form I remove will certainly not be in the affine linear span of any of them because they're independent. But I can't do it with two classes because if I try to do that, I'd have to have two forms in a class. And then I can write every form in two variables as something in the linear span of those. So I exercise to convince yourself of that. I wrote down some other ones. So the complexity of the Vinogradov 3 prime system is 1. And somehow what this means, so complexity 1 means that the Gauss-U2 norm is all you need. And as we've seen, the Gauss-U2 norm is something that's just classical Fourier analysis. And indeed, writing odd numbers as a sum of 3 primes is something that was done quite some time ago using types of Fourier analysis. And thus can be handled using Fourier analysis, traditional Fourier analysis, or more accurately, the Hardy-Littlewood circle method. So Allah, Hardy-Littlewood, I probably should have an accent on that A. If I'm going to write two words of French in my six lectures, I should at least get them right. Is coefficients in Z or Q? Z. Yeah. I mean, actually, really, all that's important is that they're Z-valued if you wanted coefficients in Q, but with Z-valued. No, I meant in your finding a span. For example, have something like 2 and 1 and 3 and 1. Yeah, over Q. So it could be handled by Fourier analysis, Hardy-Littlewood and Vinogradov. And that was done in the 1930s. And then I wrote down one more example, which is the example of a cube. So cubes, which would be of the form. So I take all of the forms n1 plus the sum over i in A of ni, where A ranges over all subsets of 2 up to d. So this is 2 to the d minus 1 forms. And the complexity is d minus 1. So there are many different d minus 2. So there are various different types of systems of forms that you can handle here. Now, as I said, I've just stated what the complexity is. Why is that the complexity? Well, that's a bit difficult to explain. Basically, the proof of this generalized von Neumann theorem is, again, some applications of the Cauchy-Schwarz inequality. And it turns out that the number of Cauchy-Schwarz inequalities that you need to prove this bound depends on you need to arrange the forms psi i by various changes of variables so that those Cauchy-Schwarz inequalities give you the right thing. And once you've decided upon that idea, working out what the correct number of Cauchy-Schwarz inequalities you need is, it boils down to linear algebra. It comes down to just a linear algebra problem to which this is the solution. I don't think I can add any more intuition to that. So that's the statement of the theorem. And well, I'm going to tell you some of the ideas that go into the proof. And I think we'll lose nothing by specializing to the case of four-term progressions. So for simplicity. But everything I say, for definiteness, not simplicity, let's specialize to the system that we've been studying. Psi equals psi 1 up to psi 4 of four-term progressions. And so let's, again, write t for the operator there. So t of f1 up to f4 for the corresponding operator. So I talked about how you use this operator and generalized von Neumann theorems to count arithmetic progressions. And the basic idea is that you split, you split the characteristic function of the set you're interested in into a structured and a random part. So the way I did it when looking at Szemeredi's theorem, so the first idea would be just to split the von Mongol function, so to estimate t of lambda, lambda, lambda, lambda, which is what we want to do. You might try subtracting off the average value. Try writing lambda as one, which is its average, plus lambda minus one. And then decompose just as we did before. T lambda, lambda, lambda as t1, 1, 1, 1, 1 plus 15 error terms, terms involving lambda minus one. And then, so maybe this is equal to t of 1, 1, 1, 1 plus order of the Gower's U3 norm of lambda minus one. And then perhaps one could show, just as we did when looking at Szemeredi's theorem, that this is small, and hence the number of four-term progressions of primes is dominated by the main term here. Now it turns out there are a very large number of issues with that plan. So here are some problems. Well, I'm going to list several problems with it. One is, if it works, it would give the wrong answer. So we know it's not going to work. So the answer it would give is just wrong. And the way it's wrong is we've not seen any of this local behavior of the Fond-Mongol function at primes p. So where are the beta p? So there's no, we've not taken any account of irregularities of the Fond-Mongol function modulo two, three, and five. Second problem is while in making this assertion here, I applied the generalized von Neumann theorem. But it wasn't actually valid to do that because the function I applied it to is not bounded. So the application of the generalized von Neumann theorem was invalid, and that's because lambda minus one is not bounded. It's not bounded by one. It's not bounded uniformly. So I'm just going to see if I can find some chalk of appropriate length, maybe that will do. So that's not so great. And then third, even if that had all been valid, well how am I going to show that this u3 norm, lambda minus one in u3 is indeed small? So how do I propose to show? So how do I propose to show that that's little l of one? The tool I have at my disposal is the inverse theorem. So the inverse theorem for the u3 norm was again only valid for bounded functions. So I only stated it, and I also only proved it. I didn't prove it, but the proofs that I discussed were only valid for bounded functions, for bounded functions. So there are, well there are three very serious problems with the plan. However, the plan can be made to work by addressing all of those issues. The first one is actually the easiest to address. So point one is addressed using something called the w trick. And the idea here is that you take w to be just the product of the first few primes. You need to take a number of primes that tends to infinity. So there's nothing really special about log log n, just any slowly growing function would work, but you want to make sure that it doesn't grow too quickly. And consider, well, and write lambda as an average of the functions which I'll call lambda sub wb. And these are defined as follows. Lambda sub wb of n is lambda w over w, sorry, phi of w, Euler phi of w over w, lambda of wn plus b. So that's basically saying, let's foliate the integers into progressions modulo w. So here, the highest common factor of b and w is one. And this has the effect of smoothing out the irregularities modulo small primes of lambda. So these new functions lambda sub wb are now do not have bad irregularities, irregularities modulo primes p that are less than the threshold I took. So to explain why that's so, well, and where this trick comes from, let me just show you the idea that led to this. So if I take the primes three, five, I know two is a prime, but it's always best to ignore it. Let's list all of the primes. So of course, those are all odd. But if I consider what happens, if I consider the n for which two n plus one is prime, well, then I instead get the sequence one, two, three, five, six, eight, nine, 11, and 14. And you'll see that that is no longer consists entirely of odd numbers. And in fact, it's half even numbers and half odd numbers. So half, or at least it would be if I continued it asymptotically, half even, half odd. And the reason for that is that a number is equally likely, a prime numbers are equally likely to be one mod four as they are three mod four. So primes are equally likely to be one or three mod four. And that's a result of de la valet-puisin. And appropriately enough, Adamard, I think, one of these guys proved the prime number theorem and with what was available at the time, they could easily have proved the version of the prime number theorem in progressions one mod four or three mod four. So that's all I'll say about that. I don't want to continue working with lambda sub WB. So I will go back to working with lambda, but one should pretend that I've done this trick. And so in other words, we're going to pretend from now on that lambda itself is nicely distributed in modulo small primes. So we'll pretend that this trick has been applied and that lambda itself is nicely distributed to small moduli. So in reality, you do have to include the W and the B and that really the only effect of that is that throughout the paper, there's a W and a B, which is a bit of a pain. So two and three are more interesting. I'll deal with them a little bit together. So the point about two and three is that although the Von Mongok function is not bounded, it is bounded by something that's much easier to understand than the function itself. So for two and three, the key observation is that although lambda is not bounded, it is bounded above by another function. It is point wise bounded by another function nu by a constant times, constant multiple of some other function, which we call nu and this function nu is much better understood than lambda itself. And this function nu comes from the theory of the sieve and in particular what's called the Selberg sieve. So Selberg's sieve. So I'll just tell you very briefly the key idea because I feel that everybody should see this idea at some point if they haven't seen it before. The amazing observation is just take any weights lambda D. So let lambda D from D equals one to some threshold R. So R is going to be a small power of n. R is n to some small power. Let this be any system of weights and consider the following function. Sum over D divides n and D less than or equal to R. I'll call it something, f of n lambda D squared. So it's just an arbitrary system of weights but I do want to have that lambda one is one. So consider a function like that. Then I claim, I mean obviously this is a non-negative function but furthermore if n is prime and if provided it's not tiny, so provided it's between R and n or bigger than n is also fine. Then f of n is equal to one and that's clear. I mean it's the sum of divisors of a prime number. There are of course only two such divisors and only one of them's in the range that I'm allowing myself. So manifestly f of n is always non-negative and so f majorizes the characteristic function of the primes. Now it may not be completely obvious to you. It shouldn't be but f is an easier thing at least if the weights lambda D are chosen in some sensible way. F's an easier thing to compute with. So for example if you wanted to prove, if you wanted to figure out the average of f while you just sum over n less than or equal to n and then you expand out the square and you get a sum but the point is if R is relatively small it's a fairly short sum. So you can estimate it, generally you can estimate it without too much difficulty. Provided R is not too big. So it's reasonably easy to compute with in cases that arise. So f is often quite easy to compute with if R is small, if C is small. But who said that there's a choice of the lambda D for which f is, I mean for a typical choice of lambda D just if I choose them randomly f is going to be much bigger than the primes. So although it majorizes the primes it will also be supported on many other numbers and be big. So the remarkable fact is that you can choose the lambda D so that this is not too much bigger than the primes. So the lambda D can be chosen so that well for example the sum over n less than or equal to n of f of n is bounded above by a constant depending only on C times the number of primes less than n. And that's what Selberg did. So the way, I mean one way to make a sensible choice is just to look at the expression you're interested in. We want to make the total weight of f small and it's actually a quadratic form in the lambda D's essentially. So you just minimize that quadratic form and that's how you choose your weight's lambda D. So that's a ridiculously short course in the Selberg sieve but this is basically the idea and the one further thing I should say is that if you want to majorize the von Mongolk function you can take nu of n to be log of n simply just log of big n times f of n. So then I mean it's also usually a good idea to divide through and renormalize so that the total mass is one. So let's also, so where C is chosen so that the average value of nu is just precisely one. Yes. So if your n's are d maybe which are small in the form of nu. So when you predate. Very good, yes. Actually they could probably both be small but certainly at least one of them should be small and it's the argument of f. So choose C so that the average value of nu is one and then this will majorize the von Mongolk function up to a constant. So lambda of n will be bounded by constant times nu of n. Point wise. So the existence of this function nu allows you to prove two. So it allows you to prove a generalized von Neumann theorem for certain, well for functions like the von Mongolk function. So it's not bounded but it is bounded by something that you can compute with. And I didn't show you the proof of the generalized von Neumann theorem but I did remark that it uses the Cauchy-Schwarz inequality and basically you do the same Cauchy-Schwarz inequality but whenever you were tempted to just throw away a function because it's bounded by one you must resist that temptation and instead make sure you include the weight nu and then essentially the same computation works and you find yourself needing to evaluate a large number of expressions involving nu but that can always be done because they've been chosen specifically so that you can compute with them. So I'll write a few remarks on that. Well actually because I'm, actually I'm not going to write down what I just said because I want to say a few more things. Don't want to run out of time. So what about point three, the inverse theorem? So it turns out that you can also prove the inverse theorem as I stated it, namely that functions with a large u3 norm correlate with a class two null sequence. That's also true without the assumption that the functions are bounded, provided they're bounded by say nu. So it turns out that we can prove a variant of the inverse theorem for the Gauss norms for functions bounded by nu. In fact by nu plus one bounded by a constant multiple of the average of one plus nu. So that's another function which has average value one. And an example of this would be lambda minus one which is the function that we actually care about. So the theory here does not use specific properties of the exact construction of this nu coming from the Selberg sieve but rather a host of properties about linear forms in involving nu that can be established in that particular case but are also valid for more general functions. Now we don't do this by going through the proof of the inverse theorem step by step and making sure that we only used the fact that we're bounded by nu but this is actually a consequence of the inverse theorem as a black box. So this follows from the inverse theorem for bounded functions together with a decomposition result. And this decomposition theorem was actually the key idea in my first joint work with Terry Tao where we proved that there are arbitrarily long progressions of primes. So this is from 2004 and the decomposition theorem is that lambda minus one say but actually any function bounded by a constant multiple of nu bounded by a constant multiple of nu or one plus nu or what have you. So for a wide range of functions you can decompose them as a bounded function bounded by one plus an error that's extremely small in the Gower's norm. So plus a function that's tiny in the U3 n norm. So why would this then imply the inverse theorem for lambda minus one? Well if the U3 norm of the left hand side is large then because this is tiny the U3 norm the bounded function is large. So I can then apply the inverse theorem for the bounded function which then correlates with a class two nil sequence. This doesn't correlate with a class two nil sequence because it's tiny in the U3 norm. And hence this correlates with a class two nil sequence. So that's the mode of argument. I should write down how that goes. So lambda, let me give these functions names. So let's call this little f and let's call that little g. So lambda minus one in U3 of n is greater than delta, let's say. Implies that the U3 norm of f is greater than delta over two if I chose parameters correctly. And that implies by the inverse theorem for bounded functions that f correlates with a nil sequence. The inner product of f with chi is large for some class two nil sequence. And then that implies that lambda minus one correlates with a class two nil sequence because what g does not correlate with a class two nil sequence. So the inner product of g with chi is tiny. And that's by the converse of the inverse theorem. So remember I said that the inverse theorem for the Gauss norms comes with a converse. It says, so the theorem itself says if you have a function with a large U3 norm then it correlates with a class two nil sequence. Conversely, if you correlate with a class two nil sequence then you have a large U3 norm but g does not have a large U3 norm. So we've managed to prove the inverse theorem for a function that's not bounded. And that has actually now addressed all of the problems that I foresaw with my plan. Questions at all? Do you still need to wish a contradiction? Yes, exactly, I haven't. Right, so this is precisely. There's no contradiction yet because I've not ruled out the possibility that lambda minus one does correlate with a class two nil sequence. It would be strange if it did because lambda minus one is to do with the primes and nil sequences have nothing to do with the primes. But that needs to be proven. So the remaining task, the remaining task then is to show precisely what Emmanuel just suggested. So the remaining task is to show that the inner product of lambda minus one with chi and maybe I should just clarify exactly what I mean here by the inner product. This is the average value over n less than big n of lambda n minus one times chi of n. I would like to show that this is small, little o one for all class two nil sequences of fixed complexity. And the error would be allowed to depend on that fixed complexity. So that's the task. And this involves techniques that come from the classical additive prime number theory. But there's an initial step which is it reduces to a similar question. I mean, I'm oversimplifying quite a bit with what I'm about to say. But it's a fact, something that comes up quite often is that questions about primes are related to questions about the merbius function. So reduces using fairly standard techniques based on the fact that the von Morgholt function is, well it's the Dirichlet convolution of the merbius function with a smooth, with a nice analytic smooth function log d. So you can always take a sum over the von Morgholt function and write it as a sum of things involving the merbius function. And this log d is usually very benign. So it reduces to establishing that the inner product of merbius, I will just remind you what merbius is in one moment. So merbius inner product chi, which is the average value of mu of n chi of n is a little o one. The notation here is probably a bit unfortunate because in analytic number theory, chi is always a Dirichlet character. So I think probably I'll rewrite these notes at some point and call it something else. So if anyone happens to watch this video at precisely this point, this is not a Dirichlet character, although it could be. No, because it's an average. Yeah. So merbius is just minus one to the power of the number of prime factors of n, parity of the number of prime factors. And what we're trying to prove here is an instance of what's known as, I think this terminology was introduced by Freelander and Devaniats, but it's something that's all pervasive in analytic number theory. It's an example of what's called the merbius randomness principle, which is the idea that the merbius function should just be orthogonal to everything. Is there some? Oh, I thought the merbius function was zero if n was not sort of free. Oh, yes. Oh, well, actually, I mean, everything that I would say would be true with the function that's the Louisville function that if you don't include that square free condition, which is in some ways more natural. Depends. In other ways, it's less natural. So the merbius randomness principle is that the merbius is orthogonal to everything unless there's some obvious reason why it shouldn't be. So it's clearly not orthogonal to itself and nor is it orthogonal to the von Mongol function. But if you take a function coming from somewhere else, so coming from an algebraic construction or a construction like these nil characters, there's simply no reason that it should correlate with the merbius function. So that's a heuristic. Proving it in any given case is not always so easy. There are basically two ways that I know of of proving a merbius randomness principle. So there are two methods for proving rigorously that the inner product of merbius with a particular function is little r of one for some f. And those methods are, roughly speaking, there are two cases. If f is kind of multiplicative, so if it is a Dirichlet character or if it's just the constant function one, then the techniques that you would use would be techniques involving L functions and the zeta function if f is just one. So if f is multiplicative in some vague sense, if it somehow has multiplicative structure, then we use L function on contour integration type techniques, Perron's formula, et cetera. And if f is not multiplicative, if f looks to be far from multiplicative, then we use a different method that's called the method of bilinear forms. And I'm going to tell you, so it's the second method that's going to be, well, actually, one does, to show that merbius is orthogonal to class two nil sequences, one actually has to use both methods depending on what the nil sequence is because actually the constant function one is itself a nil sequence. But if the nil sequence doesn't look like a constant function, then it's this method that's going to be used, this method of bilinear forms. And this is associated with various names, such as Vinogradoff, Linnick, Heath Brown, and Vaughn, and others. A well-traveled method in analytic number theory. And the idea. So for those of you who have read or are interested in reading Zhang's proof of bounded gaps between primes, this method is also important there. It doesn't come up in the Maynard-Tau approach to that theorem. So the idea is to decompose the merbius function into Dirichlet convolutions. So it's going to be written as a sum of a few functions. So functions that are going to be of the form f star g of n. So this is Dirichlet convolution. Sum over d divides n f of d, g of n over d, as sum of. And to get the method to work, you've got to be careful about exactly how you do that. And in particular about the ranges that those f and g are supported on. So there's a lot of flexibility. A lot of flexibility in how this is done. And somehow there's one basic identity that governs this entire endeavor, which is called Linux identity. And that's the identity that merbius, mu of n, is the sum over all integers k of minus one to the k times the kth divisor function of n. Well, in fact, what's called the kth proper divisor function of n. So tau sub k prime of n is the number of ways of writing n equals n one times up to nk with all of the n is strictly bigger than one. So tau sub k is itself, it's in fact a k-fold Dirichlet convolution. And so there's many ways this gives a lot of flexibility in decomposing mu into a sum of Dirichlet convolutions. Linux identity is a fun exercise. And I think I can give you a hint that will let you do the exercise in about half a line. It relies on the fact that one over zeta, which is the Dirichlet series of the merbius, can be expanded as one over one plus zeta minus one, obviously. And that's the sum of zeta minus one to the j by the geometric series formula. So if you compare coefficients of both sides, you will find that you've come out with Linux formula. So it's a really natural thing. I mean, actually, for actually doing this, Linux formula is not useful. And the reason for that is that the sum over k, k is quite, can be large. So what one would actually use is a formula of Heath Brown, which is a truncation of Linux formula. Or depending on your application, and for this application, we don't need too much sophistication. There's an identity of Vaughn. This is Vaughn, by the way. I know that this is a, English pronunciation is pretty irregular, but that's about as irregular as it gets. So when I say Vaughn, I mean that Vaughn's identity is popular. It's easy enough to prove and you'll find it in books, and that would also suffice for what I'm talking about. So you've decomposed your merbius into Dirichlet convolutions. So now what do you do? So once we have a decomposition of merbius as a sum of Dirichlet convolutions f star g, well then let's try and evaluate the inner product of one of those, the average f star g of n, chi of n, where chi is now a class two nil sequence, let's say. So I can evaluate the inner product of merbius with chi as a sum of these things. And let's suppose that I've done this decomposition in such a way that I've got nice control over the support of f and g. So for simplicity, let me suppose that f and g are supported near the square root of n. So f of x and g of x are supported where x is roughly the square root of n. So that's just one case, but it's a case that could come up. So the idea is that you take that inner product, let's call that something. So star is going to equal, well it's going to be roughly the average value over a and b of size about square root n of f of a, g of b, chi of a, b. It won't be quite that, there'll be some weights, but roughly that. And then by Cauchy-Schwarz, one application of Cauchy-Schwarz, that's at most the average over b and b primed of, well of the average over a, and all of these of size about square root n of chi of a, b, chi of a, b primed. So by using Cauchy and also the fact that f and g are bounded by one, let's say. So I'm simplifying a vast amount here, but the reason I'm doing that is that I want to illustrate what the key point is, which is the evaluation of this sum. To see what this is, I prefer to do a change of variables, just to have a look at it. So this is going to equal. I'm going to call it the average over n less than root n of chi of dn, chi of d primed n. So here I just substituted a equals n and b equals d and b primed equals d primed just because I feel that's more suggestive. So what is this? Well, if I expand it out, so chi is a nil sequence, so it's phi of p of d of n times phi, this should be bars as well. And phi of p of d primed n bar and that is the average. Well, it looks like a correlation of two nil sequences, but you can interpret it as one nil sequence. So let's just write it as the average over n less than equal to n of phi times phi bar of p of dn, p of d primed n. So for fixed d and d primed, so this is square root n, for fixed d and d primed, and d and d primed are fixed. This is an average of a nil sequence on g cross g. So gamma mod g cross itself in which the polynomial sequence, so with automorphic function phi cross phi bar and polynomial sequence n maps to the pair p of dn, p of d primed n. So what it boils down to at the end of the day is understanding how the distribution of these polynomial sequences on g cross g for different d and d primed are related to the distribution of p itself on g. So more precisely, what one does is reduce to the case where p is equidistributed on g and it then turns out that most of these sequences are equidistributed as well. So for that, you need the whole theory of distribution on nil sequences that I discussed in lecture four. So a similar sort of technique, but instead of being based on Linux identity, but rather on a criterion of Debussy, I believe, and Katai, there's some very interesting work of Bougain, Sarnac, and Ziegler. Which establishes this Möbius randomness principle for things that are quite a bit more general than nil sequences. So in fact, nil sequences come from discrete time flows on nilpotent homogeneous spaces g mod gamma, but they deal with unipotent flows on more general g mod gamma, the same generality as Ratner's theorem. I think so, I think it's written completely. The key point is that they prove a weaker result than we do. They just get a little one of cancellation. So what I didn't say is that we actually need to beat the little one by a big power of log. But I think for just little one, their result does give this. You only need this for fixed D and D primed, I'm pretty sure. Right, well, I think that's about all I have time to sketch, so I'm going to stop there. Yeah, how much of all of this I mean, how much of all of this is needed if you just wanted to prove that there are infliging in four times everything? Essentially none of it. So all of these lectures have been about work that was subsequent to my work with tau on progressions of primes. So that work is quite a bit softer than this. You don't need nil sequences at all. That work was used here on the top board, in fact. But you actually, to understand that paper, you wouldn't need to know. I mean, you need the Gower's norms. That's an important part of it, but you don't need anything about nil sequences. I suppose you wanted to stick with the way you stated it with psi one, psi eight. Yeah. But you only want to know that you think you made those. Yeah, so for that, I believe you do need all of this theory in general. Unless it's somehow, so there are sort of homogeneous systems of linear forms. You need there to be an analog of Samarit's theorem. So already for the system of forms X, Y, X plus Y, there's no analog of Samarit's theorem. You can have a positive density set with no solutions to that equation. Just the odd numbers. So as soon as you hit upon that issue, you're going to need to use this more elaborate theory. So basically what you're saying is that the fact that you get the exact same touch at the end is basically not more costly than proving it. Right. I even wonder whether it may even be the case that if you had lower bands for just every system of forms, this may even automatically imply an asymptotic somehow possible. They'd have to be compatible. So, you know, technical things, so at the end here, so there is star G, and so of course, new is it, so top line of top lines of K is a K, K book. Yes. So you have to do what you didn't do here. Actually, no, because you can decompose a K-fold convolution. You can just arrange it as a two-fold convolution. Yeah, in general, they're not going to be banded by one. I mean, again, that was just for simplicity, so they'll be banded by some power of the divisor function. But yes, you only ever deal with the, well, we only ever deal with the two-fold convolutions. I think that there are, in fact, some parts of Zhang's work used more carefully, the sort of triple convolution structure and so on. My last question is, can you envision some arithmetic implications of this program, or what else I did here? Yeah, that's a good question. I don't know. So one thing I think, maybe when you were referring to some work of theirs that was not completed, I think maybe what they'd like to do is understand orbits in G mod gamma at prime return times, for example. And I think even for SL2, that's open. So I don't know, but I don't believe, or I certainly don't know of any arithmetic applications that are parallel to the ones that I've been discussing in these lectures. So I think that if you replace your nil sequence with the horizontal flow. Yeah, you could evaluate a horizontal flow at some automorphic function on G mod gamma. I don't know what the significance of this is in general. It's actually an interesting question, for sure. I have one more question. So I suggest we thank Ben for his wonderful set of lectures.