 And today I want to talk about some things related to Szemeredi's theorem, mainly. But before I get started on that, I would like to say a few words about the inverse theorem for the Gower's norms, which is something I've mentioned several times in these lectures. Let me just remind you of the statement. So suppose that f is a bounded function, so bounded point-wise by 1. And suppose that it's Gower's case norm, is at least delta. Then the conclusion is that there is a null sequence chi of m, which is phi, an automorphic function evaluated at a polynomial of class k minus 1, which correlates with f. So the inner product of f with chi is at least some positive constant. And I spent, in fact, a whole hour going over the detailed formulation of this, which requires you also to put some bounds on the complexity. So with bounds on the complexity and on the smoothness phi. So I've mentioned this many times. And it is a very difficult theorem, very long to prove. But I can at least give some ideas of how the proof goes. And there's one thing in particular that I have neglected to mention so far, which is that in the case k equals 2, this is actually very straightforward. So I'm going to make a few remarks on the proof. So the case k equals 2 is classical Fourier analysis. And when I did some classical Fourier analysis in my first lecture, I got the normalizations very wrong. And I know, Sophie's left. So if I get them wrong again, there's nobody to correct me. And the point here, anyway, is that there's a formula. The Gower's U2 norm of n to the power 4. Well, it's a sum of f over parallelograms. And actually, I normalized it like that. And actually, there's also an additional constant there coming from the fact that I'm not working in a group. But it's basically this. And this can be expanded. So it's constant over n cubed times the integral from 1 to 0 of the average value of the exponential sum over f to the power 4 times n to the 4. So this is a formula that's very similar to the formula I showed you for copies of x, y, and x plus y awaited by f. You can establish it by orthogonality relations, so proof by orthogonality relations. And we also have, to combine with that, the Parseval identity, so Parseval identity. So that will tell you that the L2 norm of f, so the sum over f of x squared, is the integral from 1 to 0 of this exponential sum squared d theta times n squared. So if you compare these two facts, well, the result follows quite easily. So under the assumption that the Gower's norm of f is at least delta, so if f in u2 of m is at least delta, then that implies that the L4 norm of that Fourier transform integral, so that is bounded below by a constant times delta to the 4 over n. But on the other hand, this is, at most, the sup over theta of this exponential sum squared times the L2 norm. But if you compare those two sides and use the Parseval identity, you get the statement that I claimed. So using Parseval gives the claim. And in fact, and this is a slight quirk of the k equals 2 theory, it gives a very specific type of nil sequence with chi of m equals e of n theta. So this is a very specific type of class 1 nil sequence. And what's happened here is, well, this is a nil sequence with a vertical frequency. I introduced the notion of vertical frequency before. And while the inverse theorem is true with the ostensibly stronger statement that chi has a vertical frequency, and you just get that for free from the inverse theorem as I've stated it, plus a decomposition of chi into nil sequences with a vertical frequency. So somehow those two steps get concatenated together inside this proof. Anyway, so the point is that this is a relatively straightforward fact. And then the general case is done by induction on k. For the general case, we work by induction on k using the crucial relation, which I'm pretty sure I mentioned before. So using the fact that the UK norm, or the UK plus 1 norm to the power 2 to the k plus 1, is an average of UK norms of derivatives of f. So it's the average over h of the derivative, multiplicative derivative to the power 2 to the k, where here delta hf of x is f of x f of x plus h. There are slight technical inaccuracies in what I've written here to do with the fact that this range, the fact that the interval 1 up to n is not quite a group. But that statement would be literally true if I was working in the cyclic group in which I've embedded this interval. So if you have that fact, then it follows fairly easily. Hence, if the garras UK plus 1 norm is at least delta, then that implies that many of the derivatives of f have a large UK norm. So you'd get a statement of the form bigger than delta to some constant. I didn't work out what the constant is. No worse than 2 to the k, but probably better than that for many values of h. And then, well, if you have a working by induction, so you can conclude from this that this derivative correlates with a null sequence of class k minus 1. So this implies that the average of x of f of x, f of x plus h bar, chi sub h of x, is bounded below by some constant for these same values of h. And then, well, so all of that's relatively straightforward. The difficult bit is then integrating this assertion about derivatives to get an assertion about the function itself. So to get some kind of feel for what that task might involve, then we then need to integrate this statement. Well, let's consider a very simple example. So suppose that the function f I started with was just a quadratic phase. So suppose that f of x is e to the 2 pi i theta x squared. Then what could I take chi to be inside this statement? So I can take, I believe, that I can take chi sub h of x to be e to the 2 theta h x. And then actually, the left-hand side here would just be, it would actually have absolute value 1. So it's just a constant, depending on h. So you can see that in this example, this is by no means, the way that this varies with h is by no means arbitrary. So this varies linearly in h. And one would expect something like the same phenomenon to be true generally, at least for the top order terms, whatever that means of this nil sequence chi sub h. So we hit upon the idea, suggests the idea of trying to establish some kind of linear behavior in chi sub h. So we need to, we're going to have to do that. Well, actually, this has another property, which is that it's symmetric in h and x. And it turns out that you also have to establish some kind of version of that property as well. So also, this comes a bit later down the line, so symmetry in h and x. So in other words, you've got to gather evidence that this chi sub h of x is behaving as it should if it really is a derivative of something that's of class one greater. So how is this done? Well, the argument is a beautiful argument due to Gauz that gives you a handle on this. And actually, what I'm going to show you is a simplification of Gauz's argument, which reduces some of the magic of it, I think. So let's take that assertion. Let me give that a name. Actually, well, there should be absolute values around there. But let me just ignore that. And in fact, if you just twist the whole thing by an appropriate phase, you can ignore that. So I'm going to ignore them. And from star, I just get that the average over h and x of f of x plus h times chi of hx is bounded below. And now make a change of variables. Let me just be doubly sure which one. So I'm going to put x equals m and x plus h equals m. I suppose I didn't really need the first change of variables. And then what you get? That implies that the average over n and m of f of n, f of m, chi sub m minus n of n is bounded below. And again, I've been a little bit vague about the ranges that these things go over. But that's a purely technical matter. So now we can apply the Cauchy-Schwarz inequality. In fact, the way in which Cauchy-Schwarz is applied is very similar to the way it's applied in many other parts of the theory. But I'm not sure I ever gave any details in any particular case. And maybe I'm not going to give any details now, in fact. But I think it's sort of clear what you do. So you're going to apply Cauchy-Schwarz first in the m variable. And that allows you to delete f of n because it's bounded. And then square what remains. And then I apply it in the m variable plus a new dummy variable that came out of the squaring. And that allows me to delete these ones as well. So by two applications of Cauchy, Cauchy times 2, that implies that the average over m m prime n n prime of chi m minus n n times all the appropriate dashed expressions with appropriate bars is bounded below. And what you can see about this is it's a statement only involving chi. So f has disappeared. So note that f has disappeared. And that's good because I want to prove some statements about how chi varies with h. Unfortunately, it's not the most pleasant looking statement to have to deal with. But I can make it look a bit more pleasant with another substitution. So let me remember what that is. So if I set n minus n equals h1 minus n h3, h4, and h2, this expression then becomes that the average over h1, h2, h3, and h4 times the average over n of chi sub h1 n chi sub h3n chi sub h2 n plus h1 minus h4 chi sub h4 of n plus h1 minus h4 is bigger than 1. And actually, there's a constraint on these h1, h2, h3, and h4. They must satisfy this additive relation. So it may not actually look any better than what I had before. But let me show you now what happens in a specific case. So suppose that we were in the case of the u3 norm. So suppose that k equals 2. So when we did the induction, the derivatives would all correlate with a linear phase, because that's the u2 norm. And so in that case, chi sub h of n is e of theta hn. And if you substitute this in here, well, you get a geometric series. So substituting in and summing the geometric series. Well, what you get is, so the phase will be just theta h1 that should be some bars. So theta h1 minus theta h2 minus theta h3 plus theta h4 is the phase that you are averaging over. And when you sum that GP, it's negligible unless that phase is very, very close to 0. So if you do this calculation, what you get is many, so more than a constant times n cubed, quadruples h1, h2, h3, h4, which are what we call additive quadruples. So with h1 plus h2 equals h3 plus h4. So that's called an additive quadruple. And with a very similar property for theta. So theta h1 minus theta h2 minus theta h3 plus theta h4 is basically bounded by 1 over n. These symbols here contain dependence on the delta. And actually, I'm not entirely sure I've got my signs right. Let me stick with that for now. So this now looks a little bit more suggestive. I hope you agree. It's sort of suggesting that theta, well, it really does look a little bit like a homomorphism. A genuine homomorphism would take a relation like this and preserve it and would send it to 0. So it suggests that the map h, maps to theta of h, is behaving like a homomorphism. Well, that's good because I wanted to show precisely that the variation of theta h in h is linear. But unfortunately, it's not, well, actually fortunately, because otherwise, the whole theory would be wrong. This does not imply that it's a genuine homomorphism. So there are examples of maps like this which satisfy these two properties, but which is not a homomorphism, would be a map of the form. So theta h is something like alpha brackets beta h, where this bracket is the fractional part, as usual. So why would that satisfy that property? Well, restricted to the set of h for which this fractional part is less than 1 tenth, let's say, this then is a genuine, this will be a homomorphism, because the fractional part is additive for such small fractional parts. But it's not a globally additive function. So this is not a genuine homomorphism. Now, just examples is not good enough. But it turns out that there's a converse to that example. So actually, more or less, those are the only examples. And that's the key, that's the heart of the proof, actually. So a relatively deep fact that these are the only examples, these and related examples are the only ones. So to state a precise theorem, we have theta h would equal a sum of a few of these bracket terms. And then you have to allow an error tolerance as well, because the assumption has an error tolerance, where d here is bounded. And this is just for a large fraction of h. So this fact, well, morally, is in the work of Tim Gower's on this topic. And it uses tools from additive combinatorics, which I don't want to go into in these lectures. So proof uses what's now called the Baloch-Sameradi-Gower's theorem, and Freiman's theorem, and some facts from the geometry of numbers. Well, I guess those are inside the proof of Freiman's theorem. So it's quite a nontrivial fact. I have a sort of sketch proof of a more direct deduction of this form from this, which I haven't fully worked out the details of yet. But it seems as if this is such a natural statement, one shouldn't have to go through all of that machinery to get there. But anyway, this is quite deep. And well, something that may strike you, if you've remembered the first couple of lectures, is that this is not the first time we've seen these bracket objects. So the brackets should remind us of something we've seen before. So the brackets remind us of the Heisenberg group. And indeed, if you have a derivative, so if chi sub h of n varies in this bracket way, then it is, in fact, the derivative of something coming from the Heisenberg group if you also establish a certain symmetry property. So it turns out we can construct nil sequences coming from, in fact, you need products of Heisenberg nil sequences, which have, so let me call this chi, such that chi sub h of n is equal to e of this bracket. But as I say, you can't always do that. You need to establish a sort of symmetry statement as well. So we need a certain symmetry of chi sub h of x in h and x. And it's actually not too easy to even state what that should mean, even the statement of which is tricky. But that's a sketch of how the proof of the inverse theorem for the U3 norm would go. And while it's complicated, but the other thing I want to emphasize about it, and this is one reason I'm giving these lectures and writing the associated notes, is that this proof is just the wrong proof in that the Heisenberg group, which is this very natural nil-putton group associated with the problem, is constructed in a totally ad hoc manner by, first of all, finding some bracket expressions like these. And as I said earlier in these lectures, somehow one feels one should never really be working with bracket expressions if you can help it. It's the nil-putton objects that are more natural. And then you just construct by inspection the Heisenberg, the nil sequence on the Heisenberg group, which gave rise to those. So it doesn't seem natural. And well, I would like somebody to find a more natural argument. There's a different approach to all of this by Segedy. So it's the 2 by 2 Heisenberg group here? This one, the 3 by 3. Yeah. So you could make each one of these from a copy of the 3 by 3 Heisenberg. You can prove, actually, directly that's, in a sense, no surprise. You could prove just by looking at doing some simple computations with class 2 le algebra that you can make every example from a Heisenberg example up to some small errors. So the Heisenbergs, you know in advance that they're going to be enough. You don't need that the alienization has a large dimension, bigger than 2. Well, the abelianization of a product of D Heisenbergs has dimension 2D. So you're happy taking the products. You need to take products sometimes. Yes. You could also realize this is coming from a larger dimension class 2 group that is not a product of Heisenbergs. There's a certain amount of flexibility. It's not a unique representation in each case. So the dimension here is related to the D or the Heisenberg? Yes. So this D, we have to tolerate this D. Roughly what happens is, so here's a one dimensional bracket homomorphism. And this will be a homomorphism maybe half the time whenever the fractional part of this is less than one between minus a quarter and a quarter. And if you have D brackets, this should be a homomorphism of fraction 2 to the minus D of the time. So you certainly need to let D grow like log of 1 over delta. You have to let it grow. This is a big difference between the class 2 and the class 1 theory. In the class 1 theory, you could always just use these objects on circle R mod z. And here you need to let the dimension go much higher. So I was going to say that there is another approach to these theorems by Segedy and Camarena Segedy, which is quite different. And in a sense, it's more natural because you see a nilpotent group in a more conceptual way. But it's somehow equally difficult in the details to this. It doesn't provide a, I think both arguments are not the right argument for the proof of the inverse theorem. There are ways. So in the reading case, there is a free extension. Or you can decompose it into a sum of natural sum. So is there a way that something similar might take place and that also you can decompose? Well, this is a kind of a dream of mine that one could find an appropriate Fourier analysis of these class 2 objects. But it would have to be very exotic because you don't have your gamma mod g is not given to you to begin with. So you can't just decompose L2 of that. And in fact, you know that you are going to need to consider many of those g mod gammas. You have to allow arbitrarily large dimension. So this already, I have no real ideas of what such a theory could look like. But I do believe there's something to be said that's different to this inductive approach. This was just the U3 norm for the, so one respect in which the analysis here was very straightforward is that I got to just sum a geometric progression in this part of the argument. And for the U4 norm, I won't get to do that. I'll now have an average over objects which are themselves class 2. And to evaluate those, you need to use the theory that I mentioned last time about equal distribution of nil sequences. And it's much more difficult. And then it gets still harder for the U5 norm and higher. So Tim Gowers did prove a statement for all of the UK norms. And I want to state what he proved. So Gowers local inverse theorem. So this dates from about 1998. And this is also very difficult, but it is, it gives, I think it's easier, at least in the U3 case, and also gives better bounds than the full inverse theorem. So the local inverse theorem says the same thing. So suppose that the f in the UK norm is at least delta. Then, well, he doesn't get f correlating with an object on all of the interval 1 up to n, but rather some local correlations. And in fact, if you're formulating things that way, those, the functions that you correlate with locally can just be the constant function. So we can partition 1 up to n into progressions of roughly equal lengths. So I mean, all of their lengths are within a factor of two of each other or something. And the lengths are all at least n to some power, some small power depending on delta. So n to the delta to some constant depending on k. The dependence here is also quite reasonable, but I won't mention that. So they're much smaller than the original interval you started with, but they're still of size tending to infinity with n. This is definitely not the same d, by the way. Let me call that m instead. Such that the average of f over those progressions is bounded away from 0. So it's much less information. It's just very local correlations. Now, it seems as if this is a stronger theorem in that you get local correlations with just a constant function. But actually, this can be really relatively easily deduced. So it's a relatively easy consequence of the global inverse theorem, which I've just been calling the inverse theorem. And the reason for that is that any nil sequence is actually locally constant. So any nil sequence phi of p of n can be shown to be locally constant on progressions of that length. Although I should say, I mean, I'm not claiming that Gower's result is a trivial consequence of some result that has been established by Tau and Ziegler and myself. Rather, our result requires all of the techniques, essentially, that Gower's used, and then some more. And actually, it also gives weaker bounds. So if you don't care about bounds. So well, for anybody who's done a little bit of analytic number theory, I'll leave it as an exercise for you to understand why, say, e of theta n squared is locally constant on progressions. I'm seeing as I'm setting it as an exercise. Let me be precise about what I mean by that. So i.e., we have n can be partitioned as a union of progressions, each of length, at least, n to some small constant, such that on each of them the variation of e of theta n squared, the diameter sub pi of e of theta n squared is less than n to the minus a constant. So that's the kind of statement that's true more generally for nil sequences. But in this case, you could establish that by standard techniques involving vials inequality and so on. It's not easy, but it's classical. OK, so are there any questions on what I've said so far? So Garos has two papers. Yes, so Garos has two papers. There's the paper which I feel that everybody should read, masterful paper on four-term progressions, where he proves, basically, this theorem for the U3 norm. And actually, he only states it. He doesn't state it quite like this, although it is in the paper. He has some additional quadratic phase here. But then if you apply this observation, you actually just get on some smaller progression as a local constantness. And then he has a much longer paper. So that paper is only 23 pages, a truly remarkable piece of work. And then the longer paper of 129 pages is doing the general case. And he runs into difficulties that are somewhat parallel to the difficulties that we have with the U4 norm and higher. His motivation, I'm going to tell you that now. So Garos' motivation was Semerade's theorem. So this is following Garos. So I stated Semerade's theorem in the first lecture. Let me just remind you. So Semerade states that if A is a subset of 1 up to n, size alpha times n, then A contains a k-term arithmetic progression. Provided n is large enough, so n naught of k alpha. So Garos drew inspiration from an argument of Klaus Roth in the case of k equals 3. And this is called the density increment strategy. So the idea of the density increment strategy is to prove a dichotomy. So you establish a dichotomy, and the dichotomy is of the form either, well, two things happen. So either A contains many k-term progressions, or you have a density increment. So you can pass to a subprogression on which the density of A is increased. So there is a subprogression p of n of size bigger than n to some constant to the alpha to the constant, such that the density of A on p is bigger than alpha by a substantial amount. So there's at least alpha plus alpha to some constant. So the constant will depend on k here. So if you establish such a dichotomy, then you can do an iterative argument. You start with A. If it contains many progressions, then you're finished. Otherwise, you pass to a subprogression, and then you apply the same argument to that. So either that contains many progressions, or there's a further subprogression, and then you repeat. But you can only repeat a bounded number of times in terms of alpha before you end up with something ridiculous here, because the density is always at most 1. So you apply this iteratively. So an iterative application of this, as I just said, I won't write it all out, gives the conclusion. And actually, if you figure out what the dependence is with so long as n is bigger than a double exponential of 1 over alpha to some constant, you may ask where a dependence could come in. And the answer is that actually there's a third sort of slightly trivial. You can't really have a third element of a dichotomy. Let's call it a trichotomy. So there's a sort of trivial option, which is that n was too small. n is tiny. So that's how the loop might fail. So that's the strategy. And to establish this dichotomy, what you show is that if this fails, if this fails, if fails, then essentially the characteristic function of A has a large Gauss norm. And I'm dealing with k-term progressions, so I should care about k minus 1 norm alpha to some constant. And I will just mention why that's true in just a second. And then that implies that you have a density increment by the local inverse theorem. So that's how the deduction goes. This bit is actually quite easy. You sort of almost can see that the statement that comes out of the local inverse theorem, namely the average of this characteristic function of A minus alpha, while if its average on a progression is large, that's precisely the same thing as saying that the density of A on that progression is large. And there's a trick to get rid of the absolute value signs, which you need to do. Any questions on that? So how does the first step that I mentioned there go? It's a good job I'm only giving six lectures. I've more or less destroyed all of IHS's chalk to tiny bits. Yeah, that goes via what I call the generalized von Neumann theorem. So recall the generalized von Neumann theorem. So if I write T sub k of f1 up to fk to be what I get by counting k-10 progressions with these functions, then I mentioned that this is bounded by the Gower's norm. So Gower's norm of this. Sorry, this is bounded by the Gower's norm mod of this. Of any chosen one of those functions fi. And as I mentioned before, this itself is, you can't use this immediately to count the number of progressions in a set. But if you're just slightly clever about it, so we'll apply this and we'll split the characteristic function of A as alpha plus f with f of course being, f is what we call the balanced function of A. And then just by multilinearity, the number of progressions for T of 1A to 1A is going to be alpha to the k plus a bunch of terms, the final one of which will be a one with all f's in it. But they'll all have at least one f inside them. And each of those terms is bounded by the Gower's norm of f. So this will equal alpha to the k plus big O of the Gower's norm of f. And at this point, the dichotomy is clear because the left-hand side is counting the number of progressions of A. If it's small, well then this term here would have to cancel with this term here. So alpha to the k is not small. It's not 0. So the only way you can have no progressions in A is if the Gower's norm cancels the main term. Any questions on that? So this easily implies the dichotomy. So that's all I'm going to say about the Gower's proof of Szemeredi's theorem. So what I want to do to finish today is talk about a more exotic variant of Szemeredi's theorem that I mentioned in the first lecture that really uses much more comprehensively all the theory of nilpotent groups. So let me state the theorem again. So this is a theorem of myself and Tao from 2010. It's an analog of a theorem that had previously been established in ergodic theory, although the conclusion there is, quantitatively, it's weaker, but it's analogous, by Bergelsen-Hosten-Kra. So it's a sort of strengthening of Szemeredi's theorem for three and four term progressions. So suppose that A is a subset of 1 up to n of size alpha n. Then there is a common difference, d, not 0, such that A has at least alpha to the k minus little o1, as n tends to infinity, times n, k term progressions of common difference, d. And this is only if k is 3 or 4. Well, it's also true if k is 1 or 2, but I think I can leave that as an exercise. And it's false. An example of Ruzer shows that it's false if k is 5 or higher. So it's a strange kind of theorem. And this really uses, as I say, some structure of nilpotent groups. I can only really sketch how this goes, just to give you some idea. So the first key ingredient of this is something that I've not had time to tell you anything about. And this is something that we have given a slightly strange name to. It's called the Arithmetic Regularity Lemma. I'm not going to state it precisely, because it's not particularly straightforward to. But what it states is that an arbitrary function, so an arbitrary bounded function from 1 up to n to the complexes, can be decomposed. Well, we tend to write this as f sub nil plus f sub small plus f sub uniform. But the basic idea is that the f sub nil is a nil sequence of class k minus 1. It's actually, I tell you what, I should be more disciplined about whether I use k or s. I mean, it's just a dummy variable, but I've used k should be the length of the progression, in which case I'm interested in Gower's k minus 1 nought. So let's suppose this is a nil sequence of class s. So fix s bigger than 1. I can decompose an arbitrary function into a nil sequence of class s, plus some things that really don't concern me too much in a typical application, although it can be a bit annoying. But this is small in l2, small in little l2 of n, which for practical purposes means you can always ignore it. And this one here is really small in the Gower's us plus 1 norm. Now, that certainly doesn't mean it's small in l infinity. So a random plus or minus 1 function could be this bit here. In fact, if f is a random plus minus 1 function, this decomposition would look like 0 plus 0 plus f. But this bit can be ignored if you're doing things like counting progressions. So if s is equal to, so if I'm counting k 10 progressions, I want the Gower's k minus 1 norm. So if s is k minus 2, this means I can ignore this when counting k 10 progressions. So one way to think of this, I mean, of course, there is a precise statement, but you can think of it as, if all I care about is counting arithmetic progressions, then I can assume that every function is a nil sequence of class k minus 2. Now, the proof of the regularity lemma involves an iterated application of the inverse theorem inside what's called an energy increment argument. So you just keep applying the inverse theorem and eventually you spit out a decomposition like this, because something called the energy has gone up every time you do that. Much like the density increment argument, that can only happen a bounded number of times. Another way to think of this is in the class 1 world of Fourier analysis, this is a bit like decomposing a function into its big Fourier coefficients and the rest of the Fourier coefficients. So suppose I've applied that regularity lemma to my given set A. And actually, well, to 1 sub A. And actually, I'm going to simplify quite a bit more than that. I'm going to just assume for the discussion, let's just assume that 1 sub A of n is equal to the characteristic function of some nice set S of P of n, where S is a nice set, just like an open ball, an open set, not a very complicated open set in the Heisenberg. So maybe I should say, I'm going to prove the theorem that I stated in the case k equals 4. And in the case k equals 4, I apply this theorem when S equals 2. Because then this error term will be small in the Gower's u3 norm. And it's terms that are small in the u3 norm that can be ignored when counting four-term progressions. So this is an oversimplification of the truth. I mean, whilst I will have a class 2 nilpetent group, it may not be the Heisenberg. And nobody told me that the function that I take on the Heisenberg is just the characteristic function of a set. In fact, that's not even a bona fide nil sequence, because I defined them to use smooth functions on G mod gamma. But for the discussion, let me consider that. And let's suppose that P, so P of n is just a sequence. I don't really mind what the top term is, just something. And suppose that alpha and beta are highly independent over q. So 1 alpha and beta are highly independent over q, which means by what I said last time that P of n is very equidistributed. Highly, I mean, it's not completely equidistributed because it's a finite set. But it's pretty equidistributed in gamma mod g. Then I stated last time, well, let me give there are two steps. So the number of 4 term progressions in A, this is a weighted version of that quantity, but still. Well, by my assumption, well, actually, this is equal to the number of 4 term progressions involving S. But I said last time that the distribution of this top So P of n plus d, P of n plus 2d, P of n plus 3d, that does not equidistribute over g to the 4. It's equidistributed inside something called the whole Petresco group. So it equidistributes in a subgroup of g to the 4, mod gamma to the 4, called whole Petresco, whole Petresco, 4 of g. And I didn't give the general definition of what that group is. I gave one example in the Abelian case. But in this case, which here is going to be the set of x0, x1, x2, x3 in g to the 4. So that's in Heisenberg to the 4, which satisfied that their Abelian parts are in 4, 10 progression. So here, pi is the map, just the natural projection to the Abelian part, which is a two-dimensional r2. And the vertical components, z of x0, z of x1, z of x2, z of x3. Let me be precise about this. If g is 1, 1, 1 uvw, then I'm writing pi of g is uv, and z of g is w. So the central parts are not in arithmetic progression, but they do satisfy a constraint. So the constraint is that minus 3z x1 plus 3z x2 minus zx3 equals 0. So the example I gave of a whole Petrusco group last time, which was for the reals, r, with a funny filtration, we only saw this relation here. But now in the Heisenberg, there's a kind of Abelian bit and a central bit, Abelian projection and an Abelian subgroup. Now I think I'm going to draw, well, what did I have on r? So well, you can kind of depict the situation like this. This is grossly inaccurate in some ways. So here's the Abelian part of the Heisenberg no manifold g mod gamma. So this is supposed to be gamma mod g mod the commutator, which is isomorphic to r mod z squared. And gamma mod g itself sits above there. And I certainly do not have the ability to draw this accurately. So let me just, it's a three-dimensional object that is circle fiber above. So these are supposed to be somehow circle fibers above here. Now, a typical 4-term progression will look like the image of a 4-term progression under a polynomial map will look like a 4-term progression down here. And then in the vertical fibers will satisfy this additional relation, this relation here. So it won't look like a 4-term progression in the fibers. That doesn't even quite make sense. But it will satisfy a constraint there. Now, what I haven't done yet is pick my common difference. The theorem that I'm trying to prove is about a specific d. So I'm going to choose my d. And in fact, to make the whole argument work, I need to choose many d with this property, average over all such d, such that, well, which don't move too much in the horizontal direction. So that alpha d and beta d are pretty much zero. So remember, alpha and beta are what occur in the polynomial map p. So a 4-term progression, if I take a 4-term progression with that common difference, that n, n plus d, n plus 2d, and n plus 3d is a 4-term progression with such a d as a common difference. Then it doesn't do anything horizontally down in this torus here. Because d, alpha d and beta d are very, very small. So it will basically just look the four points down here will be very, very close together. And then there will be, their images will be four points up here that are essentially in the same circle fiber. So p of n plus d, p of n plus 2d, and p of n plus 3d is essentially constant horizontally. They're all essentially the same under pi. So if I take a point downstairs here, let me call it t, there'll be a circle above it. And remember, I'm working with a set s in the Heisenberg group, so s will be some set. And it will intersect that fiber in a slice, st. So st is supposed to be, st is equal to s intersect circle fiber above t. So the number of 4-term progressions in A, for which the common difference d is essentially constant downstairs, is essentially the number of solutions to this equation in these vertical fibers. So let me try and write that down. So the average, the number of 4-term progressions, and this is only over those special d, so d special, is roughly equal to the average over t of the integral over all solutions to z0 minus 3z1 plus 3z2 minus z3 of the characteristic function of this slice, st, z0, z1, z2, z3. So that's just another way of, that's a sort of formalization of what I've been saying. If you have a progression that's essentially constant in the base, that's the same thing as a solution to a certain linear equation in the vertical fibers. And I've just said, take all the progressions which are roughly constant in the base, and then it's equal to the second thing that I just said. So the point is that this has a special property, special positivity property. So you can actually write this as the average over t of a certain convolution square 1s sub t star 1s sub minus 3st of x squared dx. And by Cauchy-Schwarz, that's at least average over t of the integral st star minus 3st x dx all squared. But this is just precisely the measure of st to the power 4. So that's the average of the measure in the vertical fiber of st to the fourth. And by Holder's inequality, that's at least the measure of s to the power 4. So these st are just slices of the set s. And so if I take their fourth powers and average them, I get at least the fourth power of the measure of s. So that's Holder. But a is just the set of values for which p of n lies in s. And p of n is equidistributed. So that's just alpha to the fourth. So that's as well. It's not an easy argument, but that's an explanation of why this theorem that I wrote there is true. And the key thing that I want to emphasize is that there is a certain positivity property that is associated to this equation here. So if you have a set of measure delta, the measure of solutions to that equation is at least delta to the power 4. And if you try and do this 4k equals 5, everything looks similar, except downstairs you would now have a two-step, no manifold. And the equation in the vertical fibers is no longer this one, but it's the next row of Pascal's triangle. So it would be z0 minus 4z1 plus 6z2 minus 4z3 plus z4 equals 0. And that equation does not have the same positivity properties. So you certainly can't use this trick of writing it as a convolution square. And indeed, there are examples of subsets of the circle of measure delta for which the measure of solutions to this equation is less than delta to the 5. And it's really that that's underlying Rij's construction. So he first constructs a set with too few solutions to that and then defines his A as the set of return times of a cubic to that set. OK, so that will be it for today. And then tomorrow I'm going to talk about how to apply these results to the prime numbers. So how does it fail? So this would be gays in the larger than 5. It's not half of a cube to the 5, but it's not small. It actually would fail very dramatically, I think. Yeah, I don't think it's even alpha to the c for any c. It's a little bit like the Beren construction three term progressions, essentially. So constructing sets with no solution to this is quite a similar task to that. It's very related to the phenomenon that you can have sets of density alpha with less than alpha to the 100 times n squared three term progressions. Can you say some more about this decomposition? So this is an ingredient for what you will use tomorrow. I will not use this tomorrow. This theorem, I don't know if there are any people here who know about graph theory and that sort of combinatorics. This is quite analogous to the Samaritan regularity lemma, which can be thought of as a way of decomposing an arbitrary graph into structured pieces plus a pseudo random part. Actually, more accurately, this is analogous to the hypergraph version of that statement. It comes with very, very bad bounds, that's the problem. And the reason is that for it to be useful, you need to have a growth function in the statement. The statement would be choose a growth function, which would typically be the exponential function or something. And then you can make this piece smaller than 1 over that growth function of the complexity of this piece. If you don't do that, then you don't get to ignore this term when you're counting progressions. So that's why it's kind of complicated to even state it properly. It's like the traditional regularity lemma for graphs is about what are called epsilon regular pairs. And actually, there's something called the strong regularity lemma, which this is equivalent to, which is best seen as a statement more like this. So you can actually find in various notes of Terry Tau, the regularity lemma stated precisely like this for graphs. So the structured piece is like this here. The structured piece. And then you'll see things. And here. So you're saying this is the use of the regularity lemma. It's not. It's proved by a similar argument. And both arguments can be seen as an effectivization of projection in Hilbert space. So what you really want to do is decompose. You've got L2 of n. And sitting inside there, you have something like L2 of g mod gamma, something like this. But a quantitative version of that. A quantitative version of that. And you want to project. Well, you don't have L2 of g mod gamma. You have the space of nil sequences. But if you're working with nil sequences of fixed complexity, this is not an algebra. But this is somehow a proxy for projection onto that factor. And actually, the proof is closely analogous to the proof of the projection lemma in Hilbert space, where you just keep decreasing the length. That's the same proof, essentially.