 Okay. So today we begin by completing our study of the space of invariant and ergodic measures. So again, let me recall we're talking about a compact metric space, the Borel sigma algebra. We have a map F from M to M continues. We have the space M of probability measures, of Borel probability measures on the space with the weak start topology. This is a compact set. We have M F to be the set of mu of F invariant probability measures which we proved in the last lecture is also compact and convex and non-empty if F is continuous. And then we have the space, the set of ergodic measures, mu in M such that mu is F invariant and ergodic. And today we will prove that this set is non-empty and more specifically that mu is ergodic if and only if mu is extremal point. So it is curious that mu has this kind of, if you remember the definition of ergodicity is purely dynamical. There's nothing functional analytic. But it turns out that this dynamical definition has a correspondence meaning in the functional analytic sense in the space of invariant measures. So remember that an extremal point is a point that cannot be written as a linear combination of two or the two or the different points in the space in the set. So one direction, both of the directions are non-trivial. Suppose first that mu is ergodic. Suppose mu is ergodic. So we will show that it is not an extremal point. In other words, we will show that we can write it as a linear combination of two other invariant measures. So by ergodicity, sorry, sorry, I'm sorry, I'm going to go the other way. Suppose first that mu is not ergodic, sorry. We're going to show that it is not an extremal point. That's what I, so if it's not ergodic, what's the definition of not ergodic? It means that there exists a set A and M such that F minus 1 inverse of A equals A, F minus 1 of the complement of A equals the complement. So this is actually always true. If you have a set that satisfies this, you can do a little exercise and you can see that clearly the complement satisfies this too. And the measure of A is strictly between 0 and 1. So we're going to use this set. So what we have is our set M, okay. And then we have two sets, A and A complement, which are fully invariant by the dynamics, which means that all the pre-images of A stay in A. In particular, it means also all the images of A stay in A. You can check that also, right. This is a stronger notion than being forward invariant. So A stays in A in forward and backward time, AC stays in AC in forward and backward time. They both have positive measure. And now we can define two sets, two measures. Then probability measures mu 1 and mu 2 by mu 1 of B is equal to mu of B intersect A over mu of A and mu 2 of B equals mu B intersect A complement. Understand the definition of these measures? So we're basically taking mu 1 is the measure mu concentrated on A and normalized. So of course, if you take mu 1 of all of A, it will have measure 1. Mu 2 of AC will have measure 1. So mu 1 lives completely on A and mu 2 lives completely on the complement. And you just take the proportion of A gives the measure mu 1, mu 2. So it's easy to see that mu 1 and mu 2 are probability measures. And notice that we can write mu as a linear combination. So we can write mu is equal to mu of A mu 1 plus mu A complement mu 2, which is the same as writing T mu 1 plus 1 minus T mu 2 where T is equal to the measure of A, which is strictly between 0 and 1. So what does this mean? This means that if you take a set B, the measure of B is given by the measure of A times mu 1 because mu 1 is just, so if you want to check this, measure mu of B is equal to mu of A times mu of B intersection A over AC. AC, right? So this cancels and this gives mu of B intersection A plus mu of B intersection AC, which is clearly equal to the measure of B. This is the way I should check it. Simple calculation. So I've shown that it's a linear combination of two probability measures. Am I done? Is there anything else I need to check here? So I've assumed that mu is not a gothic and I want to show that mu is not an extremal point of mu F, right? So in other words, I'm proving it in this direction. I'm saying if it's an extremal point, it's a gothic. And to prove that I assume it's not a gothic and show it's not an extremal point. So I assume it's not a gothic and I write it as a linear combination of two probability measures. Am I done? No? Well, no, no, you just need to prove for one T that it can be written as a linear combination for the T strictly between 0 and 1. This is fine. This is a linear combination. So I've shown that it's not extremal. Extremal means that if you write it like this, either T equals 0 or T equals 1, okay? So what else am I missing? Mezza. What am I missing here? What else do I need to prove for this part? Sorry? I didn't hear the question. Okay. The question is what else do I need to prove? I assume that mu is not a gothic and I want to prove that mu is not an extremal point. So I need to show that it's a linear combination of two probability measures in MF. So I construct these measures and I show that mu is a linear combination. Am I done? Yes? Yes? Are measures mu1 and mu2, they are the right measures? Sorry? They are probability measures. Is this the special probability measures? Ah, so what else do I need to show? That they are invariant. I need to show that mu1 and mu2 are invariant, okay? It's not enough to show that this is a linear combination of two measures because that is easy. Remember, MF is a subset of the space of all probability measures, right? MF is just the invariant measures, okay? Remember that we have EF is contained in MF, which is contained in M. So it could be much easier to show that this is a linear combination of two measures here, even if it's an extremal point of this, right? So we need to show that it's a linear combination of two distinct measures here. So we need to show that mu1 and mu2 are invariant. So that's not difficult. We need to show it. So it remains to show let B in M be a polar measurable set and then what does it invariant mean? So we want to show that mu of F minus 1 of B is equal to mu1 of B. So we just check it. So how do you check something like this? Well, let's start using the definition of mu1. So the definition of mu1 is that this is mu of F minus 1 of B intersected A over mu of A. Very good. Very good. So now we can write this as mu of F minus 1 of B intersection F minus 1 of A. We needed to use some property of A, otherwise we couldn't do this. So mu of A because, right? So because F minus 1 of A equals A by the construction, by the definition of the set A. And now what? So F minus 1 of B intersection F minus 1 of B is the same as F minus 1 of B intersection. So we can write this as mu of F minus 1 of B intersection A over mu of A. Mu is invariant, exactly. So mu might not be ergodic, but it's invariant by definition. So by invariant, this is equal to mu of B intersection A over mu of A. And this is just equal by definition to mu of B. So similarly for mu2, and so mu1, mu2 are in MF. And so if mu is an extremal point, so if mu is an extremal point. So we've proved the first part, first one direction of these implications. Okay, any questions? So there's a couple of non-trivial steps here. The first one is here in which we used the definition of the set A, which goes into the definition of mu1. And the second one is here where we use the fact that mu is invariant, even though it's not ergodic. Okay, so now we need to prove, yes. Yes. Yes, thank you. Yes. This is the definition of mu1 of B, yes, thank you. Okay, now we want the other direction. We assume that it's ergodic, and we show that we cannot write it as a linear combination of two invariant measures. So now suppose mu is ergodic. And suppose by contradiction that mu is not extremal. Suppose by contradiction that mu is equal, that there exists mu1 and mu2 in Mf, such that mu1 is equal to T mu is equal to T mu1 plus 1 minus T mu2 for some T between 0 and 1. Okay, so then we will show that this implies, okay, it's not exactly by contradiction, sorry. We will show that if it's of this form, then this implies mu1 equals mu2 equals mu. So kind of trivial situation. We will show that this means the only way this can happen is if mu1 equals mu and mu2 equals mu, and then this is a trivial identity. We cannot do it for different measures mu1 equal to different mu2. Okay, so how do we do this? Okay, so this is a not completely trivial argument. So notice first of all, so we will do it for mu1. We will show that mu1 equals mu, okay? The proof that mu2 equals mu is the same. And in fact, it's anyway it's an immediate consequence once we prove this. So notice mu1 is absolutely continuous with respect to mu. What does this mean? Remember what absolutely continuous means? Yeah, so this means that if mu of B equals 0 implies that mu1 of B equals 0. And this follows immediately from this, right? Because this is just a linear combination. So clearly, if the measure mu is equal to 0, then you must have mu1 equals 0, right? And sorry, if mu of A is equal to 0, then you must have both of these must be equal to 0. So remember what we said about absolutely continuous in various measures. That absolutely continuous in various measures always have a density, right? This is a theorem called the Radon-Nickerdim theorem, a measure theory. So by Radon-Nickerdim theorem, Radon-Nickerdim theorem takes this mu1 has a density with respect to mu. Let's write it as h1. Sometimes it is written like this, d mu1 d mu. Such that the measure such that for all measurable sets B, we have that mu1 of B can be written exactly as the integral of B of h1 d mu. This is the Radon-Nickerdim theorem. So this is for every set B. So the statement that mu1 is equal to mu, this is true if and only if this density is equal to 1 almost everywhere with respect to mu. So mu1 is equal to mu if and only if h1 equals 1 mu almost everywhere. Because remember again this integral is just the graph under the curve, right? And so to say that this measure that you get is the same as mu, if this is the graph, this is our set M, this is the density h, right? If this density h is 1 almost everywhere, then the area under the graph, if you take a set here of a certain size, the area under the graph will be exactly the same measure as that set when you enter it. So this is what we want to show. This is enough to show that this density is 1 almost everywhere, mu almost everywhere. h1 equals 1 mu almost everywhere, which means there's a set of full, the only points where it's not equal to 1 will have mu measures 0, which means it does not affect the integral because it's a set of mu measures 0. Okay, so define equal to the set of points where h1 of x is less than 1 and c is equal to x where h of x is greater than 1. How are we going to show that h1 is equal to 1 mu almost everywhere? We're going to show that both of these sets have mu measures 0. And that will imply that it's equal to 1 almost everywhere. So I'm going to show it for b because the proof for c is exactly the same. So how do we show it for b? So we will show the mu of b equals 0 and mu of c equals 0. And this implies what we want to show. Okay, so how are we going to show this? Well, first of all, notice that by definition mu 1 of b is equal to the integral in b of h1 d mu. Okay, this is just the expression we have there. But now I'm going to split this set b in two ways. And I'm going to write this as the integral of b intersection f minus 1 of b of h1 d mu plus the integral of b minus f minus 1 of b. Okay, it's not immediately obvious where we're going now, but just check what I've done is I've split b into two disjoint parts. One is the part that is the intersection with f minus 1 of b and one is the part that is outside f minus 1 of b. A priori at the moment we don't know anything about b or f minus 1 of b or anything. We don't know the structure of any of these sets. But we know these are well defined and we know these are disjoint by definition and therefore this integral is equal to the sum of these integrals. And we do the same thing for mu f minus 1 of b. We write this as f minus 1 of b h1 d mu. And we write this, sorry, mu 1, thank you, mu 1. And we write this as b intersection f minus 1 of b of h1 d mu plus b. Okay, so I want to write it slightly differently. This is f, it's the same here, but differently. And this is f minus 1 of b minus b, h1 d mu. So I just write it slightly differently. And what do we know here? What do we know about these two measures? Mu 1 is invariant, so they're the same. Okay, so we know that these are the same. So since mu 1 is f invariant, we have that mu 1 of b is equal to mu 1 of f minus 1 of b. And so what do we know about this? That this sum is equal to this sum. But we also know that this is the same as this because this is exactly the same set here, even though I write it slightly differently. These are not the same set, but these are. But so this means, so we want to know whether these are the same sets or not. And in fact, this implies, so therefore we have that integral of b over f minus 1 of b, h1 d mu, is equal to the integral over f minus 1 of b, b, h1 d mu. So notice that the measure of these two sets, so the measure, the mu measure of b of f minus 1 of b minus b is equal to the mu measure of f minus 1 of b minus the mu measure of f minus 1 of b intersection b. Okay, just a little bit of splitting up the sets in slightly different ways. And this, now I use the invariance of mu and this is equal to mu of b minus mu of f minus 1 of b intersection b. And this is the same as mu b minus f minus 1. So the measures of these two sets are the same. They have the same measure. So we are integrating the same function on two sets that in principle are different, but they have the same measure. What is the value of h1 on each of these sets? What is the value of h1 on this set? What do I know about the value of h1 on this set? It's less than 1. What do I know about the value of h1 on this set? It's greater than 1, right? Because this set is inside b, this set is outside b. So here I'm integrating a function on a set where the function is everywhere less than 1. So how much is this integral going to be? Less than 1. It's everywhere less than 1. So this would be less than 1, less than the measure of this set, right? And what about this integral here? It would be greater than the measure of this set. But the measure of these sets is the same. So contradiction. So let me write this down. So by definition of b, we have that the integral b f minus 1 of b h1 d mu will be strictly less than the measure of b minus f minus 1 of b. And the integral of f minus 1 of b less b h1 d mu will be greater than the measure of f minus 1 of b b. This is a contradiction. What does it contradict? Sorry? Well, that's right. What does it contradict? What is it? Exactly. Exactly. It contradicts the fact that they have non-zero measure. It would be a contradiction if they had positive measure. Therefore they have zero measure, right? But this would lead to a contradiction if the measure was positive. And so we have mu of b minus f minus 1 of b equals mu f minus 1 of b minus b equals zero. And what does it mean? This means that b, this does not mean by itself that b has zero measure. It just means that the difference between these two has zero measure. The difference between these two has zero measures. So this means that f minus 1 of b equals b up to a set of zero measure. So even though it does not mean exactly that f minus b equals b, it means that the difference between these two has zero measure, which is what we just showed. So by ergodicity, so I know that when we defined ergodicity, we said that if the map is ergodic, it means that any set satisfying this has either measure zero or one. Remember, this is the definition of ergodicity. It turns out that this is simple to see that in fact this is true, even if this is true only up to set of zero measure because sets of zero measures don't really contribute anything. So by ergodicity of mu, we have that mu of b equals 1 or mu of b equals 0. So our conclusion of all this is that either b has zero measure or b has full measure by doing this calculation. So what do we get? So assume by contradiction that mu of b equals 1 and then we get 1 which is equal to mu 1 of the whole space m is equal to the integral over m of h1 d mu. And this is equal to the integral of b over b h1 d mu and this is less than mu of b which is also equal to 1. So let's make sure we understand all the terms in this equality. So we're assuming that mu of b is equal to 1. Why do we have this strict inequality here? Because on b h1 is strictly less than 1. So we have this strict inequality. Why do we have this equality here? That's right because we're assuming that mu of b is equal to 1. So integrating on b or integrating on the whole space is the same because you're integrating over set of full measure which is the only thing that counts. And then we have this just by definition of h1 mu 1 is just this integral and mu 1 is a probability measure. So the full measure is equal to 1. Full measure of the whole space is equal to 1. So mu b equals 0. Similarly, we show that mu of c is equal to 0. So h1 is equal to 1 almost everywhere which is what we needed to prove. Okay. Let's I think this is a good place to take a couple of minutes break. We've completed the part on the space of invariant and got it measures. So let's just take a couple of minutes. Okay. So this completes the first results about the structure of the invariant and got it measures within the space of probability measures. Of course, this is not the reason we're studying them. We're studying them because we're studying that dynamics, the motive, the definitions of invariant and got it because this has some dynamical implications. So this is what we're going to start now. What are the dynamical implications and why are we interested in invariant and got it measures? So probably the first interesting and significant result using the notion of invariant measure is a theorem called the Poincare recurrence theorem which was proved way back in the 1890s. So it's a really classical theorem, but it is still used very much in dynamical system. It's quite an amazing theorem in what it says about the importance of invariant measures. So as usual, so this is really very general. It doesn't even need to have continuous, right? So we just have M is some measure space is some measurable function. So by measure space, I mean a space that has some sigma algebra and a probability measure and mu is invariant with respect to F. Then for every A measurable, of course, with mu A positive measure for every set of positive measure, mu almost every x in A returns to A for some iterative, iterative, for some positive iterative. Absolutely, of course, yes. It can be arbitrarily large for different x. So all this says, this is our space M, this is our set A, and it says that almost every point after some time comes back to A. So the reason why this theorem is really remarkable, apart from the fact that actually the proof is very simple, it's just a few lines, is that it seems to have almost no assumptions and the conclusions are fairly abstract, but still they're quite remarkable because as you know, there are many systems that have no recurrence, right? Can you give me an example of a system in which this does not happen, in which you have sets without probability measures, just of some dynamical system in which points never come back? Yeah, if you have the real line and you just take x plus n and you move to infinity, or if you have just a fixed point and everything is converging to a fixed point, or there's many cases in which things don't come back, okay? But in those cases, you cannot have an invariant measure sitting on these regions of points that never come back. This is what this theorem says. So invariant measures is intrinsically related to recurrence. And recurrence is very interesting because that's where interesting dynamics happens, is when things come back all the time, like in irrational rotations or 2x mod 1. The interesting dynamics is when things come back. If everything just goes somewhere and never comes back, there's nothing interesting is happening in that region, right? So without really any assumptions, saying just the existence of an invariant measure means that things come back. This point might come back after 5 units, another point after 10, another point after the million. It could be unbounded the return time that points need to come back. So let me first make a comment that in fact a slightly stronger version of this theorem is that almost every point comes back infinitely often. The proof though is a little bit more complicated and I will skip it. There's not an immediate consequence of this because the fact that almost every point comes back does not mean that you can then apply it again because the image of all these points could have zero measure. So it could be that if you take a full measure set in A and you look at all the points that come back, where they come back, they might come back instead of zero measure of points. So they might never come back the second time. So the fact that they come back infinitely often is not a direct consequence of this. So you need a few more lines of proofers and you need to generalize the proof a little bit and you get this infinitely often. If you want you can try to do this as an exercise but I just wanted to remark that it's not an immediate consequence but it's true. So we will just show that it comes back once. So let's define a set A. Sorry, I've got my set A. So let me define A0 is equal to the set of points x in A such that fn of x does not belong to A for all n greater than or equal to 1. So what do we need to show about the set A0? Yes. Because this is the set of points that never comes back. So if we show the complement of this is the points that do come back. So if we show that this has measure 0 its complement inside A will have measure 1. We'll have full measure in A. Almost every point in A will come back. So we will show that the measure of A0 is equal to 0 and then we will have proved that theorem. So it is sufficient to show the mu of A0. For all n greater than or equal to 0 let an equals f minus n of A0. So they didn't hear you. Yes? Yes. Almost every point of A. So mu almost every x in A. So the set full measure set inside A. So whatever the measure of A is like full measure set of A. If mu A is positive, yes. For some positive iterate we look at the future. Yes, non-zero in particular, yes. So what is this? This is just the pre-image. So these are just the points that will land in A0. Now what are we going to do with this? Well, we claim that all these pre-images are disjoint. A n intersection A m is empty for all n m greater than or equal to 0 and n different from m. And why is that? How do we show that they are disjoint? Well, suppose by contradiction, suppose there exists some x in A n intersection A m. Then we would have f n of x would belong to f n of A n intersection A m. If x belongs to A n intersection m, I can just take the nth iterate. This means that f n x obviously belongs to the nth iterate of A n intersection A m. And I can write this by definition of A n and A m. This is f n of f minus n of A n intersection f minus n of A 0. And of course f n of f minus n of A 0 is just equal to A 0. And this is just f of n minus n A 0. So we suppose, I can suppose if these are different, I suppose that n is bigger than n. Just without loss of generality. So this means this is some positive iterate of A 0. Intersection A 0 contains a point so it's not empty. So A 0 intersection f n minus n of A 0 is not empty. And what's the problem with this? Exactly. This contradicts the definition of A 0. Contradicting. So what have we just proved? What have we just proved? Yeah, with this contradiction, what have I just shown? What has this proved? Disjointness. So we've shown this. So what are we going to do with this disjointness now? If they're disjoint, we can take the union of them. So all the sets of the form A n and greater than or equal to 0 are pairwise disjoint. So what do you want to do as soon as you get a set of sets that is pairwise disjoint? We take the measure of all of them and take the union. So let's take the union A n and let's take the measure of the union. And because they're pairwise disjoint, this is just equal to the sum of the measures. So each A n, this is equal to the sum of the measure of f minus n of A 0. And what is this measure equal to? Exactly. The measure mu is invariant. So this is equal to the sum n greater than or equal to 0 of mu of A 0. And how much is this measure? Exactly. So this is either 0 or infinity. If mu of A 0 is 0, this is 0. If mu of A 0 is positive, this is infinity. And what's the problem with it being infinity? It cannot be infinity because those are disjoint and this is equal to this and this is of course less than equal to 1. Because this is some set in our space and this measures less than equal to 1. So mu of A 0 equals 0. And this completes the proof of the Poincare recurrence theorem. So this is the first result about invariant measures, using invariant measures. And this is a basic result but it's still used very much. There are many situations in mathematics where you use the Poincare recurrence theorem to show that points come back close to each other. Because remember that you can take a very small set of positive measures. It means what you know is that if you take a really small set as long as it's positive measure, you know that points must come back to this set all the time. Okay, this is invariance. What about ergodicity? So that's enough proofs for today. But before we finish, let me state the fundamental theorem which we will not actually prove. But it is very important. So about 20 or 30 years, yes? Excuse me? Yes. A n? The last sentence? Greater than equal to 1. Okay, I'm sorry, maybe that's true. Why do you want n greater than equal to 1? A 0 is also disjoint. Everything should work with A 0 also. I'm sorry, I shouldn't have erased first, but I don't think it makes much difference. Okay, so there is a theorem that strengthens Poincare recurrence in a dramatic way. Much, much stronger and much, much more interesting. And this is where the notion of ergodicity comes in. So we want a much more precise description in terms of the invariant measure. So let me remind you that if mu is an m, so now we still have a general, so okay, we still can work very generally in measure spaces. Measure space, f, m to m, measurable, okay, mu. So for any probability measure, not even in the very measure, remember we define the basin of attraction of this measure is equal to the set of points x such that 1 over n sum phi composed with f i of x i equals 0 to n minus 1 converges to the interval of phi d mu for all phi. So remember we discussed the first couple of lectures. We discussed the fact that often that we're interested in whether this sequence converges or not. Okay, and if it converges, so this is the same as taking the sum of the Dirac deltas. Okay, the convergence in the weak start-upology to mu of the sum of the Dirac deltas in the iterates of x. So we're interested in whether this converges and to what measure it converges. Here we're given a measure and we want to know which points actually this time average is converged to the space average of phi because what this says is that in some sense the measure mu describes in space the weighting of the measure mu describes this orbit of the point x. Okay, so in principle there's no particular reason for this b mu to be non-empty. So the theorem we have is the following. Okay, so sorry to define the theorem the way I want to define it I will assume that this is actually a metric space. So it's just a little bit stronger. There's a technical reason for which this needs to be a metric space. So theorem, I will define it a kind of short version of this theorem. This is the so-called Birkhoff's Ergodic theorem. Birkhoff's Ergodic theorem and it was proved in the 1920s and it says let mu be an half invariant ergodic probability measure then the mu measure of b mu is equal to 1. So I try to emphasize at the beginning how this convergence was not something that you could take for granted and in general there's no reason why these time averages should converge to some measure. What this theorem says is that if you have an ergodic measure then almost every point with respect to the measure satisfies this convergence. So almost every point satisfies this. So once you have an f-invariant ergodic measures there exists a set of full measure whose dynamics is precisely described by that measure. So they all have in some way as we shall see only in some way the same dynamics, statistically the same dynamics because this is a limit in which you say that asymptotically the distribution of this orbit corresponds to this measure but it doesn't mean that these orbits stay together and do it all at the same time. You might have all these orbits they might do completely different things they might be in different places at the same time but on average they all do the same thing. Just like again to use the example of tossing a coin if I toss a coin and you toss a coin the heads and tails will come up at completely different moments for the two but statistically when we both toss them for a million times almost half of them will be heads and almost half of them will be tails. Exactly the same thing. So the dynamics of the orbit could be completely different but the distribution in space asymptotically is the same for mu almost every point. So in fact just to get a better feeling let me say a direct corollary of this. Corollary is that under the same assumptions for all measurable A in M the limit 1 over N of the proportion J between so what does this mean? You take a set A, you take an arbitrary point X and you look at how often so you take an N what does this mean? This is a set, a set of what? This is a set of indexes J for which FJ of X is in A and you take the number the cardinality of the set so you just count how many times you fall in A. If I'm tossing a coin I count how many times I get heads for example because I'm taking N iterations the maximum cardinality is N and the minimum cardinality is 0 of course and then I divide by N so what I have here is a number between 0 and 1 because I have always something between 0 and N here divide by N I get something between 0 and 1 so as N increases I get a different amount of time so I look at the iterate sometimes it falls in A then it comes out and then it falls in and then it comes out and then it falls in and I'm looking at the amount of time that it spends inside the set A asymptotically the average amount of time that it spends inside the set A so what this says is that for every measurable A this is true and I need to add one more thing for mu almost every X not for every X of course so the proof is just an immediate consequence of this so how do I prove this? what I know is that almost every point for almost every point I have this convergence this is Birkhoff's ergodic theory and how does this convergence relate to that? it's very easy change phi to the characteristic function exactly so let phi equals the characteristic function of A then by Birkhoff's ergodic theorem for almost every X we have that one we have that this 1 over N cardinality in 1 j that fj x belongs to A is exactly equal to 1 over N sum from so I indexed them slightly differently here so I'm sorry maybe I should have just written 0 to N minus 1 to make it more consistent but it's exactly the same thing of course 0 to N minus 1 is equal to the characteristic function of A sum from I equals 0 to N minus 1 composed with f i of X so this is just exactly a different way of writing this so here I'm just counting I look at f i of X this characteristic function is either 0 or 1 depending on whether f i of X is in A so if it's in A this is 1 so all I'm doing here is exactly counting the number of it as where I belong to A and dividing by N and by Birkhoff's ergodic theorem this converges exactly to the integral of the characteristic function of A d mu and this characteristic function of A is equal so in fact most of the time this is really the way you think about Birkhoff's ergodic theorem it's more general because this convergence happens for all continuous functions in fact so this is what the price I pay for not stating Birkhoff's ergodic theorem properly is that of course this characteristic function is not a continuous function but Birkhoff's ergodic theorem says that actually this convergence happens for all measurable functions so I'm sorry I wanted to state a kind of more compact version of Birkhoff's ergodic theorem so let me just leave it at that so let me tell you that Birkhoff's ergodic theorem actually says that for almost every point this convergence happens for every measurable function so it's more general than what I wrote okay I thought that that would be sufficient for what we want to do but of course I forgot that I want to use this application and this is not a continuous function so it works it doesn't work okay so for any measurable F yes there's no it's a completely measure theoretic theorem it's a theorem that works in measure spaces for measurable maps with invariant measures and that's why it actually works for measure L1 functions not just not necessarily continuous functions okay so there was some reason I wanted to state it like that because it was cleaner to state so the key point is that here is that this limit does not depend on the point X of course right this is this is one of the key features of this result is that in general the orbit what I was saying before I'm just going to repeat it if you start with a point X or a point Y they might have very different dynamics they might go in very different places at very different times but if you look at the statistics of the orbit in other words you only look at something like this the frequency of time which you're going to A then almost every point has the same frequency of visits to A you can in some sense think of this as the probability of A and this is exactly where the connection between probability that's why we call it the probability measure right so if you just don't know if you wait lots of times and then you pick one iterate the probability that my point is in A the probability is exactly the probability of A that's the probability that your point is in A so it's a probabilistic description of the dynamics it says we maybe cannot describe exactly where you are at every point but we know that if you wait for some time the measure mu describes the probability of your points being in that particular region of space and that's how it is okay so I think we will finish here for today in principle I could prove because of the gothic theorem but because we don't have that many lectures I think I will skip it's not even that difficult I will give you some notes with the proof it would probably take one lecture or maybe one and a half lectures to prove it but I would prefer to go and look at some examples so what we will do in the next lectures of the course we will look at some examples in which we will use the gothic theorem to describe the dynamics so once you have the gothic theorem what you want is you take a dynamical system and you want to find an invariant measure and if you can find an invariant measure that is also gothic and then that measure tells you something about the dynamics because it tells you the statistics of all the points so the problem is two-fold take a dynamical system find an invariant measure and show that this measure is gothic or find an invariant and gothic measure, probability measure okay so we will start with some systems in which Lebesgue measure is invariant and the problem will be to prove ergodicity so we already have the invariant measure and then we will look at situations in which you actually don't even know what the invariant measure is and then show that it's gothic