 As I said before, before the midterm, now that we've passed the midterm, the gears are about to shift. We're going to change gears from what we've been doing so far. The course has been so far structured as effectively a review of stuff you already knew. They're supposed to have known from your 370 course, but with more detail, a little more rigorous, and more difficult problems. Now we're going to start to introduce material that should be, for most of you, new. And for the next two, the next maybe say three lectures, it's going to be a little more on the theoretical side. I'm going to try to throw in as much sort of practical things that I can in there, but it is a little bit on the theoretical side for the next few lectures, just to warn you ahead of time. So what are we going to be talking about? We're going to be talking about, for today, Brownian motion. This will be our topic for today, and we're going to study what they are, how we can see, how we can think of them as arising from models that we've studied already, and then just study them as their own mathematical interest. So to connect it to what we've done already, let's go back to, let's actually look at the very simplest, the most simple form of an interest rate model. Let's take an interest rate model. You might recall that one of our simplest forms of the interest rate model were that R increased or decreased by sigma square root delta t. At every point in time, you have this sort of increase and decrease, and if we do this over many time steps, well, we can always write this recursively in this form here where we have little Bernoulli random variables, x's, and we can write this in terms of recursive relationship, but if you look at the structure here, there's really one underline embedded structure that's kind of invariant, and we can build this out of it. And that underline embedded structure is an object which basically does this. It goes up and down by an amount square root delta t. So what I want to do is somehow remove the pieces which stay the same as we change delta t from the pieces which actually change when delta t gets smaller and smaller. So if you think of this little branching tree here, and let me call that, I don't want to call it xn because I'll call it capital X. Capital Xn can be written basically in terms of these little Bernoulli's with the square root delta t there, and you can see there's a relationship between these two things that basically if I scale x by sigma, I get R. X is really the basic object, and R is just a scaled version of it. Whenever x increases by square root delta t, then R increases by the same amount times sigma, or if it decreases then it decreases by the same amount. So really the underline building block is this little tree, this little object here, and we're going to talk some more about it. Let's look at the basic model that we had for the CRR model, the standard one. Not the modified or anything of that nature, just a simple standard CRR. In the CRR model, we have this object here, and we could build this object out of Xs as well. We can think of, suppose I took the exponential, or S0, times the exponential of X of time n. Big X of time n. Are you convinced that that would give me the same result as this tree? Oh, sorry, there's a sigma missing there, isn't there? There we go. And in terms of the R, we could also, maybe I should have written this. Let's put R0 plus sigma Xn. So this is one way of thinking of the R process is built out of X. This is a way of thinking of the S process is built out of this other little tree, X tree. And they're just simple transformations of X. So what I'd like to do then is we want to study this tree, and study its limiting behavior. And then we're going to realize that, well, we've in fact in some sense already studied the limiting behavior of that tree. We've actually done it. But we'll redo it again, a different version of it, and then we'll talk about its connection back to the asset price tree. Now, when we have the interest rate tree and the asset price tree, there are probabilities associated with going up and down. And those probabilities for the interest rate tree, one standard assumption is to make it a half. It doesn't have to be, but one standard assumption is choose it to be half. For the asset price tree, we have to figure out the probabilities by looking at real-world historical, if we were looking for real probabilities, we look at real-world historical behavior, match the mean, match the variance, and that gives you your probabilities. If we're under risk-neutral, then we have to figure out the probabilities using the risk-neutrality criteria. Well, what I want to do is I want to separate that whole issue of probabilities from the behavior of this underlying dynamics all on its own. So we're going to take that simple tree there, and we're going to assume that it goes up and down by probabilities one-half. And we'll use that base model to build everything else afterwards. So let's start with, again, repeating the basic recursive relationship that we have for our tree. So these little x's, these are Bernoulli's, okay? And they're going to satisfy the usual Bernoulli criteria. They are independent and identical. And as I said, we're going to choose, I'm going to study this model. This is what we want to do. We want to study this particular model here. And the x's, these are iid's, okay? Great. Okay, so what do we already know about this model? What can you tell me? What do we know about this random variable x? So in particular, let's say I took some point in time, t, at zero, and we break this up into capital N pieces so that N delta t is a total time that I spend there. In other words, N delta t equals capital T. And as I take delta t down to zero, what can you tell me about x at capital T? We've already studied this, right? Well, you should know, of course, that it's normally distributed because it's a sum of independent, it's made up of a sum of independent identically distributed random variables with finite variance, central limit theorem tells me that this converges in distribution to something that's normal and there's some mean and there's some variance. And we don't know what it is, we've got to figure out those properties. And so we can figure out the mean fairly easily. Well, the mean of all of these things are zero of each one of the x's because each one of the little x's, the little x's are Bernoulli plus minus one probability half, so the means are zero, so if I take any sum of them, I will also have zero. So the mean of night is zero, so that's easy, we can fill that in now. And the variance, well, the variance is, we already know since these again are sum of independent things, we can always write that as sigma squared delta t times N times the variance of any one of those Bernoulli's. Any one of the little Bernoulli's. And since these are Bernoulli's with probability a half of being plus one and probably half of being minus one, the mean is zero, the variance is one. So this is actually just one, and that is nice and finite, so that's sigma squared times capital T. So we know, we have this property here, that x, at some finite time, as we take the time steps going to zero, it's normally distributed. And this is a distribution under P. There's only one measure here, we're not mixing up risk neutral world and real world and so on. We're just going to work under one single measure for the next little while. So you don't have to think about two different types of probabilities here, only one type of probability. Okay, now, so we've got that, and then suppose I ask you to tell me about the distribution of the increments of x from two points in time. So let's say t plus little t minus x capital T. So I want to put another point in time here, that's t plus little t. And we can keep, let's suppose that delta t is such that it evenly divides both t and t plus delta t. So that there's some number of steps in here, and that's delta t times big M. And we want to take the limit, of course, as both n and M go to infinity. And this delta t times M has to be equal to little t, because that's the amount of time in between those two points. What can you tell me about this difference in distribution? Let me slide up the timeline there again. You should all be able to work that out, right? It's normal, it's mean is zero, and what would the variance be? Sigma squared times little t, okay? Only sigma squared times little t. So how would you actually prove that? The way you'd prove it is you'd say, okay, well the increment of x, it's given only by the Bernoulli's in the second interval, right? The sum of the Bernoulli's in the second interval. So this increment, this is an increment, okay? That's the increment of x over that second time interval. So that increment of x in terms of the little Bernoulli's for a finite number, for a finite delta t is simply sigma squared delta t, the sum from say little n equals big n plus one up to big n plus big m, right? Remember, this is kind of similar to a question that I actually asked you guys to do, because I hinted that it was coming on the test, and I think there was only one student that actually did it, maybe two. So you look at the increment, and this increment contains only the Bernoulli's that appear in the second interval there, so from n plus one up to n plus m of the little x's. And again, it's clear, so let's call this thing delta x just so they don't have to keep writing it out every single time. It should be clear that the expected value is zero, again because it's a linear operator and little x's have mean zero. And the variance, again you have independence, so this is just sigma squared delta t times m, because there are m of these things remaining, right? In that sum there are only m, total of m terms, times the variance of any one of those x's, so you could take xn plus one as an example. You could even take x1 because they're equal in distribution. And that variance is one, as we already know. So that's sigma squared delta t times m, and what is m times delta t? That is little t, okay? So we've established that the increment is also normal with mean zero and variance equal to sigma squared times the time of that second window, right? Times the size of the increments, basically. So that's a general rule now that we can actually use and utilize. But, and then there's another question we could ask. What if I looked at the increment over, let's put in a few different times here, t1, t2, t3, t4. So these are four different times. And assume that they're not overlapping times, okay? So they're ordered and they don't overlap. In principle, I could have t2 equals t3. But that's the criteria that I want to make sure of. So the inequality is that t1 is less than t2, is less than or equal to t3, is less than t4. So in principle, they could touch at the middle there, but they're not going to overlap. And let's say we wanted to look at, investigate the increment of x over those two windows, okay? So we'll look at delta x here from 1 to 2 and delta x here from 3 to 4. Okay, so I hope my notation makes sense. You understand what I mean by that? Just to be explicit, why don't I write out one of them? That's x at t2 minus x at t1, okay? And similarly the other. So I'm looking at that increment. What can you say about the joint distribution of delta x12 and delta x34? So individually, delta x12, we already have the answer, right? It's basically, it's given by what we just did here. It will be normal, its mean will be zero, and its variance will be sigma squared times t2 minus t1. Agreed? We've actually just proven that result. Effectively, we've proven that result. Delta x34 would also be normal with mean zero and variance sigma squared t4 minus t3. What about the co-dependence between these two things? What can you say about delta x12 and delta x34? They're independent random variables, in fact. They're actually, in fact, completely independent of one another. It's not just that they are multivariate Gaussian with zero correlation. It's more than that. They are, in fact, independent. Of course, for Gaussians, if you tell you the correlation, you know the total dependent structure. Here, specifically, you know that the increment and the underlying reason is easy. We know that delta x12 is equal to sigma squared delta t of the sum from n equals, I'll use some notation that I haven't introduced, but it should be obvious what it is, n equals n1 to n2. So this is n1 is going to be, it should be n1 plus 1. So n1 is the number of steps I need to take to get to t1. So n1 plus 1, x little n, and delta x34 would obviously be this summed from n equals n3 plus 1 up to n4 of little xn. So the x's that appear here, the little x's that appear here and that appear here, there's no overlap. Even if t2 equals t3, you can see there is no overlap. The little x's that appear in the one sum do not appear in the other. And since the little x's are independent, these sums must be independent as well. Okay, so we could summarize, let's summarize the set of results that we've got so far. There's actually, there are a few, three approximately main points, and those three main points turn out to be a property of a type of stochastic process. And that's what we will, that's what's going to be called a Brownian motion. So what we've seen is that xt as a process has a following ingredient. So first of all, we started at zero. Secondly, we know that x at any fixed little t, it's normal with mean zero and variance. Oh shoot, sorry, no one stopped me on one mistake that I've done throughout this entire thing. I included sigma here. I don't want sigma here. Remember when I defined x, I had no sigma in it? I said we want to rip away the sigma? There should be no sigma in this at all. So just say sigma equals one here, just to avoid having to erase it from everywhere. Okay, I don't want this because I want the basic property of Brownian, of what's going to be called a Brownian motion. So I will correct it here, but you don't bother to go through and correct all of your notes, right? You just note. So all of these things, these are one, one, one, one. That's one, one, one, one, one, one. Okay, good. Okay, so without a little correction there, we have that the x's are all normal zero with variance t. We also have this property, this independence property. This is called the independence of increments property. So it has independent increments. So in other words, xt2 minus xt1 is independent of xt4 minus xt3 as long as we have this relationship. So as long as the intervals do not overlap, that's basically what it means, right? As long as the intervals do not overlap, we have this independence of increment. We also have another property. We have that we didn't, we did the computation, but we really didn't emphasize it. Let me go back and emphasize it now. Let me slide up. When we looked at this difference, we look at the increment of x at the very top here. x capital T plus little t minus x capital T. So that's the difference between x at two points in time. And that distribution does not depend on where you start. It does not depend on capital T in this particular calculation. It doesn't depend on capital T. This is what's called a stationary distribution. It only depends on the width of the increment. It's not always true. Not many processes, I can construct a lot of them, which depend not just on the width of the increment, but also where you start. This doesn't depend on where I start. In this increment, we started at capital T, but its distribution doesn't depend on capital T. So we have x T has stationary increments. So I'll write that a little bit differently here. We'll say x at little t plus x minus x at little t. This distribution, it equals in distribution x at x. This is a statement that we had earlier. We said that if we look at the increment of x, it only depends what is the increment of x in distribution. It's normal, mean zero, and variance equal to the size of the window. That's what we found. The left-hand side is normal. The left-hand side is normal, mean zero, and variance equal to t plus s minus t, which is equal to s. But that's also the distribution of the right-hand side. The distribution of x at time s is normal zero s. That's property 2 there. So we can see I can put an equality in distribution here. It's basically as if when you look at the increments of the process, at some point in the future, I could pretend that this is zero for any distribution of properties. I can just, even though zero was way back here, and the process has evolved along, it ends up at some point in time. I now look at its increment in distribution. That's as if the process just started. So you can slide the windows back to zero. Let me draw that in a picture for you. The process has actually started at zero. If I'm here at some time, little t, and then I look at time t plus s, the process has gotten to there. This increment, this is the increment of the process from that point onwards, that increment in distribution is the same as if I thought of the process starting at time little t. It's identical. That increment has the same distribution as if I started the process at little t. Now remember, equality in distribution does not mean equality. And we can see a very good example of this. Look at the difference. So what's the increment in this particular scenario? The increment in this scenario is its that height, right? And it's a positive number in this case. That would be the increment. Now, according to this distributional property, it equals in distribution x at time s. So what is time s here? Well, basically it's this window here, it's slid back to zero. So that would be about there, right? That's time s approximately. And you can see that the process at time s is clearly not the same thing. The value of the process at time s here, I mean, only by perhaps coincidence might it be. And I'm not trying this very well. Let me... Okay, there we go. So we want a line that is that long and I need to pull it back to time zero. So in fact, that's where s is, okay? I've done it a little bit more accurately now and we can see that in fact my guess was about wrong. So this is x. That's x at time s. And this is clearly not equal to the difference between these two things. Okay, you see that? Does that make sense? If you look at x at time s, it's not actually equal to the increment that x takes on, but in distribution they are equal. It's basically the same thing as saying, okay, if you take out a quarter from your pocket, you take out a quarter from your pocket, those quarters clearly have the same distribution, correct? You flip it, you flip it. Are you going to have the same outcome? No, not necessarily. So just because two things are equal in distribution does not say that they have to be equal, okay? Make sure that is clear in your head. It's a very important distinction. Any questions about that concept? Okay, great. There's one more property that we need to... that I need to state in order for me to claim that this thing here is what's called a Brownian motion. The one more property is that the process x has to have continuous path. And if we slide back up for a second, if we think of, let me draw this tree for the first few steps of x, and if we imagine a sample path in this tree, you go along some path like that, if we connect all of the points, so we don't just observe x at the time intervals, we actually put a straight line connection in between those time intervals, so the tree actually represents sample path. These sample paths are clearly continuous. You agree? They're clearly continuous. If I just draw straight lines in between all of the points at which you observe x. So I can make... What is that? I can make continuous sample paths. So x, t. Now, you might be wondering, what is this important property? Doesn't it seem to be enough to have the others? Can anyone give me an example of something which has all those other properties, but not this last one? It's a basic process, yeah. Okay, so you're thinking of, take a process exactly like x, and at one point in time make it jump by a dividend amount. Okay, that's not quite going to work. And the reason is because if I look at the increment, if the jump happens here at, say, half a year, and I look at the increment from zero to one year, the distribution will not be normal with mean zero, and the right variance. The mean will be shifted downward by your dividend. Okay, so that won't quite work. But what other basic fundamental stochastic processes do you know of? Poisson. Okay, let's go ahead. Let's take a look. A Poisson process, now it doesn't have normal zero t, but it satisfies the independence and the stationarity of increments, does it not? Its variance is also t. So it satisfies, it doesn't have the correct mean, but I can correct it so that it does have the correct mean. I can subtract something from it, so that the mean is zero, right? Nt minus lambda times t. Lambda being the activity rate of the Poisson. That'll satisfy the distribution of the zero mean, the variance of t won't be normal, and as well it won't have the continuous paths. Now you can do something called a compound Poisson in which you can eventually make it have close to normal in any case distribution. But that's just to point out that this point here, it's not just an extra add-on, it's actually an important point. Okay? So these properties, let's review them again, because they're very important, the process starts at zero, and to be really strictly correct, I should say zero almost surely. Because when you, well, you guys can ignore that. You can just say it starts at zero. The distribution is normal with mean zero variance t, has independent increments, it's stationary. Sometimes you combine those two statements, number three and four, sometimes you'll say x has independent and stationary increments. That's enough. And you don't need to fill in the details there unless I ask you for what that means. Okay? And then the last is the process has continuous paths. Okay? Those are important properties. Any process which satisfies all of those things, that's called a Brownian motion. So this is our first look at Brownian motions. And we're going to see that these objects are actually very wacky objects. They're different than regular objects that you're accustomed to. And what we have to develop is basically a calculus of Brownian motions. Why do you develop calculus in the first place? You develop calculus for looking at, firstly, derivatives of functions. You look at local changes in the functions. And why do you need to do that? Because you need to understand how functional transformations affect functions. And then the next thing you look at are integrals of functions. So we need to analyze all of those and develop the calculus of that, but for Brownian motions. And we'll eventually see why it's so different than other processes. But the sort of underlying fundamental reason that they're so different than other processes, you can heuristically see it from this tree. Imagine when you take these steps, going smaller and smaller and smaller. For every step size, let me ask you the question this, let me ask you the following question. Is a path differentiable? Just even that path there, that highlighted yellow path, is that a differentiable path? It's differentiable everywhere except at in this particular sample path, one point. Do you agree? Where it kinks, right? Where it goes up and then changes. So right at that point, it's not actually differentiable. Now, when you take the spacing, getting smaller and smaller and smaller, the number of points at which it will eventually change direction gets larger and larger and larger. And in fact, it becomes infinite. So in fact, turns out that the Brownian motions paths, although they are continuous, they are nowhere differentiable. And that seems like, okay, wow, that's an interesting mathematical oddity, right? The thing wiggles around very, very much. But that mathematical oddity has very important consequences for the behavior of stochastic processes built out of Brownian motions. And it changes the entire calculus. And it changes it in a way that's actually meaningful. It's not just a mathematical oddity. It actually has something meaningful behind it. And that's quite, I guess I would say, a little bit unusual when compared to what you, when you study analysis for the first time calculus for the first time. And you see functions that aren't differentiable. You always think, well, why the heck would I ever, you know, have those functions and use them if they're not differentiable anywhere, right? Functions, not differentiable at a fixed number of points. Okay, you can imagine that it has a kink. It changes curvature or something, right? Fine, but not differentiable anywhere. Why do I care about it? Well, here's a clear example of why you do. These processes are building blocks for our models for asset prices, yet they are not differentiable anywhere. And so we need to understand what the consequence of that is. Okay? All right, so let's just go for maybe another 10 minutes and then we'll take a break. Let's do our first set of calculations with Brownian motions and see how we can, this won't highlight the non-differentiability aspects of it, but it'll show you how to do some simple calculations for now. So suppose I have a Brownian motion and I want to calculate the, let's see, the product. I have a random variable, let's call it y, and that random variable is the product of x at time t and x at time s where t is less than s. So I can imagine such a random variable, right? So t could be one year, s could be two years, and that random variable depends on what happens in one year and what happens in two years. You take the product and that gives me a random variable y. How would I go about computing the mean and the variance of that object? Well, first of all, what's its distribution? Do you know? It's a product of, you know that both x, x at little t and x at s are in fact normally distributed, but jointly they're bivariate normals, right? You know that this is a bivariate normal in general. And their product, heard of chi-squared, must have heard of chi-squared. You have not heard of a chi-squared distribution? Chi-squared is basically a square of a normal random variable. So this is going to be something related to chi-squared, but I'm not going to ask you about the distribution of it. I'm just going to ask you about the mean and the variance. Okay, first two moments. And let's go ahead and see if we can figure out what that is. So as I said before, we're working under one probability measure, so I'm not going to bother to keep writing expectation superscript p. There's only one probability measure from now until I say there isn't... Okay? So let's compute the expected value of this random variable y. And we're going to try to do this computation without resorting to writing capital X, the Brownian motion, in terms of the little Bernoulli's. You could do that, and I could in principle ask you, try that and then take the limit and get your results. But let's try to work with only using the properties of Brownian motion that we had listed before. Okay? So in other words, use only the distributional property. It's normal zero t, independence of increments, independence and stationarity of increments. Continuous paths we'll find are not that useful for any of these computations. They come important later, but not here. Okay, so try to just use those. So if we compute this expectation, what are you going to try to do? Well, you need to compute the expectation of these two things. You know that X at time s is not independent of X at time t, but you do know that the increment from t to s is independent of X at time t. So you can use the independence, but you have to use it sort of cleverly. That's zero, that's t, and that's s. So what I'm saying is we know that X at t can be thought of as the increment from zero to t, in fact. Right? That's our X at little t, because X is zero, zero. And we have, that would be X at time s. And these are overlapping intervals, so we cannot use the independence of increment directly. But is there a little trick that you could introduce to allow you to use independence? Exactly. So think of X s as built up of those two increments. From zero to t and then t to s. It clearly is equal to those. And now I can use the independence of the second increment from the first and use the distributional property of X at little t itself. So this is equal to X t times X t plus X s minus X t. I've basically written a topology there, right? I've added and subtracted X at little t in the second term. But that rewriting has the interpretation that I drew in the picture that you have some non-overlapping intervals now. And so this is X t squared. And since that's a linear operator there, I can put plus the expected value of X t times X s minus X t. And now we can use distributional properties. What's the expected value of X t squared? Remember, X at any time, it's normal, zero t, variance t. So the first term there is t. What can we do with the second term? We now know that X t and this increment are independent. So you can just separate them. You can write it as a product of two expectations. I can only do that when I know the two things are independent. By the way, that's a mistake that a few people did on the term test. They took s times a function of s and wrote it as the expected value of s times the expected value of the function of s. Clearly wrong. These are not independent identities. But here, I can use the independence. I have X t is independent of X s minus X t. So I can say this and this, these are independent. And what's the expected value of X t? Zero. So in fact, this is just zero. So you can actually now conclude, if I didn't have the restriction t is less than s, suppose I had the other way around s is less than t. What would my answer be? It would be s, right? And if s was equal to t, well, it would just be t. So in fact, the general answer for the expected value of X t times X s, regardless of whether s and t are ordered, it's the minimum of t and s. Isn't it? Whatever's smaller, that's the one you take. And if they're equal, well, they're equal. So these are sort of simple basic calculations of Brownian motions. Something that's going to come in important when we come back from the break is the fourth moment of the Brownian motion. So let's do another simple example and then take a break. Suppose you wanted to compute the expected value of the Brownian motion. Actually, first of all, how about the expected value of Brownian motion to any, so n here is a positive integer. What's that? Isn't it zero? X is normal. So any odd moment of a normal is zero, right? Of a normal with zero mean, that is. So zero. And we could say here, let's just say here, because X, in fact, it's really because X is symmetric, right? The distribution of X. And normal mean zero is symmetric. It has symmetric distribution about zero. Okay. What about the expected value of the Brownian motion to the fourth power? That's our first non-trivial calculation for just the Brownian motion. Anyone happen to know what it equals? Okay. I'll tell you the answer. Three T squared. Okay. Now, why is that? Let's just remind ourselves again. XT equals, in distribution, T times Z, where Z is normal zero one. Okay. So I'm going to use that decoupling. We know that X is, in fact, normal zero with variance T, but I can always introduce a standard normal in order for us to just work with standard normal. So this tells me that expected value of XT to the fourth is, sorry, there's a square root missing there, right? The variance is T. So this gives me T squared expected value of Z to the fourth. And you might remember that the fourth moment of a Brownian motion is three of a standard Brownian motion. If you've forgotten that, let's do the quick calculation to demonstrate how you find it. So there's a number of ways to do it. One is you just integrate by parts, create a recursion relationship, solve the recursion. The other one is use a moment-generating function, differentiate enough time, take a limit. Another one is explicitly take integration by parts all the way down. Then there's a couple of other tricks with ways of rewriting integrals. So let's just do the moment-generating function one, which I think this is probably most familiar for everybody. Let's call that G of A. Since Z is standard normal, this would be E to the one-half A squared. And what is the fourth moment equal to in terms of this function G of A? It's equal to four derivatives with respect to A of G of A, and then take the limit at A equals zero. Every time I take a derivative, I can put the derivative operator under the expectation because things converge and I can interchange limits and integrals. So one derivative pulls on Z, another derivative pulls on Z, another factor of Z, four of them I get four factors of Z, that gives me Z to the fourth. Put A to zero, so the exponential is now one. And I just get expected value of Z to the fourth. So I have that result. So now all I need to do is compute these four derivatives and then take the limit. It seems at first seems like it's a long calculation, it's not that long. So partial with respect to A of G, first of all just A times G itself, right? You get one-half A squared derivative and one-half A squared is A and then you get G back. Okay, two derivatives. Well, I'll get the derivative operator hitting the A, which will give me a G. And then I have the A, the derivative operator hitting the G, which gives me an AG again. So that's one plus A squared times G. Three derivatives. So the derivative operator hits the one plus A squared, that gives me a two A times G. Plus derivative operator hits the G, that gives me an A times G again. Okay, so I can simplify that one more time and we'll see we get three A plus A cubed times G. And finally, four derivatives. So derivative operator hits the polynomial out front. I'll get three plus three A squared times G and then the derivative operator hits the G. So it's three A A plus A times G. Okay, and as A goes to zero you can see while the second term G goes to one, as A goes to zero. So this goes to zero, this goes to one, three A squared goes to zero and you're left with three. So that's our fourth moment. Quick and dirty two-minute calculation for you. And since you've got that result we now have the final result of three T squared for the fourth moment. Okay, so let's take a little break. We'll come back, we'll do a few more small computations and then look at something interesting. Okay, let's get started up again. So just to remind you, once again, of the basic properties that we've shown for this object that eventually becomes what we call a Brownian motion, it starts at zero, it has normal distribution, it's got stationary and independent of increments and it has continuous paths. Those are our key properties. Now we've done a few calculations and if you look at the website, the problem sets, there's a whole bunch of other calculations that you can try. What I'm going to do now is look at another fundamental property of the Brownian motion itself and I'm going to tell you something that's extremely surprising, well, what's surprising to me about Brownian motion and we're going to then try to prove that fact. So one thing, one very surprising fact is if you take a path, just a single path of the Brownian motion and you ask what is the total length of that path? Now what do I mean by the total length? This is what's called the total variation of the process. Total variation, by definition, that's what I mean by the little equal sign with a triangle, that's its definition, is defined in the following way. Now I'm going to introduce a few different symbols here that don't quite make sense yet, but they will in a second. Think of the intuition as I'm trying to calculate the length of that path. So what the length of the path means is look at increments, look at the absolute value of those increments, sum them up, okay? That's what I want to do. So I want to take the limit in which I take a sum over k and I've got increments of the path, take their absolute value, and I'm taking the sum over k from one to, I don't know, it doesn't matter, one to big k, and I'm going to be taking the limit in which this object goes to zero. Now what is that object? There's more here. Whenever the work that we're going to start doing now requires us to partition time. So pi is called a partition of the time axis. And what that means is you have, break this up into a bunch of pieces. They don't have to be equally spaced. They simply have to be ordered. So this partition is a set of points, t zero, which is less than t one, which is less than t two, et cetera, all the way up to less than t, maybe I'll call this n as opposed to k, less than t n, and that equals the end point. So, and this t zero is zero. So the partition, this will be the partition from zero to t. So I just break up the interval into a bunch of intervals, but they're not necessarily equal. This is the one little thing that's going to, I've already started changing from what we've done in discrete time. Discrete time, we've always taken the interval sizes to be equal. And that's because it was easy and intuitive to build a model like that. But to show things rigorously, what we need to do, or quasi-rigorously, I'm not going to do the truly rigorous versions of things here, but quasi-rigorous, we need to break up the interval up into partitions of arbitrary size. And this symbol here, this is what's called the L infinity norm. Okay, so what's a norm? Norm norm is a measure, okay, tells you. And in simple terms, all it is, in this case here, it's defined to be the maximum, now, really it shouldn't even be a maximum, it should be a supremum, but we'll take it to be a maximum. It's the maximum interval size. So it's a very simple concept. You just say, okay, in this picture that I just drew, let me, in fact, if that's the partition, let's say that's my partition there, the L infinity norm, in this case, would be this distance. That's the largest increment. Okay, I have only one, two, three, four, five intervals that I've drawn with the little tick marks. In this case, that would be my norm of pi. And what it means to take the norm of pi to zero is the analog of what we've done in the simple version of things where we take delta T going to zero, okay? That's the basic idea, that whenever you took delta T to zero before, we now are taking this L infinity norm to zero. So we're taking the maximum distance, the maximum interval size, shrinking it to zero. So if the maximum is shrinking to zero, clearly all the others are as well, okay? So the total variation is defined in this way. Let's take a partition, take an arbitrary partition, and then take the limit in which this L infinity norm, so you should actually do this over all possible partitions, of this sum goes to zero. And here's the surprising thing. Okay, let me, before I tell you the surprising result, let's just do a guess of behavior of something. Suppose I had something, a straight line, that went like that from zero to T, and this was just straight line of slope one. What should the total variation be? What's the intuition? What should it be? It should be the length of that straight line, right? More or less. It should basically be that. So it'll be what, like root two times T. So it's finite, right? In this case, something that's nice and finite. And if it, you know, if it sort of did something like that, okay, it's a little bit longer than root two times T now, because there's some meandering going on. And if you think about it like that, when you view this Brownian motion, you know that this Brownian motion is kind of wiggling around all over the place, and has all of these small little kinks, these small little wiggles. Now, of course, those little wiggles are very tiny, and you might think that if you add them up, they will add up to give you something finite. But it turns out that's not the case. The Brownian motion has an infinite total variation. The length of the path of a Brownian motion is infinite. For any differentiable function, so if x was actually the path of something that's differentiable, the total variation is finite. So in order for you to understand that concept, we're going to first do this analysis with differentiable paths. We'll take a look at a path that's differentiable, and I'll show you how to compute the total variation, and we'll get a nice simple formula, and then we can say what happens with that. Okay, so for now, let's call this the differentiable case. So when I say differentiable, of course, I mean the derivative exists for every single t, for every t. So if I want to compute this total variation, let's draw a little picture here and zoom in to one of the intervals that we're looking at. So we've broken up, we have a partition, we've got time broken up into a bunch of places. Let's take a look at one little interval, tk minus 1 to tk. The function's differentiable, so I'm going to draw some sort of behavior like that. It's differentiable. And if I want to calculate the total variation, what I need to eventually do is shrink this down to zero, the maximum one that has to be shrunk down to zero. But for every partition, I need to compute the difference between x at one point and x at the other point. I need to compute that difference. So what I'm after in this case is I'm after this height. That's my increment in x. So let's just call it delta x sub k. So that we can simplify our notation. So delta x sub k is, by definition, again, this increment. So we need to say something about the sum of delta x k's. And here's the wonderful little first-year calculus trick that you're going to use to try to tell you something about that sum. Well, if I drew a straight line from here to there, like that. So this should be a straight line connecting my start point to my end point. Do you know of a fundamental theorem in calculus that tells you something about that secant? Yeah? Mean value theorem. That's right. The mean value theorem tells you that there exists some point in between the end points, there's some point, and if I touch that, the slope of that function is the same slope of that line I just drew. That's one of your fundamental theorems of calculus. Let's see if we can find it. Well, it doesn't mean that there's only one. It means there's at least one. And here we could probably see a few of them. There's certainly a point right around there, and if I computed the tangent, it would, well, it's going to take me a few times to draw it correctly. That's more or less. More or less you can see that there'll be a point somewhere there which the tangent is parallel to the secant. There's also another one somewhere over here, maybe about there, something like that, right, convinced? I challenge you, draw almost any shape that's smooth, so that means it's differentiable. And you will find if you draw the secant, there's going to be some point in that interval which you can make a tangent to your curve, and it will be parallel to that secant. So the statement, the mean value theorem says, let me remind you of what it says, says that there exists Tk star in the interval Tk minus one to Tk, such that, okay, so it says such that the derivative at that point is equal to the slope of that secant. That's what it means, right? The derivative at the point, that's the slope. Those lines being parallel means that they're equal to this increment. So such that x at Tk star prime is equal to, and what's the slope of the secant? It's delta xk divided by delta Tk. Isn't it? It's just the height divided by the length. And so delta Tk here is just the, maybe I'll write it in here for you as well, delta Tk by definition, this will be the increment K minus one to K. So this is our mean value theorem. So it says that there's some point such that this is true, and there may be more than one, but all we need to know is that there exists one in that interval. Okay, so like I said, we found a couple of examples here. Let's put it in orange, I guess, I don't know. So that could be a Tk star. That could be a Tk star. Neither of them will work, okay? All right, is that concept clear? You remember this vaguely? Kind of remember the name, but maybe not what it actually said. It seems like this is, like, why is that a useful thing again? And in the first year of calculus, you always think, well, why do I need that mean value theorem? What's it used for? And so on. It starts to come into play. So why is this useful? It's useful because it now gives us an estimate. What we're after is the increment, right? We're after the increment. So it gives us an estimate for that increment. In fact, not even an estimate. It gives us an exact value of that increment in terms of delta Tk and derivatives of the function, which we've assumed exists, right? I've said this is the differentiable case. This is the case in which the function actually is differentiable. So we can just use this to write delta xk equals x at Tk star prime delta Tk. Now that's a very useful thing to do. And the reason is, if we go back to this definition of total variation, look, I can now replace that sum of the absolute value of the increment by the sum of the absolute value of the derivative times delta Tk. Delta Tks are always positive, so that's easy. And so the total variation, let's fix a partition pi. So take one partition. So TV superscript pi means take one partition. Any partition, in fact. That's going to be a sum over k x Tk star prime times delta Tk. Now, what does that converge to as pi goes down to zero, as the norm for pi goes to zero? Let me draw another little diagram here for you and you'll recognize it immediately. So we've got some partition, not necessarily equal space. So the height of those little rectangles is prime at the Tk stars. And that might be the Tk star for that interval. Maybe Tk star for that interval is there. For this interval, it might be right toward the left. For this one, it might be right in the middle. So these are my little Tk stars. The heights are the x prime at that point. That's what that sum is for a fixed partition. I'm taking the partition norm to zero. What does it converge to? Not zero. It's the integral, right? It's a Riemann integral. It'll be the Riemann integral of x prime. Well, actually I should put absolute value here, sorry. The Riemann integral of the absolute value of x prime. We don't know that that's positive. We just know that it exists. So this actually just converges to the integral from zero to T of x prime norm x prime ds. So the derivative exists, so then this would exist. And then the question is, well, are the derivatives well behaved enough so that this integral actually exists? And then you can just say, we'll restrict to functions for which this integral is finite. And then you're done. That's it. So if the derivative is Riemann integrable, if the absolute value of the derivative is Riemann integrable, then the total variation is finite. So this will be finite for most functions. Most nice functions. Okay? Make sense? You see how the proof goes? We're going to use basically a similar trick now for the Brownian motion. But for Brownian motion, where things will fail is we cannot use the mean value theorem. It doesn't allow us to say because the Brownian motion is not differentiable anywhere. We're not going to be able to use it. Okay, so let's try the exercise and see what happens for Brownian motion. Okay, so for Brownian motion, all we can say is that the total variation for our partition pi equals sum over k of the increment. Sorry, I don't know why I put w there. Delta x, k, and now I am delta x, k in distribution is delta t, k to the one half times the standard normal. Okay? You do know that. And effectively, we're going to try to use this quasi-relationship. So this tells me that my total variation is kind of like sum over k delta t, k to the one half times, now here's a problem. I have to use a different standard normal for every one of them. Okay? These are not the same standard normal. In each increment, you have a different one in distribution. The increments are all independent, so you have a whole sequence of z, z1, z2. These are IIDs. Sorry? IIDs normal 0, 1. Thank you. Okay? And then what I want to do is I'm going to multiply and divide this by let me put this down here instead. Okay? So it equals in distribution, for sure. We don't need to say frog squiggly. It's actually equal in distribution for a fixed pi. And what I'll do here is I'm going to take delta t, k and I'm going to divide it by delta t, k to the half. Is that k? And what I want to do now is I'm going to bound this in some way. Okay? I want to say that this has to be greater than or equal to something. See, here, if I take the largest one of these, these are my increments, right? So if I replace the denominator by its largest value, then clearly I have something, I have a lower bound. Think of, I have the number one over two plus one over five plus one over six. If I replace them all by one over six, that's a lower bound for the sum of one over two plus one over five plus one over six. So that's exactly what I'm going to do is I'm going to replace this what I've circled by its lower bound, which gives me an upper bound for the sum. And then we have to do a fudge. This is less than or equal to the sum over k one over the norm for pi to the one half. Okay? And, yeah, that's it. And so here's where we, where we get ready to some tricky issues is what is this thing going to converge to as pi goes to zero? Something finite, something not finite? That's the issue. Turns out that this is, it's a random variable, clearly, right? It's not the number because z's are random variables. We have an infinite sum of independent normals. They're all being multiplied by a delta Tk there, right? And I need to figure out how does that thing actually behave. So we're just going to cheat, and I'm simply going to tell you that this thing can be shown to be something that is finite. Okay? This is the part that I don't want to get into. And that's why I was wondering, and here's where we find this term here, this blows up. Pi goes to zero. One over norm of pi is going to go to infinity as norm pi goes to zero. And so this whole thing is greater than or equal to plus infinity. So it's total variation is infinite. If you don't like this, you could think of, you could go back and think of, instead of putting zk here, remember how the Brownian motion was built. It's something which goes up or down by square root delta t, right? That's what it is. So this zk, what is the absolute value of the zk when you think of it in terms of the tree? It's one, right? The random variable, the normal random variable is actually just the limit of going up or down by plus minus one or probability half. So this thing here is nice and simple and bounded. And when you go through this argument here, then kind of think of this actually as one. And this sum, this entire sum, if the absolute value of z is one, then the sum in fact is just t, which is clearly finite. Okay? Now when you do this properly and you actually look at what random variable you converge to, you can show that the random variable is bounded. And it has a finite mean, it has a finite variance, and then this norm going to zero gives us a plus infinity. Okay? So this argument I will never ask you to reproduce. Okay? Not this one. Nor this one in fact. I have never yet asked and I will not ask about total variation. What I will ask is about the next thing that we do, which is called quadratic variation. Because that one we can go through all the way rigorously. What do I want to say here? Okay, what's our conclusion? Our conclusion is that for Brownian motion, we have paths that have infinite total variation. For continuously, for differentiable functions, we have paths that have finite total variation. So we can conclude that Brownian motions are not differentiable. Because if it were differentiable, we should have a total variation that's finite. So Brownian motions cannot be differentiable anywhere in fact. Because if they were, we would be able to have a total variation. And this is true for every single little t. No matter what little t is. If you look at the total variation over any little t. All right. So, sorry, I told you that it's going to get a little bit abstract, right? This is part of that. We'll come back to something useful in a bit. Okay, let's look at quadratic variation now. So that's the next thing and the last topic for the day basically. So quadratic variation, as you can imagine, and this is always written as x comma xt. This is called a quadratic variation of a process. The quadratic variation is defined exactly the same way as total variation, except instead of taking the sum of the absolute value, you take the sum of the squares. That's why it says quadratic. So same idea. You put a partition down. You take the limit in which the partition goes down to zero. You take a sum over the increment. So I'm short-handy notation now. So delta x sub k means xtk minus xtk minus one. And you square this. And let's see what our results are going to be. So for continuously differentiable functions, what do you think the quadratic variation is? Now you're summing the squares of those increments and then taking the partition going down to zero. It turns out it's zero. It's actually exactly zero for continuous differentiable functions. So let's see why is that the case. So once again, what we do is we use the basic property of this mean value theorem once again. And we can replace delta xk by the result that we had before for differentiable functions. Remember this. Delta xk is x derivative at the tk star times delta xk. So this is simply delta x prime at tk star times delta tk. And this whole thing squared. Right? That's what we had before for differentiable functions via mean value theorem. And now here's a trick that's going to be keep recurring. So let's take that sum. And okay, let's call that sum s. So this is square times delta tk square. What I would like to do is I want to show again that this sum is bounded by something. And I'm going to show that that bound in fact goes to zero. So the trick here is to take this delta tk and break it up into two pieces. Delta tk times delta tk, that's a triviality. One of these, replace it by the norm of the partition. And if I make that replacement, I know that I've now found an upper bound for that sum. Do you agree? If I replace one of these by delta tk by the norm of pi, then I've replaced something by the maximum that it could possibly be, so therefore this sum has to be smaller than the maximum. That's it. So I know that this has to be less than or equal to norm pi sum over k x tk star prime squared delta tk. Do you agree? Okay, great. So now what about this sum? What's it going to converge to? Again, that has the form of a Riemann integral, doesn't it? It's sum function evaluated to the midpoint of a partition times the width of the partition. So it's exactly the same picture here except the heights are now x prime squared. So this is going to converge to the integral from zero to t of xs prime squared ds. And so we have to make an assumption on integrability of the square of the derivative. So if the integral of the square derivative is finite, then this sum is going to converge to zero. Why? Because there's this pi out front, this norm pi. This becomes finite. The sum becomes finite. The norm pi is going to zero fast. Well, it doesn't matter if it's going fast or slow. It's going to zero. And this is converging to something finite. So as norm pi goes to zero, we get zero. So that gives us an upper bound. And is there a natural lower bound? Well, the total variation is the sum of something squared. So it's got to be greater than or equal to zero. So we know this has to be greater than or equal to zero. And we've just shown that it is smaller than or it converges. What did we show then? We showed the upper bound. It's less than or equal to zero. That's an upper bound. Less than or equal to zero there. Everyone OK with that? Yes. We have an upper bound. So this gives us our less than or equal to zero. This is simply by the definition of quadratic variation. It's a sum of positive things. So it's got to be, or at least a sum of non-negative things. So it's got to be zero or positive. And so you can conclude that it's got to be zero. Exactly. Oh, by the way, I forgot to put this mic on. Do you guys want the mic or is it OK? Mic? No mic? Mic? OK, fine. Good. OK, so let's go through this same analysis now again for the Brownian motion case. So we now have differentiable functions. They have zero quadratic variation. Brownian motion is going to be a surprising result. So once again, we'll take a fixed partition. So if I put that little superscript on it, I'm talking about a fixed partition. And this is going to be sum over k of delta wk all squared. We cannot use mean value theorem. We can't use it. Let's see if we can get some intuition as to what the result could possibly be and how am I going to get the intuition? Oh, sorry, I put w again here, x. How can we get that intuition? Well, recall once again, in the discrete time, we're going up and down by square with delta t's. So if I could simply replace delta xk by delta tk plus or minus. Delta tk plus or minus. When I square that, I'll always get delta tk squared. Agreed? Sorry, I'll always get delta tk. This I'm replacing by plus or minus square root. That's why I misspoke that. And so the sum, if I could do this replacement, this is all very hand waving. I'm not actually doing the analysis yet. I'm just trying to motivate what the answer should be. When you square it, you get delta tk, whether it's positive or negative. And what does that sum up to? t. That's a very bizarre thing, right? When you think about it, I mean, you think, okay, so is the answer going to be t? And then you look at that and you realize, well, delta x actually is a normal random variable in distribution. And we have sum of the squares of a whole bunch of normal random variables. And if this result is true, what's going to happen is, somehow the sum of the squares of all of those normal random variables add up to always give you t. A whole bunch of random numbers adding up to giving you something deterministic. That's not random. That's bizarre. It's very bizarre, I think. And it turns out it's correct. It actually does converge to t. So that's going to be our goal. It's show that this, in fact, becomes equal to t. And the way we're going to show it is a tool that we will use again and again. The way we're going to show it is we're going to show that the quadratic variation minus t, this is a random variable, right? Because as all of these things, I'm summing up random variables, so in principle I have random variables. This is a random variable, which we're going to use the law of large numbers to show that this converges to 0 almost surely. Okay? Via law of large numbers. And how we're going to implement the law of large numbers here. What we're going to do is let's call this random variable, I don't know what I want to call it. Let's call it r superscript pi. The way that I will, what we need to show in order to show that it converges to 0 almost surely is we need to show that the mean of r pi is 0, or I should say that the mean converges to 0 at least, and this is the limit in which the partition goes to 0, and the variance of r pi converges to 0. So if I have a random variable, a sequence of random variables, and the mean goes to 0, and its variance goes to 0, well the random variable has to be 0 almost surely. The sequence has to have a limit of 0 almost surely. There's no choice. The variance is reducing, the mean is reducing, it's becoming exactly peak at one point. So this is the way that we will implement the law of large numbers here. Is that goal seem reasonable? Is the motivation for what the answer should be seem reasonable? And now we have to check that it actually is true. Okay, so let's go ahead. So what is r for a particular partition? Well, we have it written up there. We have the total variation written up there. It's this minus t, and here's our trick. Our trick is write t as the sum of the increments of t. We have a partition for time, right? It's partitioned, and I'm going to simply replace t with a partition. This has to be equal to t. We know because that's the definition of the partition in fact. It starts at 0 and it ends at t, so if I sum up all the increments I will get t. And why is that a useful thing? Well, hopefully you start to recognize I can collect those two sums together because they're over the same partition. Okay? Now what can you say about the mean of r pi? What's the mean of this? Delta xk, what is it? It's normal, mean zero, variance, delta tk, isn't it? Well, mean zero means the variance equals the same thing as the expected value of the square. So the expected value of delta x squared is in fact delta tk. So this in fact is zero. It is identically zero. It's not just zero in limit. So for every partition, any partition will have a zero mean error. So that's kind of cool. So let's just write that out in a few steps here. That's just using linearity of operators. And then we can just say since delta xk is normal, mean zero and a variance of delta tk, that's delta tk. So this equals zero. Every single term in that sum is zero. Are we good? Okay, now the last bit of the calculation, the variance of the remainder. Well, the remainder is a sum of a bunch of terms that are independent of one another. Each delta x is independent of the other one by the independence property of the Brownian motion. Agreed? So the variance of that sum is the same thing as the sum of the variance due to independence. Now, before when we were doing calculations of this kind, we were able to say, when we had the Bernoulli's building everything, the Bernoulli's were all identical. But here the partition is not necessarily identical. So I cannot say this equals n times the variance of one of the terms. That's not true. We actually have to work out the entire variance for every single partition. Well, first of all, you're computing the variance of a random variable minus a constant, minus a deterministic thing, I should say, not constant. This is not random, delta tk. So the variance is the same thing as the variance of delta xk squared. So let's work out the variance for each term individually. So that variance, that's the same thing as the expected value squared squared, which is to the fourth. Remember earlier I said that we're going to use the fourth moment. You can see it coming in now. Minus the expected value of delta xk squared all squared. Okay? Now, we've already worked out earlier that that first term, that's the fourth moment of the standard normal times the square root of delta t to the fourth, which is delta t squared times three. Minus the second term here, this is the inside of the what's inside here, the expected value of delta x squared, we already know that that's delta t squared, sorry, that's delta t, and then when we square it, we get delta t squared. So we end up with that result. So now, just continuing, sum over k, we could pull out the factor of two, delta tk squared. What do you think the trick is now to be able to argue that this goes to zero? Yeah, we need to find an upper bound. Exactly. And how would the upper bound... What is it going to be bounded above by? Exactly. Same trick that we used earlier, right? Think of the delta tk squared as a delta tk times another delta tk. Replace the second one with its maximum. And that gives us an upper bound. And why am I allowed to do that? So what does that do for me? Well, the pi is independent of the sum. The sum is in the round brackets. The round brackets is in fact exactly t, which is finite. It doesn't change as the partition changes. And as pi goes to zero, as norm pi goes to zero, this goes to zero. And it's known that the variance goes to zero. So therefore, if the variance goes to zero and the mean goes to zero, we have now concluded, we can now conclude, therefore, law of large numbers that our pi goes to zero almost shortly. And that implies that the actual quadratic variation equals little t almost shortly. Because our pi was the difference between quadratic variation for a fixed partition and little t. So I think that's a miraculous result. What it means is and there's some reflections between that and the variance of a Brownian motion. So let's just take a look at the sample path. So suppose we take a path of a Brownian motion, any path, any fixed path, if I compute the sum of the squares of the increments over that path for that fixed path, I will get t. And if I run another path, again, for that fixed path, I will get t. But there's another aspect where t shows up, where this linearity shows up. And that is, if I made sample paths, so let me draw another little diagram here. If I made many sample paths and I look at the terminal value of these points, that is the Brownian motion at time t, right? Over several scenarios. And we know that this is normal with mean zero and variance t. This linearity in t shows up in the variance of the Brownian motion with some of the squares of its increments. So there's some sort of tie-in between these two things. But they're fundamentally different because the quadratic variation is something that's satisfied on a path-by-path basis. This variance is something that you have to observe by generating many paths. So they're sort of dual effects with the same happening to have the exact same outcome. That the variance is linear in t and the quadratic variation is linear in t. It's quite surprising. So why don't we take a short break there and we'll come back and finish up the lecture. What I'd like to do for the last little bit is to convince you of these results, basically. Numerically. To actually show you if we generate sample paths and if we generate sample paths and along a single sample path compute this quadratic variation and compute the total variation we'll see how the total variation blows up. The quadratic variation becomes t all more shortly. So let's generate a Brownian sample path. So first of all we're going to take little random variables. These are our normal zero ones. We'll start off with, say, just ten steps and let's make time be one year. Okay, so your dt is t over n and we're going to have our x. It's going to be just, we need to basically sum up these guys cumulatively sum them and multiply by the square root of dt and this is from two and we'll start it at zero and let's just plot x. Okay. Hopefully I don't get an error. I do. What did I do? It's not consistent. Okay. I need to probably take a prime or something. There we go. Okay, good. So this is our very, very poorly sampled x process. Zero dt up to t. Let's run it again. Okay, that's our very poorly sampled x process or Brownian motion. And if we increase this, of course, we'll get better and better. Let's call that figure one. Okay, that's with a hundred steps. So the thousand steps. So ten thousand steps. Hundred thousand. So we can see we're getting this sort of Brownian path and this these very small fluctuations which are adding up to make the paths more and more making them longer and longer. The way that you see that is when you zoom in here the path basically looks the same as it did when you zoomed out, right? It has all these little wiggles and no matter how much you zoom in you'll always see those little wiggles and those little wiggles are what add up to give us this total variation becoming infinite. So let's just compute total variation for the process. So dx this is our increment in our x process. Okay, that's how I can compute a vector of increments. And then sum the absolute value of them and display that. That'll be the quadratic total variation for that path was 1.5 whatever it is. Run it again for ten steps times two and a half. Run it again. I can do this many, many times. Okay, so it's sum number. That's not the interesting thing. The interesting thing is taking increasing n. Does it converge? Now it's 8.9. Thousand. Now it's 25. Ten thousand. Now it's 80. Hundred thousand. Now it's 252. It just keeps increasing. You can clearly see an increasing trend here. Eight hundred. And it doesn't grow very quickly but it grows. And in the limit it'll blow up. Okay, so now what we look at is the quadratic variation. And what I'll do is I'll actually compute the quadratic variation for every t. So I'll sum up the increments from 0 to 1 from 0 to the first step, 0 to the second, 0 to the third, to the fourth all the way and we'll plot that. And what that should become is a straight line. It should become t. So the quadratic variation is supposed to be we'll make it the cumulative sum of the increments squared. Okay? And let's call that figure 2 and we'll plot again 0, dt, t. I think I might have off by 1 here, qv. Okay, that's figure 2. So that's supposed to be, this is the path. This is the quadratic variation along that path. So this is the sum of the squares up to the first step, to the second step, to the third step, etc. It's kind of a straight line. It doesn't look that great. Let's run again. Now it certainly doesn't look like a straight line in this scenario. But that's because we only have 10 steps. Let's go 100. This is our sample path. Here's the quadratic variation. Starting to look better, right? In fact, for comparison purposes, why don't we put the straight line here for us? So it should be right along that line. Another scenario. So it kind of gets close, even with 100 steps. Take a thousand. This is our sample path. That's our quadratic variation. We can clearly see it's getting much more and more approaching that exact line. And if I do 10,000, 100,000, it's pretty much right on top. So hopefully you're convinced, numerically even, that these limits actually do work out. They actually are giving us the result that we wanted. All right. And as I pointed out, this is a path by path behavior. It's not the same thing as just looking at x at the end point and squaring it. It's not. You actually have to sum along the increments as you move along. All right. So I'd like to do that. You don't really have... There's not really much that I can quiz you on today because I think this is a little bit... You need to absorb this. So I'm probably not going to give you a quiz today. We'll just continue with a little bit of the lecture. But that doesn't mean the class is over. Sorry. There's still subject material to cover. So what I do want to do is I want to... Okay. What I do want to do though is cover something that's called... I want to start to cover transformations of Brownian motions. I want to answer the question how can we use Brownian motions to build other processes? That's what we want to answer. How can I use it to build other processes? And if you think about differential equations, the connection with differential equations is where I want to go, I'm going to start off using Brownian motions to try to build other processes by saying how the increment of the new process is related to the increment of the Brownian motion. So I want to give sense for the following type of equations. So suppose we have some new process Y. And I want to write the relationship dy equals, for example, xt dxt. This is going to give us our first insight into something called Ito's lemma and Ito processes. So suppose I wanted to make actual sense of this. Now, why do I say make sense of it? Well, does this object make sense? The d of xt... x was a Brownian motion, so let me just be clear here. xt is a Brownian motion. We already know that it's not differentiable anywhere, to write this equation down, even. It actually is nonsensical. So we need to have a formalized way of what do we really mean by that? Well, what we really mean by that is an integral. Just like derivative differential equations, we can always reinterpret them as integrals. In other words, we want to look at the difference of the process Y over a large time. And we want to build this difference out of some sense of what a stochastic integral means. So let's write down here, 0 to t, xs dxs. So I need to make sense, I need to define properly what that integral on the right-hand side is. And given what we've already done, what would you naturally try to write down for a reasonable definition of this integral? Is there something natural? Take a partition, write the integral as the limit in which the norm of the partition goes to 0 of the sum of x times the increment of x. That seems like a reasonable definition. So I'm not defining in generality everything yet, we're doing it through an example. But more or less, the general version will be this. And here's the tricky point. So here we'll have the increment of x. So let's minus tk minus 1. It's very important that this here is the left-hand side. So that's x at tk minus 1. When you define things in this way, it's possible to define it at some other point in the tk minus 1 to tk interval. I can put x at some other point in there as well. And it's perfectly well-defined, but these are what are... one version is take the midpoint and that would lead to something called astrotonovich integral. This here is what's called an ito integral. This is an ito integral. And these ito integrals are what are used in finance. And the reason why they're used in finance is because later on, we'll imagine that the term in red is basically going to be something like an increment in a portfolio. And when you take a position in a portfolio, you hold that constant over the next interval. And what you gain in terms of prices, you gain the position you have times the increment in what the asset... sorry, the increment in the asset price changes and not the position in the portfolio. So that's why the left-hand side version of the integral is what's useful for finance. You can, in principle, define it to be something else. So that's the way that we're going to define it. And what do you suspect the answer to be? Suppose X was differentiable. What is the answer? Isn't it just a half X squared minus X zero squared? Because if it were differentiable, I know that the derivative of X squared is twice X dx. Right? I know that. So if I integrate this, I integrate that. Fundamental theorem of calculus says the integral of a derivative is just and we're going from zero to t. This would be X at t minus X... sorry, X squared at t minus X squared at zero. And so it seems as if the integral of X dx should be one half X squared minus X squared at zero. Right? That's what it seems like. So this is if things were differentiable, I could do that. But it's not. We know that. So you know that this answer is wrong. Or at least I shouldn't say you know it's wrong, but it could, in principle, be something different than that. So let's investigate numerically what it is rather than going through the actual derivation because it's kind of it leads to some interesting interesting guesses. Now those of you who've seen Ethos-Lama, you probably know what the result is. But you may not have actually seen it developed in the sense of derived or simulated even. So in this plot what I'm going to do is let's compute that integral for this equally spaced partition. And we'll take the limit. And we'll plot that sum together with our guess for the answer. One half X squared. We'll plot both of them side by side and see how they look. So I need to we need to compute so X dx this is the thing that we're computing our increments for. So we need the left hand side. So that's this. And we're multiplying that by the increment of X which we've already computed before. And then the actual integral that we want is the cumulative sum of that. We just sum it all up. Okay? And our guess for the answer which is X squared basically. So I'll call it X2. That just equals X squared. And plot those two things on the same figure. So we're going to plot from 0 actually we should go from dt to dt up to t of the integral and we'll keep that one in blue. Let's plot our guess in red. Okay? So there we go. Let me put a legend here. This is the actual integral and this one is just oh sorry we want one half X squared, don't we? That's what our guess actually was. So let me put a factor of 0.5 here. Okay? So they kind of move together. They do kind of look like they move together in this particular sample path. But they're definitely different. Do you agree? They're definitely different. Let's run another sample just for curiosity. You can see that they are still quite different from one another. So let's take a look at what the error is. Take the difference between these two curves and let's not do 10,000. Let's just do 100 steps. So we'll take the integral and we'll subtract half X squared and plot that. This is our error. Okay? That's our error here. It kind of looks pretty noisy but it's trending downwards. That's an increase in number of points. Okay? So here were the two sample paths. And here's the difference. It looks like it's trending downwards and also kind of noisy still. 10,000. You see what it's becoming? That's the error. What do you think the limiting case is? The straight line with slope minus a half. Right? So the integral minus our guess equals straight line and it goes through the origin as well. So that's what we're going to use for the basis for our analysis. We're going to see, is it actually true? So it seems like the integral from 0 to T of Xs ds is equal to