 Ok, so for today our goal is going to be continuing on our discussions from last week and we were talking about Brownian motions and looking at certain properties of Brownian motion. Ok, can we settle down please? Great, thank you. So as I mentioned we're going to continue on with our discussions of Brownian motions from last week. I'm going to review some of the main points that we talked about and then we'll go on to some connections with finance a little bit and then I'm going to come back to this idea of how to define a stochastic integral and at the end of last class we talked about a particular stochastic integral and we looked at an experiment, a computer experiment to show that that stochastic integral isn't what you would expect it to be from standard calculus. So by the end of today hopefully we'll be able to talk about this concept which comes from a little lemma which probably shouldn't be called a lemma but that's the name that's been used called Itto's lemma and we'll see where that comes from by the end of the class. Ok, so quick review, Brownian motion. So what is the Brownian motion? Let's recall, I'm going to actually start using the symbol w for the Brownian motion because it's a little more standard than x. We started using x simply because historically we've always used little x as the Bernoulli random variables that built the tree and when we took the limit then we saw some properties of that process and so we kept x as the symbol for the continuous time limit but now we're starting directly in continuous time so I can throw away that notation and start afresh. So Brownian motions are also known as vener processes and that's where the notation wt is often used. Sometimes you'll see it written as b sub t. By the way, anyone happen to know where Brownian motion was first observed in terms of an observation described? Was it in the physics context, the finance context? I don't know, biology context, yeah? Particle physics. It's actually well before particle physics, exactly. It's actually a botanist, somebody who was studying plants who noticed Brownian motion. What they were doing, they were looking at pollen suspended in water and they noticed that the pollen was moving around erratically even though the liquid was not moving at all. So although Brown is the name of the individual who noticed this that's why it's called Brownian motion and he did not have an explanation for it and Einstein later came up with an explanation for it as bombardment of particles and also from light hitting the pollen, knocking it around. So that's a little bit of historical reference there and vener is the mathematician, was also a physicist as well, who formalized the theory. So Brownian motions were going to denote as a stochastic process by w sub t and there are a few main properties as we discussed last time. So they start from zero. Of course you can always start it from a point that isn't zero but that's the standardized version of the Brownian motion. It starts from zero. It has a distribution at every point in time t to be normal with mean zero and variance t. The increments are stationary and independent. Now remember intuitively what the stationarity means is if you look at the increments of the process at some point in the future you can slide that window back to time zero and it has the same property. So the distributional properties depend only on the size of the window. That's what the stationarity part means. So I'll put a little more detail here. So for example, w t plus s minus w t equals in distribution w s. So it only depends on the size of the window and not on the location of where you start observing the process. And the independence property, so this is our stationary property and the independence property says that if you look at the increment of the Brownian motion over two distinct set of times, so distinct meaning that the interval, so I'll say this is independent of that whenever, so the t's are all ordered, but in particular you have this kind of inequality. So the interval t3 to t4 does not overlap with the interval t1 to t2. So if there's no overlap then you have this independence between the increments of the Brownian motion over those two intervals. This is one very important property. And then there was a final one. Does anyone remember what that final property happened to be? Continuous paths. Pads are continuous or the process has continuous paths, however you'd like to state it. So these are the sort of workhorses of Brownian motion. If you have these pieces of evidence then or this piece of knowledge about the process, you can do pretty much anything with it. And we went through a couple of calculations, a couple of examples last time on computing moments of Brownian motion, the moment of the squared and so on. And I wanted to, before going on to doing some more moments, one thing that I didn't do last time is sort of show you what many sample paths look like. I kind of sketched it and maybe I'll sketch it again and then I'll implement it and show you because it's sort of an important feature. So if you just generate a sample path from the Brownian motion, we all kind of know what it looks like. It's some sort of process that looks like that. And if you generate in many of these sample paths, you'll see eventually because of this property, because of the property that the Brownian motion is normally distributed with mean zero and variance t, if you generate many of them, you'll find that most of the paths lie in this sort of, in this envelope. And this envelope here is a curve that is minus square root t and plus square root t. And you can of course have a couple of other standard deviations, this would be the one standard deviation, so you'd have about what, 67% of the paths that would lie within this cone. And if you did plus or minus two square root t, then you'd find that about 90% of the path and 95% of the paths are inside of that cone. And if you did plus or minus three, you'd find whatever it is, 99 point something, percent of the paths are in that cone. So I'll show you what that looks like numerically. But before going there, I still want to remind you of other main features. The other feature that we observed is that if you just take a single path and compute the total variation of a Brownian motion, do you remember what the result was? So if you basically kind of compute the total length of the path. So the idea was that the Brownian motion is wiggling around far too quickly and those increments add up, sorry, those increments don't actually converge when you add them up. It becomes infinite. And there was another important property that we demonstrated. So the total variation along any path is infinite. And, sorry, and as well we looked at this idea of the quadratic variation of the Brownian motion. And we demonstrated that this was what? It was t, almost surely. Now what we actually proved in class is convergence in L2, but you can show that for those who want, I can show you how there's a subsequence in which it does also converge almost surely. And so we showed these results along an individual path and it didn't matter which path you took, but if you take a sample over many, many, many paths, then we also said if you look at, say, the distribution at one point in time, at one fixed t, then because of this property number two, the variance of those paths, the variance of those end points of those paths is t as well. But those t's, the variance of the Brownian motion at a particular, the variance of the Brownian motion at a point in time being equal to t is quite of a different nature than the quadratic variation being equal to t. Remember, the quadratic variation is going to be true for any path of the Brownian motion, while the variance, in order for you to look at the variance, you actually have to simulate many paths and look at the end point. So these are quite different objects, although the result is the same thing. Okay, so I just wanted to do a quick little computer experiment to try to ground this area that we're doing to demonstrate this envelope that I mentioned there. Okay, so what I'm going to do is create, again, another piece of code that's going to generate many sample paths. So we'll have a bunch of little random variables. And let's say, let's do this over one year. Let's say the number of steps that we take is, I don't know, 500. And the number of paths that we'll generate, let's take 1,000 for now. So I'm going to create some random number that's going to be, number of rows is going to be the number of simulations I've got. The columns are going to be the time steps. And in fact, I only need NdT of them. And then we'll store that in this little variable called w. Okay, and I'm going to do this not in the most efficient way, but in a way that is easy to understand. So we'll step over through time. i equals 1 is the first step, i equals 2 is the second step, i equals 3 is the third step, et cetera. So I'll make the Brownian motion for all scenarios at the current actually from time step 2. Because I should say these are indices starting from 1, so t equals 0 is actually i equals 1. t equals delta t is i equals 2. So I'm going to start from i equals 2. So the Brownian motion is equal to what it was before, plus we have to increment, we have to basically take a normal random variable and you know what, to make it really simple, let me remove it from there and I'll just put it in here. We'll take a normal random sample and we'll have N simulations of those things. And this is supposed to be scaled by square root of dt. So this here, what I've highlighted, is the increment of the Brownian motion over time step from time step i minus 1 to time step i. And we know that the increment of the Brownian motion is normal with mean 0 and variance delta t. So that what I've highlighted generates exactly that, it generates the normal with mean 0 and variance delta t. The function random N generates standard normal. So this is like the Z that we usually often use for standard normal. Okay, so this will generate me many, many sample paths and then at the end of it I'll simply plot these things. And I think I need to put a dt here. T divided by N dt. Okay, so we'll plot them with time on one axis and then the Brownian motion there and let's hopefully that runs, simulate the Brownian path. Okay, good, there we go. So that's a thousand sample paths and you can see visually this idea that I was telling you about, this little envelope that paths are contained within. To make that envelope even more obvious, I can just put in a little formula in here. In fact, why don't I say, call that vector t. We can put in something which is going to give me a line that's square root of t plus and minus and these will be our, see standard, make an error. I don't know, it's already run. Okay, I don't know if the black line doesn't show up that well there, does it? Maybe red will work. That's worse. Okay, I know one potential way to do this a little bit better. Okay, I think you can see it more strongly there and then if I did plus or minus two standard deviations then we'll get most of the paths in that interval. There we go. So you can see that's about 90% of the paths that are in there. There's about 10% that come outside. Okay, and like I said, if you look at this variance of the terminal actually let's take the variance of time t equals 0.8 for the sample paths. So we know if I take the Brownian motion, all scenarios at 0.8, I've taken how many steps here? 500. So 0.8 would be at step number 400, right? So it should really be 401. So this will give me all of the results that occurred at the time slice t equals 0.8 and if I compute the variance of this, what should I receive? What should I find as my result? It should be about 0.8, right? And simulation error. It's about 0.76. It's not too bad. If I ran it again, that's a thousand scenarios. You can actually compute what your error should be. You're all statisticians or you've done a lot of stats. You've got a thousand scenarios from something that's normally distributed. It's actually been summed up from a bunch of individual normals and you should be able to compute what the standard error is and whether this is in that confidence interval. Do you reject whether or not this variance actually is 0.8? It's a test that you can all do. I'm not going to ask you to do it here, but it's something that you should all have the technology to do. Let's go at a half. Let's check at a half. So we have, again, 200, 500 steps. So that's 251. And that's about a half. That one's even closer. Let's take 0.9. Why not 0.9 times 500 plus 1? It's 0.88. So it's pretty close. So we can see the scenarios are, in fact, giving us reasonably close results to what the series should be. The more sample paths we produce, the more accurate we will get the answer. So let's say 100,000 paths. Now, I think MATLAB should be able to handle plotting 100,000. It's a plotting that actually takes the longest time. The scenario generation was very quick. Okay. Caught unexpected exception. Yeah, so it's got too many paths. It can't do it. Can't plot it. And I've crashed my MATLAB. Okay. Oh, it had it. Did you see that? It flashed on the screen. It disappeared. Let's do 10,000. It should have no problem with 10,000. Okay. There we go. Okay, so now, of course, it's much more densely packed, as we can see. And let's, I don't know, let's just double check a couple of these results. So say at T equals 0.9 again. Now we're, well, 0.883. Still, I'm surprised. That it didn't, it didn't improve a huge amount. Well, okay. How should the error go, by the way? What should it scale by as I increase paths? If I multiply, if I wanted to get a factor of 10 times more accurate, my confidence interval should shrink by 10. How many more samples would I actually have to generate? A hundred times more. All right. Confidence interval squared, scaled by square roots. Because variance is independent. They are linear. So in order for me to reduce my con, so here I've increased the con, I've increased the path by 10 times. So I'm only going to get an increase in accuracy by square root of 10. So that's why it's not a huge change. In any case, I think this, this, this is enough to serve the purpose of illustration. Any questions about this little sample generation? Nope. Okay. Good. So let's go back to the, to the board here. What, one of the things, one of the other things I wanted to do is to talk about how to connect this Brownian motion back to some of the financial models that we were looking at before. We did discuss one, one connection. And that was, we said, if we looked at the CRR model, and we took its continuous time limit, we know we get this log normal distribution for the, for the terminal asset price. And in fact, the increments were independent. They were stationary. They're normally distributed. If we look at the log of the stock price. So in fact, the thing that drives the asset price dynamics in the continuous time limit of the CRR model is a Brownian motion. We sort of saw that development in the last class. So what I'd like to do is sit here and say, suppose we simply just started off, and I told you, let's model directly in continuous time. Forget about starting in discreet and going to some sort of limit. I could simply write down a model to say, let the asset price be equal to this. e to the mu minus a half sigma squared t plus sigma wt. And w is a Brownian motion. Now, we all know from earlier discussions when we talk about processes and probabilities and so on, we have to be careful about what probability space we're using, what probability measure in particular. So when we're thinking about asset dynamics, I should be a little bit special and say that this is a p Brownian motion. So it's a Brownian motion when we use real world probabilities. When we discussed Brownian motion last class, I said forget about the measure because we're working only under one measure, one probability measure, but when we do asset pricing and you want to look at derivative valuation, there are always two, or at least two measures around. The real world one and at least one risk neutral. So here, when I write down this model, I should specify that this is in fact a p Brownian motion and my asset price can be written in this form. And you can ask yourself, well, does this correspond to the CRR model that we had before? And what's your answer? Does it? How would you answer the question even? Well, you'd look up at the exponential. You know in the CRR model, the distributional property, that's what you mainly had. You had this fact. You said that ST equals, let's say, let's put a little note here, from CRR as an increased up to infinity, we had S at any particular point in time. We usually used capital T because we were always interested in the maturity of an option, but here we can pick any fixed point in time, little t. We showed that that was equal in distribution, this expression here, and we can then ask, well, does the model above, does this model have the same property as that model? Okay? Oh yeah, there's a zero everywhere here. Yeah, thank you. There's our initial asset price. Okay. Yeah. It's not just normalized by one. So we can ask, well, do those things have the same property? It's impossible to go from the second line to the first one because the second line, the one where the arrow is there, distribution. You cannot go from just a distribution to an entire stochastic process unless you make additional assumptions. So what we can at least do is see whether the Brownian motion driving asset price in the box actually has the same distribution as the CRR limit. And hopefully you can all answer that in a split second, right? You look at it and you say, well, in order for me to check, all I really need to do is check this part and this part because that's just an exponential transformation. So if I check the exponent of my exponentials, I'll match. And then I also realize, well, clearly these two things are the same. So there's nothing to check there. Therefore, all I need to do is check to see whether those two have the same distribution. And by definition, W is a Brownian motion. So we know it is normal with mean zero and variance T. And what I've underlined in the CRR model is, again, normal with mean zero and variance sigma squared T. Since I've multiplied by sigma in the original model, in the continuous time model, I can see that those things match. The variance of sigma W is, or the distribution of sigma W is, this is normal, mean zero, variance sigma squared T. And so is the expression that we got from the CRR. So these are definitely the same thing and we can start directly in continuous time. Now, there was a question on your term test on the very last question where I asked you to show in the CRR model when you took a limit that the joint distribution of the asset viewed at two points in time was something, right? Do you remember that question? Not a whole lot of you attempted it. I'd say about half of you attempted it and less than that achieved the goal. But you can actually, we can ask that question again now in the continuous time setting and you'll probably all be able to get the answer much more easily. So let me pose a question here and say what is the joint distribution of S at T1 and S at T2? Because we know what the distribution is at a single point in time. What is it at two points in time? So one way to answer this question is to simply write out S at T1 and S at T2 in terms of that Brownian motion and then we'll see the question boils down to a question about the features of the Brownian motion itself. So S at T1 is simply that and S at T2 there's no, I just replaced little T by T1 and little T by T2. Now actually before going further one thing you, I hope you do notice the main difference between the CRR model and what we've written in the box. Notice in the CRR model when we took this limit what we demonstrated were distributional properties of the asset price. What we have in the box is a path-wise property of the asset price. Notice that the equality in the box is not equal in distribution. It's a definition of S in terms of the entire path of W and I hope you know the difference between the two, right? One is telling you how S at a fixed point in time is distributed. That's the CRR going and going to infinity. That's telling us how S at a fixed point in time is distributed. Log normal with those properties there. The other equation in terms of the Brownian motion actually tells me how S evolves and gets to that point. The entire evolution. Those are quite different objects. Just be aware of that. That's why in this equation, these two equations here for T1 and T2 I'm allowed to simply write equality there. I don't have to put equal in distribution. They are simply equal. It's the same Brownian motion that is showing up here. It's just one single Brownian motion because we look at the entire path. Again, if I sketch that out for you and say here's T1, here's T2, there's some sort of sample path of the Brownian motion and I'm asking about how is that point jointly distributed? That's my Brownian motion that's driving everything. I exponentiate it. First of all, I multiply it by sigma. I add mu minus a half sigma squared times T. I exponentiate it. I multiply it by S0. That gives me my asset price path. That's what the formula at the top of the screen is telling me. I'm asking about the joint behavior of those two things and I think right away you realize that to describe the joint behavior of this asset at these two points in time all I need is to know how does how do these two things up in the exponential behave? What is the joint behavior of those two expressions up in the exponential? In other words, I already know that it's log something. S of T1, S of T2 are jointly log something. We just need to ask, what is a something? If we look at the two exponentials we realize that certainly they have to be normal because each term in that exponential is just normal random variable in terms of distribution. I'm asking about a distribution property so I can actually talk about distribution and not path-wise behavior anymore. I know that this is normal and the other thing is normal and in fact they're jointly normal because they're driven by the same underlying Brownian motion. This process or this asset price if I take the log relative to S0 I already know this is going to be normal with some mean and some covariance structure here. I already know that for free basically. I've done no work. All I do is I simply observe the fact that the exponential is the Brownian motion plus some deterministic thing so therefore the exponentials have to be jointly normal. Then the only remaining task is to figure out the means and the variances. The means are trivial here. The mean of the log of that first term there is just mu minus sigma squared times T1 and the mean for the second process is mu minus the half sigma squared times T2 that's pretty straight forward. To figure out the covariance structure I simply have to compute the covariances all pair-wise covariances. What's the sigma 11? That's the variance of sigma WT1. Capital sigma 11 is identical to the variance of sigma WT1 because we just have plus a constant as far as the variance is concerned up in the exponential. That's sigma squared T1 by definition of Brownian motion. Sigma 22 is similarly that's just the variance of little sigma WT2 which again by definition of Brownian motion it's that. The only real difficult part and it's not actually very difficult at all is the covariance terms and of course covariances are symmetric so I just need to compute one of them, they're equal and that's the covariance of sigma WT1 and sigma WT2. We can do a quick little calculation that sigma squared expected value WT1 WT2 minus the product of the individual expectation but actually if you remember the result from last class we could have gone straight to the answer but it's worthwhile deriving again. These are zero because they're Brownian motion and how do I compute this expectation? We have the Brownian motion at two different points in time. T2 is bigger than T1, right? I didn't actually specify that here but I should have T2 is bigger than T1 kind of the obvious choice. How do you compute that expectation? Yeah. Exactly, use the incremental property, right? Use the independence of increments. So this is WT1 minus the increment WT1 T2 minus WT1 and that equals well the product of the first term WT1 squared expected value, that's T1 isn't it? And then we have the expected value of WT1 times the increment from T1 to T2 those are independent we can therefore write that as the product of the expectation and that's zero. So I'll just write in that extra step for you because now remember we're only allowed to say that this is the expected value of the product because those two terms are independent. I'll mention again I've seen many times in exam situations students making the mistake of saying the expected value the product of two things is always equal to the product of their expectation that's not true it's only true if the two objects are independent of one another and this is the case so we can say independence because of independence so our final result is quite easy sigma 1, 2 is sigma squared T1 so now we've got everything we actually have it all I'll just summarize it again here so we have log ST1 over S0 log ST2 over S0 this is normal mu minus a half sigma squared T1 mu minus a half sigma squared T2 and even factor out I'll leave it like this sigma squared T1 sigma squared T1 sigma squared T2 so if you were to compute the correlation what would the correlation be? it would be the covariance which we just computed divided by the square root of the product of the variance which is sigma squared T1 times sigma squared T2 square root and that was basically the result that you had to show on the term test except I asked it under the risk neutral measure so instead of mu minus a half sigma squared we had r minus a half sigma squared but here we can do the calculation and the derivation directly in continuous time in the term test you had to go back to the discrete time and take limits because at that point that was the technology you had now you have a little bit more any questions about this little derivation here? okay I want to go back to this calculation that we just did of the expected value of WT1 times WT2 so we just did this calculation by using the independence of increments idea does anyone have another idea of how you could efficiently compute that expectation? there's another property there's another property not just a property of Brownian motion the property of expectation then you can use a property of Brownian motion conditioning iterated conditional expectation I could in principle also write this as the expected value of the expected value WT1 WT2 conditional on knowing what WT1 is I'm always allowed to do that I'm always allowed to do an iterated expectation where I give myself more information in the inner expectation and that's the whole idea we've used this kind of trick before in the context of finance with these forward starting options you might remember those examples when we had the asset prices so here we can also use it for Brownian motions as well and the advantage here becomes quite clear quite quickly first of all the WT1 well, you're conditioning on it so in fact in that inner expectation it's not even random it can come out of that expectation so you have this iterated expectation and the problem then reduces to well what is that now you could use the independence of increments if you like for the Brownian motion but you don't even have to do that you can just say originally it's stationary right so we know that WT2 equals the distribution if I know where WT1 is if I know where I am at WT1 from there onwards there are many possible outcomes right and that would be the W at time T2 what would be the expected value of WT2 it's where it started right this expected value is in fact just WT1 Brownian is a stationarity idea that if I start if I start the process at its current point and I only look at its future it looks exactly the same as the Brownian motion it just happens to start at its current location that's where we can just use stationarity so that in fact is WT1 and so you see you end up with the expected value squared which is T1 so you get the same answer as you had better otherwise we made a mistake so I just wanted to point out that you can also do the calculation that way and sometimes iterated expectations are easier than looking at the increments and it really will depend on a case by case basis and you'll always get the right answer using one method or another method there's no way that you would make a mistake okay to give a little bit more exercise in Brownian motion let's do a couple of other calculations how would you do the variance of WT WT plus S so these are just computations it's useful to do once you see them once you can do a lot of these types of things what would be the approach as always with variances as an expectation of the objects that you need so you know that this is going to be the expected value of this product squared minus the expected value all squared we've just computed this we've actually just done it T1 is equal to little t T2 is equal to t plus S S and t are positive here so what's the answer to this one that's kind of the nice thing if you look at the expected value of a product of Brownian motions of two different points in time it's just equal to the shorter point in time always so that's T so the hard part is computing this expectation but we can use exactly the same tricks that we've got already in our toolbox so let's do that as a side calculation and I might as well just use the increment version of things and then you can try using iterated expectations as your own little exercise so I can write it as WT plus WT plus S minus WT and this whole thing is squared hope I have enough brackets there and then let's multiply the WT well we have a WT squared because of the square the term out front and then we have the square of the sum of those two terms which is W squared plus twice W T times the increment plus the square of that increment so all I've done there is simply expanded so first you might have looked at it and thought oh jeez it looks like, looks kind of scary but actually it's not too bad for the details so what's the first term? you have W squared times W squared what's that expected value? remember? yeah good memory, it's 3T squared expected value of WT squared what about the middle term there? we have expected value of WT cubed times the increment of W it's 0 why is it 0? because times the increment of WT times the increment sorry from T forward so those are independent so you can write as a product of their expectation the expected value of the increment is 0, you're done in fact the expected value of the cube of the Brownian motion is also 0 both terms are 0 so let me just write that extra term in here and a few lines just for the purpose of clarity and when you look back at the notes so it's this and that particular term there this is the same as expected value WT cubed times the expected value of WT plus S minus WT and each of these are 0 it so happens it wouldn't always be the case that you have both terms being 0 but we do happen to have that in this case okay and what about the last term? that's right it's S times T so once again we have this independence property that we're going to use so let's be put in a little more detail again so we've got independence of this is independent of that right those are independent any function of W this independence of increment by the way doesn't apply just to W the increment of W alone at two other points of time any function of the increment of W from T2 to T1 and any other function of the increment of W from T3 to T4 will have to also be independent that they inherit those properties once you have independence of the underlying randomness any function of that underlying randomness will also be independent so that's what I'm using here that's sort of a basic property of probability okay and this is T and this is S so it's kind of interesting that before we found the expected value doesn't depend on S but the variance does the expected value of that product doesn't depend but the variance does depend on S and our final result 3T squared plus T times S okay any questions about this? so before we go for our first break I want to talk a little bit again coming back to finance motivation I want to talk about correlated Brownian motion so when you think about what is the motivation to discuss correlated Brownian motion if you think about the financial context if you look at two assets two assets IBM and Amazon completely different industry sectors that could perhaps be correlated in some way perhaps more IBM and Microsoft were probably more strongly correlated and those asset dynamics the returns are going to be correlated in some way and so we want to translate that sort of concept of correlation of the asset prices back in terms of the underlying fundamental uncertainty which is the Brownian motion so the way that you can correlate Brownian motions the easiest construction is to say excuse me two independent Brownian motions so what do I mean by two independent Brownian motions so this and this okay so independent Brownian motions are Brownian motions that have zero that have no relationship to one another one can go up or down independently of the other just obvious simple understanding of what you mean by independence in terms of distributional properties we can think of these as in the same way jointly we can define them in exactly the same way in the same procedure as what we had before so those are zero, those are zero we can say that WT WT perp because we need to describe the joint dynamics this is normal instead of just being normal zero with variance T it's normal zero zero zero because it's a bivariate object has variance T but covariances are zero so for normal random variables all we have to define are the covariances so we'd want this to be our basic property or basic distributional property and as well the little sign up here this is a perp sign perpendicular so orthogonal, independent but you still want to have these other properties that you have the increments are independent and stationary WT and WT perp and in particular we also have to describe something about the joint increments right so I'm going to draw two timelines for you here and let's say we look at some sequence of intervals so this is time this is both time and I'm thinking of this as one Brownian motion and this as the timeline for the other Brownian of course running on the same time but I've just separated because I want to talk about this kind of correlation structure that we see so if we just focus on one Brownian motion and I ask about the correlation I ask about the dependence between this interval and that interval what is it their independence of the correlation would be zero so similarly here the correlation between these two are also going to be zero just by definition basically that's what we mean by independence of increments but what we also mean by the fact that W is bivariate normal with zero covariance is that if we look at the correlation between these assets this is also zero even if the time interval is the same so the increment of W over time T1 to T2 is also independent of the increment of W perp from T1 to T2 so you want to have independence across there as well because W perp has independence of increments so if I take T1 to T2 and then T3 to T4 certainly for W perp those increments are independent but also T1 to T2 increments of W is independent of the increment T3 to T4 from W perp you also have these sort of cross diagonals that are also zero correlations so you have zero everywhere here in this little diagram so this is how we build up our Brownian motion our bivariate Brownian motions that are independent and paths again are continuous now you might be asking how can I create this from T3. Remember when we started the Brownian motion we had a nice little model for how you could take a model where we took steps of square root delta T and we put probabilities one half on these things and then we saw in a limit as delta T went down to zero this type of stepping ended up producing our Brownian motion do you remember that was what we did in our last lecture so you could pose the same problem what would it be the fundamental tree that produces this bivariate independent Brownian motion anyone have an idea what you might do I'll start drawing it out because it's not entirely obvious what you need is you need two dimensions you can't just use one dimension there's always time horizontally you can't just use one space dimension you need two to keep track of both processes so the idea is you start off at a point here and this is the point 00 for WT and W PERP and after one time step it's going to be a little bit difficult to draw this so bear with me I'll erase several times I suggest you don't copy until I've got it right and perhaps even then don't bother just get it from the web so then from there we're going to draw four outcomes you go up by square root delta t in both assets you go down by square root delta t in both processes you go up in square root delta t in one process and down in the other those are the four outcomes that you've got from there to there from there to actually let's put it I'm going to extend that line a little bit more to there and from there to there so these are the four that's a plane that I've drawn there those are the four points that you move to and if I want to put some little labels on here I'll do it like this square root delta t square root delta t actually I think in usual notation that would be minus right because as you go into the board you go negative in that direction here you'd have square root delta t square root delta t both processes went up here you'd have square root square root delta t and then actually that's minus square root delta t square root delta t that asset went down and over here I would be minus square root delta t minus square root delta t and these branches all occur with probability what? they want to be independent so they should all have equal likelihood of occurring exactly all with probability one quarter one quarter they're all one quarter now where it gets interesting is what happens in the next step that's where the mess starts to happen bear with me for a second if you think about it in each direction you should have a recombining tree if I take a slice in one direction if I take a slice in this vertical direction we should have a tree that looks just like our good old binomial tree going up and down by square root delta t so I need this to occur both in the direction of process one and in the direction of process two so what happens at the next time slices you actually have nine points so you have a point and let's start drawing it here one one two three one two three one two three and let's focus on what happens with this particular node here so that guy is going to go there there and there so you can see at each point you emanate four points you go up or down in all possible pairs the next node let's say this one is going to go here here here here and there okay then green I guess is my next choice of color here this is going to connect there there and it's already looking it's already kind of hard to look at isn't it should I bother drawing the last one it's okay like this and you know where the last one connects right maybe I'll just circle them in yellow that point connects to this one this one and this one okay those four so you're able to span all possible combinations and you can imagine hopefully you can convince yourself that if you restrict yourself to any sort of plane you see a little recombining tree in there if you restrict yourself to any plane there's a recombining tree and I have a piece of code which demonstrated this for me and I was looking for it earlier oh plot grid it might be this no it's not that one okay if I find it for you I'll post it I'll post a little movie so that you can see what I have is a piece of code which generates this for several steps and you can look at it as it moves around and see the shape there's always a recombining tree in every slice okay so this is our underline this is our underline model that's going to span and then when you take the limit of delta t goes to zero it's going to produce two processes or a pair of processes that have exactly this property these properties here and we'll use these uncorrelated Brownian motions to build correlated Brownian motions which will then be used to drive asset prices that are correlated okay that's what our goal will be questions okay let's take a little break then okay so we'll continue on here let me remind you once again we're talking about this idea of correlated Brownian motions and in order for us to in order for us to generate correlated Brownian motions which are going to drive asset prices that are correlated we need to first talk about how to generate or how to define independent Brownian motions and this was a construction that we went through here's a tree that allows us to create a process that has those properties and now I'd like to tell you how you can use those processes to build correlated Brownian motions so suppose I gave you to answer the question how do you get the correlated Brownian motions suppose I gave you first of all two independent normal random variables two normal random variables that are standard so independent normal standard standard normal let's say how would I create two random variables x and y that are jointly normal still mean zero variances are one because we standardize it but they have a correlation of row how would I do that any takers so this is nothing to do with Brownian motions it's purely just two random variables I want to make them correlated and in fact not just make them correlated but I want to make them jointly normal yeah exactly so if I took let's solve it sort of generally suppose I took x to be z so then x is standard normal by definition and y to be a linear combination of z and z perp now since x and y are both built out of these normal random variables they're also going to be jointly normal and the only issue is what's their mean and what's their variance and what's the covariance that will fully specify the distribution so we already know the mean of x the mean we know that x is normal zero one what about y well the mean of y is also zero right because it's a linear combination of standard normals the variance of y on the other hand is what? a squared plus b squared because you've got the variance of a squared times the variance of z plus b squared times the variance of z perp plus 2ab the covariance but the covariance is zero because z and z perp are independent so you've got that well we want that to be one don't we according to the question that I've posed we want the variance of y to be one so we've got one constraint and our second constraint comes from making the covariance of x and y be equal to rho so what is the covariance here it's just a in fact isn't it just to be really pedantic about it let's say that z that's az plus b z perp which is the covariance of z with z times a because the covariance is linear in each argument plus b times the covariance of z and z perp and that's just a and we want that to be equal to rho so actually we already know what a has to be a has to be rho therefore b has to be the square root of one minus rho squared why don't I put both together this achieves the goal so we now know how to take independent standard normal random variables and create correlated and jointly normally distributed standardized normal random variables that's our simple way to do it it's not the only way you could actually also make x a linear combination of z and z perp as well and there's some extra degree of freedom in that and there's some advantage to doing it ok well with that little bit of a back story how do you think we can create correlated Brownian motions let's call them correlated Brownian motions x and y out of the uncorrelated ones it seems natural to just define them in exactly the same way that we define the linear combination for the for the random variables so we can simply say x is one of those Brownian motions and y is another linear combination and we'll choose exactly the same linear combination because we know that the distributional properties are going to be preserved in this matter if we make this definition we can see that x t is certainly normal actually we can look at the joint behavior we know that x t, y t jointly it's normal with mean zero and variance is t and t and the covariance calculation so we know it's just rho times the variance of w so that's rho times t if that's a little bit quick for you just go through the step and double check that that is in fact the case so x t, y t are jointly normal with this covariance structure we also with that definition we also can easily check that x t and y t are stationary and independent now they're stationary and x t and y t individually are stationary and independent but bivariately they are not independent so let me be careful about that statement so I'm going to write this in two separate statements this is stationary or has stationary and independent increments so this property is just inherited by the fact that w is and y t also has stationary and independent increments which again is inherited by the property of both w and w perp w and w perp are stationary and independent so their linear combination must also be stationary and independent there's no way that you can introduce a non-stationarity by taking that linear combination but x and y are not independent as processes so I'm going to draw the same correlation diagram that I had before when we took two times so these are times here and this will be for the increment of the x process the increment of the y process and we've got two intervals here so we look at the increments of x in this interval interval and we do have the property that this correlation is 0 and that correlation is 0 but we also have still that this correlation is 0 and this correlation is 0 because these are two non-overlapping intervals in time you can actually prove that result you can use the definition of w and just prove the increment of x over say over this interval is independent of the increment of y over that interval do you think you can do that proof on your own? no? come on you should be able to do it x and y are linear combinations of w you know the properties that w has of w and w perp I'm going to have an independence of increments for x's and y's as well but I'll leave it open for you to do try it on your own what about this correlation the increment over that interval so the correlation will be rho and the correlation of these two will also be rho and how would I check that I simply go back to this definition of x and y in terms of the w's look at the increment of w oh sorry x look at the increment of y compute their covariance their variances are each t the variance of x is t the variance of y is t straightforward the covariance in fact we have in front of you it's rho times t when you look at any interval any fixed t and so their correlation will be rho rho times t by the square root of t times t so that's how we get our correlation so this correlation structure is a little bit different than what we had for the underlying Brownian motion here we have zeros everywhere the uncorrelated Brownian motion but we can take those and create this type of correlation structure and this is exactly what we want for a financial model as an underlying dynamic because if you think about it because what will the increment of this process over that time and this time represent it'll somehow represent the returns of one asset and the returns of another asset and if you look at the returns over the same time span those assets may be correlated but if you look at returns over a different time span they should be uncorrelated this is the efficient market hypothesis basically the fact that you cannot predict future returns based on past returns okay so that's why this type of structure where you have zero correlations once you look at increments over distinct points in time over non-overlapping points in time those correlations have to be zero but correlations within the same time may not be zero okay that's a big important point and that's what you need to take away from that picture so correlations over distinct points in time non-overlapping are always zero no matter what even if the underlying Brownian motions are correlated but correlations over the same interval in time they can be non-zero and this is an example of how that happens okay so let's do a couple of little calculations on these basic processes alright so suppose I were to compute the correlation of the increment of so x and y are correlated Brownian motions with correlation row that's what this means okay these are correlated Brownian motions and you call and row is the correlation of course there's not much to say about that it just is and I wanted to look at the covariance between x at time t plus s minus x t or the covariance between t plus x minus x t and y t plus u minus y t so this is an overlapping point in time and I want to have the following ordering I want t to be smaller than t plus s to be smaller t plus u so if we look at this on timeline here's some time t here's t plus s and there's t plus u and I'm asking about the covariance of the increment of x over this interval and the increment of y over the larger interval okay so these are not independent pieces right there can't be independent because times are overlapping how would you approach that right always the same idea look at breaking it up in terms of the increments this is one way and then the other way is through iterated expectations writing out the expectation form so I could always think of the increment of y from t to t plus s and then from t plus s to t plus u then I can use independence of the second part of that increment right I know that the increment of x over here is certainly independent of the increment of y over there okay those are independent and from that diagram above I also know that the correlation over here is just rho it's exactly the same time frame so the correlation is rho so what's the covariance then without even doing any mathematics we can tell the answer right in a simple diagram and then we can work it out with the math but what's the answer for the correlation for the covariance sorry it'll be rho times s the size of that window is s the correlation is rho that's rho times s so let's work it out explicitly so let's just call this thing c so c equals covariance of x plus s minus x t y plus t plus u minus y t plus s minus y t plus s minus y t right I've done nothing I've simply added and subtracted um sorry plus I've added and subtracted y t plus s and then this is the covariance of the two terms y t plus s minus y t actually t plus u sorry minus y t plus s if I keep the order in the same plus covariance of x t plus s minus x t y t plus s minus y t independence of increments tell me this is 0 okay those are non-overlapping intervals stationarity of increments tells me that this is the same thing as the covariance of x at time s and y at time s stationary and then the definition basically of the property of this covariance is that it's rho times s alternatively actually let's write it back in terms of the underlying Brownian motion that would be w s semicolon rho w s plus 1 minus rho squared square root w s perp right? in terms of the underlying Brownian motions that are uncorrelated this is far more detailed than you will need later on for now you might need it but later on you'll realize this is just a waste of time going through all these steps that's it so that's our result rho times s questions about the calculation okay so let's do something a little harder then let's look at the variance of x at time t and y at time t it's a little bit harder not a whole lot so how would I do this computation right in terms of the definition this is the expected value of the square of that thing minus the square of the expected square of the expected value okay so we need to do a couple of side calculations we need to compute the expected value of x times y and what's that equal to not quite, almost it's rho times t but the way that you can do it is always you could always fall back if you still have trouble which it will take a little while for you to immediately see the answers but you could always go back and write x and y in terms of the uncorrelated Brownian motion then things are always easy it's longer because you have more arithmetic to do but it's always straightforward because you understand the uncorrelated Brownian motion case well so you take x is w and y is rho w plus 1 minus rho squared w t perp and that's equal to rho the expected value of w t squared plus 1 minus rho squared square root w t w t perp and these are independent, right? w and w perp are independent by definition therefore I can write this as a product of the expected values and only because they are independent and then individually these are both zero in fact okay? and so my final result for that is rho times t the expected value of w squared is t okay what about the expected value of x t squared y t squared again you can just go back into the basic definition in terms of the w's so it's w squared rho w t 1 minus rho squared square root w t perp all squared and you have three terms now so that's the expected value of so if we square the first term there the road we're looking at that thing let's square it and at the same time multiply it by w squared we'll get rho squared expected value of w to the fourth plus 2 rho 1 minus rho squared expected value of w squared w t perp times w t okay so I'll simplify that w t cubed that's from the cross term right so we have twice rho w times squared 1 minus rho squared w t perp and we're multiplying that whole thing by w squared so we get w cubed w t perp and then the last term there is expected value or 1 minus rho squared expected value w squared w t perp squared so now what can we do what's the first term 3 t squared right remember that fact or derive it what's the middle term it's 0 because again w cube is independent of w perp so that equals the product of the expectation of w cubed and w perp and both of those happen to be 0 so in fact it's just 0 the middle term and then the last term I can use independence once again because w squared is independent of w perp squared so that's the expected value of w squared times the expected value of w perp squared and the expected value of w squared is t the expected value of w perp squared is also t so I get a factor of t squared so I left out a few of the details now when I'm just saying them in words but I think by now it's kind of becoming old okay so you can add those together and of course you get 1 plus 2 rho squared t squared okay and then you go back here to the actual calculation so the variance is that term minus that squared so what's our answer final answer for the variance it's 1 plus rho squared t squared okay there's the 2 rho squared t squared gets killed by by this term one of them gets killed by that term because our answer was rho times t questions? alright great so now's the time for your quiz you probably saw that coming right okay so the last thing that I'd like to cover today is that it's a integral that we talked about before and before actually going through that detail I'm going to suggest something for you to work on we talked about quadratic variations before there's something called a co-variation and you can imagine what that is so basically instead of taking the square of a process and summing up the increments of that process sorry instead of taking the sum of the squares of the increments you take the increments of two different processes and product them together for a partition and takes a limiting case of the partition going to zero so what do you expect this result to be what's your guess you know for Brownian motion it's t what? rho times t in fact is what it turns out to be not the absolute value it's a co-variation because you want to know whether or not when one moves in one direction the other moves in the same direction or if it moves in opposite directions that's why you don't put an absolute value there so this is called a co-variation of the process and so you can actually you can demonstrate this result using more or less the same technique that we used to show that the quadratic variation is t almost sure more or less the same technique so try this out it's a good exercise for you to do okay so what I would like to what I wanted to cover in the last part okay sorry there's a question that's more or less so for quadratic variation what we did is we said let's look at the difference between what the answer is and what that sum is for a given partition and then we show that that sum had a zero mean sorry the sum minus the what we expect the answer to be had a zero mean the error had zero mean and the variant was bounded and it goes to zero as the partition went to zero same method except now these are correlated Brownian motions the x and the y process here okay so I want to get back to a question that we kind of left hanging last lecture and that question was we were looking at how do I compute well first of all even how do we define a stochastic integral the integral of x sorry let me not put x here let's put w since we're talking about the standard Brownian motions w d w we wanted to know how do we define this thing and I gave you one definition was we could take the partition of the time zero t take that limit going down to zero summing over all of those intervals and evaluating the integrand which in this case is w at the left hand point multiplied by the increment of the Brownian motion over that interval and it's very important that this evaluation of the integrand which is corresponding to that term there at the left hand point or the left hand side of the interval it's not the right hand side it's not the middle it's not some other point in between those are valid definitions I could put w tk plus sorry I could put the average of tk and tk plus 1 there if I wanted to I can actually do that gives me a different definition and it would have different properties this has a very nice financial interpretation I'll show them at one point in time over a short interval at least few seconds even if it's that at one of the highest scales or even millisecond scales that's probably typically as high as you'll go more usual would be daily weekly and you hold that position constant so you would really have the left hand side of the interval being showing up there and the question that we wanted to answer is how do I compute this limit and what should the answer be from standard calculus results we guessed that the answer would have been one half w squared from standard calculus but we did this nice little computer implementation and oh dear I am not sure which file this is so that's fine we'll just do it ourselves again we did this little implementation which numerically demonstrated that the result should be not one half w squared but rather one half w squared minus a half t so I'm going to remind you of how that worked again because it's an important it's an important one of the really key fundamental examples to work with so we're going to generate some random numbers here to give us our path we'll do this for one year and the number of steps let's take say a thousand so those are going to generate our underlying noise our Brownian motion is going to be equal to the sum of all of that noise summed up and we're going to be multiplied by the square root of dT so that we have this be normal zero square root dT we're going to start at zero so this should give us our path for the Brownian motion sdE w squared oops well it doesn't matter the title of the file is irrelevant so this is our sample path for the Brownian motion itself and we'll like to investigate whether when we integrate so when we compute this sum whether that's going to approach a half w squared if we compute that sum wdw which is what we want to compute here that's this is our left hand point so we go to the left hand point and we're going to be multiplying that by the increment of w so this is w from and that should be the integral of wdw looks a little cryptic I know that's just matlab code for you and oh I need to sum wdw that's for one one outcome run it again that's another outcome run it again that's another outcome run it again so this is clearly not a half of that thing squared first of all if you squared this you'd always get something positive and this clearly has a negative undertones in it so what we could do I could plot side by side w a half w squared just for again purpose of comparison so the blue path will be the integral of wdw the red is half w squared clearly a difference and if we look at the difference between these two paths let's make another plot okay this is our error between the two this is the red path minus the blue path and on a path by path basis this is just one simple one specific sample path and we already see that it has a fairly clear trend it's trending downwards looks like it's going down with a slope of a half and I just keep running a bunch of scenarios here so that you can see no matter which scenario I run I get something that's almost a half and if I increase the number of steps here say a hundred thousand steps that's my that's the difference this is still a one year time frame so I'm just taking finer and finer meshes right so my partition is getting more and more exact I'm finding a line that's more and more approximate in this we know what the answer should probably be the correct answer for this result should probably be one half w squared minus a half t that's what we would like to show so that's our task for this last part of the lecture today and this is going to be a lead in for something called Ito's Lama okay so how do we do this how do we go ahead and show it one obvious approach is take the difference between exactly like what we did for the part for the quadratic variation take the difference between what we have the finite partition and what we believe it to be call that an error and then show that that error actually has a zero mean and in the limit in which in norm of pi goes to zero goes to zero that's the goal so it's exactly the same technique so let's define r pi this is our remainder for a particular partition kind of reminiscent to what you do when you're in first year calculus and you're trying to show that the Riemann sum converges to a half w squared kind of a similar thing but here you have stochastic processes also thrown into the mix so we're subtracting a half w squared minus t so the idea here is to put that minus a half w squared minus t together into the sum itself and since you have a fixed partition what I can do is I can always write a half w t squared minus t as the sum over the increments of a half w t squared minus t if I think of this as just some new process x what I will do is I'm going to replace it by the sum just the sum of its increments those are equal, right? because this gives me the first point minus the last point it's a collapsing sum so that's exactly what I will do here so now I'll put everything underneath the sum minus tk and I've run out of space so I'm going to convince that this last term that I've put under the sum is identical to that term we're convinced of that it's this collapsing sum I'm taking a one minus a zero plus a two minus a one plus a three minus a two and all of these intermediate things will cancel and I'll only be left with the last point minus the first point and the last point is the first term the first point is zero so once you've done that actually that's almost the entire trick there's almost nothing left to do all it is is just some algebra on here so let's collect a bunch of terms together and we'll keep the t by itself so I'm going to write maybe I'll write on a separate line here minus minus tk minus one okay and what I want to convince you of there's a factor of a half out front I'm going to convince you that when you collect all of those terms what you end up with is simply wtk squared I want to convince you of that fact it looks surprising but it is true so let's expand this this term here so that's wtk minus wtk minus one all squared and that's wtk squared minus twice wtk wtk minus one and plus wtk minus one all squared so we simply have to identify those terms we already know this here are those two terms right the t is fine the only question is do the w sum up in the appropriate way well let's look at the wt t squared term wt squared where does it show up that's the only place agree wtk squared that's the only place and whoops and that's where it is there it has a factor of ah sorry it's minus a half I think I might have a sign error here one quick I think I need an overall minus sign okay we'll double check it in a second yeah we need an overall minus sign that's correct okay so that term identifies right we have minus a half wtk squared inside of that sum we do have minus a half wtk squared what about wtk minus one squared so where does it show up there's one term here and that's negative wtk minus one squared and over here you get plus a half wtk minus one squared so when you combine those two you actually get one half minus one half wtk minus one squared so they match up so there's only one term remaining and that is the cross term this together with that and those two together are that term I notice there's an overall minus a half out front when I multiply that by the two I'll get plus one this is exactly what I have there sorry for all the scribbling around there but it's just algebra right there's nothing there's no mystery going on okay so once you've done that now most of your work most of the rest of the work is it's also done I'll fill those in oh good god I'm erasing everything as I go by there we go I restored everything I think so it looks right restored alright so this is our remainder term for any finite partition and I think it should be clear that if I take the expectation of that remainder term at zero I can interchange it with the sum the expected value of the increment of w is the increment in time which gives me zero so the only thing that I really need to do is the only hard work is computing this variance and what's the variance well the variance is one quarter each term in that sum is independent of every other term so it's just one quarter times the variance of each term the constant tk minus tk minus one is just a constant so that doesn't compute it doesn't contribute to the variance term but this does and we simply need to compute this variance and I'm not sure if you remember but the result would be 3 delta tk squared minus delta tk squared so the 3 delta tk squared comes from the expectation of the square of the increment of the fourth moment let me put in another line skipping too much too many steps and we've shown before that this is just 3 delta tk and that's the Brownian motion so we simply get delta tk for the expected value of the increment squared and there's a square there and we square it again so we have 2 delta tk squared same argument as before now so this is equal to 1 half sum over delta tk squared which we can bound less than or equal to 1 half the sum of delta tk times the norm of my partition and that is constant so I can pull it out and this is a little t by definition this is all finite so this limit is going to go to 0 as pi goes down to 0 and the conclusion is we have a remainder which has a 0 mean and its variance goes to 0 therefore r converges to 0 almost surely and like I mentioned before someone was asking whether this is convergence in probability or is it just convergence in distribution convergence probability versus convergence almost surely there is a subtle difference but you can prove that you can in fact find a subsequence in which you do get convergence almost surely so what we've then shown is what we've then shown is in fact that this sum of w tk minus 1 I might as well just write it like this the increment of w is equal to a half w squared minus t which is and this is by definition the left hand side there is by definition the stochastic integral of w s d w s questions in terms of sort of Brownian in terms of stochastic calculus this is perhaps one of the most fundamental results oh man I don't know why it's doing that other than the quadratic variation becoming t almost surely this is the next fundamental result for Brownian motions and in some ways it's connected to that it's connected to the fact that the square of the sum of the increment surely in fact this error if you look at it that error is almost the sum of the square of the it's almost exactly the quadratic variation isn't it so we get that nice result okay there's a couple of things a couple more things you can learn about stochastic calculus just by looking at this result suppose you took the d of both sides of this equation whatever that means right in a very hand waving sense if you took the differential of this then whenever you take the d of an integral you just remove the integral right that's all it does for you so you would end up with wt which we know doesn't actually mean anything because Brownian motions are not differentiable so without the integral sign there wdw doesn't make much sense but it's a sort of enumonic tool so according to this rule this equals one half the d of w squared minus one half dt right if you just sort of apply the d operator to everything there and I'll put all of this in quotation marks and if I just put the dw squared on one side of the equation and the rest of the thing is on the other side you end up with two wdw plus dt if you were doing standard calculus that's where you would have stopped right you would say that the differential of the square of something is just twice the something times the differential of the something right you just get two wdw this here is the itto correction term and this term only shows up because of the fact that the Brownian motion has infinite total variation and finite quadratic variation it doesn't show up for any other kind of process for any other for any differentiable process I should say even if there are jumps you don't get that term so it's only it's really quite special to Brownian motion that you get that correction so then when you look at that you might ask yourself well what is the more general rule then this is fine for quadratic function we worked it out but what if you were to take a Brownian motion and you just so you take some you take a Brownian motion and then you map it through a function to get a new process g of that Brownian motion so this is a specific example where g is the quadratic function but in general what's the answer it turns out that you need g to be differentiable enough for there to exist an answer in the first place so g must be twice differentiable so this notation means twice so it turns out that if g is twice differentiable then this is the first version of Ito's Lama that I'm going to give you and Ito's Lama says that if you take the d in this sort of strange weak sense d of g of w is equal to what you'd get usually from standard calculus so one derivative of g d w plus plus an extra term one half two derivatives of g with respect two derivatives of g d t so this is your standard calculus and I'm going to put this in quotation marks still this is your standard calculus result and this is your Ito correction it's not too difficult to actually prove to do a reasonably rigorous but still some whole proof of that statement and it would follow along the lines of what we've actually just done so what you do is you say that well what does this really mean if you take the if you integrate the left hand side that's the same thing as g of w t minus g at w zero equals the integral from zero to t g prime w s d w s plus one half the integral from zero to t g double prime d s this is what that statement actually means it actually means this in terms of the objects that are underlying here we have to make a few definitions still this one, what is that that's the stochastic integral that's the object that we haven't fully defined yet but we can say we'll define it this is a stochastic integral and this here just be interpreted as either Lebesgue or Riemann it really depends on what the function is but Riemann is enough for our purposes here so let's call this term well I'll just write it out here so this integral is by definition exactly as how we defined it for the quadratic for the quadratic function you put a partition down to zero you evaluate this in fact I'm going to write it like this for any function h it doesn't have to be g prime itself for any function h if you take the Brownian motion at the left hand point you multiply by the increment of that Brownian motion and then you take the limiting case of the partition going down to zero for the for integrals with respect to the Lebesgue measure here you can define it in the same way there's actually nothing too complicated about it because here or I should say in this case we had to actually make some sense of what the integral of d w mean right because d w is not differentiable so this notation is kind of it's already a little bit loose so we made that definition specifically by we've defined what we mean by this entire stochastic integral in this way and we've defined it in that way so that this thing converges in some sense it converges in this l2 sense for this integral we don't actually run into that problem at all you can show that this integral here because of the fact that ds actually makes sense in some space some sorry some time evolution then we can simply we don't have to worry about details of how we define this we can actually use any point in the interval w tk star and here we would have the delta tk so for a Riemann integral you don't have to use the left hand or the right hand or any of those points in there that you like most typically this is going to be tk-1 simply so that you can make easy comparisons between this sum for a finite partition and the other sum for a finite partition so this is what we precisely mean by those terms or almost precisely mean by those terms now what I'd like to do is use this as opposed to try to prove any of these things if you and if you're interested in seeing the proof of this statement come to me I'll show you next class I'll give you a hand-waving proof which uses in a very bad way Taylor's theorem and it's completely wrong but people do it anyway so the rigorous proof requires a more delicate touch but for the last 10 minutes today I'll show you how you can use this result okay so let's take an example an example of well we already have the example of w squared so what about w cubed might as well ask that question so suppose a question was asking you you were being asked what is the integral of w squared dw you wanted to find out that integral how would you do that well your guess of course is it's a third w cubed and you know that's wrong because this is stochastic calculus and you don't get the usual thing so since that's your guess your guess is it's a third w cubed what you can do is say consider consider a function g of x equals x cubed okay because you think the answer is a third w cubed so the factor of a third is inconsequential we can get that afterwards so why not consider g equals this then it's those lemma then implies that the d of g of w equals according to it's those lemma let me I'll write it down again it's g prime of w dt one and a half g double prime of w dw sorry I put the dw and the dt in the wrong place so that's just straight forward itto's lemma that's actually the statement of itto's lemma and this equals so what's g prime 3x squared right so we'll get 3 wt squared dw what's g double prime 3 times 2 times x right so that's a half 3 times 2 times w dt so now I'm after we've already made some progress I want to compute this integral and that's showing up there so let's isolate that term put everything on the other put everything on the other side so we have w squared dw equals one third d of g of w minus well the factor of 3 and the factor of 2 get cancelled so we simply get minus w dt and now I integrate both sides of the equation from 0 to t and I'll have integral from 0 to t ws squared dws equals one third integral from 0 to t of the d of g of w minus the integral from 0 to t of ws dws now what is the integral of the d of anything it's just the thing evaluated at the endpoints right if you imagine this in terms of the partition you're just summing up the increments over the entire interval so you get the first point minus the last point or sorry last point minus the first point I'm confusing myself so our answer this is one third g ws sorry whenever you have to integrate you can't use the integration variable and the endpoints is the same thing so I need a dummy variable here and then you simply put what is g of w this is simply w cubed w0 is 0 so that second term is 0 minus and the third is in front of everything isn't it I made one small oh no the third is not in front of everything yeah yeah third is not in front of everything thank you I thought I made a mistake okay we end up with that so that's our result integral from 0 to t ws squared dws in some sense this is an integration by parts formula almost in some sense this is really an integration by parts formula you're changing the d operator to make it be applied to the w cubed and then you have your correction term you know the integral of u dv is uv minus the integral of v du this thing that's really what that formula kind of tells you and what's nice about this formula is that on the left-hand side you have a stochastic integral on the right-hand side you do not have any stochastic integrals you have stochastic processes yes but that integral of ws ds is simply a Riemann integral and the w cubed is just the process at time t so in fact you've actually performed the integral you now know that that stochastic integral is this object this object is not something that's deterministic in any way but you have an explicit form for it you will not always be as lucky to be able to completely solve the problem but generically if you want to compute stochastic integrals of that kind an integration by parts formula will always follow by applying itto's lemma to an appropriate function and usually the appropriate function to use is whatever your guess is from standard calculus ok so i guess it's 5 minutes so i think i'll stop there that's as much as i'd like to tell you today