 All right. So, today I want to spend the entire lecture talking about Gaussian statistics and Gaussian random fields. And towards the end of the lecture, I'll say something about length scales for the cosmological context, which has also been touched upon by Marco. There'll be a little bit of repetition, which I think is good, and then I will set it up in a language which I can then use tomorrow. All right. So, you've already seen enough motivation for why understanding of Gaussian random field is important, because you've seen that simple models of inflation predict that the primordial density perturbations or the curvature perturbations in this quantity zeta are approximately Gaussian distributed, because the fields that we are talking about are approximately free, and they have weak interactions. So, the level of non-Gaussianity is small. So, we would like to understand if I have a realization of a Gaussian random field, what does this mean? So, that's what this lecture is about. And just to remind you why we are doing all of this, this is a movie of simulation, which has been performed with such Gaussian initial conditions, but then evolved under the influence of gravity. So, there is no other physics here. It's just Newtonian gravity starting from initial conditions where density fluctuations of the matter were taken to be a Gaussian random field evolved with standard stuff, which we will study in the next lectures. So, I want you to look at this picture already to see what the goal is. This is the thing we want to understand from this and the remaining lectures. And we will not be discussing the physics of this simulation today. That will be reserved for the next two lectures. We are interested in understanding how to figure out what the statistics of the initial conditions are, because then that will help us understand what the statistics of the evolved field also looks like eventually. So, this is just to set the stage. These simulations are now easy enough to perform that even I can do them. So, that already tells you something. This was done at Ayukka with a publicly available code written by Volker Springer. It's called Gadget. Okay. So, now I will just move to the blackboard for a while. So, let me just stop sharing the screen. Okay. So, let's start with something very basic, because I find that several times if you have not come across these topics before, some of the jargon can get very confusing. So, consider a variable X, which is drawn from a Gaussian distribution with zero mean and a variance of one. If you do not understand what this means, I would suggest you go to the Wikipedia page for a Gaussian distribution and for statistics and probability in general about probability distribution functions and random variables. Okay. So, this is the language that we will use throughout this topic. So, if I have a variable like this, I can write down the probability density function, which I will call little p of X. This will be 1 over root 2 pi e to the minus x square divided by 2. So, this is the probability density function. And as the name suggests, this gives you the probability density of X, meaning that p of X dx in an infinitesimal interval dx is equal to the probability that the variable X takes values, that the variable takes values in the infinitesimal range X to X plus dx. Now, I'm not a mathematician, so I will be, sometimes if there are mathematically inclined people here, for you, I will be very sloppy from your point of view. I will not worry about whether there's a square bracket or round bracket and things like this. I will also not worry about distinguishing between the variable itself and the value that it takes. So, I'm calling X as the random variable. I'm also referring to the value that it takes as X. Statisticians will be very annoyed. So, the Wikipedia article that I'm talking about, for example, will be careful about such things. Okay. They may call the variable capital X and the value that it takes as little X and so on. So, we will not do such detailed stuff. So, okay, so this is the PDF, probability density function, PDF. And it's normalized because if I do this integral, this turns out to be one. And if you don't know how to do this, please look it up in some standard maths textbook where it will show you how this integral can be done. Okay. So, now several interesting things can be constructed out of just this simple distribution. For example, you can start taking moments. The first interesting moment is called the mean of the distribution. So, here is some notation which I will use throughout. Average, the angular brackets refer to an average. So, this is defined as the integral weighted by the PDF of the quantity X. Okay. So, the angle brackets refer to this integral weighted by P of X. And whatever is inside the angle brackets is the quantity which I have to put here. So, now for this, what should this value be? Can someone tell me? Zero. Okay. And if you don't know why, please do this integral or just stare at the integrand and notice the integration range. And this should be clear. Okay. So, the mean is zero for the way I have defined this field. The variance is defined like this. It is the expectation value of X squared minus the expectation value of X, the whole squared in general. Here we don't have an expectation value of X. So, I only have to calculate the expectation value of X squared. What will this be for this particular distribution? It has to be one. That's in fact how I have picked the distribution. And again, you can do this exercise and see how this comes out. All right. And you can go beyond and define the nth moment. And what you can prove for this field is that all odd moments are actually zero. Just look at the integrand and argue based on symmetry that these have to be zero. And all the even moments have a very nice numerical form. They can just be represented by this quantity 2n minus 1 double factorial, okay, for this particular distribution. It's also useful if you've not done these calculations before to ask how should I prove this, you know, in a reasonably quick way. Turns out that Fourier techniques are very useful. Okay. So, and Fourier techniques we will use in a very heavy manner in the rest of the lecture. So, let me just quickly show you what happens in Fourier space. So, if I consider the Fourier transform, let me call this distribution p with a subscript g, g for Gaussian. Okay, this is my notation. Nobody else uses this. But I like to keep track of the name of a distribution by putting a little subscript there. So, the Fourier transform of this pg, this will be a function of the Fourier variable k, which I'm assuming is the conjugate to x. And my convention is that this is defined as an integral over x of e to the minus i k x times pg of x. All right? If you notice the way this integral is written, it is also clear that this is the expectation value of e to the minus i k x. All right? Just from the definition of expectation value. Calculating this integral is a nice exercise in contour integration, which if you have not done before, I would really encourage you to do. What I have found is that many people remember the answer to this integral. But if you ask them how to prove it, you usually get into trouble. Because this integrant does not have any poles anywhere, any singular poles, discrete poles anywhere in the complex plane. However, it has essential singularities at plus i infinity and minus i infinity. So, how do you deal with this kind of integral which is on the real line? So, you have to be a little bit careful with the contours that you choose and the arguments that you use to apply the Cauchy theorem. Do that and you will find that the answer is simply e to the minus k squared over 2. Have you seen this before? So, I see people nodding. Maybe other people have not seen. So, if you have not seen this before, please try proving this. Using this, if I calculate the expectation value of x to the power n, then because this is simply the integral over all of x, sometimes I will not write these integration ranges when it is obvious what I am talking about. Whenever I write an integral sign without an explicit range, it means I am integrating over all allowed values of that variable. So, in this case, I have minus infinity to infinity and now I have this times a Pg. But now for the Pg, I can write down the inverse Fourier transform and I also know that when in terms of Fourier conjugate variables, if I multiply by one power of the variable, then in Fourier space, I am taking that one derivative with respect to the Fourier conjugate. Usually, you think of this when you study, for example, fluid dynamics or differential equations, you will have d by dx appearing and then you will say, let me go to Fourier space where I will multiply by k. Here, I have x appearing. So, if I go to Fourier space, I will differentiate with respect to k. So, you can then show that this will just be i times del by del k to the power n of Pg, sorry of the Fourier transform of Pg of k evaluated at k equal to 0. Because the Fourier transform has e to the i k x, but in this integral, that k dependence, when you look at the way this integral is constructed, you will end up evaluating that quantity, the derivatives at k equal to 0. And then you can do this n by n. So, for n equal to 0, you can show that the expectation value of 1 is just 1. For n equals 1, you will find that expectation value of x is minus i k times e to the minus k squared over 2 at k equal to 0, which is equal to 0 because there is one power of k and so on and so forth. So, for example, with n equals 2, this is the first non-trivial one. So, I will have x squared. This is equal to minus i squared del del k of k times e to the minus k squared over 2. The whole thing evaluated at k equal to 0. And now you can see that there will be a non-zero term, because although there is a k here, when I differentiate, there will be one term in which it is just the exponential being evaluated at k equal to 0 and this will turn out to be 1 and so on. So, you can use this very easily to prove this general identity as well. Okay. Another thing I want you to remember regarding Fourier transforms is that if I just integrate dx e to the minus or plus i kx over the entire range, I get 2 pi times a Dirac delta in k. So, this is my notation for a Dirac delta. When I go to multiple variables, I will use vector notation and I will still use Dirac delta as delta d and then the quantity inside will be a vector. Okay. So, what are the, suppose x has dimensions of length. Can someone tell me what are the dimensions of Dirac delta of k? So, who says 1 over length? Who says length? And who says dimensionless? Okay. The ones who are saying length, can you argue it out for me? Come again? Very nice. Okay. So, he's very quick with just looking at things but suppose I just ask you for the long winded way of doing it. Okay. So, let me give you the long winded answer. This is a very nice answer. He just saw that there's an integral here. This is dimensionless. So, the right hand side must have dimensions of length, not 1 over length. Okay. So, if I think of it in a longer way, kx is dimensionless. So, if x has length, then k has 1 over length and the Dirac delta is a density in k. So, it has 1 over 1 over length which is length. Okay. And then it matches what he said before that it has units of length on this side. All right. So, these dimensional reasonings will also be important as we go along. Okay. All right. I think I will not need Hermit polynomials in this lecture. So, we will skip this. Okay. So, the next interesting thing is to modify this distribution a little bit and introduce some parameters. So, let's introduce the variance as a parameter. Let's call it sigma squared and we'll assume that this is not equal to 1 in general. So, now just by dimensional reasoning, you can argue that this Pg, let me also call this Pg of x semicolon 1 and I'll call it and I'll define it like this. So, then Pg of x semicolon sigma squared, where sigma squared was 1 initially, just by dimensional reasoning, if I say that sigma squared is the variance of the variable, then sigma squared, if x has dimensions of length, what should sigma squared have dimensions of length squared? Okay. So, now P of x is a density in x, right, because Px dx is a probability which is therefore dimensionless. So, P has units of 1 over x or 1 over length in this case. So, now, if I want to replace this quantity with something which is dimensionally correct and has a variance sigma squared, I can apply dimensional reasoning and you can convince yourself that the answer has to be this. It cannot be anything else, okay. Simplest way to do it is to just check what happens if I calculate the variance of this distribution now, meaning that I take integral x squared Px dx, yeah, you should get sigma squared, okay. In this case, in general, again the odd moments are going to be 0 because I did not do anything to x in the numerator here, but now the even moments, again almost by dimensional reasoning will just turn out to be sigma to the power 2n times 2n minus 1 double factorial. Similarly, the Fourier transform of this nu Pg in k space, this will be e to the minus k squared sigma squared by 2, okay, and again you can follow this through all pretty much by dimensional reasoning. Remember, this is just the expectation value again of e to the minus i kx. This is the definition, okay, good. We can also introduce a mean for this variable in the problem. Let's call the mean mu and this basically means that I want the expectation value of x to be equal to mu. This is what I mean by introducing a mean. Now, what you can check is that wherever I see x in the original distribution, if I simply replace x with x minus mu, this will do the job, okay. So, this will guarantee that the expectation value of x is mu. So, this will mean in Fourier space, raise this thing. This is integral dx e to the minus i kx of Pgx. So, if I make this replacement in Pg, then effectively in the Fourier space, I will pick up a phase e to the minus i k mu, okay. So, just do the calculation yourself and check how this happens. I'm going through this quickly because I'm assuming that at least half of you maybe have already seen all of this before. This is basic Gaussianology, okay. All right. So, basically all Fourier transforms will pick up these phase factors e to the minus i k mu and nothing else will change about the way one does calculations. The next interesting thing about the Gaussian distribution is it relates to what are called generating functions. So, let's look at the generating function of the Gaussian distribution. A generating function in general for any kind of distribution is that function whose Taylor coefficients in the variable of interest will give you, so it is a function of two variables. One is the variable of interest and the second is an auxiliary variable. So, in terms of the auxiliary variable, if I do a Taylor expansion, the Taylor coefficients have to be the moments of the distribution that you're talking about, okay. And the moments of the distribution are these quantities x to the power 2n, x to the power 2n plus 1. This is the 2nth moment, the 2nth plus 1 moment. So, this is a function whose Taylor expansion, Taylor coefficients are what you want, are the quantity of interest. So, this would be a generating function for this quantity of interest. So, in this case, let's call the moment generating function m of k. This is equal to the expectation value of e to the kx. You can easily see why, because if I write down this expectation value and I Taylor expand in powers of k, then each Taylor coefficient will exactly be, so for the nth coefficient, I will have expectation value of x to the power n, okay. That's how the Taylor series will be ordered. So, this thing in our Fourier language becomes simply the Fourier transform of Pg, not of k, but of i times k, okay. Because I had defined, when you write down the Fourier transform of Pg, you find the expectation value of i kx, okay. So, what you will then see is that the Fourier transform of i k, you know, evaluated at i k will give you exactly e to the kx. I think this was e to the minus i kx, right. So, this thing is expectation value of e to the minus i kx. So, if I replace k with i times k, there will be a minus i squared, which will become 1 and then I will get exactly this, okay. This is a nice integral to do. It turns out to be exactly this, e to the mu k plus k squared sigma squared over 2. So, again, this is worth proving because the one way of doing this is to write down each x to the power n in terms of the nth k derivative of the Fourier transform, right, and then you can go from there, yeah. And you'll have to recognize a particular Taylor series, which is the Taylor series corresponding to the exponential of this quantity, okay. So, try doing that. This is the moment generating function. And the most interesting part of this generating function process for Gaussians is the so-called cumulant generating function. Maybe I should not write so low. Let me write it here. The cumulant generating function, this is defined, we'll call it c of k. It is defined as the natural logarithm of the moment generating function, okay. This is the definition. So, for any distribution, assuming that the moment generating function is well defined and its natural log can be taken. That is the case here. So, let's take the log of this. And for the Gaussian, this turns out to be just mu k plus sigma squared by 2 k squared. Now, any generating function is a function whose Taylor series expansion is its coefficients give you the quantities of interest. So, the cumulant generating function is the generating function of cumulants of the Gaussian distribution by this logic, okay. What do you see here? What you see is not a complicated function. It is just a quadratic polynomial. So, it has a linear term and it has a quadratic term and it has nothing else. So, I can think of this as a Taylor series in which there are exactly two coefficients, the linear coefficient and the quadratic coefficient, right. There is nothing beyond. There is no k cubed term. There is no k to the 157th term and so on. So, this has exactly two cumulants. The distribution, the Gaussian distribution has exactly two cumulants which are the mean and the variance, okay. This is the defining property of the Gaussian. So, these are the mean and variance and this statement is the defining property of a Gaussian. This is a Gaussian in one variable, okay, Gaussian distribution. So, that means a non-Gaussian distribution for this one variable will be anything that is not this, meaning it is anything that has any higher order cumulant, okay. And in fact, there is a theorem which says that if a distribution has one higher order cumulant beyond the second order, it will have in fact an infinite number of them. You cannot construct a distribution which has exactly one higher order cumulant. You cannot stop at three, okay. You will in general have somewhere or the other you will have higher order cumulants continuing infinitely, okay. So, this is some nice thing to remember. This is the definition of a Gaussian variable, okay. All right. This is all for one variable. Now, we want to eventually describe a field, okay. A field is composed of an infinite number of variables. If I think of a field as something that lives on a grid in space, then a field would be a set of Gaussian variables defined at every location on the grid. So, I cannot just think about one variable. I have to think about many variables. So, before we go to fields, let's just set up some notation for multivariate Gaussian distributions. This was a univariate Gaussian distribution. Now, we'll write down a few things for multivariate Gaussian distributions. So, the simplest way of understanding a multivariate Gaussian distribution is to think about a set of n independent Gaussian variables, okay. So, suppose I have y1, y2 up to yn and they are such that each yj is drawn from Gaussian with mean muj. Let's set the muj's to 0 for simplicity because wherever you see non-zero muj's, you can introduce phase factors e to the i, k, mu and Fourier space appropriately and deal with them. So, we won't worry about the means. And let's say that their variances are some lambda j squared, okay. And let's also assume that these are independent, meaning that I don't have to worry about when I want to draw. It's always useful to think, at least for me, to think about random variables in terms of drawing a value from the distribution. What does drawing a value mean? So, I can do a coin toss and get heads or tails. So, now, the value that I get as a result of this experiment is a draw from the distribution. The distribution itself has nothing to do with the experiment, okay. The distribution exists in an abstract space of distributions and the distribution says that the probability for drawing a head is half and the probability for drawing a tail is also half and they sum up to 1. When I do the experiment, I realize a value and that is a draw from this distribution. So, in this case, what one is saying is that when I draw a value of y1, I don't have to worry about what the values of y2, y3, etc. are. And similarly for any yj, okay. So, these are independent draws, which means that the joint probability density function for y1, y2 up to yn. How will I write this? It is a product over single Gaussians. So, j equals 1 to n, Pg. In this case, because I have chosen 0 and lambda j squared, these are all Pg of yj with sigma j squared, sorry, lambda j squared, okay. So, this is the starting point. If I have a multivariate Gaussian where each Gaussian is independent of the other Gaussians. But now, at least I have a multivariate distribution here, okay. I have a joint probability of finding values y1 and y2 and yn for all these n variables. A generic multivariate Gaussian will not have all its n components being independent of each other. In general, they will be correlated with each other. But if I want this distribution to remain Gaussian, then the defining property for all practical purposes is to simply say that any multivariate Gaussian can be thought of as a rotation of a set of independent Gaussians, okay. So, I think of my y1 to yn as a vector and this is composed of components which are independently drawn Gaussians. A general multivariate Gaussian can be obtained by an orthogonal transformation of this y, okay. So, it is a straightforward linear algebra exercise. So, consider p of x which is obtained from, let me just invent some notation. It is obtained from p of y by a rotation, okay. So, let me write down this rotation. So, let us call the rotation that takes you from x to y as r, okay. So, this r is such that, rotation is r such that I can think of y as r acting on x or x as r transpose y, alright. So, now think about what this will do to the distribution, okay. What I want to say is that the probability, the probability density is such that the probability to be in an infinitesimal volume in y space should remain preserved under this rotation, okay, because I am just doing a rotation. So, I am not really changing anything about the distribution function itself. So, if I think of the probability density as this dn y p of y and I apply a rotation, it is completely straightforward change of variables now, okay. And I know that the infinitesimal volume element in y space will remain unchanged under an orthogonal transformation. So, this will just convert itself into dn x times p of x, where now p of x is just p of y evaluated appropriately, okay, where y has to be evaluated as r times x. So, what you will then see is that I need to now think a little bit carefully how to represent this quantity that I get on the right hand side. So, an easy way of doing this is to convert this product into a single function because I just have to deal with exponentials. So, instead of thinking of this as a product, I can think of this apart from the normalizing 2 pi to the half etcetera factors. This is just a summation over all of these n variables of y j squared divided by lambda j squared, okay. But I can think of a summation in terms of matrix algebra, right. So, I can think of this as a vector transpose times a diagonal matrix times the vector itself. So, this is also equal to e to the minus one half of y transpose times some matrix lambda inverse times a y, okay, yeah. It is not that there are never any correlations, but it is true that you can always find variables where there are never any correlations. But those variables are linear combinations. They are rotations of the original variables, yes. That's right, exactly. It is a general result, yes. For Gaussian distributions, yes, that's right. Okay. So, I have just written this set of independent Gaussian variables in this language. The reason this is nice is because now I can say that, you know, my P of x, I will just get by replacing y as r times x. So, y transpose will be x transpose r transpose. Then I will have my lambda inverse and then I will have r times x here. So, I can think of this as a new Gaussian. This is equal to e to the minus half then x transpose, sorry, yes. Is it better if I just write on this line and I make it bigger? Okay. Let me write it and then I will just tell you what I have written. Maybe that will also help. This is r and then this is x. So, what I have just done... It's also visible. Yeah, yeah, yeah. Anyway, I can just talk about it also. So, this is just the same thing as before and I have applied the rotation here. Okay. And then the volume element does not change. So, what I can see here is that if I think of this quantity, what is this quantity? It is a square matrix, right? It's just a rotation, it's a similarity transformation or the rotation of the matrix lambda inverse. If I call this c inverse, then I have something that looks exactly like the previous Gaussian which was y transpose lambda inverse y. Now, I will have x transpose c inverse x, but c is not a diagonal matrix anymore. This lambda inverse is a diagonal matrix with 1 by lambda 1 squared, 1 by lambda 2 squared, 1 by lambda n squared on the diagonal. And therefore, lambda, which is the inverse of this, will just have lambda squares on the diagonal. This c or its inverse is not a diagonal matrix anymore. But that's all that has happened. So, a multivariate Gaussian distribution, at least from me, this is the easiest way to think about it. It is a distribution of a set of variables, x1, x2 up to xn, which is defined by saying that its probability density function has this particular form, e to the minus x transpose times the inverse of some matrix, which is a real symmetric matrix. Not only is it real and symmetric, because I have chosen to define this starting from these independent Gaussians, I have naturally introduced the variances of each of these variables into my calculation. The variance of any Gaussian is a positive quantity. At least it cannot be negative. It could tend to 0, but it cannot be negative. So, all of these numbers have to be non-negative, which means that the rotation that I get of this diagonal matrix is a positive semi-definite matrix. So, just to summarize, Gaussian distribution, and let me just show you what the thing looks like with the mean thrown in. So, Pg with x minus mu with a covariance matrix C. This C is called a covariance matrix. This is 1 over 2 pi to the n-halves times square root of the determinant of C times e to the minus one-half x transpose sorry x minus mu transpose C inverse x minus mu, where mu is a real vector of dimension n. This will be the mean vector of the Gaussian, which I have set to 0 initially, but you can easily put it back. The most interesting thing is this C inverse, where C I can think of as this expectation value. So, it is x x transpose minus x times x transpose or in terms of components C i j is x i x j minus expectation x i expectation x j. And this is a real vector of dimension n. Symmetric, positive semi-definite matrix. And you can use this set of tools to analyze almost everything about Gaussians, because for example, you could work in Fourier space by first doing the rotation. I give you a Gaussian and I tell you, here is its mean, here it is covariance matrix, it is covariance matrix as it has off diagonal elements and it is very complicated. I want to calculate the nth moment of the fifth variable. How will I go about doing it? So, one way of approaching the problem could be you start with your distribution, you find the rotation, which diagonalizes the matrix C, that will go backwards in this calculation and reach this stage. So, now I have to ask for the variable x 5 to the power n, what does it mean in terms of the linear combination, which is implied by the rotation. But I can do this in Fourier space in reasonably straightforward ways and you can calculate various things. So, I will give you a couple of homeworks here on the board to try at home. These are just some useful exercises in Gaussianology, which turn out to be interesting in many, many contexts, including by the way, in the statistics part of the course that you will be dealing with. So, there also if you deal with Gaussian distributions for errors, all of these things will apply. So, I will not spend any time on this, I just want to mention it. Marginalization, if I take a n dimensional Gaussian, let us call it P n of x 1, x 2 up to x n and I want to marginalize the nth variable, marginalize the English word, marginalize means to render irrelevant, population of people is marginalized by oppressors. So, I want to marginalize the nth variable, this means that I do not want to care about what value this nth variable takes, but I do care about all the other variables. So, I will integrate over the probability density of the nth variable. So, let us construct the integral d x n over all allowed values of this P n, which is a function of x 1, x 2 etcetera, x n minus 1 and x n. Prove that this for a Gaussian, so this is a let us say Gaussian with mean mu and I will also be a bit sloppy about vector notation, just to tell you that we are not talking about single variables. So, sometimes I will use this, sometimes I will use this, they all mean vectors and there is a covariance matrix C and a mean mu and the covariance matrix is, it has off diagonal elements in general. So, prove that this integral is actually equal to P n minus 1 of x 1, x 2 up to x n minus 1, which is a Gaussian also, but with, so if this was mu 1 to mu n minus 1, then this Gaussian will have a mean which is mu 1, sorry if this was up to mu n, then this will be up to mu n minus 1. So, I will just drop the last element of the mean and a covariance, if this was a C n, it is also called as some mu n. So, a covariance which will be C n minus 1, which will just be C n, where I with the last row and column or nth row and column dropped. So, if I have a matrix with C 1 1, C 2 2, C n n and then C 1 n and C n 1, this is C n, after marginalizing I get C n minus 1, in which I will just get this thing without the last row and the last column. So, this has to be proved and again Fourier techniques are very useful for proving this. Mirdad, how am I doing for time? Half an hour more. I will have to think a little bit, I will get back to you on this. If I start thinking now, it will take up time for the rest of the, alright. And while doing this calculation, it is also useful to keep in mind the Fourier representation of this multivariate Gaussian. So, let me just write it here. This can be thought, this can be written as an integral in Fourier space over a k dimensional, sorry, an n dimensional k vector. This Fourier representation is actually extremely useful. So, let me just write it down once e to the i k transpose x minus mu, these are all vector operations times e to the minus k transpose C k by 2. In the Gaussian probability density function itself, you see that there is a C inverse appearing for dimensional reasons almost, right, as we argued before. In Fourier space on the other hand, the exponential contains a C. This is almost on its own the reason why you should always prefer to work in Fourier space when dealing with Gaussian distributions, because you work with the covariance matrix itself rather than its inverse. And whenever you have to do any calculations with the inverse of the matrix, even numerically, etc., it becomes much more challenging, okay. So, the fact that C appears here is a very useful thing, okay. So, you can start with this and try to marginalize, alright. Another interesting aspect of these Gaussian distribution functions is the idea of a conditional distribution. So, I will again not spend much time, but imagine if I have a bivariate Gaussian p of x 1, x 2, which I can think of as d 2 k e to the i k transpose x minus mu, whatever. It's just this, right, here, with mu equals mu 1, mu 2. And C, because it's a 2 by 2 matrix, let me write it in this way, sigma 1 squared, sigma 2 squared, C 1 2, C 1 2. It's real symmetric and I'm assuming that it is positive semi-definite. So, C 1 2 has to be chosen appropriately, okay. It cannot be arbitrarily chosen. So, now, what I'm interested in for this part of the calculation is in understanding if I know the value of x 1, what is the probability density of the value of x 2? So, this is the conditional probability p of x 2 given x 1. You can apply Bayes theorem, okay, and this will be a ratio of two probability densities p of x 1, x 2 divided by p of x 1. And you can argue that this is actually another Gaussian. So, please do this, figure out why this is the case, either by brute force or by some clever trick. This is a Gaussian in x 2. It's a univariate Gaussian. We started from a bivariate Gaussian and we conditioned on one variable. Earlier, we had marginalized over one variable by integrating out all its values. Here, we are conditioning on a variable by saying we know exactly what value that variable x 1 takes here, okay. And this turns out to be a Gaussian in x 2. If it is a Gaussian, it has to be defined by what? Two numbers, right, which are what? The mean and variance of this thing. So, we invent notation, I mean, this is standard notation. What we want is the mean, but it's not the complete mean. It is the conditioned mean, okay, conditional mean and the conditional variance. So, let me call that as variance of x 2 given x 1. And you can prove that this will be equal. Let me just not screw up the calculation. This will be equal to the mean which it had, if I didn't care about x 1, plus a correction which depends on x 1 and it depends on the off diagonal term in the covariance matrix, okay. And the rest almost follows from dimensional analysis. I'll get a x 1 minus mu 1 here. So, this is a useful thing to prove. And similarly, the variance of x 2 given x 1 will be the variance that x 2 had if I didn't know anything, which is sigma 2 squared. But now it will be reduced because of the presence of the off diagonal term, because I will get a c 1, 2 squared divided by sigma 1 squared. It is reduced because there is a square here and sigma 1 squared is also positive, alright. And this is nice to interpret because what you're saying is that I had some uncertainty if I think in terms of errors. I had some uncertainty in x 2 if I didn't worry about x 1. But x 1 and x 2 are correlated. So, if I put in some information into the system saying that I know the value of x 1, then the uncertainty in x 2 has decreased. It must decrease, okay, because I have put in some information and therefore reduced uncertainty. And the amount by which the uncertainty has decreased is exactly given by the correlation between 1 and 2, x 1 and x 2, okay. I see. Yeah. I have a question. It's not normalized anymore, right? Is what normalized? It's not this conditional p won't be normal. No, no, no, it is. Because, no, so I'm saying this is a Gaussian with this mean and this variance. So, the calculation if you go through the correct algebra will lead to a correctly normalized Gaussian which has a mean and a variance which is given by this, okay. So, in the exponential, I'll have a x 2 minus this quantity, the whole squared divided by 2 times this quantity and the pre-factor will be 1 over square root 2 pi times the square root of this quantity. This will work out from the algebra, yes, okay. And again, Fourier techniques are very useful in proving this. Okay. So, with this setup, let us move to the object of interest which is now a set of Gaussians which is defined on to begin with, let's imagine a discrete grid in space. So, here is a point, here is a point, here is a point. At every point, I declare that there exists a variable which is Gaussian distributed, okay. And the collection of these variables at each of these points forms a multivariate Gaussian. But I am talking about spatial grid. So, instead of calling it a multivariate Gaussian, I will call it a Gaussian field, okay, because it is defined on a spatial grid. And that's the only difference between a multivariate Gaussian and a Gaussian random field. It's just nomenclature. There is another difference if I want to take the continuum limit of this spatial grid, okay. If I want to make the grid cells very, very small, for example. Then I have to do things a little bit carefully, okay. And this is what we will study here. All right. So, now, as I have argued repeatedly, it is always useful to work in Fourier space. And in fact, in the inflationary context, this makes a lot of sense because you have already seen in the previous lecture that the Fourier modes, the momentum space modes of your density fluctuation field are the ones that are approximately uncorrelated with each other, okay. There is one subtlety about correlations versus independence which it is worth emphasizing. If I tell you that p of x1 and x2 is equal to p of x1 times p of x2, then x1 and x2 are independent variables. On the other hand, if I simply tell you that the expectation value of x1, x2 minus x1 times x2 is 0, this is usually what people mean when they say that x1 and x2 are uncorrelated. So, in general, I can have a bivariate distribution of x1 and x2 where this is satisfied. So, the variables are uncorrelated, but I cannot write the distribution function in this separable form. So, the lack of correlation does not imply independence, okay. Independence will imply that there is a lack of correlation. For a Gaussian set of variables, however, the lack of correlation does imply independence. Only for the Gaussian distribution, this is true, okay. So, this is just something worth remembering. So, since we are now going to talk about Gaussian variables, I will switch between independence and not being correlated as being equivalent to each other, but it is worth remembering this difference, okay. So, now let us set up this multivariate problem directly in Fourier space. So, instead of working with the values of delta at vector positions x, I will instead work with the values of the Fourier transform of delta. So, my Fourier conventions are like this. Delta x is an integral over the volume of d3, sorry, I am writing this wrong. Delta k is the integral over the volume d3 x e power minus i k dot x times delta of x. And I am imagining that I am living in a cube of side l, okay. So, this volume is l cubed here. So, just for simplicity. And correspondingly, because this is a finite box, my delta x will be written as a sum over discrete k-mods e to the i k dot x times some variable delta k. So, this variable delta k is the Fourier transform of delta x in this discrete calculation, okay. So, it is a partially discrete Fourier transform. It is discrete in k because there is a finite volume. Eventually, we will take a limit in which the volume goes to infinity and the discreteness in k becomes an integral over d3 k, okay. So, we will take the continuum limit later, but it is useful to work with discrete calculations just now, okay. So, now delta k, it is the Fourier transform. And by definition, the Fourier transform is a complex number, okay. So, I can write delta k as some a k plus i times b k, where a k and b k. The labeling is by k, the Fourier mode, and the a's and b's and the delta's are the variables of interest, okay. These are the random quantities. So, a k and b k are both real numbers and therefore, delta is a complex number. So, now if I look at this, the definition of a Gaussian random field, one way of defining it, is to say that a k and b k are Gaussian distributed. But now, if I look at this expression, if a k and b k are Gaussian distributed, this delta k looks like it is a sum of Gaussians, except for the fact that there is a square root of minus 1 sitting here. So, this I can think of as a weighted sum of Gaussians. And one nice property about Gaussians is that if I have x and y to be Gaussian distributed, then the quantity z, which is x plus y, is also Gaussian distributed. But with a mean and variance, which are different from the individual means and variances of x and y, okay. So, if this is, these are Gaussian distributed with mu 1 and sigma 1 squared, and this is mu 2 and sigma 2 squared, then this z will be Gaussian distributed with a mean mu 1 plus mu 2, and a variance which is sigma 1 squared plus sigma 2 squared. Again, a very simple exercise if you work in Fourier space and use this generating function, okay. So, if this is, if I apply this logic here, then a k and b k being Gaussian means that delta k is also going to be Gaussian, except for this complex nature here. And the Fourier transform of delta k, which is delta x, this is also exactly a weighted sum of delta k's. So, I am just doing weighted sums of Gaussians over and over with the weights being imaginary or complex numbers, okay. So, this delta x, if I work at a fixed value of x, the delta x is also going to be a Gaussian variable. It may be correlated with other delta x primes, okay. So, the delta at a particular x may be correlated with delta at x plus vector 5, for example. But in general, this is a Gaussian distributed variable. So, let's see the, some of the mathematics of this set of assumptions. Yeah? So, the Fourier transform delta k. Repeat the question. Yes. So, the question is what are a k and b k? So, delta k is the Fourier transform of delta x. The Fourier transform is a complex quantity in general. So, I have just represented it by two real variables by writing a k plus i times b k. So, it's just the things that construct delta k, that's all, okay. I also have a question. Yes. Delta x, is it related to the Dirac delta? No, no, no, sorry, I should have clarified. Delta of x is the name I will give to my field. This is, you can best think of this as what Marco was calling delta rho by rho. Yeah? So, this is the, it is the equivalent of the over density field in inflation. That is exactly what we're doing here. Yes, sorry about that. I should have probably used a different notation, but everything is in this language now. Okay. So, let's do, let's impose some conditions on this Gaussian random field. We believe, okay. So, there was a discussion about homogeneity and isotropy yesterday. And we argued that the actual universe is not homogeneous and isotropic because there are galaxies here and there and everywhere. Even the CMB is not perfectly isotropic. What you can argue, however, is that the statistics of the fields that we care about, such as the CMB and large scale structure at late times, are consistent with being translation and rotation invariant. By statistics, I mean any expectation values that you take of the fields of interest should not care about where you sit in order to take the expectation value. Okay. So, your location of sitting at a point and taking a correlation between this point and the neighboring point should not depend on the choice of where you choose, where you sat particularly. Okay. So, let me be clearer about this. So, take the two-point function delta x and delta x plus r multiplied with each other and I take an expectation value. Translation invariance will say, translation invariance will say that this cannot depend on x. It can only depend on r. Okay. Or if you do not like this, think of this as delta x1 delta x2. This quantity cannot depend on x1 or x2. It can only depend on the vector difference between the two, x2 minus x1 or x1 minus x2. So, this is called translation invariance. I mean, it is translation invariance. It is also called statistical homogeneity. So, the universe is not homogeneous, but you can try to assume that it is statistically homogeneous and then ask whether this is a reasonable assumption. Okay. So, I am still taking ensemble averages here. So, I have not yet done spatial averaging. I have not invoked a robotic hypothesis or something because I am saying that I know the stochastic process which is generating this randomness and I am given a box in which every location has a Gaussian random variable and I am capable of generating thousands and thousands of such boxes by resampling the stochastic process. I cannot remove what? Let us work with this. This will become clear. The question was can we cannot go to other variables to rotate, etc. Let us keep working with this for a second. Okay. So, this thing is independent of x. So, now, look at what this says in Fourier domain. So, this will be a summation over k, summation over k prime, one for this, one for this one, of e to the i k dot x, then e to the minus i. I am just using convenient complex conjugation at appropriate locations. So, this is x plus r because delta x is real. So, I can just write delta x as the complex conjugate of itself. Okay. And that complex conjugate gives me a minus sign here. And now I will have an expectation value of delta k times, sorry, that should have a prime here and a delta k prime complex conjugate here. Okay. So, now, I am saying that this, I mean, this is just mathematically equal to this, but I am demanding translation in variance of this quantity. So, that means that the x dependence that is contained here should vanish, right, because otherwise this quantity would depend on x. And I hope you understand why the expectation value has gone and sat only on the deltas because nothing else is a stochastic variable. These are just functions of k and x. Okay. This is the stochastic quantity. The x dependence is not sitting in the stochastic quantity in k space. The x dependence is sitting in these functions and I want to kill it. How can I kill it? I have to make sure that k and k prime are the same. Okay, somehow. What is the only thing that allows me to control whether k and k prime are the same or not? It is this expectation value. This is the only other thing in the problem that depends on k and k prime. I cannot just arbitrarily set k prime equal to k because there is a k prime and there is a k. But there is a dependence here which has not been specified yet. So, that freedom exists. And if I want to demand translation invariance then I have to make sure that this expectation value is proportional to a Kronecker delta in k and k prime which will fix k equal to k prime. And then one of the summations I can do and I will be left with a second summation. Okay. So, this is very interesting because why? I am assuming, I should have mentioned this before. I will also assume that each delta k or rather each a k and b k have zero mean. So, that means each delta k has zero mean and delta x also has zero mean. But now this therefore if delta k has zero mean, this is just the covariance matrix of delta k and delta k prime of the delta k's. So, what this is saying is that the covariance matrix of the delta k's is diagonal because it must be proportional to a Kronecker delta. So, my delta k's must be uncorrelated because the covariance is diagonal. But these are Gaussians. So, my delta k's must be independent. Okay. So, this is a very beautiful conclusion of this demand of translation invariance for the two point function of a Gaussian random field. It tells you that in Fourier space your delta k's must be independent Gaussians. So, this answers your question I think that you are actually working because of this assumption you are working in Fourier space which variables with that are naturally independent Gaussians. So, you don't have to worry about rotations. In fact, what you are doing is a complex rotation to go to the real space quantity. So, the real space quantity will not be an uncorrelated set of Gaussians, but the Fourier space quantities are. Okay. There's an important exception here which leads to the next topic of discussion which is that delta of minus k has to be equal to the complex conjugate of delta k. And this is simply the constraint that delta x is a real variable. Okay. So, go back to your Fourier algebra and figure out why this must be the case. Okay. So, that means let's count where the degrees, the stochastic degrees of freedom of this random field sit. Yes. I am not talking about different times here. Right. I am not talking about evolution so far. Yes. Yes. These are my initial conditions. Okay. All right. So, where are the stochastic degrees of freedom of this field? Because of this constraint, if I know the value of a particular delta k, the complex value of that delta k, I also know the value of delta minus k through this equality. That means if I split my grid in k space into half on one side and half on the other side, only one half of the grid is relevant. Right. So, if I think of a grid in k space and let's say this is the k z direction and then there is the, let's say k x direction and then there is a k y direction as well, only one half of this grid is relevant for me. It's an infinite grid. Okay. But it's discrete. So, it extends all the way out to plus minus this discrete infinity, but only one half is relevant. So, without loss of generality, I can say that only k z bigger than zero is relevant. Okay. So, in this sense, only half of the modes are actually stochastic quantities. But now you may think, okay, I have a random field delta x. So, there are n number of grid points and, you know, okay, I don't have a set of grid points here in x, but still I had some n number of grid points here and going to infinity. And now I'm saying that there are only half of them that are relevant. So, what happens in real space? Is it that somehow half of real space as well is irrelevant? No, because remember that delta k's are complex numbers and each complex number has two real numbers that define it and each of them is a Gaussian random variable. So, there is a factor to multiplicity in the definition of delta k itself, which is countered by a factor half decrement by the reality constraint. Okay, so the number of modes is correctly in place. There are n number of modes which are stochastically allowed to vary. So, this is worth discussing a little bit. So, let me talk about how to count degrees of freedom in a Gaussian random field. It will also show us a different representation of a Gaussian random field. Can you, Mehta, give me a ten minute warning? Or am I already there? Have I crossed it by a lot? No, it goes on to 12, 13. Okay, okay, all right. Okay, let me then just see how I want to do this. Okay, so I will not talk about length scales. What I will show you is this counting and then I will show you some interesting consequences of what follows from here. Okay, all right. So, now let's say that I am considering these k bigger than zero modes by which I mean without loss of generality k z is bigger than zero and then the other ones extend in both directions. And I have delta k is ak plus ibk. Then the distribution, the joint distribution of ak and bk, let me call it little g of ak and bk. To be quick, I am not going to write vector signs on the ks, okay, but they are of course vectors. So, this is a Gaussian distribution by assumption and the other assumption is that ak and bk are independent Gaussians. Okay, so this is part of my defining quantity. Ak and bk are independent Gaussians. Let's put it down here. So, then this thing just becomes 1 over 2 pi times some, let's call it lambda k squared e to the minus 1 by 2 lambda k squared ak squared plus bk squared times da k dbk. Okay, this is a joint distribution of independent Gaussians. So, the pdf of all the modes, if I call it some p of the collection of delta ks, this is just going to be a product over all of these k bigger than 0 modes. So, I should call this a gk, gk of ak and bk. Okay, because I am saying that my modes are independent because of that translation invariance argument. And I also have that delta minus k is delta star of k, which means in terms of ak and bk that a minus k is equal to ak and b minus k is equal to minus bk. Okay, this is a simple consequence. That also means that ak squared, sorry, a minus k squared is equal to ak squared and b minus k squared is equal to bk squared minus sign goes away. So, that means I can think of this distribution as, so this is, you know, this I can, if I can write this as being proportional to e to the minus summation over all k bigger than 0 of ak squared plus bk squared divided by two lambda k squared. Okay, because of this condition, I might as well have summed over k less than 0 here. Right, I would get exactly the same expression because I would get a minus k squared, a b minus k squared and some lambda minus k squared, but a minus k squared and b minus k squared are the same variables. Okay, as ak squared and bk squared. So, this could just as well have been a sum over the other half. So, the sum over the other half is equal to the sum over this half. So, that means I could just sum over the whole thing and divide by 2 and nothing changes. Okay, so I can write this as being proportional to e to the minus sum over all k now, but with a half of what I have here, which is ak squared plus bk squared by two times lambda k squared. Alright, so that means, so that is this and now this thing ak squared plus bk squared is nothing but the magnitude of delta k the whole squared. It is delta k times delta k star. Right, that's ak squared plus bk squared. So, this is also equal to e to the minus sum over all k of mod delta k squared divided by two times two lambda k squared. Okay, so this is the distribution, the density function of the modes where this is my eraser. And this finally brings us to the concept of the power spectrum. So, let's define p of k as two times lambda k squared. Okay, so now this finally starts looking like a Gaussian distribution with a variance which looks like p of k. Okay, so the pdf of the modes is proportional to e to the minus one half sum over all k of mod delta k squared over pk in this language. Alright, so this is where the power spectrum enters. There's another way of thinking of this where I write delta k instead of writing it as ak plus ibk I can write it as an amplitude rk times a phase e to the i phi k. It's an equivalent way of writing complex numbers. Okay, rk and phi k are real. I can ask if delta k has is constructed from ak and bk and has this distribution where d ak d bk g is given by this. What is the corresponding distribution of rk and phi k? I can change variables from ak bk to rk phi k. And again conservation of probability will tell me that this gk in terms of rk and phi k times drk d phi k. I will not prove this, but you please check. This I can write in the following way. d phi k by 2 pi times b of rk squared by pk times e to the minus rk squared by pk. So the magnitude squared of the Gaussian field is Rayleigh distributed. It's exponentially distributed. Okay, and the phase is uniformly distributed and the phase and the amplitude are uncorrelated. In fact, they are independent because I can think of this as a product of two PDFs. And because the g's for individual k's are just multiplying each other, it also means that this entire thing is independent of the amplitudes and phases of all the other modes as well. So what this is saying is worth emphasizing that amplitudes and phases are independent. They're more than uncorrelated. They are independent. So now non-Gaussianity can show up in any form that violates this. I can think of this as the defining property of a Gaussian random field, right? Because everything that I started with has led to this. These are all implications of the original definition. So I might as well use this as the definition. So if the amplitudes and phases are independent, then in a non-Gaussian field I could have amplitudes being correlated. I could have the phases being correlated or I could have the phases being non-uniform. Okay, all of these would give me a non-Gaussian field in general. So let me kind of stop here. Okay, maybe just one last mathematical thing to just emphasize the fact that in real space, things may be correlated. Consider the two-point function. We actually saw this. Okay, so let me write down an answer here and you can think about this statement. So let me define the two-point correlation function as this quantity that we saw before. You can show that this quantity, okay, and you have to do a couple of other things which I did not do here. You have to take the continuum limit. You can try this as a homework exercise. The continuum limit is defined by saying a couple of things. Wherever you see summations over k with 1 over v, I mean, you will use summations over k, you will replace in this way with integrals over d3k. Okay, so dimensions are consistent. Kronecker deltas in k and k prime, you will replace with appropriate Dirac deltas in k minus k prime. But to maintain dimensions, I need to divide by a v because the Dirac delta in k has dimensions of volume. Okay, so these are the two main things. And finally, this p of k that we wrote down earlier, which was the discrete p of k. This I will replace with v times a continuum p of k. So I will make all these replacements and then I will say v goes to infinity. Okay, and what you will then find is that the power spectrum expression becomes this. So earlier, in the discrete case, I had delta k, delta k prime star was equal to a Kronecker delta in k and k prime times the p discrete of k. This will become delta k, delta k prime star is equal to 2 pi cubed Dirac delta k minus k prime times the continuum p of k. Okay, this is the usual way in which one writes the power spectrum. And finally, you can use this to prove that now xi of r is actually just the Fourier transform of p of k. Okay, so this will turn out to be just the integral d3k by 2 pi cubed e to the ik dot r of p of k. Okay, I have not talked about rotational invariance here. In the interest of time, maybe we can use the discussion session for this. But I just wanted to show you a few images and then store. So here is a Gaussian random field in one dimension in which one is demonstrating that modes of a Gaussian random field are independent. So what one has done is to define the power spectrum as being k to the power n. Okay, so p of k is proportional to k to the power n. So what this is doing is n equals 0. So the power spectrum is just a constant. And now the first 25 Fourier modes are shown here. I just sum over the first 25 and then I sum over the next 25. Okay, so as in additionally, so the first 15 total. And what you can see is that there is some behavior. Okay, the thing is going up and down. And then when I add another 25 modes, I'm adding larger k values, the modes corresponding to larger k. So these are smaller wavelengths. So I start seeing more small scale fluctuations. But the amplitudes at these smaller scales are still similar to the amplitudes of the larger scales. Okay, so there is equal amount of variation at smallest k's as there is on the longer wavelengths. Contrast this with a similar thing with n equals minus 2. So k to the minus 2 here. So now at small k, I have more power. And at large k, I have less power. Okay, small k is longer wavelengths. Large k is shorter wavelengths. So when I do the first 25 Fourier modes, you can see that at long wavelengths, I have a lot of variation. There is this large long wavelength fluctuation. And even here, you can see that the shorter wavelengths are not fluctuating that much. Okay, there are tiny fluctuations of smaller wavelengths. When I add the next 25 Fourier modes here, you don't see much difference actually, right? Because the power at these large Fourier modes is not very much. So this is the interplay between the power spectrum and realizations of the Gaussian field. This is something to think about. Let's now also look at what it means to not be Gaussian. Okay, so here are a couple of examples. In the first case, in both of these cases in the left-hand panels are realizations of an n-body simulation which creates a density field which is not Gaussian at late times. Why it is not Gaussian at late times? We will study in the next two lectures. So this is beyond what we have discussed so far. But this is just to emphasize this last bit that non-Gaussianity can arise in several ways. So in this left set of panels, the left panel is the actual density contrast of a simulation. In the right panel, there is something which has almost the same power spectrum as the left panel. But the correlations between the phases are different. Okay, so I have just by changing the nature of the phase correlations, I have constructed something that visually looks very different. But the power spectra are almost the same. They are constructed to be almost the same. So if you had calculated p of k for each of these somehow, you would find almost the same values. So this shows you the visual implications of phase correlations. This is the reason why the cosmic web appears the way it does. It is because of very specific kinds of phase correlations. This is further emphasized in these panels where I take an n-body simulation and I just randomize the phases. Okay, I just scramble the phases. Now, depending on how you do the scrambling, you will get slightly different answers. But this is what you get with one kind of scrambling. So I think this was done by just assigning random numbers, uniform random numbers to each of the phases. And you can see what looks like almost a Gaussian random field. It is not actually exactly a Gaussian random field because in the simulation, in addition to phase correlations, there are also amplitude correlations. Okay, and randomizing the phases does not kill the amplitude correlations. But you can see that it looks more or less like a Gaussian random field. So the amplitude correlation must be weak. Okay, so these are the kind of things that one can think about a little bit. I think I'm well over time, so I will not talk about this and stop here. Thank you.