 11. Ja. Četko? Četko? To je lektur. Da. Kako smo prišli lektur? Tako. To je zelo vse. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. Četko. kaj je zelo však. Proste ne vzelo... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... probability distributions, kaj so menej potrebno be profišnjeni. In zelo, da imaš tudi svoj idej z vsehovoj aktivitaj, sem Bandi, imam vsehoj lektu 1. In včasnjih imamo aspar originalskih šedul, ki je 9.00 am – 10.30 am. Prof. Redna vseh lektu 2. Sydney Redna, in 11.00 am – 12.30 pm. Vseh imamo lektu 2. Tudi, čeče, bih dobrali skupovati, zelo pripravim, če pa nami dobrali povaštje. Však sem tukaj, da ste predržili, da smo našličili odkrivati do lektivov, način tukaj, način, začin, da smo izvrili na vseh, način, način tukaj, kako smo našličili v srečnih želih, especially if the phenomena that we are constring are time-varying. And if the fluctuations arise in a human-made device, then we call it noise, because typically we want to have some robustness built into the device to have very high signal fidelity or strength. And here we will use the term fluctuations to include noise. Thank you. Hi, Sanas. You have a question? So, we will use the term fluctuations to include noise. But understanding noise is central to understanding the design and performance of almost any physical device. So, noise sets essential limits on how such, how small a bit can reliably be stored or how fast it can be sent. So, effective designs must recognize these limits that come from physics to us. So, our first step will be an introduction to random variables and some of their important probability distributions. Then we will turn to noise generation mechanisms and close with some more general thermodynamic insights into noise. Although noise can be surprisingly interesting in its own right, we will focus on concepts that lay the foundation for topics we will explore. But along the way, I also want to share some examples of open problems that have not been sorted out yet, so that you have a flavor for what are the questions that remain unanswered, that if you find them interesting, you may go and tackle them. Eventually, we will move towards a more principled exploration of fluctuations in general physical systems by using these concepts to study the linear response regime of systems that are nominally removed from thermal equilibrium and then ask about what happens when the physical systems are driven far away from thermal equilibrium where we get very strongly nonlinear responses. For those of you who are experts in what I'm about to discuss or are already well versed in equilibrium statistical mechanics and are students of non-equilibrium statmate like myself, then you might have heard of terms like fluctuation relations and Jarszinski equality, et cetera. So, one point I want to make is throughout, we will keep an eye out for what we can learn about a physical system or its underlying process from the fluctuations it exhibits. And finally, I'm an experimental physicist by training. I'm not a theorist by training, so my approach will not be towards theoretical rigor, but rather it will be towards building physical intuition and I'll be using several examples from the real world and try to encourage how experimental data are analyzed. So, you'll be seeing plenty of experimental examples once we have passed the conceptual stage. So, without further ado, we'll get started. So, random variables. And once again, if you have any questions, please stop me and clarify it because we are working under suboptimal conditions here. So, and we will begin with expectation values. So, let us consider fluctuating quantity, we will call it x of t. So, some fluctuating quantity that I will call x of t. And it could be the output of some noisy amplifier, it could be some time varying natural phenomenon, whatever have you. We are not imposing any conditions on the nature of this fluctuating quantity just yet. So, if x is a random variable, then it is drawn from some probability distribution, which we will call p of x. So, at this stage, I'm just starting with the nomenclature and the terminology. So, this means that it is not possible to predict the value of x at any instant, but the knowledge of the distribution does let us make some precise statements about the average value of quantities that depend on x. So, the expected value, the expected value of a function f of x can be defined by an integral either over time or over a distribution, which we will call angular brackets f of x in the limit of time going to infinity, integral 0 to infinity. Sorry, I'll just take from minus t over 2 to plus t over 2 f of x of t dt. This is the same as, oh yes, thank you. I've already made my first mistake. Now I feel comfortable. This is the same as f of x times probability distribution of x dx. Or a sum, if the distribution is discrete here, I've just assumed that it is a continuous distribution. Now, if I take f of x equals 1, shows that a probability distribution must be normalized. That is, 1 times p of x equals 1. If a probability distribution p of x does exist and it is independent of time, then the distribution is said to be stationary. More will be said a little later on this, but if those of you who already have had some preparation, what we mean by a stationary distribution in theoretically proper terms would be that to say that all the, as we collect more and more data to construct our probability distribution, it is going to go through some early, it's going to fluctuate a lot before it settles down to some asymptotic form, that no longer depends upon the number of data points that went into constructing the distribution. Another way of stating the same is that the moments of the distribution become independent of the total number of data points used to construct the distribution. But in practical terms, as an experimentalist, I can tell you that it's not possible for us, we always deal with finite amount of data, so a weak form of the stationarity requirement is that if the first two or the first four moments at best can converge, then we say, ok, it is stationary. So, this is the hand waving part that comes in the experimental side of things. So, since I have already used the term moments, let's go ahead and define it. So, can you, yeah, board is still visible, so maybe I will use this part of the board separately. So, now for the moments of a distribution. The moments of a distribution are the expectation values of the powers of the observable x to the power n. So, the moments of the distribution define so, where we all know the first moment, which is the average of the mean x times p of x dx. And the mean square deviation from this is the variance. So, mean square deviation for the variance, which we typically designate by sigma squared equals, and if we do simple algebra, we know that x squared minus, sorry, plus is the same as, no, sorry, this is inside, this is outside, and that gives us, the square root of this variance is what we know as, so square root of variance is what we know as the standard deviation sigma. So, the probability distribution contains no information about the time varying properties, the temporal properties of the observed quantity. A useful probe of this is the auto covariance function. So, this is the basic set of information I will stop with about the distribution and random variables and moments. We will come back to it, because I want to cover the time varying properties a little bit. So, the auto covariance function is defined as x of t, x of t minus tau, which is in the limit that capital T goes to infinity, 1 over t from minus t over 2 to plus t over 2, x of t, x of t minus tau, d tau. If we normalize the auto covariance function by the variance, then we call it the auto correlation function, ranging from plus 1, so if auto covariance function normalized by sigma squared, we call it the auto correlation function. The range of the auto correlation function is plus 1 for perfect correlation, minus 1 for perfect anti correlation, and 0 for no correlation at all. So, a physical interpretation of an auto correlation function is that if I give you some signal, x of t, that is for simplicity I will assume that it has 0 mean and it is fluctuating about the value 0 and this is time and if I break it up into windows of time tau and ask how much of the signal at time t plus tau does this value if I make a measurement at time t, how much of its pass does it remember. So, the auto correlation function is in some ways quantifying how much memory there is in a signal that you are measuring. Another way of thinking about it is how much information do I have in a given signal to anticipate the current value given the past history. So, a simple way of looking at it is in the old days we used to have these slide projectors to make two exact copies of the signal and make them coincidental and slide one signal over the other. You can sort of see how fast the signal is changing its values. So, if you have a completely random process where the current value has nothing to do with its previous value then the correlation is completely 0 but suppose you have a sine wave where you can sort of predict what value it is going to take in any given instant then you can say something about the signal at least half the time. All right, so the rate at which the auto correlation function decays as a function of the time tau the time window have chosen tau provides one way to determine how quickly a function is varying another way as I said is to look at it is to say the auto correlation function provides a quantitative estimate of the memory in a given signal on average. Very soon we will introduce the concept of mutual information in not tomorrows but later lecture and a much more general way to measure the relationships between or among variables. Now that we have the auto correlation function you notice that this is a time domain quantity. Let's get into spectral theorems and compare its reciprocal equivalent which some of you may know well as the power spectrum. So we can just erase this part and continue. If I give you a fluctuating quantity X of t then the Fourier transform of this Excuse me? Yes? I have two questions about the correlation and if I drew the auto correlation based on tau if the behavior be power law what is the meaning of this power law behavior of auto correlation by tau. Ok, you are running ahead of the lecture so we will come to it if you are patient for a little bit we will touch upon it. Ok, thank you. But to quickly give you a hand waving answer the correlation function decays as a power law Is there a general statement I can make? I don't know. Mateo, Sydney there is no general remark we can make if the auto Is it about criticality? Criticality of time series? Hello, so I think you can think of a power law decay in position of many exponential decays with different time scales so that in some sense power law would be like normal decay, exponential decay with many many time scales so with time scale invariance if you want. So in this sense you could think of it as critical. Another issue is whether the auto correlation can be integrated so the integral of the auto correlation function is finite. In that case you don't have a system with memory in the other case if it is infinite means that it has infinite memory. But I think Mahesh will come to that. Excuse me, another question, I think it's also going through but what about oscillating oscillating oscillating and decreasing and oscillating decreasing behavior like oscillating oscillating do you mean a signal that is oscillating but the oscillation magnitude is decreasing yes yes yes you would have so I can think of this more naturally in the spectral domain which I haven't come to yet but it would mean that you have a peak at a particular frequency which is with a window that is broadening in the auto correlation function of that would be basically signal that oscillates the correlation function at short time scales will show the oscillatory behavior but eventually it will go to zero if I have a perfect sine wave for instance that is never decaying in the stationary in the sense that you don't have any decaying process in it then the auto correlation function can be analytically figured out and it will always go and oscillate about the 2 pi cycle but if I include a decay component onto the sine wave then at some point the auto correlation must decay to zero but there will be oscillatory component to the correlation function that is broadening does that answer your question ok thank you so let us ok yes so let's move to spectral theorems now so if we have the quantity x of t and we want to define the Fourier transform transform of this fluctuating quantity that would I will use the terminology capital X F for frequency in the limit time t going to infinity the integral minus t over 2 to plus t over 2 e to the i 2 pi f t x of t dt and since we have the Fourier transform let me also go ahead and state the inverse Fourier transform of this which would be x of t equals in the limit that bandwidth frequency f goes to infinity from minus f over 2 to plus f over 2 in the window e to the minus i 2 pi f t x of f df so the Fourier transform is also a random variable and the power spectral density which I will define now the power spectral density usually we just call it the power spectrum or PSD is defined in terms of the Fourier transform by taking the average value of the square magnitude of the transform so any question so the Fourier transform the power spectral density is s of f equals squared which is x of f the complex conjugate of f which I don't know if we can see this yeah it is visible I will just go down it is easier that way in the limit that time goes to infinity dt and second integral minus t over 2 to plus t over 2 e to the minus i to f t prime x of t prime ok so once again x star here is the complex conjugate of x so replacing i with minus i and we shall assume that x is real small x that this signal x of t oh yeah right here now a few words about the power spectrum itself the power spectrum might not have a well defined limit for a non stationary process examples of that you may have heard of wavelets and Wigner functions which are also examples of time frequency transforms they retain both temporal and spectral information for non stationary signals one example where we come across this is in biological sciences one example I know because I worked on it few years ago is bird song trying to analyze bird song you take a spectrogram where you have to look at both the time domain as well as the frequency domain signals so there are areas where Fourier transforms are not very useful but in most in most cases in physical sciences at least they provide a lot of useful information for us to work with ok so continuing on I see the advantage I enjoy Sydney because I get to keep the right half while I am in the left part and tomorrow we will be able to do better alright so the Fourier transform you will note is always defined for negative as well as positive frequencies if the sign of the frequency is changed the imaginary or the sign component of the complex exponential changes the sign while the real or the cosine part does not change the sign ok for a real valued signal this means that the transform for negative frequencies is equal to the complex conjugate of the transform for the positive frequencies since the power spectrum is used to measure energy as a function of frequency it is usually reported as a single sided power spectrum that is we disregard the negative frequency part it is usually reported as a single sided power spectral density found by adding the square magnitudes of the negative and positive frequency components for a real signal these are identical and so the single sided density differs from the two sided density by an occasionally omitted factor of two the Fourier transform can also be defined with the two pi in front so then we do not use the symbol f instead we use the angular frequency symbol omega which would be in the limit t going to infinity we have minus t over 2 to plus t over 2 e to the i omega t x of t dt and the inverse transform x of t is in the limit that capital omega goes to infinity 1 over 2 pi integral minus omega over 2 to plus capital omega over 2 e to the minus i omega t x of omega d omega so as you can see I am mostly covering the nomenclature in the terminology taking care of the definitions so the symbol nu measures frequency in cycles per second omega measures frequency in radians per second because 2 pi radians is one cycle so defining the transform in terms of nu eliminates errors that arise from forgetting to include 2 pi in the inverse transform or in converting from radians to cycles per second so I have a habit of working with nu because I am a very forgetful fellow in fact the joke when I landed a professor's job was he is already absent minded now he is also a professor anyway so as I said defining the transform in terms of nu eliminates errors that arise from forgetting to include the 2 pi so it is just a factor of keeping track of the 2 pi we will use whichever is convenient for a given problem the power spectrum is simply related to the auto correlation function we saw in the time domain a bit earlier through the Wiener-Kinchen theorem found by taking the inverse transform of the power spectrum so I will just show you the quick calculation this is one of my favorite theorems in all of physics the Wiener-Kinchen theorem because very often I find myself not knowing what to do I reach a certain point in a problem and I don't know what to do so when I don't know what to do I just take the Fourier transform and sometimes it works actually it has worked more often than I can believe it so let me because this will take some space ok, so the theorem I am referring to is the Wiener-Kinchen theorem it is named after Norbert Wiener the famous professor from MIT the father of cybernetics if you ever get a chance to actually read a beautiful book by him called God and Golem Incorporated it is a very beautiful book and Kinchen of course Alexander Kinchen who did a lot of work in probability theory together with Kolmogorov so as I said the power spectrum is simply related to the autocorrelation function through this theorem and so let's see so if we start by integrating the power spectrum which we defined as s of f and we take the inverse Fourier transform of it so we should be taking minus 2 pi f t df this is basically and the complex conjugate times e to the minus i to pi f t df so I am just expanding the power spectrum and now we will start working with each term so this gives us in the limit of t going to infinity 1 over t minus infinity to plus infinity and then minus t over 2 to plus t over 2 so you can see that I have basically expanded these two terms out and once we do and I have to add one more term because I forgot this so I will just add it here below minus i to pi f t df so we have three integrals we have three dt dt prime df so now yes, I keep forgetting the average throughout so actually I forgot to put the average throughout over here and so we have the limit t going to infinity 1 over t minus infinity to plus infinity we can take the terms together sorry professor I don't understand the one word t in front of the limit sorry the one over t yeah why there is now one over t and before there isn't there is, you are saying there is no one over t over here but now I have added one over t here in the previous definitions definition of other Fourier transform let's see that is the right answer okay sorry, thanks for catching it as you can see I'm an experimental theorist, not a theoretical experimentalist I'm still experimenting with theory okay, let's proceed so where is the one over t is in no no in the definition of x of y yeah just in the Fourier transform and not in the anti Fourier transform yes thank you now we club the terms in the exponentials I'm going to just write it as t minus t prime minus tau df x of t x of t prime dt dt prime this time I won't forget the angular brackets proceeding through and doing this integral we can basically write the exponential now as the limit t going to infinity one over t we have we solve one for the first integral and we just go with minus t over 2 2 plus t over 2 and now this is the delta function so I can write this as the so today morning Sydney introduced the Kronecker delta I am introducing its continuous counterpart you can think of a Dirac delta function as a Gaussian distribution in the limit that the standard deviation is going to zero that is one definition it's not the only definition of a Dirac delta function but I'll tell you what it means shortly so we get t minus t prime minus tau x of t x of t prime dt dt prime k and now when we integrate the Dirac delta function we basically get instead of t prime is it one moment let me first write down this and then I'll explain it it's difficult for me to follow both at the same time so what I have done here is I have solved one integral using the Dirac delta function and it has the property that if I take some other value x minus x zero then it takes the given value at that interval it takes the value x zero at the given point so here I can replace t prime by t minus tau for that reason because I am solving for the integral against dt prime so that leaves me with only one integral and that is the same as x of t x of t minus tau average which was the auto covariance function we saw without if I give the normalization then I get the auto correlation function so here so now I can explain to you what I meant by the Dirac delta function so the Fourier transform of delta function is what I used so so so so as I said one way to derive these relations is by taking the delta function to be the limit of a Gaussian with unit norm as its variance goes to zero so the Wiener-Kinchen theorem shows that the Fourier transform of the auto covariance function gives the power spectrum so knowledge of one is deemed to be equivalent to the other the Wiener-Kinchen and the second example of this is white noise a memoryless process with a delta function auto correlation will have a flat spectrum regardless of the probability distribution for the signal as the auto correlation function decays more slowly the power spectrum decays more quickly I have included a figure in my lecture notes so it's difficult for me to show here but let me try maybe I'll bring it closer can you folks see that so unfortunately I cannot see what I am showing you so if you have a time actually let me come closer no we'll just do it this way so what I am trying to show you here is I have a time series I have a spectrum and I have its auto correlation function so the first case is an illustration of the Wiener this is basically what I am trying to show by plotting different time series and what kind of spectra you get from there and what kind of correlation functions you get for them so here you have a completely random time series for which I have a completely flat spectrum and the auto correlation function decays quickly and is going to zero whereas if I have much more complicated signal the spectrum starts decaying quite slowly and if I plot the auto correlation function it takes a long time that means this signal has much longer memory and this is even stronger case yes thank you so here the auto correlation function decays even more slowly so it takes much longer time to decay whereas if you look at the spectrum it is decaying faster so you get the same information whether you are looking at the power spectrum or the auto correlation function from an experimental or an engineer's point of view there is not much of a difference between whether you look at the spectrum or the correlation function but from a theoretical point of view many a time it is easier to work with the correlation function in other cases it is much easier to do work in the spectral domain but if you download the scan of the lecture notes you can see this figure more prominently so it will give you a better idea ok so now one aside that I want to point out and I think I can erase this part is about the Wiener-Kinchin theorem and how it is related to another mathematical result we know if I take taking the time tau equal to 0 so if I take tau equal to 0 that means I have instantaneous correlation and auto correlation will always be 1 because the signal is always completely self correlated with itself at that very time instant but if I take tau equal to 0 the Wiener-Kinchin theorem yields what we know as the Parseval's theorem so let's do that quickly x of t x of t minus tau s of f and I am working in the spectral domain now minus i2 pi so f of t df and that gives me integral let me write it in the next line which implies that and that is basically the Parseval's theorem that is coming out of the Wiener-Kinchin theorem when I set the time window to 0 excuse me so the average value of the square of the signal which is equal to the variance if the signal has zero mean is equal to the integral of the power spectral density this means that true white noise has an infinite variance in the time domain although the finite bandwidth of any real system will roll off the frequency response and hence determine the variance of the measured signal so I have said something that stands quite well usually because in the real world we don't have anything as real white noise true white noise so if you were to work with a function generator and we say I am feeding you white noise the function generator always has a finite bandwidth maybe 15 kHz 30 kHz what have you so we are always approximating white noise within some frequency range so what I have just said is tantamount to saying infinite variance in the real world case because of the finiteness of the bandwidth of any physical system we are working with we are always going to see finite frequency response and then there will be a roll off of the frequency response so that sets the limits on the variance of the signal if the division by time capital T is left off in the limiting process as I did earlier defining the averages on both sides of the passables theorem then it reads that the total energy in the signal equals the total energy in the spectrum so the integral of the square of the magnitude that is what it means so that is what I wanted to share with you about spectral theorem so we saw the definition of auto covariance function and normalizing it we get the auto correlation function and we saw that the auto covariance function is equivalent to the power spectral density in the information that it represents and the relation comes to us through the Wiener-Kinchen theorem so now we will go back to probability distribution so as we started with random variables we looked at a little bit of the time variance of the signals variation of the signals and now we go into the probability distributions so this is mostly the pedagogical part of the conceptual part and building the basic fundamentals Excuse me I have a doubt before we go on in each step of the Wiener-Kinchen derivation you kept the expectation value symbol but shouldn't you drop it when you wrote the limit that t goes to infinity on 1 over t because that because I think that is a definition of the expectation value as you gave it to us Yes and that's where the 1 over t come from, not from the Fourier transport Agreed Thank you Alright So let's proceed to probability distributions How are we doing on the time? Are we okay? Alright, so we start with probability distributions I think we can zip through this a bit quickly because it's quite likely most of you have seen the elementary distributions but I'm just covering it so that we are all on the same page with respect to terminology So far we have taken the probability distribution p of x to be arbitrary I haven't said anything about its functional form In practice three probability distributions recur so frequently that they receive most attention and they are the binomial the Poisson the Gaussian Their popularity is due in equal parts to the common conditions that give rise to them and to the convenience of working with them The latter reasons sometimes outwez the former many of us tend to use them because they are convenient to use not because they are relevant to the problem but they are a good starting point to work with most often So as I said the latter reasons outwez the former and for that reason these distributions are employed far from where they usually apply for example many physical systems particularly those driven strongly far away from thermal equilibrium where very nonlinear responses abide have long tail distributions that fall off much more slowly than these three distributions do We will look at a class of these long tail distributions namely power law tail distributions later on So let's start with the binomial distribution So if I consider many trials of an event that can have one outcome with a probability p So binomial we have many trials of an event that has outcome with probability p such as flipping a coin and seeing a head and an alternative with probability 1-p such as seeing a tail In n trials so if I have n trials the probability p sub n of x to see x heads and n-x tails independent of the particular order in which they were seen is found by adding up the probability for each outcome times the number of equivalent arrangements which we write as p sub n of x equals n xp to the x times 1-p to the n-x Is this visible? There is n factorial divided by n-x factorial times x factorial So this notation is usually referred to as n choose x So this is the binomial distribution The second line follows by dividing the total number of distinct arrangements of n objects n factorial distinct arrangements of heads which is x factorial and tails which is n-x factorial The easiest way to convince yourself that this is correct is to exhaustively count the possibilities for a small case small case because you will see a combinatorial blow up if you start increasing the number of the number n Then we go to the Poisson distribution So now let us consider events such as radioactive decays that occur randomly in time Dividing time into n very small intervals so that either there are no decays or one decay in any one given interval So we have chosen a time window such that there is at most one decay in that given time window or no decay And let p be the probability of seeing a decay in an interval If the total number of events that occur in a given time is recorded and this is repeated many times to form an ensemble of measurements then the distribution of the total number of events recorded will be given by the binomial distribution If the number intervals n is large then and the probability and the probability p is small the binomial distribution can be approximated using Stirling's approximation for large n which says n factorial is approximately equal to square root of 2 pi times n to the power n plus half e to the minus n which means if I take log n factorial will approximately go as n log n minus n So this gives us the Poisson distribution I am not plugging in this approximation into the binomial to show you you can work it out as as an exercise but it is quite likely you have already worked it out during your earlier preparation for elementary statmek this gives us e to the minus n n to the x divided by x factorial where I have now introduced capital N and capital N is small n times p is the average number of events so the Poisson distribution is very common for measurements that require counting independent measurements of any given event so naturally it is normalized so the normalization for Poisson is is equal to 0 to infinity e to the minus n or x over x factorial is e to the minus n times summation x equals 0 to infinity n to the power x over x factorial which is e to the power plus n so therefore this is equal to 1 if x is drawn from a Poisson distribution then its factorial moments sorry, its factorial moments defined by the following equation have a simple form x times x minus 1 and so on to x minus m plus 1 is n to the power m relationship is in fact one of the benefits of using the Poisson approximation with it it is easy to show that the expectation of the variable x is equal to n and its standard deviation sigma is square root of n which in turn implies that relative standard deviation in a Poisson random variable is 1 over square root of n so the fractional error in estimate of the average value will decrease as the square root this is something we very routinely work with when doing experiments and we ask what is the error in our measurement and if we can assume that the measurements are drawn independently then we can say it goes as 1 over square root of n so let me give you one example where people make this assumption and it is erroneous so few years ago I was working on fluctuations in wind power and there is an influential researcher who wrote a scientific American article explaining that okay so the problem with wind power is that you have atmospheric turbulence and so if the turbine is generating power from the wind blowing past the turbine the turbine equation says that the power generated goes as wind speed cubed so if the wind speed is fluctuating then power is also fluctuating because wind speed cubed will also vary so in fact the fluctuations are getting magnified so the electrical grid was not designed to deal with this the electrical grid we have today was designed to deal with constant power generation that comes from say coal fired or nuclear power plant which are high inertia systems so the question in the renewable energy community is how do we deal with fluctuating input into the power grid and what are the robustness parameters that we must design the future smart grid for to deal with these fluctuations in order to do so you have to understand the character of these fluctuations so there was an influential scholar who wrote a scientific American article saying the wind is always blowing somewhere so let's just interconnect all of the wind turbines and wind farms and we are fine and at some limit at some large wind turbine number of wind turbines we will get a perfectly DC signal the problem is there are correlations in the atmospheric flows and they correlate the wind turbines and the fluctuations in the power output so we cannot make such simple assumption but his assumption was coming from the fact that as he starts summing the outputs from several wind turbines the assumption he made was that they are independent producers of power that they are not correlated and so he used the 1 over square root of n as his yard stick and went and did his calculations so this is a common pitfall sorry question was there any question so this is one example where we tend to take 1 over square root of n for granted if the underlying process is not drawn from an independent measurement then it doesn't work for us so let's see we have the fraction so I told you the fractional error in an estimate of the average value will decrease as a square root of the number of samples this important result provides a good way to make a quick estimate of the expected error in a counting experiment and I gave you one example where it fails miserably in fact it is possible to show by doing scaling techniques that if you combine the output of wind power from several turbines and several geographically distributed wind farms there is actually a correlation length that is set by the atmospheric turbulence of around hundreds of kilometers and if you average all of them there is a limit to how much you can smooth these fluctuations these fluctuations will smooth up to a limit and then they will halt you have to come up with other engineered mechanisms to deal with them so that's one example where a Poisson process doesn't help us so we'll go to the third distribution now doing fine on time and I think this is a distribution you all are quite familiar with which is the famous Gaussian distribution so the Gaussian or the normal distribution excuse me just out of curiosity when you talk about wind turbines what is the quantity that is modeled as the random variable distributed as the Poisson what is the quantity? the total power so let us stop for a moment all of you know about the Gaussian but I don't know how many of you know about wind turbines so if you have a wind turbine and you have wind blowing past it to turn the rotor and generate power this is a relation that it's a dimensional argument that goes all the way back to Rankine of Rankine numbers and all that the power generated by the wind turbine is less than or equal to 16 over 27 times 1 half rho A v cubed where 16 over 27 is the theoretical upper limit on the amount of energy kinetic energy available in the wind that can be extracted or the total amount of power available in the wind that can be extracted by the turbine rho is the density of air A is the cross sectional area of the rotor and v is the velocity so it is possible this is a dimensional argument if I forget if I take all of this into 1 I can write a time dependent form of this equation that this part I'll just call k times v of t cubed so basically the power generated by a wind turbine goes as the wind speed cubed so if the velocity is fluctuating due to unsteady flows of the atmospheric turbulence then the power will also fluctuate and it will fluctuate as a cubic function now if I have say if I start summing so let's say I have a geographical area spanning some hundreds of kilometers of radius and I have several wind farms within it each with several turbines I am going to sum all of these guys the power from all turbines within the wind farm and I am going to sum the power output from all of these wind farms at the grid level so the question is each turbine is outputting a fluctuating quantity and I am summing all those fluctuating signals at the farm level and I am taking many such geographically distributed farms and I am summing all their power outputs at the grid what are the fluctuations I see at the grid level this equation or rather this relation is for the single turbine but I could always come up with summation over i equal 1 to n p sub ij of t goes as basically v ij of t cubed where I am dealing with the ith wind farm where I have total of n wind farms and within the ith wind farm I have j turbines going that range from 1 to m so I can deal with sum over all of these this quantity fluctuating now the fluctuations of this combined quantity are different from the fluctuations of the individual turbine so the question is how is the smoothing taking place if I assume that each of these turbines is an independent producer of power that the fluctuations of one turbine have absolutely nothing to do with the fluctuations of another turbine then I can go as 1 over square root of I can't use capital N anymore plus m as how the fluctuations smooth out but you never see it that way so that was the example I was giving you where this fails completely did that long answer explain your short question yeah more or less I think if you're interested please drop me a mail I'll send you the paper so just for your reference you can just drop me a mail here and I can send you the material ok there are lots of questions actually you will be surprised every form of renewable energy fluctuates because of the natural variability in its energy source so there is a statistical physical framework waiting to be formulated for fluctuations in renewable energy that has not taken place yet why? because the engineers and the policy makers who work on renewable energy systems have not studied statistical physics but when it comes to fluctuations statistical physics is a natural home for any principle study of fluctuations so I think there is an opportunity for statmic community to make a big dent over there and dynamical systems community too ok let's get back to gaussian so I can write down the gaussian distribution for you first and then we will look at properties the functional form is 2 pi sigma squared e minus x minus mu whole squared over 2 sigma squared ok I am almost out of this chalk remind me to tell you a story about the chalk I am using today at the very end of this lecture I will even send you a link to a very nice video involving mathematicians in this chalk ok so the gaussian distribution with this functional form has a mean mu and standard deviation sigma and the integral form and the integral form from minus infinity to plus infinity is 1 so that it is all properly normalized the partial integral of a gaussian is an error function which we write as integral 0 to some upper bound y e to the minus x squared over 2 sigma squared dx plus 1 over 2 prf of y over square root of 2 sigma squared and since the gaussian is normalized we have the error function for infinity equals 1 the gaussian distribution is common for many reasons and it is quite likely most of you already know this one way to derive it is from an expression around the peak of the binomial distribution for very large n so if I take so if I take p of x equals n factorial divided by n minus x factorial times x factorial p to the power x 1 minus p power n minus x and take log p of x equals log n factorial minus log n minus x factorial plus x log p plus n minus x log 1 minus p minus n minus x I think I forgot something minus I forgot I will just put it here ok so finding the peak by treating these large integers as continuous variables and setting the first derivative to 0 shows that this has a maximum at x approximately equal to n times p and then expanding in a power series around the maximum gives the coefficient of the quadratic term to be minus 1 over 2np times 1 minus p because the lowest non-zero term will dominate the higher order for higher orders for large n sumitlja gaussian with mean np and variance np times 1 minus p in the next section we will also see that the gaussian distribution emerges through the central limit theorem and that is the more conventional form the better known form as the limiting form for an ensemble of variables with almost any distribution so for these reasons it is often safe and certainly common that an unknown distribution is gaussian just as I said when I don't know what to do I take a Fourier transform I think it is safe to begin with when you don't know what functional form a distribution has it is safe to start with a gaussian distribution so let me give you a real world example here if I am not an expert in control theory but within the community of mechanical engineering those who study applied dynamical systems control theory is a very well studied subject and it is of great interest in many of these automation problems especially these days with quadrocopters and drones and robots and what not in fact it is quite likely that most of you in the audience have purchased stuff from amazon then it is very likely that you have in some way or the other received your package through such a control system automated control system that was originally designed by kiva systems which was bought out by amazon robotics later the sub class of control theory what is known as stochastic control theory or robust control theory deals with problems where you have perturbations that you cannot describe that you cannot anticipate and typically what they do is they assume that the distribution is Gaussian now let me point out a problem that is unsolved and I am actually designing an experiment to point this out you have a drone, a quadrocopter and it is going about in an unsteady environment the requirement for an engineer either due to policy imperatives or in the interest of self preservation they either want to go for fuel efficiency or they want to go for safety or both so it is a question of what issue you are optimizing for so you have to provide a control system, design a control system for your drone or your quadrocopter that may be delivering pizza or maybe medicines in the middle of the Sahara desert where normal transport does not work so in these mission critical processes you have to design a control system there are two ways to deal with drone going through an unsteady flow medium one is you either over sample at very high frequencies and basically make it piecewise linear and do the normal control problem or you get some information about the unsteady flow that your drone is being subjected to and try to anticipate what is coming so typically the kind of unsteadiness that they put into their mechanism is a gaussian fluctuation but I can show you that it is impossible for this distribution to be gaussian hopefully hopefully we will be able to reach lecture 9 where I will actually show you the functional form I can trace out the kind of distribution you will see most of you have seen the gaussian distribution it looks like this, if you have X and P of X the famous bell curve if you ask yourself the following question let's say I have a random variable capital Z which is constructed from the product of two other random variables let's say that the random variables X and Y are normally distributed then what is the functional form of the distribution Z the reason I am asking this general question is when you are dealing with power power of any form it is always a product of some quantity if I am talking about electrical power P is I squared R or V squared over R where V is the voltage and I is the current in a fluctuating with a gaussian distribution then it's a product of two gaussian distributions or if the voltage has a gaussian fluctuating form then it's a product or in case of wind power that we saw where it goes as V cubed we know in turbulence theory that the velocity fluctuations are always almost gaussian distributed differences in velocity which we study extensively in turbulence in gaussian distributed but what is the distribution of V cubed these distributions are not gaussian they usually have this form they have fat tails and very pointed curbs this is nowhere near gaussian so the question then is can I come up with a general form for the kind of distribution I expect to see for a random variable Z that is constructed from the product of normally distributed random variables that is a question that is answered from statistics and from statistical mechanics but when you come to robust control theory they assume that they are looking at power which couldn't possibly have a gaussian distribution but because they can solve for a gaussian problem they assume it as a gaussian procedure with it but it doesn't work so what do they do at the end of the day they oversample their system and try to work out through piecewise linear control fits battery bleeds very quickly and you lose your gain so these are real world problems where all these concepts that we learn kick in and these are problems waiting to be solved anybody who cracks this problem is going to be a multimillionaire so there was I before I broke into robust control central limit theorem and for these reasons it is often safe and certainly common to assume to give you a counter example and as I said if you solve this you are going to make a lot of money the Fourier transform of a gaussian distribution has a particularly simple form namely the gaussian with the inverse of the variance so there are some of the nice properties of a gaussian which make it so attractive sorry just one really small question could you repeat exactly what is the problem to be solved to get multimillionaire so suddenly everybody woke up ok so I mean I followed along but I didn't really understood the precise problem like first go and study control theory step one then go and specialize in robust control theory or stochastic control theory within robust control so robust control theory deals with a system how to control a system that is subjected to external perturbations of fluctuations we may know some information about those fluctuations but we may not know much in the general case we have to assume in the real world system where a drone or any other system has to work in any number of hostile environments it is impossible to predict all the possible environments and fluctuations that are going to be subjected to so there is a certain robustness criteria that goes into designing the system so in the robustness in the robust control theory when they want to assume the kind of perturbations that the system is going to be subjected to they assume that the fluctuations that the system is being subjected to is gaussian so the inputs that go are the mean and the standard deviation but most of these systems that we are dealing with are dealing with a fluctuating quantity which is a power but power I can guarantee you could never possibly have a gaussian distribution in most cases because it is always constructed from product of random variables so if I have power as I said electrical power is V squared over R or I squared R mechanical power is in the case of a quadrocopter electrical power can be defined mechanical electrical mechanical would be the torque times the rpm revolutions per minute of the propellers or the power coming from the unsteady medium which will go as velocity cubed you have three different definitions of power all three are definitely related to each other but what you notice is the power is always a product of fluctuating quantities so if instead of V the voltage I write it as a time varying quantity because the voltage will always have fluctuations if I assume this voltage is normally the fluctuations in the voltage are normally distributed then the product of a normally distributed quantity with itself couldn't possibly give you a gaussian distribution again that was my point so what is the functional form of some random variable z that is constructed from the product of normally distributed variables you solve that problem you are doing good did that answer your question yes but like for two identical gaussian distributions do we get a key distribution I haven't said this problem is answered so I can't answer your question ok so if I had answered this I would be a multimillionair already ok but I am just a lowly academic ok so I am about to enter ok so we will stop in a bit again drop me a line and I can send you details about the problem these are a bunch of problems are right in my notebook and I keep and sometimes I get some idea when I am sitting in a talk and I ask how does this bear upon this question that I had noted down because I don't know the answers to all the questions I have written down ok so as I said the Fourier transform of a gaussian has a simple form which is basically a gaussian with the inverse of the variance so I have 1 over square root of 2 pi sigma squared integral minus infinity to plus infinity e to the minus x squared over 2 sigma squared e to the ikx dx is e to the minus k squared sigma squared over 2 remember this you should never need to look up the transform of a gaussian just invert the variance you are done the variance because of this relationship the product of variance of a gaussian and the variance of its Fourier transform will be a constant was there any question so the product of the variance of a gaussian and the variance of its Fourier transform will be a constant this is the origin of many classical and quantum uncertainty relationships so let me stop there today because the next topic I wanted to touch upon was to introduce the central limit theorem in very simple terms because we are almost yeah it's 359 so maybe we should stop there so we will be able to finish this so we will continue with the central limit theorem tomorrow so any questions please questions of course we want to know the story of the chalk ah yes so this chalk it's a japanese chalk called Hagoroma and I came across it from a youtube video I'll post the link I don't know if I'm ready to send the link to all the students yeah I'll send the link of a video to Erika and she will share it with all of you but so the japanese mathematicians are using this chalk or just run a search we don't need Erika to send you a link look for Hagoroma and you will find the video and you know the rest of the story so I come from an institute that does not have blackboards anymore so I bought this chalk and I have traveled all the way from Japan to Italy to try it out and I can tell you it's a really good chalk actually I have one more person to confirm it for me Sydney used it today morning and he was very happy with it so if you want to find out what is good about it he can I don't know if ICTP rules will now permit them to come here and try it out one after the other we'll think of some way alright with that let's stop but that was one question I answered about the chalk any other questions you will have noticed that the conceptual material I'm covering is very basic and I might be making mistakes there because I really don't care much for that rigor but along the way I really want to share these points like the wind power fluctuations or robust control because all of these are real world problems in industry that are somehow coming back to statistical physics and dynamical systems and they're all waiting to be solved by us so I want to leave you with a bunch of these open questions and you can go away with them, think about them and if you have something to do if you think you have made headway on them go write it up or patent it and do us all proud alright, thank you so very good so we resumed the zoom connection tomorrow at 529