 Now so far we talked about random variables both discrete and continuous random variables but we did not say anything about time dependence, we did not say anything about how these random variables could possibly change with time. From now on we will look at random variables which change with time, which evolve with time and then you have what is called a random process or stochastic process. So this is going to be our next topic which is concerned this subject is concerned with the study of random variables with some rule for the evolution of certain probability distributions as a function of time okay. Now the first thing we have to appreciate is that random process if you sample this random process at discrete instance of time you get a time series with values for the random variable drawn from the sample space of this random variable. So if we for instance say that this random variable could have values x1, x2 dot, dot, dot etc then from this set of values if you sample this process at various instance of time say t1, t2 and so on these are the sampling instance. Instance of time or particular values of the time variable the general technical term is epoch sampling epochs and this is these are elements of the sample space of the random variable x okay. Then one asks for the probability that any of these variables values is attained at any given instance of time okay. Now we need this is very cumbersome notation so what I will do is to just take the index here and label this value by that index okay and for this index I will call use the symbol j, k and so on and so forth when I have too many of them I will call it j1, j2 etc etc. So the question is what is the probability for a discrete random variable for instance what is the probability that at some instant of time t1 the value happens to be some j1 or x sub j1 so I will call that the one time probability or if it is a continuous random variable then I use I will interchangeably use this j1, t1 but I will be careful to indicate the fact that this thing is a continuous variable here. For the moment of course let us leave it at discrete then I have this I could also ask what is the probability that you have the value x sub j2 at time t2 and the value j1 x sub j1 at time t1 that is a different function this is a joint probability it is a different function from this there are two time arguments here to keep track of that let me call this p2 and let me call this p1 and clearly this can go on I look at the three time probability the four time probability and so on. Now to specify this random variable completely I need to tell you all these probability joint probabilities so the first thing we learned is that a stochastic process is described by an infinite hierarchy of probabilities or in the case of continuous variables probability densities but it is an infinite hierarchy to start with of course with this formidable problem is not much one can do unless you start making certain simplifying assumptions but there is one thing we can do which is not even an assumption and that is the following you can always take the n time probability this n time probability can always be written as equal to a probability that you have jn tn given that all these earlier things happen. So I am assuming here of course that t1 less than t2 less than less than tn and I am writing the earliest times to the right and the latest times to the left that is the standard notation and this n time probability can be written as the product of a conditional probability and a vertical bar will denote the probability of whatever is on this side given whatever is have on the right hand side of the bar that will be my notation. So we have jn minus 1 tn minus 1 all the way up to j1 t1 this is again an n time argument probability so it is still n but this however is now a conditional probability conditional whereas this one is just a joint probability multiplied by the probability that all these events have occurred that is jn minus 1 tn minus 1 dot dot j1 t1 and that is a function of n minus 1 variables so it is p n minus 1 and in turn you can take this quantity and write it as a conditional probability of this last event have occurring given that all the other events have occurred and so on. So finally you can write it as a product of an n time conditional probability and n minus 1 time n minus 2 time right up to the single time probability p p1 of j1, t1 okay so that is one simplification one can do right away immediately but even that is not very helpful because you still have this formidable task of specifying all these conditional probabilities that these things have happened okay and that is where the general theory is one can proceed further with this and so on but we are going to restrict ourselves to a very special instance a very special kind of random process where the memory is a short term memory in a very specific sense okay. Now this implies that the probability that this happens at time tn depends on all that happened earlier and all earlier instance of time okay but this is like saying you have a memory in this process now experience tells us with random process of various kinds tells us that in nature very often if you use the right number of variables if you take a complete set of variables in a very specific sense then it is short term memory that occurs never long term memory no history dependence in a certain specific sense just to give you an example if you look at to give a sort of trivial example if you look at Newton's equation for a particle moving in space this looks like a second order differential equation in time okay so not only to tell you what so if you want to plot the trajectory of a particle you have to know not only the position of the particle at a certain instant of time but also the slope of the trajectory at that instant of time this is like saying really to specify things completely the fact that the force specifies the acceleration rather than the velocity tells you that you need both the initial velocity and the initial position which means that dynamics is really happening in a phase space comprising the configuration space of coordinates as well as the velocity components or the momentum components right and once you put it in terms of those extra variables the full set of variables then the equations of motion of first order differential equations so the initial state any given at any given instant of time will determine once you solve the equations of motion will determine the future state of the system right so that is an example where the dynamics is really first order in time so that the future is determined by the initial condition or the present and not how you reach that present in exactly the same way as in quantum mechanics where Schrodinger's equation is a first order differential equation in time for the state vector so if you tell me the state vector at an initial instant of time and the Hamiltonian which gives you the rule of evolution you can predict what the future state of the system is going to be in principle so this experience tells us that it may be worthwhile looking at those random processes or stochastic processes where this conditional n time probability is not dependent on the earlier variables other than the one immediately preceding here so if this is equal to p n now it is no longer p n but it is p j n t n j n minus 1 t n minus 1 and it is just a two time probability so it is p 2 if this is equal to this quantity here for all n so p 3 p 4 p 5 etc it does not matter every one of those things gets truncated to just this here if that happens then it is called a Markov process so again to repeat a Markov process it says nothing about the form of the probability distributions does not say anything about whether it is a Gaussian whatever those things come later it says something about the level of memory in the process sometimes there are cases where you would like to have this dependent on the preceding two instance of time and then it is called a two-step Markov and so on but I am not going to get into that now this is our straightforward definition of what a Markov process is okay does not always have to happen but it turns out that if you model physical systems appropriately with the right number of variables almost always you end up with a Markov process notable there are notable exceptions we will talk about a few of them but the fact is that in most cases experience tells you how to model a random process and in general the most common one that you use always is a Markov process okay now exactly as in the vector example I gave of a particle moving in space it might so happen that the random variable is not a single random variable but a set of random variables couple random variables then it would be a vector process of some kind maybe and then it is a Markov process still in terms of memory but there won't be a single index here but you need now several variables in labels here for all the variables okay so that is possibility we keep in mind okay and that is a matter of notation which we can sort out if the occasion arises but this is what I mean by a Markov process this thing here a similar thing for continuous processes instead of probabilities the same thing is true for densities okay and then I will call it a conditional density in this case but it is a two time conditional density here as soon as you have this you immediately see that this joint probability simplifies enormously so if you make the Markov assumption this becomes equal for a Markov process to a product of P P twos of J R plus 1 T R plus 1 given J R and T R and product from R equal to 1 to n minus 1 out here so the last one is this guy here multiplied by a P 1 of J 1 so at once simplifies okay into a product of two time probabilities multiplied by a one time probability P 1 so the problem now reduces to specifying these two quantities and once you do that then we have all the information we need for this infinite hierarchy of probabilities okay so it is a great simplifying assumption the Markov assumption is a very very it immediately changes the complexion of the whole problem and makes it much more tractable problem to hand okay as you will see this itself includes in it enormous amounts of complexity but it still makes the problem quite tractable so we will focus on such cases here we will look at many examples of Markov processes there is another further simplification that can happen and that has to do with the fact that the process that we are talking about may not change statistically speaking as time progresses in other words it could be exactly the same process statistically no statistical properties change as a function of time okay in other words the randomness is not aging in some sense there is no systematic drift or anything like that if that happens that would be the analog of an autonomous dynamical system where you do not have explicit time dependence in the way in the dynamical variables evolve in the dynamical rules they will be satisfying some satisfy some kind of differential equations but then those differential equations do not explicitly involve the time okay so the analog of that here would be a process where the origin of time does not matter and therefore this quantity here is a function only of the elapsed time tr plus 1 minus t sub r okay and what would that imply that is called a stationary random process so stationarity implies statistical properties do not change with time at all so it implies that p2 of say k t given j at time t prime this quantity is a function of t minus t prime and not of t and t prime separately okay so you could write this as equal to p2 of k t minus t prime j 0 in other words I can shift the origin of time and nothing happens the probabilities do not change okay and very often I am going to make life easier and write this as p2 of k t but k and j are state labels or they stand for sample space elements I am going to use this kind of notation all the time this t minus t prime j I drop the 0 here it is understood that it is a function of a difference of time arguments here what would it imply also for this quantity p1 of j, t this should be independent of time so all time dependence disappears in the one time probability so this is equal to p1 no t dependence at all and that together with the Markov assumption here so for a stationary Markov process for a stationary Markov process this thing here implies this is equal to a product from r equal to 1 to n minus 1 p2 of j r plus 1 t r plus 1 minus t r j r so we now just have a two time probability to handle a one time probability time dependent probability conditional probability to handle and an absolute probability here okay so a stationary Markov process is completely defined if you tell me this quantity as a function of t minus t prime and this quantity now all the models we talk about are going to specify these two quantities okay and if there is no confusion once we reach that stage I will often drop this one and a two the moment there is a time argument and there are these arguments with this bar I know I am talking about a conditional density or probability and this for a probability itself in this case okay you could put in one more bit of physical assumption or a physical input and that is the following although this is not absolutely essential in general we more need it but it will so turn out that you could ask what happens to this quantity k t j as t tends to infinity notice I have dropped this two here I mean it is supposed to be there but I just dropped it for convenience what would you expect would happen to this quantity this probability conditional probability as t tends to infinity well you might expect intuitively you might expect that this quantity should tend to something which depends on k but shouldn't depend on the initial condition j initial state j as t becomes very long memory is lost completely so I would kind of expect in the same way I expect autocorrelations to die down and so on and so forth and expect that this tends to something which depends only on k and therefore it is just the probability k with a one here but this needs to be established we need to make sure this really happens okay on the other hand of the system as in the common example of some system in thermodynamic equilibrium for example I would expect the statistical properties aren't changing then if I choose a particular initial condition and ask what happens conditioned upon that initial state if some variable changes with time and I find find some quantum some expression for the probability associated with it I could ask what happens if time elapses a long time elapses and the system nothing is happening to it statistically I'd expect it would this relation to hold good for instance if this were the velocity of a molecule and I start with a particular molecule whose velocity is some given number I specify and then I let it lose among all the other molecules and I ask what's the probability or probability density that it has a certain given velocity a long long time after I started I'd expect it would just attain the equilibrium density all over again okay so I would expect it would tend to the Maxwellian distribution on this side independent of what initial velocity I started with okay that's a physical expectation if the system has enough junk in it and there are enough influences which are completely independent of each other randomizing the whole process then I would expect this to happen in technical terms it's one says that if dynamical system has a sufficient degree of what's called mixing this will be true in general so we will take a look at examples when this happens but remember that we've already assumed that it's a stationary process okay if it's not stationary then of course this is even this is not true this is a time argument sitting here and it could well be that the initial state is remembered okay so this poses an incredible amount of simplification once you have this the moment you have a property like this it means the entire process is determined completely by this one time conditional probability because from that you get this zero time thing and you get all the other joint probabilities as well through this formula so a stationary Markov process with this property here of mixing actually is determined completely by determining this probability density this probability conditional probability and then it reduces to a question of writing down equations for this probability in general okay so the processes we will look at a large number of them will fall into this category and we will write down specific equations for this quantity if you think a little bit you realize that any modeling that you do for physical systems of probabilities would be always to write down equations for conditional probabilities or probability densities you need to know given something then what's the probability of something else happening and so on you never say something about absolute probabilities itself it's always conditional probabilities so rightly conveniently for us joint probabilities reduced to conditional probabilities okay so all we need to do is to model these conditional probabilities appropriately and then we are done okay so it's important to distinguish between several assumptions here first the Markov assumption reduce things to one-step memory if you like and then the stationarity assumption reduces time arguments in this fashion here that's important remember that it does so for an arbitrary n no matter how many time arguments you have out here this conditional probability depends only on the preceding instant of time okay that instant is not specified it's arbitrary some earlier instant of time and that's it that's all you need and then if it is true for every such earlier instant of time you have a Markov process okay so in a sense this process is kind of renewing itself at any instant of time is forgotten the past and now it looks at what it does next the future so it's not surprising that there are going to be renewal equations and so on associated with this sort of process okay for instance you could ask can I write down an equation for this P and now let's use symbols like J, K, L, etc because we're not going to deal with these n time probabilities anymore but essentially just one-step memory so let's simplify notation and ask what is this likely to be K, T, J with 0 on this side okay now clearly if it's a Markov process which has this property of renewing itself all the time let's look at a case where J, K, etc can take values 1, 2 up to some n in other words the sample space is discrete and you have capital N of these possible values we could of course subsequently look at cases where n tends to infinity or becomes continuous and so on and this is equal to on this side the probability that you started with J and reached some intermediate state L at some intermediate time T prime so on the time axis here is 0 and here is T prime here is T and then in the remaining time you move from L to K T my so let's write it out properly P of T prime started with J but reached an intermediate state L and then the probability that you went from that L in the remaining time T minus T prime to the state K but you could have done so through a variety of parts all kinds of intermediate states L would have been allowed so you have here a summation L equal to 1 to N so for a stationary Markov process this tells you because it's not dependent on any earlier instance the memory is a one-step memory it says to go the probability of going from an initial state J to a final state K in time T is the probability of going from J to L at some till some intermediate time T prime and then in the remaining time going from L to the final state K this desired state K and you must sum over all the intermediate possibilities or paths and that's the summation over this is like a chain equation it's got a technical name it's called the Chapman it's called the Chapman Kolmogorov equation it should really be called the Chapman Kolmogorov Bachelier-Smolkovsky equation that are several people were associated with this equation but it's popularly called the Chapman Kolmogorov equation in this case. Now if these were continuous random variables then you would have to integrate over this state the intermediate state L rather than sum over it but that's just a matter of notation in this case what do you what's the first thing that strikes you about this equation well first let me say that this is not restricted to Markov processes there are other processes which also obey the chain equation but Markov processes obey it so it's not uniquely the property of Markov processes yeah but fixing T prime here we're not fixing T prime so this is true for any T prime any T prime in zero T as in for each L the T prime that we choose is the same right yes yes of course yes certainly you must sum over all intermediate states at some intermediate instant of time so if you draw a picture here's the initial state here's the final state here are all the possible intermediate states they're going from propagating from here to there the here to here in this fashion and there's a time slice here at this point at time T prime so you're summing over all those possibilities and adding the probability probabilities appropriately to get this so what is it that strikes you about this equation immediately as a mathematical equation this is not so tractable as it looks because it's a non-linear equation this equation here is not linear in this P okay and therefore it's a fairly complicated equation it's not immediately obvious what the solution will be yes is it T plus T prime or T minus T prime well the time interval left here is T minus T prime so that's all the time available for the system to go from the intermediate state to the final state so it's this interval multiplied by that interval processes but they need not be Markov is that right well this chain equation yes they're stationary processes but there's a wider class of processes called renewal processes for which this equation would also hold good it's called it's an example of what's called a renewal equation right but we are concerned here with Markov processes so I'm not going to get into the technicality of looking at processes other than that if time permits we will talk about such renewal processes when we do Poisson processes and so on then I'll mention what happens if you look at a more general case so this non-linearity makes it intractable in some sense and if it's a continuous variable then for the probability densities you have an integral equation because there's an integral on the right-hand side which is non-linear and therefore fairly hard to solve it would be convenient to write this in terms of a linear equation for this P for this purpose one introduces the following idea doesn't always work but when it does this is what happens so one introduces the idea of a transition rate and the idea is the following consider this probability here for extremely small values of t very close to 0 or this probability for extremely small values of t minus t prime close to 0 so if you look at P of k delta t j this is state j at time 0 and the state k at an infinitesimal time delta t what would you expect this to be proportional to if delta t goes to 0 I'd expect that it's going to remain at the initial state I'd expect a delta function there right but if delta t is infinitesimal then I would expect that this quantity for all k not equal to j for all k not equal to j this must be of the form some delta t multiplied by w k j where this quantity is a transition probability per unit time that the system jumps from the state j to the jump state k I'd expect the answer to be proportional to delta t and the constant of proportionality is a per unit time this is a probability so this must have dimensions one over time and this is a transition probability or rate no guarantee that this exists no guarantee at all this exists okay but if it does then it has the physical connotation of a transition rate because when you multiply it by the time interval delta t you get the actual probability conditional probability okay the same thing could well be true for even a non-stationary process what would happen in that case if I had a t plus delta t here so I have a non-stationary process of the form k t plus delta t j at time t you could still assume that if delta t is efficiently small and k is not equal to j this should be proportional to delta t multiplied by a transition probability but that transition rate would depend on time right so the generalization of this idea of a transition rate to a non-stationary process is fairly straightforward this would again become equal to w of k delta t k t well k j and then a t here to show that the transition rate itself could change as a function of time because the statistical properties are changing with time so the great advantage of having made the stationarity assumption is that the transition rates are independent of time okay so this is a very physical thing that we are talking about if I make that assumption then what's the next step what's going to happen here well the obvious thing to do is to say let's make t minus t prime delta t and then for this quantity put that in put that expression in and there'd be answers there'd be things proportional to delta t the obvious thing to do is to subtract from this k of t minus delta t at time j from both sides and then divide out through delta t and convert it to a differential equation so this is what one would do immediately right so I leave that to you as an exercise and it's not hard to show that with this assumption this equation translates to d over dt of p of k t j becomes equal to summation l equal to 1 to n and now we got to be a little careful of l t j w of k l l not equal to k the sign minus because you subtracted this quantity you end up with a minus sign here and now let's look at this equation carefully so the trick is to subtract from this both sides of this equation subtract the following quantity minus p first set set t minus t prime equal to delta t and subtract p of k t minus delta t which is t prime by the way j from both sides and put that in and maneuver okay yeah considering the product of probability yes Chapman's one more equation yes we have been not considering all possible times ah it's not necessary any time will be true okay so look at it physically like the picture I drew you want to start at t equal to 0 at this point and at time t you want to reach this point at time t you're starting in this state ending in this state and you have many routes to go through with different probabilities and now the statement is the probability to go from here to there the total probability is a sum of all these individual probabilities such that you go from here to here at some time t prime and then you traverse the rest of the way and it doesn't matter where you take the time slice these quantities are mutually exclusive they're different intermediate states which is the reason you sum over it okay so when you sum over this prop these probabilities what's the meaning of the word and it means if you have several possibilities you sum over their probabilities right this and this and this and this if you have or then of course it's a different story sorry if it's and you multiply the probabilities which is what I've done but if you have or you sum over them and that's what I've done because they're mutually exclusive this is different from this is different from this so it's only at the same instant of time that these are all mutually exclusive possibilities okay so it's worth pointing this out it's not an equation in time it's not an integral in time there are such renewal equations we'll talk about them subsequently but this is a summation over intermediate states here at any given instant of time in between okay and therefore I can choose that interval as I mean intermediate time as I please I choose this to be infinitesimal no no that be over counting that would be over counting this is the physical way to look at it this is over counting because these parts could intersect and so on so you definitely have to do this at one instant of time so you add over mutually excluded events okay now let's look at this equation a little bit so this derivation is something I'm going to leave to you straight forward enough but what's the interpretation of this equation it says the conditional probability from to go from J to K the rate of change of this probability has two contributions one is a gain term out here where you go from J to L an intermediate state multiplied by the probability per unit time that you go from L to the final state desired K that's the gain term and this is the loss term exactly like in a rate equation because you've gone from J to K the state that you want to but then you jump out of it with this transition rate with this probability here okay so the input into this is first you do this then you subtract this and then you use conservation of probability you use the fact that if you start with P K T J and you sum over all K out here all K now from 1 to N what should you get you should get one because you start with a state and the system is not disappeared it's in one of the states available to it including the initial state itself so when you include that the sum should be equal to 1 for all T okay this is equal to 1 for all T greater than equal to 0 that's input that's put in you need to put that one in and that's how you get this minus term appropriately okay so the interpretation is quite clear the rate of change of this probability this increases when you have a gain and it depletes when you have a loss and this is the precise equation for it okay this is called the master equation this word is used in many many contexts but this is the most common content what's the great advantage of this master equation it's a linear equation the price you pay for it of course is that it becomes a differential equation in time a first order differential equation but it's a linear equation the matter is not so simple even now because in general if it's a continuous variable then these would be probability conditional probability densities and this would be an integral so then you have an integral differential equation linear but an integral differential equation and that's not so simple to solve either okay in fact we are going to look at that what will happen in that case is that this side will get converted there been integral here we can get rid of that integral but we get get it converted to a partial differential equation in the variable itself but it will unfortunately be an infinite order partial differential equation in general okay at least formally and then we look at further cases sub cases etc but at the moment we are talking about discrete variables we are discrete sample spaces then this is what you have as the master equation okay now when you do chemical reactions you write down rate equations for the concentrations of various species you have precisely the same sort of equation set of equations you have things which are gain terms and loss terms of this kind so this is often called a rate equation or something like that but in this context these are equations for the conditional probability itself okay so the next task is to solve this by the way what's the initial condition it's a first order differential equation in time so we need an initial condition to solve it and of course p of k 0 j given that you are starting in the state j at t equal to 0 of course at t equal to 0 this becomes delta k j okay now in a slightly more general context you could look at this j sitting here as a dummy variable as a sort of spectator throughout you could write such an equation for the probabilities themselves without putting this j in and then specify an initial distribution of j's then the initial condition would not be a delta function but some appropriate distribution we look at those cases as well but this is the task one has to now attack this quantity here now let's see what we can do about this the first thing to do is to notice that if j and k run from 1 to n these indices run from 1 to n then this equation has the following structure let me write p of 1 t j let me let me write this let let me write this as a column vector p of t given j so let me suppress this j index for a moment because it's a spectator sitting out here for every j this is true for each and every j you have such such a master equation so let me suppress that for a moment and write this p of t to be a column vector which is p of 1 comma t p of 2 comma t p of n comma t which j is understood on the right hand side after the bar if I have define a column vector of that kind then this equation here takes the form d over dt p of t equal to some w w is a matrix of some kind and what are the elements of w w j k is just w j k this is for j not equal to k on the other hand the diagonal elements of this matrix remember here it's l that's getting summed over so in that sense this term comes out into the summation and this fellow multiplies the sum over l okay so it's immediately clear the moments thought that w k k equal to minus the sum of all the other elements in that column okay so this is equal to minus summation w l k l not equal to l equal to 1 to n so you can rewrite this this set of linear equations in the form of a matrix equation with a certain column vector p which determines all the probabilities that you want conditional probabilities and then multiplied by on the right hand side you have this square matrix and by n matrix acting on this column vector where this matrix has off diagonal elements which are all the transition probabilities and the diagonal elements are minus the sum of the rest of the elements that's a very special kind of matrix because it says the sum of the elements of every column of this matrix is 0 okay now what does that tell us immediately about the eigenvalues of this matrix whether determinant is 0 because the sum of each column each column is 0 so the determinant is 0 right the moment the determinant is 0 you know that 0 is an eigenvalue of this matrix right so this means that this equation in general this equation would have an eigenvector you expected to have a non-trivial eigenvector such that w on p is 0 which would imply that d over dt of that p is 0 which would imply that this is a stationary distribution doesn't depend on time at all so this is buried in it this whole thing is buried in it and we will see what happens of course there are other eigenvalues as well what's the formal solution to this equation I have an equation of this kind what's the formal solution it depends on the initial condition right now what would the initial condition be we know that at t equal to 0 this quantity here at t equal to 0 is a delta kj I have written this equation here this is the k index and I have suppressed the j index right so at t equal to 0 what's p of 0 it's going to have 0s everywhere except at the jth element where you would have 1 so you got to solve this equation with the initial condition that p of 0 equal to 0 0 etc till you hit a 1 and then 0s again and this will be the jth it will be in the jth row now given that initial condition what's the formal solution to this equation the exponential because this w is independent of time and what's the physical assumption that made w independent of time stationarity we assumed it was a stationary process otherwise is not true you still have the formidable task of exponentiating this matrix but we know in principle what's going to happen if this matrix has eigenvalues lambda 1 lambda 2 to lambda n then in general generically barring repeated eigenvalues and so on we are going to have terms on the right hand side which go like e to the lambda 1 t e to the lambda 2 t and so on so they are going to be exponentials of the eigenvalue multiplied by time real and positive because if they are the properties will keep on multiple yes absolutely absolutely this immediately tells you we know nothing about this matrix at the moment we know nothing about it what we know is the following we know that these elements these fellows are all positive or maybe 0 there could be some states where there's no transition directly possible from K to L so this could be 0 right but certainly not negative so we have a matrix whose elements are all real all the off diagonal elements are either positive or 0 no negative elements and all the diagonal elements are negative because they are minus some positive numbers okay and the matrix is real not necessarily symmetric because there's nothing that says WJK must be WKJ nothing at all so given that we still see from this physically we would be very surprised if you got an eigenvalue which has got a positive real part because immediately it would imply that this probability is growing unboundedly with time so you need to be sure that the eigenvalues cannot have positive real parts they could be complex but they'd occur in complex conjugate pairs and once they do that what it would mean is if you have an eigenvalue of the form lambda plus or minus i mu then this would go like e to the lambda t cos or sin mu t that's what the solution would look like but we must be sure that this lambda is in fact negative so we'd expect something like this e to the power minus lambda t where this is positive and possibly oscillatory behavior etc so this is what we should make sure we have now we'd expect that as t becomes infinite I'd expect the t dependence there to disappear and things to go to where well once I say that all the eigenvalues have negative real parts all these fellows go to 0 but we know that 0 has to be an eigenvalue of this matrix therefore there'd be some constant which is sitting there and the p of t will tend to that constant which will be the stationary probability right now this is sort of formalized by a little theorem in matrix analysis called Gershkoran's theorem I'm not sure if you've heard of this but let me explain what this theorem is because it's simple enough it says if you have a square matrix with various elements and n by n matrix then let's suppose the general element of this matrix is a 11 a 12 etc a n n then it says the eigenvalues of this matrix whatever be this matrix in the complex plane because the eigenvalues in general complex are located in certain circles or discs and these discs are found as follows take a 11 and mark it on the complex plane it could in general be a complex matrix with complex entries we don't care we're sitting somewhere here and then you take the rest of these elements here at their moduli together and that gives you a positive number right and that positive number draw a circle of that radius about this point so draw a circle of whether you choose rows or columns doesn't matter because the eigenvalues of a matrix are unchanged if you change the matrix to its transpose so the radius here would be the sums of these moduli similarly take the next row take a 22 that's somewhere here and draw a similar circle etc these things are called Gersh Gauhan discs and the statement is all the eigenvalues will lie either in or on these circles that's all and it's a very simple theorem to prove you can prove it by elementary means okay now these circles could this could be disjoint there could be another disc here which is disjoint there could be things which overlap we don't care what we do know there's an extra theorem it says that if any of these discs is disjoint then you're guaranteed to have at least one eigenvalue there and this is a completely general theorem it doesn't say anything about the nature of the matrix it doesn't assume whether it's real elements complex elements etc we don't care still true now if you apply this to this W what's going to happen in the case of W we know that all the diagonal elements are negative real numbers so they're all sitting here or here or here etc and in each case the elements add up the rest of the elements are minus whatever was the diagonal element right so the radius is just this distance and this fellow here has and all the eigenvalues are in the intersection of these discs which means no eigenvalue can have a positive real part and all the eigenvalues other than 0 will have negative real parts and therefore the system will the probabilities relax towards the equilibrium distribution so W is called the relaxation matrix in the physical literature so I stop here now we take it from this point