 So if the total energy of the chain is delta f for the rest of the chain plus delta x and if I ask myself what is the likelihood that this forms a protein for that to happen there some must be smaller than zero otherwise the free energy would be positive or alternatively what I'm saying is that what is the probability that the rest of the chain has a free energy that is smaller than minus delta epsilon just moving over that on the right side we can calculate that by looking at all the possible values delta f might have because again they're going to be tons of different ways we can fold the sequence right and then just seeing how likely is it for this to happen in principle I don't know that probability distribution but the second you hear many and probabilities the way we're going to solve that is the central limit theorem which says that if you're just adding up enough random samples the probability distribution function is going to turn into Gaussian if they're independent I also know that the average here is going to be very positive most proteins will not just fold so I'll draw the y-axis there and the x-axis there so the y-axis is just probability density and here on the x-axis I have what is the energy of the rest of the chain again I'm not really defining anything here well I'm defining but I'm not really in conclusion that any Gaussian will have an average and let's use those angle brackets for that average and I also have some sort of width sigma the only reason for introducing those is that we typically specify Gaussians with them and if I specify those two I can say that the probability distribution of delta f equals leave some space e raised to minus delta f minus delta f in square brackets the average square that term divided by two sigma squared but if it's a probability distribution here I also need the integral here to be one and I get the integral one by putting a factor in front of it if you're nasty you can actually just put a c there because it's not going to matter what it is but it's going to be two pi sigma square if you want it to be fully correct we're almost done but not quite in general if I want to start calculating the integral here it's going to help me to have a simpler expression for this I would simplify this in a second so leave me a little space there so that I can have one more term in that equation and then I'm going to continue up here so what I really wanted to ask myself right is what what was the probability of this happening meaning that delta f falls below minus delta epsilon well so we're here I'm going to have a value minus delta epsilon I don't know where it's positive or negative right now but in general it's going to be much smaller than large delta f right because this is just one residue that is the rest of the chain the probability of this happening is just the integral from minus infinity up to that value right so the probability of delta up the scale f being smaller than minus delta epsilon equals the integral from minus infinity to minus delta epsilon of p as a function of delta f d delta f maybe it's still such a good idea to use delta f here everywhere but this is not going to be as complicated as you think I bet that you're tearing your hair at integrating that function right actually you can't even integrate it the integral of e to the minus x squared is the error function don't worry we're not going to go a deep into math here if I'm looking at the entire integral here in general for most of this delta f is going to be a large value and in particular it's going to be sorry delta f in brackets is going to be a large value the average here is typically going to be much smaller than the values I'm looking here at delta f so if I assume that delta f within brackets is much larger than delta f that term is going to result in three parts I'm going to have if I were to expand this term I would get first with the delta f squared that can be either a smaller large number but in general pretty small at least close to zero I will have average delta f squared that will always be a large number and then when I have a component that's minus two delta f multiplied by delta f within brackets that's going to be an intermediate size number since delta f within brackets squared is always larger larger than average than the other terms I could simplify this that means and this is where you need extra space first I'm going to get one expression and now I'm going to write this out but you can start calling this C prime and C best the constants won't matter 1 over 2 pi sigma squared let's use the first expression in that part that would be e raised to minus delta f in brackets squared divided by 2 sigma 2 multiplied by the next term so that's minus minus that's going to be plus 2 and that's 2 cancels the 2 and the denominator here there's going to be e raised to now delta f not square brackets and then I'm going to write this in a slightly different way I'm going to divide it by sigma square divided by delta f in brackets this is not as complicated as it looks because this is where the radical me enters I don't really care about the proportionality constants here so that's just an arbitrary constant so this is some arbitrary constant note how delta f itself doesn't enter here just the average and the spread so forget about this part for now we're going to need to keep it just a little while this is the only function I'm interested in and this is a beautiful function this is an exponential it's a plain simple exponential it's just a function e raised to delta f yeah there's some constants I don't care about constants this my friends you know how to integrate it's an exponential with a constant that constant is gonna well you're gonna get something in front of it what do we do with constants I don't care about constants so this proportionality corresponds to well first I need to take this value when this expression delta f is minus epsilon so there's going to be some sort of constants and now I grouped everything there into a constant and then I'm going to erase e raised to minus delta epsilon divided by that entire expression sigma square divided by delta f the average squared minus the corresponding expression minus infinity but e raised to the minus infinity is going to be zero so I don't care about that do you see how I approach this I radically simplify and I throw away the constants generations of my math teachers would cry at what I do here but this is why physics and biophysics is different from math do you see this expression the probability of this being stable of this being in need small protein is something that looks like a Boltzmann distribution but it's not the Boltzmann distribution we don't have temperature here so this is proportional to the proportionality of that small factor delta epsilon which has units of energy so that the entire thing here sorry there shouldn't be a two there this entire thing must also have units of energy we're not quite sure why that is yet but let's look at that in a second what this means is that if delta epsilon is a very if this defect is pretty much zero well what's that's going to mean then I put a small defect here then this term is not going to be too bad on the other hand if the defect is a very large number here then I'm going to end up in exactly the same problem I had with the Boltzmann distribution right if I introduce a very large energy it's going to be very costly for me to introduce this defect in the sense that it's going to be a very low probability that I can introduce it there so this works this works remarkably well what is the denominator here it's not kt well in a way you have something that has units of energy and you could almost derive a temperature because if we divide this by Boltzmann's constant we would get something that had units of temperature sigma here that was related to how broad the distribution is and the average here delta f this is really just the average energy of chain of amino acids in the chain so this whole part has some sort of characteristic energy for this particular sequence if you convert that to a temperature it would be maybe 350 Kelvin or something don't worry too much about it this is a constant whose properties are determined by the amino acid compositions but the point here is that there's nothing here that depends on the chains the total chain energy because sigma square is proportional to the size of the protein the average energy is also proportional to the size of the protein so they cancel out the only part remaining then is delta epsilon so that the probability of a particular chain being stable or the particular the probability of this defect not being too bad is only determined by the small defect not the rest of the chain and that's why we get these properties that breaking one or two hydrogen bonds can be so disastrous that this makes an entire protein not fold it works like a Boltzmann distribution it quacks like a Boltzmann distribution but it's in fact not a Boltzmann distribution