 What I have for you today is that I'm going to spend roughly 40 slides. Well, you should have Slide copies to there you go. I'm going to spend just under 40 slides talking about free energy Calculation, but then I figure we'll have a coffee break and after that I'm going to spend About an hour or so depending on how much questions you have and how fast I go Talking a little bit about our research and this is not just to push ourselves Scientists are pretty good at that anyway, but there are lots of things here that we've gone through in the course very fundamental things that Surprisingly relate to research. I was about say research performed the last 10 years, but it's really research performed today So there was one thing before I start with the free energy calculations I actually today I didn't make a separate slide about the study questions, but last week we talked about Drug discovery in general and all I said that I would just have one question for you So I would like you to discuss and take me through the drug discovery process How do you do that in practice? So let's take one step back and start so assume assume that you get hired at whatever fights or murk as a Zeneca a year from now and Now you're part of the drug discovery team And let's again we can assume that if it says to Zeneca they have a strategic interest in pain and So where do we even start in the company? How do most companies work? What is that they start to go after? Yes, so you have targets and if you're if this is related to pain. This might be what type of receptors? Yes, so there are there are special receptors usually voltage gated ions and this is an entire family of them many of them There is a common type of receptors sensitive to capsicine for instance This are the the hot compound you have in Chile is and everything So assuming that we let's start then assume that we have a receptor What would you do what happens in this dark discovery team and how do you find that out? No, that's what I would so be careful. I'm literally not asking you exactly I don't expect I'm not exactly asking you about things that on the slides and I think said general Because when you when you have this job, right? The point is not necessarily going to be to repeat certain slides from her course Or anything. How would you do? So one remember one thing in industry Hopefully unless you're at a very small company the drug discovery team is going to consist of more than one person and You're likely a weekly meetings with other teams such as experimentalists and everything right then. How do you typically do that? Yes, but but but I'm again your way ahead of me I think so the whole point is that most of the early studies are done experimentally That doesn't necessarily mean that they're very useful You're likely the likelihood that you randomly in the lab going to find a chemical is basically nil But to understand what the channel does in the first place you typically need experimental studies It's frequently going to be electrophysiology that I will talk about later today But though there is a starting point and remember that that the very first starting point is usually Experimentally and then you get to the point where you feel that you kind of understand your receptor And this might be based on saying doing mutations or something if you're really lucky You're a large rich pharmaceutical company and you have a structural biology division. What would they do for you? So if you're if you're at a very large company and this company has a structural biology division What would one of the first things to do again? You're still ahead of me You would try to get a structure of the target and that will likely fade If this is successful, it might be 150 billion dollars of revenue. There is no such thing as too much work Not a bit seriously that there are companies that have invested two billion dollars in structures of deep roti and couple receptors and failed they didn't get it And that's hey, that's a risk you take in the market, right? That's not every investment pays off So if you can't get a structure of it, what would you typically do then because that's when they start looking at you? No, but we did we failed we didn't get a structure. No, because that's that's what happened in 99% of the cases No, well, but that we had already done that a tiny bit, right? So that there is something else a bit But something related to the topology that you've done in other courses Yes, you would try to create a homology model of it, right? and it Think about that to that the further you get in this program the more you should think in terms of integration that by Necessity when we bring things up in courses, we'd have to have a one course on bioinformatics One course of modeling one course of biophysics one course in genomics But in practice when you're working you should of course use all this knowledge Okay, so the point is you would obviously do an homology model And if you now have this homology model in best case You're gonna have some rough idea roughly where the binding site is and you can start doing all these experimental studies that you talked about That well if you change risk you lose in 49 that appears to influence the binding So you have a rough idea roughly where the binding site is and roughly what you would like to do with the protein say inhibit it or Potentiate it So and that's really when you enter the actual drag discovery process the only reason for bringing this up is that Both at your job and possibly at an exam you would never ever get the question. What type of free-energy parameters should I use? Because that's why they're looking at you that's why you're around the table You're gonna have somebody saying that I have a structure and I have no idea how to find something to bind to it So don't and this is important Don't jump too far ahead and assume that it's a matter of what free energy calculation You want to do stay at the planning stage for a while and think hard about it? How can I really approach this and it's very easy to end up in the situation that when you got the words largest hammer Suddenly everything around you starts to look like nails and that's frequently a very bad idea But okay, we have a potential binding site and now At this point the experimentalists are pretty much given up and the center that the computational chemistry team will have to look at this What would you do and how could you find that? Yeah, well, they might be one natural ligand, but that's not If it's a pain receptor, you probably know the natural ligand that causes us to create pain, right? But if you want to develop a drug, you'd better find something new that binds to it. Well, that depends. I was at a fun meeting in Thursday Friday when I couldn't be here with you I was a fun meeting in the New York project in Amsterdam. I wasn't that fun to travel But I had some great discussion with Mark Forster at St. Janta Well, St. Janta is this company that specializes in Any any type of pesticides or insecticides and they do lots of things related to crop And there are now apparently new types of screens The problem is that typically have one of two types of screens Either you have a database with compounds called one called zinc in particular that contains a bunch of compounds I don't know how many but about probably a couple of million or so that you can order off the shelf They're super cheap ten dollars So why that but then you could argue if that if those are already known haven't they been tested? Do you think they have? Well, but let's start at the compounds in general. They haven't been with how many receptors are there? Well, if we get about the membrane proteins were saying there are 20,000 genes in your body And if there are now 100 million compounds or something that it's an as well This is a tiny amount of chemistry space, but it's still an astronomically large testing space One thing I did not mention last week. I think but it's called something called repurposing. Have you heard of that? So repurposing is the idea of that rather than going out for something new go vintage in secondhand Find a compound that is already used as a drug for something and See if this drug can be used for anything else Now this might sound really stupid, but it turns out many of these drugs are small hydrophobic They bind lots of places. So what's the advantage of using a drug that's already known? Why is would it be cheaper? It's something more important. There was a very important concept that I brought up at one or two slides last week Admin tox right? You know that how to administrate it. You know that it doesn't cause any bad side effect. It's not too toxic It's not going to kill 20% of the people taking it and that's where 90% of drugs fail If you're repurposing you already know that that's fine But assume that you couldn't you might have found some small hits or something in this database What Mark Forster told me about the other alternative is of course to go all the way and synthesize completely new chemicals Astronomically expensive So there is apparently new companies in the US and I haven't had time to look this up yet But they have a library of what is it? I think it's roughly 25,000 small chemical compound parts That's not a whole lot or might be 50,000 But the whole point is that then they can join these parts together as pairs Any pair with any other pair and suddenly 50,000 squared That's a fairly large number This costs you roughly 30 to 50 dollars to order a compound and it takes you four weeks And suddenly that this basis and again if this Eventually when you run out I bet you could find ways to join three parts together It's the whole point they have these parts expressed and then they rather than finding an arbitrary organic chemist to synthesize a Completely arbitrary compound. You now just need to synthesize one reaction to bind these parts together So that is very much a question of chemistry space and then you start with screening this and you find something that's Reasonably good. How would you find something that's reasonably good in the screening? What methods would you use? Well, one way of doing it is doing experimental hydropid screening, right? That's not really what this course is about and it's you can do it. It's very expensive But if you don't do it experimentally, what's the other alternative? Docking, yes If there is one thing you should remember with docking is that it's fast but sloppy And that might sound bad, but it's all because the whole remember when we talk think about the sizes of these chemistry spaces, right? Your success is going to be proportional to how many compounds is green Not how accurate you do it. Yes, you might miss something. That's awesome. That's fine if you find something That's great And that's why docking has really been optimized to be fast fast fast fast fast fast And if there's anything people have done is to try to go to larger and larger compound libraries not to do the docking more accurate That's a much later step So the drawback with this is that most of the things you find docking are going to be false hits It can be very frustrating that you find 1000 hits and then you go into the lab And if you like you might have one or two that show something The 998 compounds of a thousand failed. Is that bad? Why not This is awesome you have two leads right or two hits There's something you can work on. Yeah, it's insanely cheap. You can run it on the laptop overnight Now the challenge here I would I don't know but I would guess that As we're moving more and more into this modern world of big data machine learning and everything I would assume that we're going to see Google is very interested in this for us I know that they're hiring tons of people from the pharmaceutical industry now with docking background Can you use other information? Can you use information? Anything related to admin talks from the literature or something you find ways to bring more information into this step I would guess that although docking is very low intensity now wait 10 years And I think they're going to find ways to use insane amounts of computational power for it And a particularly insane amounts of data So there you stand and you have your compound. Why is this? Usually pretty much useless and not just about winding. What is usually the case with this binding? Well, yes, so that so normally how would you measure binding and there are kind of three answers here And they're all right from different points of view. Oh, sorry. Now you think about how I'm more thinking in terms of units Sorry, there are there are lots of ways to measure where the two compounds are bound together experimentally In the interest of time I'll skip that for now, but fluorescence would certainly be one of them But you yeah, well, you know that one when I say I'm binding why do two molecules bind? Right, so what is the unit of binding? What does that have to be? Yes energy right kilojoules or a cake house per mole. That's how Maybe a computational chemist would measure it. How would a physicist measure it? They would probably do Katie this essentially the same thing physicists love using Katie for everything But there is another way to measure it if you're a wet lab chemist Yes, so when you have the when you have the chem the equilibrium between bound and unbound states, right? When you calculate the binding coefficient the binding constant of this reaction Then the concentration at which 50% of the molecules are bound. That's a concentration So the lower this concentration is the higher the stronger the binding is So a really good compound a finished drug that would be at least non a molar if not a picomolar affinity So in say you need insanely low concentration for it to bind At the hit state what concentrations are you talking about? Maybe millimolar There's very bad at binding this is you would have to administer way too much compound and the more compound you administer The more side effects you're gonna have so that it's pretty much useless for a drug at this stage So on the one hand you're happy, but the problem is you're gonna need to improve this you need to get it better and better the way you do this is that Now you're gradually moving to fewer and fewer and fewer compounds And then you're gonna use more and more advanced methods some of the ones that I'm going to talk about today To try to optimize calculate exactly what the binding is try to come up with a way to modify the molecule Say add an ethyl or methyl group see if does this improve binding or not and you can do this Either with docking or free and your methods I will speak about today and then you come back next week Well four weeks later, which is this usual iteration time in big pharma and you tell the experimental team Well, you know based on the results you showed us four weeks ago Here's a new list of compounds that we think that you should now synthesize and try we think these are gonna be better Does this usually work? It often does I forgot this but I can actually try to dig up a paper for you. There is a particular a group led by Bill Jorgensen at Yale and They're awesome the way they use free and decalculations and they read there are those whole bunch of publication when they show That how they start with a millimolar compound and then like eight weeks later and a couple of experimental iterations later They don't pick them up pick a molar binders and this is very very subtle changes Remember this optimizations that I showed for the hr1 protease inhibitor That's really keep adding things and moving and this is where you frequently use all these information about Pharmacophores and everything would it be good to have a in principle divine inspiration is fine, right? But the smarter you are the more you look at the structure You spend a lot of time sitting and just look at molecular models that it would be good to have an hydrophobic ring here Or it would be good to have an hydrogen bond donor here So that's it's 50% black magic 50% chemistry and then another 10% lacrosse So now you have an awesome this what you call lead Optimization and usually you wouldn't up with an entire series or something because well You don't at this stage You don't really know which compound the series is going to be best But ideally you can now deliver a series of 10 very promising compounds your experimentalists and Somewhere here the computational part of the project gradually winds down, but what would happen later? Well both in the computer, but in general in general in the drug design with what would happen in big pharma So at this stage you're still in the preclinical phase So but eventually what you do you would go into the lab you would do more and more Experimental tests you would start doing some tests may be on well first the chemistry and then eventually in single cells and Then eventually on animals Once you're done with animals and you have filed a huge amount of paperwork Then you might get permission to move to what we call phase one trials and that would be to test it in healthy subjects So what's the point of a phase one trial? Yes, but what you mean by it can be applied There's one specific thing you're testing in phase one Exactly you test that they don't die Which of course you'd not always write occasionally they do die Or get very bad side effects the point that this says this has nothing to do with how efficient the drug is It's just to show that it doesn't kill people And if you're lucky and it doesn't kill people After phase one well, I'm not gonna ask you what happens after phase one because that's phase two So what's the point of phase two? Right, so it's basically show that it works The difference is that when you move from phase one to phase two you're gonna need a much much much wider range of test subjects This is something that's frequently criticized because said though. So what do we typically test on? Well, nothing that would be pre-clinically, but what do you typically test on in particular in phase one and phase two? Why? Actually, they're not always white anymore So the reason why they're healthy and men There are two reasons for this So first one problem is of course if you're ill we have no idea how the drug might start to interplay with that illness, right? And even if it might also be the worst that this illness might disguise some other things that you don't see that in a healthy If you already have high blood pressure, we might not see that this drug would create high blood pressure So that you prefer it's very important that people are healthy so you can see all deviations But this is something that also comes up that it's so unfair Why do we do most clinical trials on men? No, it's nothing much simpler men don't get pregnant Because you don't want to kill kids. It's a that again Pregnance is one of the most sensitive things in biology. It's extremely complicated So you have two individuals and these chemicals will cross the bloodstream to the fetus So that it's not it's not just big pharma not caring about women It's because there are some things that are very very difficult to control But that happens that comes much later, right? But in the early phases because in the early phases, this is where all the mistakes happen as you move eventually in the same I would act what actually can happen You can even get a drug that goes to the market after step states 3 and states 3 since we're talking about it What would be what? Yes, so it states 3 or so that has to be better and it's actually in principle It is a requirement that to get put a new drug on the market. It has to be better than existing alternatives Then it depends a little bit on how much much lawyer or how many lawyers the pharmaceutical company has And if they cannot I bet they can always argue that there's one particular case that it's better But frequently of an FDA for instance rejects the drug is because that the company hasn't showed that this is significantly better than the alternatives And what frequently happens then is that if you're lucky and the drug Gets Permissions you can put it on the market initially. It can only be prescribed to men. That's quite frequent And then eventually you expand this more and more as you've done more and more testing So that it can cover well more individuals more different types of diseases having a disease in combination with something else Sadly sometimes it's in the pharmaceutical industry's best interest in not pushing this too much Omopressor lo sec this blockbuster from as Rosanica that it's a proton pump inhibitor and this is how it can actually Treat acid reflux There was something very important that got people the Nobel Prize and relate to acid reflux and ulcers There were two Australians that got the Nobel Prize for serving something So acid well ulcers in general for a very long time That was a disease believed to be goes by stress or something or drinking too much coffee in which case I would have it and it was also a Frequently lifelong disease. You couldn't really do anything about it Exactly so that is showed by course by hello back to pillory which means that you can treat it with what? antibiotics Which was both good and bad because prior to this if you're a pharmaceutical company The best possible drug you can imagine is what? lifestyle drug There is no coincidence that leapeter is so popular because the drug you take against high cholesterol The patient will continue to continue to take for 30 years and you just rake in the profits The worst possible drug is a cheap drug that the patient takes five to ten days and then the patient is cured and no longer needs your drug So only a press all was of course great right that this is a Western disease Acid reflux and everything people have it a lot and where the acid reflux part is suddenly that's certainly something that you Most of us can have the point or do in life But it's usually not an ulcer, but that these really chronic parts. That's awesome because you would sell the drugs to people for 40 years So what happened? Well, and the but that's that's more but that's more related to the antibiotics That's not related to loci. So but eventually it actually took them 10 years before they and this was part of because the discoveries happen in parallel It was at least 10 years After the day discovering putting loci on the market until they actually proved and certified it for use together with antibiotics to actually cure Which of course for the company's bad It's I think they will survive because by that they had already had most of the profits and the patents were about to expire So this is what's so complicated. This is partly the way I tell this can be interpreted as I'm saying that they deliberately Well, it withheld it. It's not that easy because it's like an airplane. You can't just take a drug as it is high I tested this on 10 people. It seems to work with antibiotics. It requires a huge amount of testing to get it approved That's in brief roughly how drug design works, but there was one thing I cheated I didn't tell you about how this actually lead optimization usually happens and Black magic trial and error has certainly been one of the things but what's increasingly happening today is that you're basing this with more or less accurate free-end calculations You also had a lab of free-end your calculations on Thursday, right? So I'm gonna today I'm gonna spend tonight talking about free-end your calculations a Little bit how you would do it in practice and computer simulations I'm not gonna go through things that are too specific to grow max But I'm gonna try to bring this up in a concept how we can use it to optimize things So why hasn't this been used instead of talking you did know the lab, right? How long did it take you to do that lab? Yes, and the simulations were probably tens of minutes at least for a completely trivial system If you have sources with hundreds of thousands of atoms these simulations will take days and Compare that to how quick a docking calculation is a Second or something So that historically this has been way too costly, but in principle this has the physics correct So the idea of free energy is really that Free energy determines the relative population of states If you know the difference in free-ended between two states This is just the Boltzmann distribution that tells you how probably one is relative to the other and that of course determines where that Determines for the whether a molecule is bound It determines what direction and reaction will go to it will determine if you can get the free energy of opening an iron channel You can determine whether this will be open or closed So if you know that there is a difference in five kilo joule between the open and closed states of iron channel and if you now stabilize Let's say that the open the closed state is by default five kilo joules per mole more stable But if you find some compound that stabilizes the open state by six kilo joules per mole You're gonna now have the open state to be better So the reason you want to go after free energies is that if you could calculate these things really quickly in a computer We can instantly say that that really complicated molecule adding say an ethyl group here and hydrogen bond donor here would have a Better delta G that is it would bind stronger if you receptor Why would you do that? Why wouldn't if it takes a couple of weeks and docking is so fast What would you want to calculate free-ended? Yes, but we know it's right if we could measure it in the lab So first why can't do we do it with docking? Well in general the problem right is docking is fast but sloppy So the probability if you have one molecule and you want to take does this get better if I had an ethyl group Docking might or might not get that right the errors are too large in docking And if it's just one ethyl group or something then it's fine But if you now want to test 1,000 random ways of modifying your molecule The problem is at this stage you can't really afford to make 90 99% errors anymore You can if you're starting from a billion molecules, but if you only have a sample of thousand modifications You need better than 1% hit rate So why can't you do it in the lab? Well, I said it was ten dollars. You're right, of course, but why is it why is it not ten dollars? No, so it depends what type of molecules you're right if you are testing molecules that are have already been synthesized that have been ordered Then it's cheap What can happen is that they say sorry molecule 14 that you wanted We're out of stock on that and we don't expect to produce it right now. Sorry At the docking stage you would likely accept not to include that molecule because if they do exist You can get them for ten dollars if you want something that does not exist molecule 14 with an metal group added say Then it can be fifty thousand dollars to synthesize it so you Ten dollars times ten thousand. That's fine. We're hundred times hundred thousand is still fine $50,000 times 100,000 is not fine It would also take too long because in many cases it would be it could be a matter of months to synthesize an arbitrary molecule But so the neat thing with free energy if you could get at least 75 percent hit rate or something about all these small modifications you said then we might be able to just Let's since the size 10 molecules We can't do that if we if I start from a thousand and if I just synthesize 10 I can't accept a 1% Proportion of being right, right, but if I'm 75 percent, right, I would expect to have seven or so of those molecules be good So the reason we're getting free engines is that docking is not accurate enough. The lab stuff is too expensive and Then of course in some cases you want to do this just to understand what happens But that's in the in a drug design point of view is because docking is fast, but sloppy experiments are slow and expensive so ideally we would like to calculate these free engines from simulations just as you did in the lab and In principle we can do this Molecular simulations and statistical mechanics. We spent quite a few lectures going through this There is one caveat here though If we forget about all the equations for a second you could of course just run a simulation and see whether your molecules by What's the problem with that? That's point one, right? If a molecule binds once It says hardly anything You might have been lucky or unlucky depending you were after if you see ten times You start to get things better The only problem is if you have a really large complicated receptor that has to open or close You could take a microsecond just to see one binding event and now we're talking about ten ten microsecond Simulate ten or well ten to hundreds of microsecond the simulation this gets insanely expensive But assuming that we could solve that part and really get the accurate free-ended calculation If we get that we can quantitatively predict exactly what would happen in macroscopic experiments Do you see the difference of most of the other things we've been looking at in simulations has to do with Measuring an enthalpy or a diffusion coefficients. Those are very direct properties something that you can Calculate more or less directly from your simulation The beautiful thing with free energies is that free and is correspond I would I would argue that almost anything you can measure in the lab can be translated to a free energy and of course so the reason for calculating free and this is to go in the other direction if I can calculate free energies, I can predict experimental outcomes and By now you should know this equation We have Gibbs and Helmholtz free energies So the Gibbs free energy would be the enthalpy minus temperature times the entropy And this enthalpy corresponds to the potential energy and then this volume thing So this was an old slide that I reused and I actually didn't do a typo here There was also this other free energy And here do you see what letter I use for the other free energy here? So the reason for not changing this is that this is quite common among physicists So standards are a good thing. Let me set everybody should develop their own right Sadly, there are no good standards. We if you see Delta G. That's virtually always Gibbs free energy Occasionally you'll see a Delta F a Delta F can be anything F for free If you see a Delta a it's usually Helmholtz in physics So what was the difference? What is the what's the difference between Gibbs and Helmholtz a bit of repetition from earlier on in the course So Helmholtz assumes that we do this under constant number of particles constant volume and constant temperature while Gibbs would be NPT constant number of particles constant pressure and constant temperature and Normally in the lab, you can forget about the volume term, but you should feel a little bit bad about it I always do So, you know the point here is that I want to get you to understanding free energies for proteins, but if The problem with starting to look at something binding is that they're going to be too many steps under there once So let's start with something much much much simpler one of the simplest processes we can imagine if you take a small compound say cyclohexane and I want to calculate what is the cost of having this in gas phase or if it's a liquid you might have pure cyclohexane and Moving this to a solvent say water That is what we had in pretty much chapter one and two of the book the simplest possible Salvation energy the only reason why people well it's going to turn out that there is a reason why people are obsessed with this But it's a very simple reaction and one we can study without involving large protein chains and everything Why do people frequently use cyclohexane by the way? You've probably seen that Molly. Do you know what cyclohexane looks like you can probably guess It's a cyclic compound. Yes. So what's the difference between cyclohexane and? How many carbons do you think are in it? Six, but you know a cyclic compound with six carbons. What's what's that one called? Benzene so Benzene is C6 and then how many carbons? C6 8 6 and that's what you call an aromatic compound. It's completely flat But in many cases we don't aromatic compounds are a bit complicated and everything they tend to interact in strange ways So cyclohexane is the equivalent aliphatic compound So that's that has a ring that's going to look either like a small boat or as a small chair But it's every it's just six CH2 groups. So six ethyl groups. So it's a very simple compound There are only two types of atoms the carbon and the hydrogen and all of them are there is nothing like aromatics or something So that's why it's very common as a solvent It's kind of the simplest organic solvent you can imagine and it's not 2.2 toxic So in principle we might want to study how expensive it is to insert something in water or a membrane or whatever So this solvent here, I've drawn it as water, but in principle this could be anything This is also important for protein folding Why? We talked about that a lot in the course, but not from a concept of free energy perhaps well now I think we did right so this is the free energy of Salvation if it's good to solve it something that is going to love to be on the surface of a protein If it's expensive to solve it it's going to be buried on the inside So that's why it's related and that's why actually when we brought this up in the book It was because this was important to understand protein folding and here we rather bring it up now We're going to think more about how we actually calculate these properties so Just as we told called it partition free energy when the book described it You can think of the partition how much how buried or exposed the side chain is or Whether a particular side chain likes to be in the water or a membrane if it's going to be a membrane protein And this looks fairly simple, right? The only problem is that these partition free engines can be pretty complicated I actually the student who worked at this a couple of years ago I'll get some slides of that later So you can have multiple different charts to run charge side chains and see how they behave In some cases these sites will even turn the helix a bit So this is researched on less than 10 years ago understanding how individual charged or polar amino acids behave in membranes So it can get pretty complicated, but in principle we just The helical backbone here is pretty much neutral. It's plus minus zero when it comes to inserting it So the cost of inserting a helix in a membrane is going to depend entirely on these 20 side chains So we can calculate the cost of those 20 side chains So we will understand how expensive it is to turn something into a membrane protein So that's still a partition, but a slightly more complicated partition So how do you would you get free energy? You did this a bit in the lab, but my point isn't so much don't think don't think too much in the lab You're you're quite right, but you're jumping a bit too far forward. Let's keep it more simple. What is a free energy? You're too advanced. What is a free energy? What is a free indeed? So why is called free energy? work So the cost of moving from one state to another with different free energies that corresponds to the amount of work You have to do on the system now that work might be negative. What would negative work mean? Well negative work me would mean that it positive is that you do work on the system negative would mean that the system does work on you But that you would either pay or gain energy from changing the system and in principle we could measure that if you want to take What is the free energy of taking water from say 20 degrees to 80 degrees? If you have water like it's my symbol, what is the specific heat of water? The specific heat of water is the amount of free energy required to increase the temperature of water A kilo of water by one degree centigrade. It's 4.184 For is it approximately 4.8? How many more decimals do you know? No, that's a trick question. How many more decimals are there? There are no more decimals. Why are there no more decimals is the definition So how many kcals is that? I bet you do remember it The round number is one kcal That's funny. Why is it one kcal? That's how you have defined kcal. One kcal is how much it takes to increase the temperature of one kilo of water by one degree But then of course in the SI system and everything we have gone back and defined that to be exactly 4.184 kilojoules per mole So Joel has to do with the one watt during one second But the point is that we've defined this it's exactly 4.184 between So this has to do so in principle if I can just measure how much work I need to do to change the state of a system I can calculate the free energy and I would argue that conceptually this is the easiest way. How would you do this with the water? You go down to one of these store hardware stores and you buy a wattmeter for roughly 15 dollars Plug that into the wall and then you measure how much energy you need to do to heat the water when you're boiling it Now there that would likely not be a very good result because you're gonna have all these losses in the Style and everything you can do it more accurate in the lab But in principle you just measure how much energy do I need and that's the work right the electrical work How much energy do I did I need to use to increase the temperature of the water by one degree centigrade? You could do this for something really complicated to if I have a small protein here and a ligand or It can be any two molecules that are bound together I can calculate how strong this binding is by slowly pulling them apart slowly slowly slowly and Measure I can add a small spring here, right? And if I now start to move this string Well, I know how much force I'm applying because Well, I can't well if I if I move this by with a constant rate The force will be determined by how much the spring the force constant in the spring has been extended So I can integrate. I know exactly what the force here is as a function of time How is force related to work? So I take the force multiplied the force is Newton And what is the Newton meter, right? So I take my force and multiply that by the length I'm pulling it. I integrate it actually So if I know the force as a function of the position here, I can calculate how much energy I'm changing Will that work? But by definition it will work. It's physics There is only one problem. I have to be I have to pull actually I would have to pull infinitely slow So that the system is always at equilibrium What would happen otherwise? So assuming that it's Friday evening and you're in the bar and you want to get to the counter and order a glass of wine Tons of people on the dance floor. How do you get to the bar? You're gonna need to move slowly, right and try to squeeze your way between people What happens if you just run to the bar? So that you're gonna you're gonna bump into everything, right? So that if you do this very quick You're gonna get a beautiful energy here. It's even gonna be a free energy But the free energy you will get is what is the free energy of heating my system a lot because you're causing lots of friction? You will get a free energy It's not just the free energy you thought you would get because the free energy We would get if I do this so slow enough that you don't cause any friction or anything because friction will leads to heat, right? So I need to do this so slowly and the equivalent of moving very gently between all these people in the bar So that you're not obsessing the system at all And in principle this has to be infinity Now infinity is a very long time so in practice you try to make do with Microsecond or hundred nanoseconds, so why would you do why is a microsecond enough? No, it's not enough But what you always do in simulation what people do and this is equally true in the lab, of course if The longest simulation you hope to be able to run is a microsecond you run a microsecond and you hope that is enough That is a very very very bad justification for it, but Hey guilty as charged. That's what we do, right? It would be better if you could do it a hundred times longer I will show you that there are some tricks to get away with but it's very common that people just try it Is there something else you could do? No, because if you measure heat you're just measuring friction, right? I was rather thinking rather than just pulling ahead and Directly simulating one microsecond And this has become so much easier today because computers are faster sit down with a paper and pen do a back-up the envelope Calculation is this realistic? Do you hope to be able to see this free energy? You would be surprised at how many cases the answer to that is no There is no theoretical way you could do this with type of simulation you're doing and people nevertheless They spend six months or half a student's career trying to do it So before think before you simulate is it even so that but in theory if this work and if this was a very simple protein You could be if this was a small molecule and the both a molecule was relatively rigid and the protein is relatively rigid So if we don't disturb them too much then this could work And then you could in principle see that you would have a very high force that would mean that it's a strong binder And if the force was lower, you would have a weaker binder So you could see the difference if we could translate that to a free energy The cool thing is that this is something you do in the lab, but you don't think of it as free-ended binding Have you heard about AFM or atomic force microscopy? It's a really cool technique An atomic force microscope Realize on all the forces you have between atoms and you have seen those in the lab. What are those forces? Lenard Jones, right? Very weak or fund-a-vasse interaction at least so all atoms will interact with each other. So what you do is that you create a Very very fine tip here and it's so fine so that at the tip here you should just have a couple of atoms of width But that's possible to do and Then when you sweep this very very very close to a sample here, there will be small Forces between that tip and the atoms, but these are of course going to be so small that you won't see anything at all happening But then this is connected to a small cantilever. So you just basically the whole point this Whole or it can move up here, right? So that it can rotate around this part and then you just have a small mirror here And then you're shining this with the laser And when you have tiny tiny tiny differences here, you can detect how the laser beam is being deflected up here This is a super cheap experimental setup compared to most other things you people you can almost design it in the lab yourself if you're a graduate student and Then you can actually directly trace patterns of individual atoms Usually not something you do you don't you don't do this to determine the shape of a protein or something But if you want to see what this what's on the surface or you can And what you in particular can do is that if you can somehow take this a fm tip and attach it to a linker or something Then you can measure because This motion here is directly correlated to how much force there is between the tip and your sample, right? So if you take a protein you can use an a fm tip to literally pull part of the protein Really cool People have done this to present measure the free energy of unfolding of beta sheets and everything So this is something that actually works But you can imagine that the motions here are going to be tiny, right? You're talking about like Micro nanometers that they move in a second or so But it works So let's then try to mimic this in a computer experiment So this is a small dinitrophil in half then it doesn't matter what it is There's a small compound bound to a very large protein and can you try to connect the virtual spring here? So this spring corresponds roughly to your lana-jones interaction We can measure how much energy it will take to pull this out and then you start pulling here with some constant velocity So if you in a typical simulation that might be One nanosecond just to make the units easier If in one nanosecond you want this to move away or it let's you know what let's give it Let's go five nanoseconds if you want five nanoseconds you would like this to move out say five nanometers away What is the speed with which you're pulling? Yes, or in more conventional units that would be Five meters per second Sorry It's insanely fast, right? So the problem is what looks very slow in your simulation If you're comparing that set up to that one five meters per second It's insane and this is a problem So what will you primarily create during your five meter per second pulling? Heat, heat, heat So the problem is you're gonna need to pull way slower. So this means this will require very long and careful simulations But people could do this even 15 years ago So here's the force measured in piconewtons measured as a function of the position of this cantilever And I think they go up to roughly 450 piconewtons Which is fairly typical and then they claim that this is the unbinding force they get which is quite true This is the unbinding force they get for that particular experiment But then you can plot this as a function of the speed So and the funny is that the slower well funny This is kind of obvious to you right the slower I pull so here you're at roughly 10 meters per second and Then the force of unbinding would be roughly a thousand but the slower you pull the lower This force gets because you're creating less and less and less and less heat and And this is what you frequently can't you can create a model for how this would work and then you can extrapolate this so that If you start to see a pattern here and in this case This is not a random pattern, but I won't have time to go into explanation of this particular shape of the curve But the point here is that if you can pull slower and slower So let's say you start with a one on a second simulation then a ten on a second simulation a hundred on a second simulation a One microsecond simulation if you're lucky you can start to extrapolate Because if you now start to see that these all your values They're starting to fall along a curve that would go to a constant value for very long times Even though you haven't really reached that value yet You can extrapolate and say that based on my fitting of my curve I expect the value to be in their case. I think was 60 piconewton very large times So I'm not gonna say about 15 years ago There was a very nice experiments that you could do better But the point is that even if the even if you can't really simulate slow enough to get this to correspond to the experiment The experiment would be like 10 to the minus 6 1 micrometer per second You can't get to that in a simulation, but we can extrapolate that and get a value that's reasonably close So why don't you do this? It's awesome, right? We could do this for all our compounds It's worse each and every one each and every black dot here has to be there's a standard bar there so each and every black dot has to be at least three different calculations and Some of these are really slow, so you're probably talking about a hundred calculate simulations here This would take this is an entire scientific paper just for one compound and they get 60 plus minus 30 So it's the error is astronomically large So this particular case was interesting because the protein the reason they did this is not because they were stupid It's a very close friends of mine who are smart scientists This particular case they actually wanted to understand what happens to the protein when you unfold it The problem is that it works in theory. It's not going to work to replace talking because it's way too slow So the question is is there some other smarter way you can get to obtain a hope? Sorry to obtain the free entities and We're going to need to go back a little bit to think about what we really mean by a free energy again a Free energy is always a difference between two states What does that mean? Well, I get back This is a small product thing I use this protein before fk5 fkBp, which is that's for fk501 binding protein Poor protein doesn't even have a real name. This is one state when you have the protein with this compound fk501 bound And we want to compare the free energy of this state With the state where you have the protein without anything and the fk501 molecule in water There's two clear states state a and state b and I want to compare those What is that characterizes a free energy? What can we say about a free energy? How does it depend on the way between on the path between these two states? It does not depend on it. Then why doesn't it depend on it? Hmm But why why is that you can't prove it? Hmm, but why does that mean what would happen otherwise? right Yeah, it's a very simple proof But really what you call in mathematics you will call this reductio ad absurdum that you assume the opposite And if we show that the opposite leads to something absurd Then the statement must be true, right? And the obvious way of showing this was that assuming that there are two paths one with one where I One whether one way the free energy difference is higher than in the other and if I constantly I Use energy to move the system along the cheap pathway And then I let the system move itself back the other way Then I'm back where I started but I just gained energy and that would violate conservation of energy It would be a particular mobile of the first kind, but this leads to something else. That's important What paths are you allowed to take between state a and state b when you calculate it? That would be the path nature takes What path are you allowed to take if you just want to calculate this you can take absolutely any path you want And this is something that comes back It turns out that some things are very easy to measure in the lab, but hard to calculate Other things are hard or impossible to get in the lab, but they're much easier to calculate So when it comes to calculating a free energy, which path should I choose? Sorry But the easiest for what? The no the one that's easiest to signal it Because if you do it in the lab, you're obviously going to get the one that sees the best one in nature, right? If you're going to do this in the computer Sorry, there are no brownie points for having a path that corresponds as closely as possible to the lab And that's essentially what you're getting if you're trying to slowly pull it But the free energy only depends on the final and starting states So let's see can we find paths that are unphysical, but they're smarter As long as the beginning state and the end state are physical and corresponds to my state if I can determine a free energy between them I'm good and that leads us to something slightly more. This looks complicated, but it's not all right. It might be but it's not that complicated Rather than taking a small component trying to pull that away from a protein What if we start with four states here? There are actually if you in general you can think about four things so this the yellow try the yellow Square here that would be my diamond that will be my ligand and Here in this state. I have the ligand in water. I don't care. There's no protein here up here I have my ligand bound to my protein. I Can also take my protein, but really without the ligand So the red thing here the red hexagon here means that I kind of I've removed the ligand Well, the core the atoms might be there, but I've removed the ligand so it does not interact with anything So that's think of that as say that's really the the system without the ligand the ligand doesn't exist there But had it been it would have been in that position or you can think a pure water box where you don't have the ligand either And for all these things you can imagine how expensive it is to move between these two states, right? So there the part a here would means to bringing my ligand close to the protein That's an extremely difficult thing to calculate Because then we will be pulling and everything so the point I want to avoid doing that I'm sorry, I should have rotated this a bit because I realized now that these four states don't correspond to each other That should be transposed So one point one possible state here is that I have my protein and the ligand is bound to my protein And now the thing is that if I take my ligand and I just gradually disappear all the atoms I know this is horrible. This is alchemy, right? You can't do that in chemistry But you can do it. You have charges and you have Lenard-Jones parameter So let's just gradually tuned on all these parameters to zero when they're all zero. There are no interactions left It's a valid calculation While we're doing it you're violating everything you can imagine in physics, but once you've done it You just have your protein without the ligand That is a well-defined state that is a well-defined state If I can just calculate the free energy between these two states that free energy will be a valid description of the change between these two states I can also take just the ligand in water That's also well-defined state and this is of course Delta G binding, right? How much does it cost to take the ligand and move it into the protein? But that was very expensive to calculate. I Can also have Take the ligand in water, but I gradually disappear this ligand I tuned on the parameters scale down the parameters the same way I did when it was in the protein Well a dummy in water that would mean just having water, right? But if we start with a protein ligand and then I move along these four arrows so that eventually I'm back with the protein ligand What is the sum of all these changes if my move one lap here? Zero why? Yes, and why does that imply that it has to be zero? Yeah, so well in particular free energy is a state variable, right? It only depends on the state and if I am in the same state by definition But that in principle means that the difference between Delta G 3 and Delta G 1 must be the same as the difference between Delta G bind and Delta G 2, right? The difference that's just a simple mathematics and Delta G 2 is zero because here I don't have any here I just have water and here I just have a protein I haven't really done I don't need to do anything to move between these two I have my protein and well this protein is in water too But the ligand doesn't do anything when I move between those so that is by definition zero So the free energy of binding actually corresponds to the difference is growing gradually growing this dummy slowly In the binding site, sorry gradually growing the ligand not the dummy in the binding site Minus the cost of gradually growing this ligand in water. This looks really complicated. Why do we do it this way? What would be the advantage of doing this rather than pulling? No in infinite if you could spend if the simulation was infinitely long the result would be the same So the only reason for doing this if this is somehow more efficient So the problem with doing this pulling is that I was I was moving the entire system, right? I was shuffling when I when I was pulling the ligand out. I was also pushing the water out I was creating heat Typically, this ligand is a tiny molecule might have 10 atoms or something in it So here I put these 10 atoms in a small cavity and just gradually grow 10 atoms It's a very small and very local change to the system. So it's not really going to perturb the system a whole lot and Same thing if I just grow these 10 atoms in water. Well, that's going to be closely related to the salvation free energy It's also very cheap. You could even some case in some cases people even just estimate this with the salvation energy So the points that both these simulations are actually quite cheap While that simulation would be outrageously expensive The other thing you can so that so this way you can get it And it's called the by the free energy cycle and the whole point is that rather than doing the one expensive leg We can do two cheap legs instead But there are some way cooler things you can do that In most cases, you're not interested in the binding free energy for a couple of reasons Assuming that you could do a simulation where your Error was in the ballpark of 1% That would be a good simulation And I would measure Delta G here the Delta assuming that Delta G3 is a thousand and Delta G1 is 995 so this would be a thousand plus minus 10 This would be 995 plus minus 10. So what is your free energy? Is roughly 5 plus minus 14. It's gonna be the square root of 2 multiplied by that So 5 plus minus 14. You don't even know what sign it is So the problem with this you frequently when you try to calculate absolute binding free energies and compare different compounds The problem is that you'd up taking differences between two huge numbers But that's bad because you really want to know what the binding free energy is, right? What do you what is that you want to know? Take us that back. Why were we doing free energy calculations in the first place? Well, yes and no, but there was even a step before that because if you want to measure how well it actually binds Then you need to know what the binding free energy is Compare is the keyword So what is that we were trying to compare? Right you want to if I add say an ethyl group here was this better or worse You don't care whether this is 900 or 500 All you care is about when you add the ethyl group that does improve or deteriorate binding And there are much by using the same cycle you can do Pretty much the same thing here so rather than Rather than trying to calculate the raw binding imagine that I have a receptor with a small mutation in it Then I can look at it What is the ligand bound to the normal receptor and the ligand bound to the receptor with the mutant and Here down I just have the receptor with and without the mutant here. I don't care about the ligand in the water So what this cycle would give me is actually how much does this mutation influence the binding or same thing in this Inhibitor assuming that you add a small ethyl group to it or methyl group That would be I prime Then we would see how much better does I pry bind compared to I So here you look and this now this way to get complicated the binding free energy is a delta G, right? But here we're looking at and that's a difference, but here we are looking at differences of differences which is Delta Delta G and Normally in 99% of the cases that's what you're interested in How does the binding free energy change the binding and it is itself a difference So it is the difference in the difference when you either when you do mutations in the protein or where you change your molecule and This is typically what for instance with Bill Jorgensen and others do they take one small molecule That has or a couple of molecules that have been Slightly promising in the lab at least the best one we could find in the first iteration round and Then you you could even have a small computer program design 100 small random not really mutations But these are chemical alterations to your compound. It's not the protein, right? So let's change this small compound in a thousand different ways And then let's calculate for each of these 1000 different ways based on the best possible pose we have in the receptor disease improve auditory rate binding These simulations you typically run in much less than a day on one computer So they're much much much much cheaper than absolute free energies One week later, you're now going to have a hit list that says for each of these 1000 changes Do you predict them to improve auditory rate binding and they pick the top 10 and synthesize those And then well what usually happens is that there's some rounds You're really lucky that in this round you'd really correctly guess how it bound and how it was placed and all ten are gonna be better Other rounds you might be unlucky because all the all the differences you introduced now cause the molecule to bind in a slightly different way Or you don't know what but something happened so that none of your changes were successful and then you can need to go back to the drawing table and see if you can do it better and And typically this is what you would then do is that you would go through a couple of dozens of iterations like this And as you're gradually getting better and better and better you would hopefully eventually have a picomolar binder or something This is getting more and more. I would still argue that this is the cutting edge in industry The reason why we've started to do this more is that the computers are fast enough that you can afford to do it It's better than docking the problem is that it's still like 10,000 times more expensive So what you're somehow trying to do is that can you create a funnel where at the start of the funnel You can have a very broad funnel and just go for screening as many things as you can but as you're gradually Decreasing the size of this tube right that as we have fewer and fewer and fewer compounds Can we compensate for this by spending more computational time on each compound and that's where you could use the free-end this I'll skip this part of the distribution because it's not that important So how do we calculate this in practice in a simulation well That depends One way is of course doing this pulling right, but that's what we decided not to do here So we're going to do this very simple things you have a state a and a state B If these states are super similar So similar that the compound is going to be bound in exactly the same way You can just take one stage and change the parameters to the other state and rerun your trajectory or something Because that would be if the coordinates would essentially be the same that they're not the entropy is not going to change They don't need to rerun the simulation Just take a simulation and change if I had other parameters that these other charges or something where I've added my ethyl group here How would it look like instead? This never works, but in theory you could do it This would require you to have the same number of atoms and everything right that it's a very very special case But in theory if there were no difference in phase space and if there were no differences in entropy You would not really have to rerun the entire simulation. Oh, sorry. You would not have to do a separate simulation You could just recalculated with different parameters And the point is that only the enthalpy would be different. There is no change in the entropy Then you're good. It's awesome. But it's a theoretical case The other thing that you could have is I think you mentioned this Lourines that If you have a very small barrier or something and if there are frequent transitions between these states You could just simulate it and count the populations like how frequently is this If it's combined in two different ways or if it's something very floppy loop or something And if you're going to see a thousand transitions in a short simulation Just do the short simulation and calculate how much is in a and how much is in b And then just use the Boltzmann distribution r, t, l and k right That works in some cases But the problem is that in general we're going to be in the third part here that you have A large change of free energy and infrequent transitions So the problem is that if the difference in free energy is large if this is a 10 or 15 k t The difference is going to be so large that even if there is no barrier between your two states Actually in particular if there is no barrier between the two states What would happen if it's if it's if it's 20 k t would it move back and forth? Right, so you would see after one nanosecond that it's 100 percent in the lower states of the one microsecond It's 100 percent in the lower states after one millisecond It's 100 percent in the lower state if you could simulate one second You might start to see some noise that it's occasionally in the higher state so that This only works if you can simulate long enough so that we actually What you call this in physics is ergodicity I'm not gonna ask about that at the test but that if the time average is equivalent to the ensemble average So that we're gonna need the way of fake to calculate this explicitly just as you did in the lab Remember that mountain that if you never cross the peak It doesn't help that you can determine free energies locally We're gonna need to find a way to force it to cross the peak And there are some beautiful words in physics called Hamiltonian that I haven't introduced forget about the Hamiltonian here This is really just our potential energy Modify the potential energy in some way that I can force it to go across the peak of the mountain And then I calculate Well, now I'm calculating something fake, right? I calculated the thing on my force system But that's fine Because I don't care about the free energy along the path. I just care about the free energy in the start and end states So what I do along the path is up to me Um, so if I can do any black magic is divine inspiration is fine here as long as I can then eventually calculate a good free energy difference And I don't expect you to know these equations, but I have to have them to explain what I'm doing So the Hamiltonian is a potential energy. This is all your Bonds angles torsions electrostatics and fundamentals interactions and to make things simple. Let's forget about the bonds and torsions Let's just think about electrostatics and lennar jones parameters So this is just the charge and the lennar jones parameters on each atom, right? And then so we have one set of charges in state a and another set of charges in state b In state b we can say that we have removed all the charges on the On the ligand so that the ligand is a dummy. It doesn't exist And in that case if there is some sort of parameter lambda that describes how I move between a and b The free energy is going to be the derivative the integral from zero to one Of the hd lambda with respect to lambda I'm not going to derive this for now And if we have no way you can choose absolutely any coupling parameter It could be a sign of whatever fun What we typically always choose is to make this linearly so that in one state lambda is zero and the other state lambda is one And then I when lambda is zero here, I'm going to be in state a and when lambda is one I'm going to be in state b a completely linear change Which means that I gradually scale up my charges in the simulation There are some nicer ways to do this, but we're not really going to care about this So what you do in a simulation that's slightly more complicated than what you do in the lab, but not a whole lot Either you could gradually change the lambda in the simulation so that I start if I have a million steps I would change lambda by one millionth at each step But that leads to the same type of problem as if you were pulling that you're gradually always changing the system It turns out that it's much better to just pick 10 points. It is actually it's usually sufficient with 10 points The reason why this works is that by default if lambda zero is one part one one side of the mountain and lambda One is the other side of the mountain The simulation that runs here will never sample that state the simulation that runs here will never sample that state Because there are in different sides of the mountain But when we gradually force them over here, each simulation here will sample a little bit of its neighbors That means that I can calculate the relative free ends I will skip that part. There are some tricks that we need to avoid To we need to use to avoid atoms bumping into each other, but I'm not going to So what you would do in a simulation here is that and again, there are automatic tools to create this for you So you typically don't do it manually That you have a you have a simulation that you're describing with two topologies. Remember those topologies you had in the labs, right? So once that would describe I have all my charges And then I have another b state where I've set all my charges and the lennar jones parameters to zero And then I calculate these two simulations This d h d lambda is something you get in the log file And then I just integrate these numbers and boom, I get the free end of the difference This probably looks a bit complicated, right It is complicated because the mathematics is complicated The beautiful thing is in a simulation. This is dirt simple So what you do is that you use either tools or go to web server and say that you know what I would like to calculate the Binding free energy or the free energy of salvation for say sector hexane You will get all the topologies automatically And you run this once in water and once in vacuum This will already be in the energy files Then you just run one grow max command to integrate this and you're going to get all these d h d lambda as a function of your lambdas And then you will integrate these curves to get the differences So all these and this is not just grow max all simulations program can do this automatically for you The neat thing for a free energy calculation. I think I we're talking about less than one hour of cpu time here So this is something we can do not just for one molecule, but with thousands of molecules We can even use the amazon or something. You don't need a supercomputer for this This works really well. There are so slightly more advanced techniques to do this But this is really how you calculate free end is the modern simulation codes The point here is not that I expect you to be able to read you write these equations But you should know that this is not the mathematics behind it is hard doing this in practice is easy very easy hmm So that but that's that's the that's a good question That's the cheating is essentially what i'm doing here The reason why you generate heat when you're just pulling it out is because there are very large barriers And all these in that case the barriers corresponds to bumping into the water or send everything So what i'm doing here is that i'm i'm not literally crossing the peak of the mountain, but i'm digging a tunnel under the mountain Because gradually rather than forcing if i'm going to pull the molecule out from the protein I might need to force the entire protein to open up or something. That's a very expensive process But by gradually disappearing the molecule inside the protein and then gradually disappearing it outside the protein I literally never ever have to cross that part where i'm pulling the protein out through the To the narrowest funnel or something and that corresponds exactly to digging a tunnel under that mountain you had in the lab Which is of course cheating But if all if you're interested in knowing is how high it is on the left side versus how high it's on the right side That's enough It's not going to be enough to measure the peak of the mountain The reason i'm saying this is that there is a surprising amount of simulations in the literature where people show that They can use large-scale simulations. They can show the binding if you're in a pharmaceutical company. You're not interested in that Well, it might be fun to know a little bit how high the barriers are and everything But at first approximation you just want to know what the binding energy is The free energy calculations are much better than just doing the brute force trying to stimulate the entire binding and unbinding There is another way that we can define so this works great if you have small compounds you're binding, but There's another very simple way to estimate free energies And that's something called the potential of mean force. I think i'm not sure you did you introduced this in the lab You might have but you probably didn't think about it in this way So what is a force? so The definition of force is just that it's a negative derivative of a potential Well, it should be negative there to bet it doesn't matter here So force is the change in the potential with respect to unit energy But on average an entire long simulation and i'm not this is actually it's a fairly fun proof But it's hard the average force in a state over an entire simulation Sometimes we even if it's the same state Sometimes the force is going to be positive or negative and the average force is actually the derivative of the free energy with respect to coordinates And the difference is in a long simulation the reason why this is the free energy is that on average in the simulation We also get the entropy effects in here, right? So while if you just pick frame frame 14 That force is going to be the derivative of the potential energy in exactly those positions But if you look at the molecule in the state where it's bound for instance On average if it doesn't change the average force is going to be zero because it's happy there If it would on average like to move slightly to the left Well, in that case the would force would be pointing slightly to the left, right? This is a very deep statement that averages of energies in simulations corresponds To the free energies which i'm not going to prove So in theory if I just run a simulation and calculate the average force as a function of something For instance with respect to a coordinate or something lots of different pick 10 different simulations And I calculate what is the average force in each of these simulations If I integrate that average force I get the free energy along those coordinates and that's what you used in this umbrella sampling We didn't call it potential of mean force, didn't we? There's one minor challenge You're going to need what to we're going to need to know what to apply the forces between because there's always a force can Ever act just a one particle. It's forces are always pairwise, but I'll forget about that for now And this is called a pmf or potential of mean force so if I just If I calculate the average force and that might take each simulation might need 5 or 10 nanoseconds And if I then calculate this and then I integrate that force then I get an Free energy that really corresponds to the average of the force And that's why people call it the potential of mean force. It's not really more advanced or complicated than that So remember what the difference when we talk about free energy free energy is intimately related to doing work on the system Right and doing work corresponds to forces So here I'm actually using that concept again if I'm integrating the average force that is the energy But it has the simulations have to be long enough that this really is the average force in that state Over an entire simulation or something. I think this is going to be easier when I show an example There are a bunch of different ways to do this I'm not going to go into too much details there But in principle you can have one molecule here and one molecule there and then have some sort of spring between them And if I now force these to be at different distances from each other, this could be I'm not sure what Say two different ions if I force these to be at different distances from each other And I calculate what is the average force at each distance Then I would get a curve that describes What is the average potential as a func of these two ions when they are either infinitely far from each other? Or when they get closer and closer and closer and closer And this so this is relatively similar to these umbrella samplings By far the easiest way is to look at this when you think about a membrane So if I have a small ion or something here or let's say an amino acid side chain If I take this amino acid side chain and force this to be at either out in the water Or at different depths in this protein, right? Sorry, then this membrane Out in the water. Well out here the force is going to be roughly zero If this is a very hydrophilic compound when I start pushing this into the membrane The membrane is going to try to push this molecule out, right? But I can still force it to be in here with an umbrella potential and I just measure how much would the force on it be And what I would then get back is what is the free energy of having this out in the water would be roughly zero And then so we're in here would be very positive because it's bad and then it would go down again So here I would get a curve along one dimension that how expensive it is to insert something in a membrane This works really beautiful when you have a simple reaction coordinate So the reaction coordinate in this case would be the z coordinate, right? It's not as obvious if you had something like a ligand bound to a protein This would corresponds to pulling the ligand out of the protein which would be bad. We're not going to do that Let's see if I I thought I had some examples here. Oh, yes, sorry You know what? I'm gonna I'll jump back to the other slides after the break I actually had a talented student to do this a couple of years ago Remember that problem I showed you with the apparent hydrophobicity scales That was a bit strange how things inserted in membranes And we could measure that biologically So we measure what is the probability for different helices to insert through the translocon or not And the free energies were much lower than we expected in from physics So ana actually Was a PhD student here some 10 years ago. She actually did she set up a bunch of very simple systems And then she forced amino acid analogs So this is just a side chain part of an amino acid and forced that to be at different depths Some hundred pointer so across these very small systems And then she measured what is the force required to keep it at this position And sorry, this is a small plot, but so what you have here All these black and white points they corresponds to measured forces What is the force positive or negative and then when you integrate those forces you get these blue and red curves The only difference are one of these have been symmetrized because the membrane should be symmetric So you see arginine is very expensive to insert in the middle of the membrane alanine is cheap Glutamine is well somewhat in between histidine is relatively expensive too So she could measure with physics. What is the cost of inserting this in pure bilayers just with lipids? And at that point you can start comparing these two scales that What is the in vivo hydrophobicity scale? Versus what is the the physics based hydrophobicity scales and sadly or interesting depending on what the point of view was Physics is still true when you do it in the computer simulation, which is good for physics, but we couldn't really explain the results So what ana then continued to do is that she went on to show that What really explains this is that in real membranes you have lots of helices and everything And in particular right next to this translocon if you're putting an amino acid here Sorry the analog right here and just put it to the membrane very close to the translocon All these helices will stabilize it so much that you get beautiful free energy curves that are much lower And that would agree roughly within vivo hydrophobicity And this is the later work, but I don't think we ever published this actually But this is just an example that you can even calculate how expensive it is to pull different types of amino acids through the inside of the translocon And show what happens on the very inside inside the small funnel or poor here This is just to give you a taste of the What you can do with potentials of mean force I will spend another five minutes to go through the errors because then we will go back to the research after the break Sorry, this is the slides that I should have had in a different order So the problem with all these simulations is that as beautiful as they are Computers can only give you numbers There's this famous quote by Pablo Picasso that the computers are useless. They can only give you answers And this is sadly the problem here So the result you're getting now you're sitting and doing this computationally and your result is minus 23 kilo joules per mole Is that good? What is the quality of that simulation? Right, so 23 is just a number. We have absolutely no idea how good or bad that simulation was This might have been a simulation. That's way too short. It might have been a simulation where the protein unfolded At some point you're going to need to assess is this something Well, basically you're now going to have a major presentation to the CEO of the company next week And based on your presentation, they're going to decide whether to go ahead and synthesize compounds for 10 million dollars Hopefully you should have some because you're basically putting your job on the line here, right? If you're going to recommend that to do this you should have some pretty darn good confidence that This is a good binder is where it's going to market not just it might or might not be is not good enough So we're going to and this is sadly one thing that's not really specific to free entities, but people are missing a lot Uh, we are really bad at assessing accuracy not just in simulations. It's just as bad as experiments So there are a couple of things you can do in simulations Actually anything that has to do with free entity If it takes five kilo joules to go from state a to state b how much does it take to go from state b to state a? Physics is reversible So if you add something in there You should be able to go. Sorry here. We're here. We're removing it and here We're adding it by definition. These two should be the same but with different numbers No matter how you do your simulation You should always be able to do the opposite simulation and you should just have a difference in sign When you do this you're not going to get the same result just with the difference in sign So it's very common that if you start Let's say that I start Let's see If I have two states a and b and b really should be five kilo joules per mole lower And then when I move it from state a to state b I only get the difference of three kilo joules per mole It's stuff that I brought up earlier today Let's see if we have a pen here Yeah No Oh, I'll do hand waving. I'm pretty good at that So if I have a state a here Zero and the state b that's minus five Typically when I move there although I would expect hope to get minus five in practice I'm not going to get that. I'm always going to get something that's higher. So minus three Where did the two extra kilo joules per mole go? Heat yes because there is some friction any finite simulation will generate heat But that's fine So then I am now at minus three And then I go back To zero so that I would expect this to be a difference of five What will frequently happen is that I get heat in that direction too, right? So I'm gonna have five, but then I will lose two in each two So I get minus three in one direction and plus seven in the other This is called hysteresis So that's in theory. I should be back at the starting point But when I go back and forth says you've generated heat in both directions You do not get back to the starting point Can you use this in some smart way? How could you factor it in? No, that's hard because you don't know how much heat you generated or friction or just moving the system around versus anything else, but When do you expect to generate most friction when you're moving forward or backward? So is there something is there something physically special with forward? So the laws of physics don't have a sign, right? So it's not more likely for something to happen enough Well, depending on if it's a very specific system, uh, there might be difference, but the first approximation we don't know And if we don't know Let's just assume that we generate as much heat going backward and going forward And if half the error was in the forward direction and half the error in the backward direction Then I can just average these two, right? Because I know that If I started at a and then I went to b and then I went back to a and if now it was four kilo joules per mole higher Let's just assume that it was two kilo joules small in heat in each direction And then I would have roughly the answer So that's cheating a bit of course because we don't know that that's the case But you could you could estimate the hysteresis this way and hopefully get rid of a bit of heat You can all see this the slower you do your simulation the lower the hysteresis should get So if you're lucky you should be able to extrapolate it And this really comes back to anything that any simulation or experiment will always contain errors So what is an error? Do you know the difference between standard errors and standard deviations? This is where I would need a pen I would suggest that I take a break here and what time is it now it is 10 39 Should we meet here at 11 and then I'll go and get a pen and then I'll just bring up this definition This is something that you're going to be lots of use for you not just in simulations, but in experiments too And after that I have time to talk a little bit about our research All right, let's get started again. I talked before the break about estimating errors So this is something I'm going to need to do on the whiteboard And let's just see if we're really lucky now that's recording. We'll get the whiteboard otherwise not Do you know what a standard error is or sorry standard deviation? So first what is an expectation? Do you know what an expectation value is? So normally mathematics or mathematical statistics you talk about a random process and a random process Is determined by something called the distribution function, which is just a mathematically fancy way of saying that For instance, if you're drawing random numbers and this is some sort of frequency That is how frequently do I get the number and this is just what number am I talking about? So this would be a For argument's sake, let's say this is 10 So that's the most common number would be 10 here, but I can also get numbers that are smaller or numbers that are larger And the reason why I draw this as a Gaussian Is that any time you just add up a lot of random numbers they will eventually get to a Gaussian The expectation value is the true value that actually describing this Distribution function And that's a mathematical home today if you write a program or something you can define that That is the number if this is say the binding energy, this is the true binding energy You don't know what the true binding energy is So the expectation value you occasionally use capital e or something Or occasionally an x with a dash over it This is the true value 10 exactly Now no matter how many experiments you do or how many simulations you do you will never get exactly 10, right? So how could you try to estimate the expectation value? That's easy Yes So like if you just what is I'm not sure A dice is a bad example if this is a normal distribution, but you just take lots of random samples and calculate the average And the average is what I you typically denote that in brackets like that So if you just do one number I might get eight So with one number The average is eight It's not an awesome estimate of the expectation value If I have a thousand numbers The average might be 10.00892 Pretty good approximation So the difference here the expectation value is the true value what it is And this is my current estimate of the expectation value But remember that those They can be that this can be really good or really bad How good this is depends on how many samples you take And this you probably all know instinctively, right that if you take more samples, you're going to get a better estimate but what decides What decides how What values you can get because on the one and this is 10, right? But for one type of example, you might assume that you only get values between 9.99 and 10.01. So it's a very narrow range you get values in typically While another process you might frequently see values of five or 15 And this is what you describe with the width of this distribution That is how Wide is it here or actually how wide is half? And this you typically describe with a parameter called sigma Which is the standard deviation So the standard deviation just tells you as sorry just to be clear I'll I'll actually erase the second arrow there because sigma is half that So sigma just means if you pick one random sample, what is What is the expected spread in this? This is very common to see in the medical literature that if you see a value that it's 10 plus minus one The one should really be The width here sigma and that's True, but 10 plus minus one that does not mean that all your values will be between 9 and 11 If you draw this Plus minus one sigma here The area here Do you know roughly what What the probability is to fall within one standard deviation of the Expectation value 68 percent This makes it somewhat scary with the number of md's in particular Who think that this means you will never ever get the value outside 9 or 11 Yeah, like only in one third of the cases 95 percent is also good the 95 percent measures something else that measures two sigma So if you go two sigma out That entire Rates would be another that pen is swearing out So the probability to fall within two sigma for a single sample is 95 percent But this still that means a one value in 20 falls outside this There have been some remarkable failures in the scientific literature when people do this Because what they do is there's a famous xkcd about this too that People test something and then they find that it's They get the first they test it just whatever a genetic trait or something. There was a paper on ki about this recently You test something and then you find the result is negative. Okay. That's bad Then you take the next thing and then you find that the result was negative. No correlation. Okay You test the third thing and we decide that there's no correlation By the time you've tested roughly 20 things You will have found one correlation Or rather you will have found a false correlation, right? Because if this is 95 percent roughly one out of 20 Is going to be false one out of 20 times you will see a result that falls outside this and it's completely normal So there was a paper on carolinska institute about a year ago when they argued that under some cases genetic acquired properties could be inherited Very fun, but it did not work. Was it it worked mad the grandmother to son I think that was the only correlation they found all other correlations were not statistically significant And the problems they done exactly this they just kept looking for coral they had One out if you look at enough if you look at enough properties Sooner or later you're going to find a spurious correlation and that does not mean that it's a result And it was very fun because it was a professor ulle hegström Which professor of mathematical statistics in Gothenburg completely debunked the paper and asked that it should be pulled They still haven't pulled it But it's a completely bogus result had nothing to do with reality So why do I keep talking about these standard deviations? Well It turns out the problem is that you never know what exists This is the real binding energy, right? So what you want to know is how close is my current estimate to x? And this is what you want to show you want to say that I think that my value is 10.842 plus minus 2 So that means that I think currently based on my measurements I think that the value the expectation value should be 10.8 plus minus 2.12 2.1 perhaps So somewhere in that range I expect that to be Now this is a problem because Or as a problem this is not the same as the standard deviation. The standard deviation is I just take one sample It might be between 9 roughly between 9 and 11 The second sample I take is going to have what distribution? Exactly the same distribution, right? The third is going to have exactly the same distribution The 9,946,843 value hydro It still can have exactly the same distribution So the average value and the standard deviation I get for a single value That will always be the same That's the property of one value I draw But this is different because here I want to say How well do I estimate the expectation value? And this value is called something else. This value is typically labeled s for the standard error And the difference here is that the more samples I draw the better I should be at estimating The expectation value, right? So if you by the time you've taken an infinite amount of samples here s should go to zero Sigma is constant. Sigma will never change What you can show is that if all your samples are independent s is going to be Sigma divided by the square root of the number of samples And this is not entirely trivial to show and I'm not going to bother you with doing it But the reason we're showing this is to get you to understand what the difference between Sigma s which is a constant of that's a property of the distribution And the standard error is whenever you're reporting anything a binding energy or something you want to report this as accurately as you can And therefore you want to get the standard error as small as possible So sigma is standard deviation s is standard error And sigma is very easy to calculate if I just if I just keep drawing samples, right? I I can count And if I know what they have I the standard deviation I can simply get from there are normal statistical formulas for that But I basically take the square of the distance to the mean divided by the number of samples and the square root of that The only reason we're bringing this up understand the difference between standard error and standard deviation So then we'll go back to the slides here So what I really want to do we want to minimize the standard error or actually no you don't want to minimize the standard error at all What is the first thing you want to do? You want to know what the standard error is, right? It doesn't matter Well, you could argue that it's not the end of the world If you if your simulation was bad Because you know that the binding energy you estimated is so bad if it's minus five plus minus two thousand You're not going to suggest to the ceo that you spend a billion dollars on this project, right? But if you know that your binding energy is minus 50 plus minus 0.2 You will never know with that. Well, but the whole point is that Being good, but not knowing that you're good doesn't help It's frequently much more useful to be decent, but you know you're decent So know your errors and this has nothing to do with simulations. This is just as true in experiments The only problem here is that these numbers this is the theoretical beautiful case that every sample is independent of another sample If you have a simulation, there's two femtosecond between the frames. Of course, they're not going to be independent, right? And it's the same in an experiment if you're measuring on something twice Are those independent measurements? Probably not because you're measuring on the same system and that depends on what the relaxation and everything is in this You might need to wait 30 minutes for the next measurement for it to be really independent. It depends There are two ways to handle this Which again, you do this in simulations or not, but they have nothing to do with simulations really so if you have You can split your data into blocks So if you have a simulation with 1 million frames And that's assumed that that covers a millisecond From one nanosecond to the next nanosecond. We might have correlation But if I group these into blocks of 1000 so each block corresponds to one microsecond of simulation time To first approximation. I might say well microsecond two is independent of microsecond one And I don't have a million independent samples, but at least I have 1000 independent samples This is something you might have done in bioinformatics too. It's typically called jack knifing or something And then you can use that to calculate the number of samples here. So that would be the square root of 1000 Does that tell you something? The square root I forget about the formulas for now So what's the problem with that square root? Assuming that your simulation is not good enough you would like to You have a standard it's a you might say that you have a binding energy of minus 10 plus minus 10 So it's not hopeless But the standard error is so large that you can't really be quite sure about the sign And you would like to reduce this say by a factor of two So how much how many more simulations or how much more data would you need to reduce the standard error by a factor of two? four So if you would like to reduce this by a factor of 10 You're going to need a factor of 100 more data. This has nothing to do with simulations This statistics it could be measurements anything. This is the curse of all measurements That it becomes well the number the amount of that that you need goes up as the square of what you want to achieve And this is why it's surprisingly I had a friend I still I said it's called Zebdoniak who's a professor at Stanford let's say And he used to say that he doesn't believe in all the statistics and everything I I don't I'm probably paraphrasing it here a bit But when he looks at data he wants to see the data right away that the results just shines out that it's obvious If he doesn't see it right away He's not going to believe it just because you go through a lot of statistical processing and everything Now this might sound horrible in particular for Zeb who's a professor of statistics uh But the point or statistical physics actually But the point here is this is that if your data is so weak That is just on the boundary whether you can see the signal or not For this result to actually be useful later You will likely need a factor 10 lower error and then you're going to need a factor 100 more data But if you have enough data, that's a really strong signal. You will likely see it right away So that doesn't mean that you should forget about statistical processing statistical processing is very important to actually prove you right, right? But the point is that at some point you're going to need to be able to Don't forget you got feeling being able to prove you right is important in a paper But to decide do that back of the envelope calculators. Let's see is this realistic And if you realize the two minutes of paper and pen calculation, I'm going to need a billion times more data You don't go out and start simulating the whole point. You realize that this is never going to work I need to think of something different and you just save yourself 10 years of work The other thing that you can do is that This blocking is a bit stupid because I might actually have more than 1000 units of data here This depends on how quickly these curves Fall off and the correlation. There are lots of mathematical ways to calculate this So you can calculate out-to-correlation functions that describe how quickly Whatever number we have how quickly it loses its memories and how independent different samples are The reason for showing you these equations is that you have a choice here Either you can start going through these equations doing curve fitting and matematica And there is some brace that I forgot there making sure that you use at least two different Fitting things here. There are lots of free parameters Or you just run one of these both grow max and most other programs have a command that you can do this automatically for you So if you just have a time series of data, there are usually programs that can tell you what is the expectation value and what is the standard error No, sorry, what is the mean that is our approximation of the expectation value and the standard error? Excel has this too So don't worry too much about it in practice But the one thing you need to understand is the difference between standard deviation and standard error Because that will help you a lot in life And who knows it might even help you from publishing some embarrassing papers that professors publish No, but the reason I'm getting these are not assholes right that this is hard and it's so easy to Make a mistake and these mistakes 20 years ago You used to have a statistician in every department that could help you But the problem with computers now in particular in this age of big data Suddenly you can have access to billions of genome sequences. It's very tempting to just start doing your statistical analysis on it But just be aware that it's so easy to make a mistake So what should you do? Be a serious. I don't this is not a course of matter madam. This is not a 12-week course of mathematical statistics I don't expect you to be experts on this What what are you going to do when you work out in that company? Or when you repeats these students here? Yes, but there are other things you can do In real life, it's occasionally allowed to cheat You ask somebody You ask somebody who actually is a professor of mathematical statistics Or you of course you can't keep running to the professor of mathematical statistics every week, right? But yeah, when you have this awesome result that you're thinking of submitting to nature Send this manuscript to a colleague and see if you know anybody who's really good at statistics and ask them to have a Look at it And say that I'm not an expert. Do you think this is right and then explain what you did? And at that point in my face palm. No, you can't do that and yes, it's a bit embarrassing But trust me, it's a hell of a lot less embarrassing than having this exposed to the world and having to retract the paper So don't the key thing with the statistics understand The first reality check here you need to understand when it's so difficult enough It's actually time to ask somebody who knows it better than even I do that And I have a phd statistical mechanics, but there are things in statistics that I don't dare to I don't trust my own judgment on it There are lots of ways of doing this The only the only point of doing all these curves and everything that this is really the difference between saying that something Is minus 15 kilojoules per mole that is useless That is useful Actually minus 15 plus minus 10 is still fine Because minus 15 plus minus 10 still says that it's a fairly good probability that this is a better binder Just minus 15 pointless you have no idea what the spread is And you to sum it up you could also argue that is kind of the whole point of doing free energy calculations, right? In docking it's virtually impossible to get these standard errors But here you can actually start to say how accurate the binding energy is There are tons of caveats that too And I think that pretty much sums up what I had to say about free energy. I have a bunch of study questions for you there too Since this is your last lecture I'm not going to go through this the normal way and tomorrow morning I have a meeting at kth, but I have scheduled time both on let's see Wednesday and Thursday morning And I actually just scheduled an hour, but that's I will be here for three hours if we need it My reason for scheduling one hour is that I will be around 9 a.m I forgot what room it is, but I'll know you about that And the idea then is that For as long as you have questions I'll be around and then we can keep asking but I I want to entertain you I will respond to absolutely anything you're asking me But I won't go through anything we talk so I was thinking about I can actually I do have a set of Summary slides that it uses on point of the course Considering I recorded the entire lecture I don't the summary slides are just I've taken slides from all the other lectures and I go through them fairly quickly I don't think they're particularly useful for you But I could I can actually make a link to an old version of that if you want to The other thing that I figured you could have went when I talk about is I'm going to tell you a little bit about how we're going to do this at the exam And then if we have time I'll talk a little bit about our research My idea with the exam is that I'm going to have two parts of it So the first part will be a bunch of very simple Not multiple choice questions, but questions that you should be able to answer with at most One sentence frequently just one word And those questions are pretty much just going to be taken from the all these study questions Those simple stuff that kind of do you know the course? And you will pass the exam if you just have those right You won't even need to have all of those right to pass the exam And then the second part of the exam is going to be slightly longer questions Where you actually have to reason a bit and think a bit Not things that are obviously just taken from the slides Like some of the questions I've been asking you here in the morning send everything That say if you work in a pharmaceutical company and you're tasked with doing something How would you do it? Those are not going to be quite as obviously right or wrong But it's going to be a bit testing how you can integrate things And if you want to go for the highest grades That's where you want to then you need to do those two And they're not possibly not on the easy questions But on the more advanced questions I might we will include some of the stuff from the labs too So I'll make sure that Björn and Dari is surrounded Probably both Wednesday and Thursday to ask questions Rather than you asking me questions now I think I'm going to take a little bit of time to quickly very quickly There's 20 slides but I won't go through all of them in detail To just not so much make ads for our research But show how we and others use this type of things in Modern research projects So this is just one rather arbitrary research project Where we use free-ending calculations a lot And that has to do with these ligand gated iron channels that I've showed you I won't ask any questions specifically about our research on the exam But again the concepts all go back to the other stuff we have done in the course So for me this is a great old slide by Sidney Harris We used to do cartoons in Scientific American And this is kind of the story of our research life I feel It's really a pain when you're in this nomad land between physics and chemistry But it's also very fun because if you want to do research It's a long time it's more fun to do research on the white spots on the map So this started as an old project some 15 years ago when I was a postdoc in Stanford And I had this crazy, literally crazy anesthesiologist, sorry Show up and become, oh he wanted to model this ligand gated iron channel Yeah, apart from the fact that there is absolutely nothing There wasn't even a structure available Even down where the structure wasn't published until five years later We had some rough ideas about the linements and I had just been He had been auditing a course where I was TA in bioinformatics And so it was one of those things that it's crazy it's never gonna work But he was insistent that we kept working together And this developed into a great research collaboration of friendship These channels are fun because they're so intimately related to lots of human culture This, you know what this is? This is ethanol From Babylonia I guess something So it's ethanol is the oldest drug known to mankind We've used and abused it for 4,000 years Nicotine, it's another slightly more modern drug But it's also fairly ancient Benzodytesapines, very new drugs They all hit the same receptors At one point I gave this lecture in a monastery on the topic of sex and drugs and rock and roll Propofol, one of the most widely used anesthetics in the world This is cheap, cheap and efficient And you can recognize this milk white compound because it's an emulsion So it's lipid, it has to be dissolved in lipids And that goes back to these First instances of anesthesia That was obviously not propofol but ether and Edward Abbott So both ours and Ed's interest in this case Was that we were interested in understanding what happens in these ligand gated channels And again people knew, we know quite a lot about them from a functional point of view But there were really no structures available And the other thing, and I'm not sure I might have used this slide The other strange thing that was also known experimentally That they were so promiscuous That they can either inhibit or polarize the membrane They can lead pretty much any type of ion There are just two changes in amino acid down here And one amino acid you remove And then you can turn an anionic into a cationic channel And in particular we were very interested in these anesthetics that I spoke about before And that has to do with this Meyer-Overton hypothesis That they're not just going into the membrane But there actually are specific binding sites in these channels Very much related to drug design But it's not the plain, the type of drug design that we mostly spoke about today And last week was the plain one, right? We have a drug that binds to the primary binding site, the ligand So this is rather a drug that acts allosterically If this drug is bound, it amplifies the effect when you bind the real ligand Or agonist Kind of like a transistor When we first started out, we knew almost nothing about these channels We spent a lot of time building models and everything The cool thing is that the last few years There are now a whole range of these different structures This was August last year In one issue of nature, there were three new structures And they've kept coming since then There are, let's see which one Even since then, there is now a crye-EM structure Like the facility There's a crye-EM structure of the glycine receptor Which is really cool And we're going to see the GABA The heteroameric GABA receptor in a couple of years We're trying to get it to We'll see who succeeds first But what you can do with simulations Even are very simple and plain homology models Based on the bacterial channels that we started out with And over the course of roughly a microsecond or so You get beautiful results where you can see Here I'm just showing one sphere But this is actually an entire ethanol And we start from red and go to blue And it moves in here through the membrane It reaches the binding site here Where it spends several hundred nanoseconds And then eventually it can diffuse out again So this is just one example And we were very happy with this binding site Because this was right next to some of the very important residues Functionally very important residues So we were, we published this And we were super happy that we have identified These blue binding sites in particular And we show that this were interacted perfectly With all the serine, serine 267 residues in particular That was so important If you knock out that residue It's no longer going to bind ethanol This was a fairly simple project But it's also typical in the sense that A neat way of using simulations Is to understand something conceptually We know that what it done Like we know roughly what it did functionally But you have no idea exactly how this works And in particular If you want to start mutating this binding site There are probably at least 20 residues around this binding site And if you just start mutating them randomly Well it's going to be pretty hard to predict Whether that's going to increase or decrease The strength of binding Or if it's rather going to have an effect Of the structure of the protein So it's very common both for us And others use relatively short simulations Simply to understand binding Kind of again The other thing I mentioned in a drug design company Just sitting and looking on the screen How would this bind And if you would like to create a better drug here Could you get something that also interacts With this residue Create a small extension of the drug That's slightly larger Basically try to fill out the binding site And do this Three year old kid puzzling And getting something that binds as good as possible From our point of view We were very happy with this Because this blue site Sits right between the subunits And this particular channel Well not the prokaryotic one But the glycine receptor which this is We predicted Well We predicted that if you bind the molecule here It's going to push the subunits apart Which would open the channel a bit Which is what you see experimentally It opens easier when you have these ligands bound We were super happy But sometimes your closest friends Keep your friends closer But your enemies closer now Pierre Jean and Marc They're very close colleagues And I worked with Marc for several years actually This is a co-crystal Published roughly a year Half a year after our work or so Where they had managed to take The prokaryotic protein Glycine And co-crystallize it with Let's see I'm fairly sure that's propofol Could be there to know That must be propofol Propofol here was bind Entirely inside each subunit And this would have been awesome Had it not been that my team We just bet our entire bank And significant career On having binding up there I've been wrong lots of times Being wrong is not a problem But what really troubled me with this Is that It's one thing to be wrong It's one thing to be wrong When you don't expect to be wrong Because this was one of these cases We had beautiful expectation values here And small standard errors So it's not just You know if you even one simulation You see that it might be here That doesn't tell you anything But we had very strong quantitative data That we really expected it to bind there And we so couldn't explain What happened there That's I think I have the Yes I still have that joke At some point I hear that At this point it's time to stop simulations And rather do experiments is that But to quote another great scientist Donald Rumsfeld's There are no known unknowns And unknown unknowns Right And in hindsight there were a bunch Of mistakes we did in this study The first one is that We're doing There are too many variables That we could not control This is an homology model We have no idea how good That homology model is The other problem is that Glick is a channel That is inhibited by ethanol While the glycine receptor Is one that's potentiated by ethanol So the problem is that This is the structure of Glick But we were not really modeling Glick We were modeling the glycine receptor So at this conference He gave his talk after mining He was very clear He said well in theory There could even be two binding sites And at that point I felt Well yes he's just trying to be nice We in thought I had a glass of wine Together afterwards anyway But the take home message here is that When there are too many unknowns In this equation You have no idea where we went wrong So the first thing we decided to do Is you know what? Let's stop doing this All the functional data That we and everybody had Was primarily on the glycine receptor So we teamed up with a great postdoc From Austin Who's studying Glick directly And Reba is actually coming here She's going to start working here in June So at that point We decided to start doing electrophysiology On the bacterial channels That have pretty much been neglected Until then And I'm not sure how much I talked to you about this But this is very simple You order frog eggs From Germany They cost roughly one Euro cent a piece We get them in a package And then you just inject some DNA in it You might see these very narrow glass pipettes here Right there Maybe 10 to 20 micrometers In the very front there And then you inject say two nanoliders of DNA And then you put it in an incubator Which is just a fridge For a couple of days And let it incubate at 12 degrees And after those three days There should be 99.9 percent Of all the membrane proteins On the surface of this small frog egg cell Is going to be my membrane protein And when you do this We can measure it And again with a similar setup We have these two pipettes We use one of them to adjust the voltage And the other one we measure the current And then you get very simple curves like this So this is a pH gated channel So pH 5.5 it opens And then we try it again At pH 5.5 it opens And this is pH 5.5 Where I also have methanol added And then it opens stronger And then it opens stronger And then I go back to not having the methanol And then it opens weaker So here I can really show that Methanol potentiates the channel And the funny thing that we can show Is that methanol potentiates it strongly Ethanol potentiates it just a little bit But the second you are at propanol or stronger You inhibit the channel You turn it off instead What is the difference between ethanol and propanol? It's just one ethyl group They are all aliphatic alcohols It's exactly the same series of compounds And it's something that you hardly never see this It's also beautiful That the longer the alcohol gets here The more it inhibits So from propanol there and all the way down to necronol Stronger and stronger and stronger effect So there is something strange that happens That depends on just the size of the chain here So suddenly it's not just that it changes the effect You literally switch sign of the effect And you can actually show that We can make some mutations in this channel That in particular one residue called The number 14 residue in one of the small helices Facing the pore Where we can increase this effect This is 400% I think we have one that's 800% even So it appears to be super sensitive When I replace the phenylalanine Let's see, yes There's a phenylalanine far down in this One of these binding sites So what we were actually able to show Both in experiments and simulations Is that there really are two binding sites So the cool thing in the wild type And this is based on a computer model In the wild type you have this large red binding site And this is a bacterial channel And by default that would be the only binding site You had in a bacterial channel But when I do these mutations I make the channel This was not random mutations of course I take the bacterial channel And mutate it to look more like the glycine receptor More like the human channel And when you do this You open up a second large cavity Because an alanine or cysteine Is much smaller than a phenylalanine So now we open a second binding site So what happens Likely is that the very short alcohols They can fit in this blue cavity And that cavity opens the channel But as you make The alcohols larger and larger and larger They can't fit here But they will fit here And they will start to bind here instead But in this case You can fit much larger alcohols in the blue cavity And you can actually show this with simulations I'm not going to go through the details there But you can actually show that If we simulate the wild type glyc We get exactly the same binding as we do in the experiment Which is a bit of a cursor simulation Once you already know what the experimental result is People in simulations are really good to prove That that is what they get in simulations too Sorry But what we can then show that if we do this mutation again Still in the bacterial channel Then we get binding in both sites And this is exactly what we see that With these mutations We make the bacterial channels behave like the human channels Then they are actually potentiated by alcohols All the way up to hexanol And just as much as I was muttering about the experimentalists In December 2010 or so Six months later we decided that we love experimentalists Because there was Ryan Hibs and Eric Groh They determined the structure of a much much more advanced channel A eukaryotic channel It's essentially a human Well okay worm but Sea elegance Now it is a eukaryote And this channel has a similar type of pharmacology It behaves similar to the human channels And the cool thing is in this x-ray structure You actually see an antiparasitic agent Ivermectin bound right between the two subunits Ivermectin did you hear anything about that? Last year's Nobel Prize Because it's one of these agents that have been used In the antiparasitic diseases Not related to these iron channels But it's a bit fun It's a bit fun when you start to see this molecule Show up the Nobel Prizes But on the one the other part Well we got a bunch of nice publications out of this But we still had this Nagging feeling that we're really good at predicting things Once we know what the result is And we really wanted to show that we really understand This binding if there are multiple binding site We should be able to really predict binding Based on simulations first Design something in the simulation that I want to do Get the channel to behave in a different way And then after that show in the experiment That it actually behaves that way So what we want to do is we want to try to design Allosteric modulation Change the way the channel is modulated But rather than doing this with that tunnel We want to do it with real molecules This is another anesthetic Dess fluorine, if I recall correctly Very common that there are lots of fluorides And everything on these molecules It causes them to bind better So we decided to try this very simple molecule we had again So this is the wild type protein I'm not going to show the entire protein here But this white here is one subunit And gray is the next subunit So in this case we had a structure So here we started out not by doing simulations But doing docking So we took all the compounds we wanted to test And in particular we also took a bunch of experimental compounds And the reason for doing that We already have an X-ray structure of what How proper fault, for instance, should bind So the question now If we do this docking without giving the computer program The information Will the computer program be able to find the docked post? So that's a positive control That's also something that sadly the people I was about to say that people forget about To do it in simulations Sadly they're increasingly starting to forget About doing it in simulations Remember to have positive and negative controls In your experiments This small, this fluorine molecule We found lots of poses And the point is docking is fast and sloppy So I don't really care about which poses The highest ranked or anything But among the top ranking ones This was supposed that we had in the crystal structure And we found some things that were very similar to this So that made us reasonably happy That if I just try to dock in things In this large cavity here that we have defined We appear to reproduce things that we had also found experimentally And here we have the gray part here This big phenylalanine And I can certainly try to force things to dock In that other cavity that's really too small And you can probably see already here right That we're going to bump right into the phenylalanine This is going to be fairly bad You can do docking with flexible side chains And in that case you can force the phenylalanine To push out so that there's slightly more room for us But even here you can almost guess That this pose is not going to be very good But here we're just hand waving We have no idea how good or bad very good is And then we can do exactly the same thing But we take that phenylalanine and mutate it to an alanine So now we just have that There is no side chain there whatsoever And we do docking first inside each subunit And then between the subunits And here already here you can imagine That there's a little bit more space there This we did purely with docking But once we've done this with it So what we get from docking is a bunch It's a collection actually I think it was 10 or 20 potential poses Because there's going to be one pose When the ligand sits there It's another one when it's rotated 30 degrees Maybe one where it sits 1 nanometer higher up So that you get a whole forest Of slightly different variations there They look roughly the same But they're not going to be identical But we really want to be able to design this We want to be able to say How good is it to bind here Versus how good is it to bind here Can I use this mutation To control whether it wants to bind It doesn't matter that it can bind here If it's still better to bind here So the question is this binding site best Or is that binding site best That's going to determine the behavior of it Right? So it turns out that there are slightly more modern ways Of doing free-end calculations nowadays And I'm not going to forget about this slide This is done ahead from a recent talk The whole point is that There are really beautiful automated ways to do this If you ever find yourself in a situation Where you want to do this Look up the tutorial for one of the programs And they will tell you exactly how to do this Same thing here Lots of mathematics in the background But in practice it's all hidden in the program So what was What type of simulations did we want to do When it had to do with free-end this We wanted to use these free-end these cycles right So in particular here We wanted to do this Take a protein And gradually disappear this This fluorine in the protein And then this has to do with a protein In a membrane surrounded by water and everything So it's very large systems And then I have to do the same thing Take this fluorine in water But gradually remove this fluorine in water It can be for It can serve, yes That's one of the reasons But of course that depends on the models So if you have very small differences In free energy and everything The water model can be very important And then we want to take the differences Between those two bars And then we get the effective Binding free energy The affinity And now there are lots of bars here Let's look at the black bars first The black bars here correspond to the wild type And here we measure free energy Of binding in kilojoules per mole And let's say in the wild type And this was the bacterial protein Then we love to be bound inside each subunit Roughly minus 22 kilojoules per mole But we can also bind Between the two subunits With minus 14 kilojoules per mole And do you see the standard errors there? They're tiny So it means that there's definitely You can argue exactly how large Those standard errors should be There might be some things we don't see here For instance if the produce starts to move and everything The whole point is that the differences Between those two bars Is significantly larger than the standard error So the difference appears to be significantly significant So this says a couple of things So first it says that it's best to bind Inside each subunit Which is good Because that's also what you see experimentally right And the co-crystal that we saw it bind Inside the subunit Which is what I thought was wrong In our homology models So what we do see here is that Simulations reproduce that beautifully Not only that it can bind there But between these two binding sites That is the The one that is best in terms of physics and free energy Do you see the difference between docking? This is not an estimate This is a calculation of it With water and everything But on the other hand we also have Minus 14 or something Well let's assume that it's 10 kilojoules per mole You can also bind Between the subunits So it doesn't want to bind in both places So what should you compare 10 kilojoules per mole to? Yes So that's a good idea There's one thing of course If you ever have a physicist out of there So what how many kT is this? 2.5 Get your units right 2.5 kilojoules per mole The units are really important So 10 divided by 2.5 So this is 4 kT So if you look at the relative distribution Of that state versus that state It's going to be E to the power of 4 So it's going to be roughly about 52 Between 50 and 100 or so So this like Virtually all molecules are going to be here Once once in a blue moon or something You're going to see a molecule up here But virtually everything is going to be down here This has to do with that These differences do not appear to be huge right? But because kT is so small When things are And 4 is starting to be Significantly larger than 1 You're going to have almost the entire population Bound here The funny thing that these channels are known for having Very strange hill coefficients And to make a long story short This has to be that In a normal simple system As you increase the concentration You just expect to see more and more and more and more binding Actually I'll come back to that in a second But this will mean that eventually As you're starting adding more and more and more Eventually you will have saturated all these binding sites What will happen when you've saturated all these binding sites? Then it will start to bind there right? Hey this is first class But if first class is full You will have to accept second class Because at least it's negative Black curve for chloroform here Much smaller difference here But it's the same relative sign And for now all I care is about relative sign Here you will probably get something like a quarter About three quarters binding here And maybe one quarter there When we take this mutation We've just mutated a single amino acid They trade places So suddenly with this mutation We're now saying that it would be better to bind Between the subunits than inside each subunit The difference here is not gigantic The difference here is starting to be gigantic So what I'm saying What the simulations are predicting here Is that suddenly it would be better To start binding here between subunits And it's only when this binding site is full That I will start to bind inside each subunit So what we had postulated And what I also I might not have been entirely clear with that My guess and this was a large bold yes We had at the time Could it be because all these channels There was a difference between eukaryotes and prokaryotes That they were either potentiating or inhibitory Right could it really be So that we'd have one of these binding sites This one in particular inhibiting the channel But this one would potentiate it This would really be like having an on button And an off button in the channel Which would be pretty cool And the cool thing is that that is the case So if you take this fluorine on the wild type This fluorine inhibits it And then we take this fluorine on our design protein Just what one amino acid This fluorine now potentiates the channel instead So this is really starting to be some very strong And again I would still say that This is strong circumstantial evidence We haven't really proved it yet That's what we're working on now That the second bindings There are really two different binding sites So sorry by the way This works for an entire series of different compounds Which is important because had it been one compound It could have been a mistake right But it really appears to be the case That we have one binding site between the subunits That potentiates the channel While the binding site inside each channel Inhibits it So what Reband we are trying to do right now Is to systematically mutate this binding site So can I selectively knock out the binding site So that it's a channel that can't be inhibited for instance And there are a bunch of complications here That both we and others are struggling with But I would argue that this actually still holds There are separate potentiating and inhibitory binding sites For these channels This is potentially super cool Because this might enable us to design better anesthetics So a problem with anesthetics And I might have mentioned this before that It's not a problem to sedate you You're young, healthy and everything You're going to be fine But the problem is that the elderly a patient gets And if they are sick maybe overweight High blood pressure and everything It's simply hard to keep them alive And then during a recovery You have all these anesthetics wearing down With different rates So what this might enable us to do To literally have pair of compounds We use one compound here to sedate you deeper While if you then want to bring you up To make you slightly more awake We could give the other compound So in theory this would make it possible To fine tune anesthesia way better Than what we can do today And then you can have fewer of the other drugs To try to compensate for these drugs Which again in theory at least Would make it possible to operate on patients That are more elderly or more ill Ed's dream is to administer an anesthetic He has designed himself at some point Which I'm not sure will happen Hopefully, well if we do succeed I hope that I'm handed designing it The other things that you can do With these channels is that Remember how I said that you have An agonist up here You have the allosteric site down here And all this interacts with Whether the channel is open or not So we've had a number of students in the group Show that you can actually reproduce this in simulations So if this is a glutamate binding channel So that if you do not have glutamate bound You have a very high free energy barrier But then when you add glutamate And in particular if you add glutamate To all subunits and run longer simulations The free energy for the ions to pass It actually goes down and down and down Here we're down to 10 kT Which is still a bit too high I think this should be 3, 4 kT But we can actually show How the allosteric modulation works And if you have these both molecules bind We drop the free energy And we not only do you see the channel Opening geometrically But we can actually see that it opens In the sense that it reduces The free energy for ion passage This is beautiful in a simple simulation setting The only problem is that this is not how it works In reality All we can do in simulations This far in these channels Is that you pick a beautiful simple channel Which is a alpha subunit That's what it's called Gly or a glycine receptor The alpha subunit 1, 2, 3, 4, 5 alpha subunits That's how we see them in the x-ray structures That's how we normally do simulations That channel pretty much doesn't exist in your bodies If we get about the glycine receptor For a second and look at a receptor called GABA Which is possibly Well, I mean one of the most important receptors in your brain There is an alpha subunit of the GABA receptor 2 There is no GABA channel in your body That only consists of alpha receptors There is a beta subunit 2 That looks almost the same There's a few differences in the sequence Very mild So it's a different gene There's an alpha subunit gene And there's a beta subunit gene So now I'm going completely crazy Why am I bothering you with this? Well, this is partly to tie up the course on bioinformatics Do you think this is important? If you knock out the beta subunit You can't sedate rats They're no longer sensitive to anesthetics And we've actually been able to Not we but others have been able to show That the anesthetics bind in the interface Between the beta from the beta to the next alpha subunit We have no idea why It's just that that appears to be We don't even know exactly how it binds there Because this is a channel We would not have the exact X-ray structure That was the beta subunit There is also a gamma subunit of this channel And this is by far the most For the gamma butyric acid receptor This is the most common form you have it of in the body There are other special differences Between these subunits Whether it's sensitive for benzodiazepines These receptors won't even work Without cholesterol in the membrane We have So some way the cholesterol We're going to have to bind to this receptor We have no idea how it does that All we know is that if you put it in a membrane Without cholesterol it doesn't work Which is one of the key differences Between eukaryotes and prokaryotes There is actually a delta receptor too A delta subunit We haven't worked on that It's not as common since that is the most common form So there are four different subunit types And if you think that is difficult That is a horrible lie Because there are at least 17 different subunit types Of this expressed in your brains The difference We're talking about the handful 10 to 20 residues out of 500 That are different So tiny variations So that if you just look at bioinformatics Any of these channels are going to have 90 Any of these subunits are going to have 90% Sequoes that enter into another So they're beautiful If you want to build a homology model Or something right But it's still There has to be some key structural differences here That give them different properties We know almost nothing about this And what people have done And now we're heading out Widely into the speculation territory But nature rarely does something Without having a need for it Right And of course it costs you something To have 17 different genes That we need to control So it's likely that the specific expression Of all these different genes And the subtypes are related Different cells having different properties The cells we have in different parts Of the brain are different The cells I have in the brain Are different from the ones I have in my spine Are different from the peripheral nervous system And everything So these channels exist everywhere But depending on what channels we express They will likely give the cells Slightly different properties Not just that Depending on how much of the channel I have expressed Right You're going to have a cell that responds Better or worse to the particular GABA receptor So this is also something Just as we spoke about the Fetal hemoglobin and everything Gene expression levels will alter biological behavior Now that leads to some very interesting things That unfortunately are not this I'll give you the simple story first If you remember that this is not true So these channels are so related to addiction What do you think happen if you Let's say that you have a normal healthy nervous system And then you start drinking lots of ethanol So ethanol will on average Potentiate most of these channels It will make them easier to open What could you imagine that the body would do Exactly You don't need to express so many channels, right? So what now happens if you stop drinking ethanol So all these withdrawal behaviors And the sad thing is that this is not true It's much more complicated in practice The point I want to measure that What's amazing is that many of these diseases That we've historically thought of as Psychiatric diseases or behavior diseases They are very much biochemical The way all withdrawal symptoms Well, if anybody has had a hangover on a Saturday morning, right? To feel that it's definitely physical It's not just in your mind Sadly, there are people that have done lots of investigations Where they've tried to study expression levels And alcoholics and everything I don't think that people have really found any patterns Nowadays, you can do all these temporal Genomics and spatial genomics You express things that function hundreds of different nerve cells They do that a lot here at SciLife Lab This is very much cutting edge So instead of sequencing one human cell You're not going to try to do full genome sequencing Of say 600 cells close to each other In different parts of the brain or something As we collect more data I bet we're going to start to find some of these patterns But for now, we know almost nothing What the difference between all the subunits are And this is just one of these channels There are hundreds of them in the body And I think I will stop there There are a bunch of people Oh, this is research I'm not behind this It's all the people in the lab in particular And Tachyla and Reba at UTex as Austin But if there is one recommendation I have to you When it comes to research and everything Go into the brain This is, well, first is sexy stuff Second, it's interesting Third, I think this is where the future is Because it's one of those The problem when it comes to choosing research topics It's so easy to pick one of these fields That either I or some other lecture Have spent three lectures talking about, right? And you know all the details But why could I spend three lectures talking about it? Because it's an area we know very well Those are not the best research topics So some of this I can't really spend more than a slide on this Because we, well, we don't really know There have been some papers published But the results weren't really that strong It's unclear We don't know yet Uncertain, doubtful Those are the keywords to look for When you want to choose PhD projects That people haven't done research It's easier to do It's easier to do discoveries That people haven't already discovered I think I'll stop there We have 10 minutes or so, right? No, we don't We have three minutes Do you have any questions for me?