 bio-physical chemistry. Good. You know what? I'm gonna have a recording of this live series. I can actually keep it right here, if you don't mind. Otherwise, we're gonna fall asleep in this small room. From tomorrow, we have a better room. So what I'm gonna talk about today is really a little bit about what bio-physics is. The reason why we have bio-physics, in particular in this modern world, where we're focusing more and more on high throughput and sequence. This course is very much the connection between what you see in large-scale life science, a disease, and the original cause of that disease. In many ways, you probably started that last semester, right, that in many cases we have mutations that's the cause of a disease. But why does a mutation cause a disease? And the reason mutations cause diseases is typically they change something mechanically, physically. You have some sort of cavity where an iron or any type of molecule or another protein would bind. And that mutation causes this cavity to change shape, so that we'll no longer bind something. And the way we normally try to treat this is by designing a drug or something that hopefully either restores the cavity or somehow causes this function another way. And ultimately to understand this, we're going to need to drill down into these interactions, understanding what's happening on the atomic level. It turns out that this is complicated. It's more complicated than we think. And the main reason for this is that a simple molecule like sodium chloride or something that's literally simple, proteins are complicated because it contains thousands, tens of thousands, or even hundreds of thousands of atoms. And the really big problem there is also we're going to need to start looking more and more into statistics. And then you might start as well, but that's more physics rather than life science, right? Well, the problem is that this statistics comes back. Virtually every single new method we develop being has to do with single cells or single molecule. Anything you can imagine is a pretty good idea that it has a single prefix today. And the problem is once you're down and looking at single molecules, even the experiment center of this complicated world of statistics and worrying whether things are correct or not. And it's a bit embarrassing if you see just the share of papers being published, either from our universities that contain fundamental big statistical errors. So throughout the course, we're going to try to combine these approaches. One part mathematics, statistics, one part physics, the one part life science. And it turns out that in the end, there are some pretty amazing things we can do with this. Just to whet your appetite a bit, I'm going to start by showing you a couple of examples here. So these are two big proteins. It actually turns out to be membranes proteins, both of them. But it's exactly their exact function doesn't matter. So what you have here on the left is a small part of the proteins, just four helices. But all these blue parts, they're charged. And it turns out that this is the reason why you even exist. In many ways, every single heartbeat when the voltage changes across nerves. So that causes that blue helix to move. And that's why you even have heartbeats or nerve signals. And the day when you were a sperm fertilizing an egg, it's actually multi-skated chance like that, of course, is they excel too close to. But the cool thing is that these are charges. And the reason they move is simply that we have a potential that's changing it. Very simple physics in a way. But of course, it's encoded biologically. What you have on your right is a ligand gate that I channel. And again, don't worry about the details. These are just examples. We're going to come back to this much later. So this is a channel that sits in the membrane. And it's part of your synaptic transmission of nerve signals. So you have something as a molecule, a neurotransmitter binding out here. The binding of this neurotransmitter causes this entire domain to somehow change shape. It's like an earthquake inside the protein. And this conformational changes pushes down on the transmembrane domain. And then the transmembrane domain here will magically open. And when this opens, you get a flux of ions through the channel. And that's why you get a new nerve signal in the next set. This is actually something we're working on quite a lot on the research side of things. Because these are the key receptors for things like all types of addiction, anesthesiology and everything so that you can actually develop new anesthetics by trying to match things perfectly with these candidates here. It's pretty cool. A more direct example has to do there. There are a bunch of channels and transporters or channels in a way. They're simple. There are all these things are in the membrane. Channels are just some sort of holes, windows in the memory. So if you open this hole, it might be selected in this case for a potassium ion, in this case for a sodium ion. But they literally just open a hole that selectively lets through some things. But if you only had that, not much would happen in your voice. The only reason you exist is that nature somehow at the end, at the equilibrium, you're all dead right. No processes can exist at perfect equilibrium. So that the reason you exist as the body is somehow is a machine doing something. And one of the most simple parts of this machine is this purple thing in the middle. This is a so-called pump, which is literally a small machine that pumps sodium against the gradient and potassium diagram against this gradient too. So how do you think that works? How can you do something? So this does something that nature would not like to do naturally. I can the body achieve that. Well, these are the channels. They the channels move things to equilibrium. So normally you have an excess of potassium ions on the inside and an excess of sodium ions on the outside. So the channels that just open up and let the ions move in the direction they would like to move. But the pumps does the opposite. They move potassium in the direction we already have more potassium ions. Yeah, it's like against the gradient. So then it needs energy to works. So we need to get energy somewhere. And where do you get the energy in your body? ATP. Yes. So this is a sodium potassium ATPase. So it uses ATP that binds magic happens. Pull this in and I'm a close colleague of us. They've actually determined some 14 different states of this. It's like literally like a machine piston going through lots of motions. It turns ATP into ATP. And while doing so, it's using that energy to pump the ions. We'll come back to exactly how these machines work. But this is super complicated. When I was 10 years ago, we knew virtually nothing of these apart from the fact that they work than they used ATP. 10 years ago, we had that picture. Today, we know the structure of this. There is a small ATP molecule and there is one of the structures of these proteins. You can start to find that these are getting pretty complicated. And this is not me screwing up. And the reason why I have this upside down is just that here is the cytoplasmic side and there's water and I have this in the right direction. The membrane would be the opposite side. So that it's using ATP on the inside here to somehow as we're binding different ions or something, we're changing the confirmations. And the end result is that you pump both sodium and potassium, two potassium inwards and three sodium outwards. How frequently do you think this happens? So how much ATP do you use in that day? Close to 100 kilograms. Well, maybe that's better, but at least your body wins. So that when you think about these processes and again, that's that's in the order of 10 to the power of 25 molecules. It's completely insane. Because every single site molecule that force you have to restore it from ADP to ATP, right? And another place of cell. So the whole energetics of the cell is super complicated. And the reason why we do this is that you can think of this like a small condenser or almost a battery so that by pumping these ions to different sites, that enables us to later simply use. This is a slow process. But once we have this different concentrations of ions, then we can use these ions to very quickly open things. So this is slow, but these have to react for every single nerve cell. And that's why it takes like 0.1 seconds for you to move a finger or something. That's a cascade of these channels opening all the way from the brain after your thing. So we'll come back a little bit of that when we talk about membranes and how membranes work. But this is true physics. This is another close friend. You know what this is? You all have this in your voice. This is hemorrhutamine, which is the protein that is acts like the drill than a virus infects your cells. So what this protein does, it has some fusion peptides out here. And this fusion peptides drill down in the membrane of the cell. And when you drill down the membrane of the cell, this causes the cellular membrane to fuse with the viral membrane. And then the virus will let out its genetic information in the nucleus of the cell. It's a, of course, it's a stupid, simple molecule, but this is the entire reason why viruses are so amazingly efficient. The problem with these proteins is they have to go extremely rapid. Evolution, so that ideally, of course, would be simple to get something to bind to this protein, right? The only problem is that when they mutate, whatever. Whatever new drug you have or something or a vaccine is no longer going to be able to find the. Well, if you have a vaccine, you will basically teach your body as immune defense to recognize a part of this protein. The only problem is that the virus will change its form of the protein so quickly that, say, a year later, it's not really going to have an effect anymore. On the other hand, it's not magic. So what people are doing, colleagues of us in Virginia, you're going to realize that I tend to steal pictures from my own research. It's so much more efficient. What they're trying to do is to find the parts of this protein that are most important for the protein structure, because you can't change every rescue. The reason why proteas have the structure they have is because they have the amino acids there, right? So that there have to be some places here that are more important for the protein. And the question is, can we target those and find those sites? Then you might be able to develop new antiviral drugs. And there are a couple of drugs on the market that try to achieve this. But that's nature. This is nature in a way, but it's man-made. So this is a protein for artificial photosynthesis. It's much, much simpler and more efficient than natural photosynthesis. But I think right where we are right now, you would still like to get the efficiency up by an order of magnitude or so. Now remember what I said about energy, right? That if you could get this, so this could essentially turn into a super efficient solar cell, that you would use a biological molecule to convert energy, sorry, convert sunlight into energy, rather than doing it with the classical silicon cells. And then again, these things are on the market today. People use it. How do you think you create something like this? Well, so these are the cytochrome domains. So this is something that doesn't exist in nature. So you can't really copy nature directly. But in some way, you're going to need to engineer a protein fold that binds the right co-factors, in this case, the cytochrome. And somehow provides exactly the right surrounding so that we get the synthetic light process in stoker. It's hard. It's very rare that we achieve it. But again, it's getting more common. Today we see papers about this in nature and science every year that we manage to, abinage a front scratch, engineer a function biologically rather than do it with the silicon or something. Hugely based on computational. So it turns out that some of you probably know there are a bunch of different proteins and times that's basically small biological catalyzers. Things like gene expression. It's always a protein and its structure regulating it. Skin, hair and everything is structural proteins. Myoglobin and hemoglobin that bind oxygen in your muscles above, respectively. Transfer proteins, receptors like these iron tunnels. We're not going to go through and cover every single of these classes, but the idea throughout this course, I want to get to thinking about what do these proteins do and how do they do what they do. An enzyme, for instance. What's the definition of a catalyzer? Most of you have taken some of my undergraduate chemistry. Yes, but you don't consume the catalyzer itself, right? So what that turns out is that, as we're going to talk later, there's a concept called free energy that basically describes how expensive it is and how much energy molecules need to get during a reaction. What limits the speed of a reaction is that you need to get some very unfavorable high-energy state in the middle of the reaction. So what these, what ends up with catalyzers in general, the protein catalyzers in particular do, is that if you have two molecules that you're somehow going to fuse or something, this protein needs to bind these molecules in a way to make this really unfavorable state slightly more favorable. So then you reduce this maximum decan energy and then you would magically get these two molecules to bind and they can release again and you would have a T-efficient catalysis. This occurs all the time in the body. And we're going to get back to that into the three slides. So what this course is about is very, we're going to try to understand nature for the simple principles. That relates to something that I think I say further down. Yes, Wolfgang Cowley had a famous quote there, you can always simplify as much as possible but never more. This is the hard part here and that's kind of what we want to teach you during the course. Simplify, simplify immensely. It actually turns out that imagine anything high throughput sequencing. How do you detect those sequences? Well today it has to do with corrections in the future it might be graphene or something. But the way we achieve this is ultimately sitting down with paper and pen, possibly a computer. But to be able, these molecules are so complex there is no way you can handle this myriad of proteins and everything. So what you do is that you create some very simple ideal situation and then you try first order model, something very simple if you, by the way, almost everything, what you think would happen. And then the idea, you don't rely on that on being right, but that model can be so simple that you can imagine what would we need to do. And then you do that and then you can go in the lab and test it. Did it work or not? So the idea with most of the things I'm going to bring up is this is not going to be about making predictions that are accurate to the level that they can reproduce experiments. Occasionally they can and in particular in things like docking, early stages of drug design, we frequently do that today. But I think that the real power here is really about thinking about the why question. Explaining complex processes from simple interactions, you can argue. A bunch of this is going to be a macromolecular structure. I'm a protein person and this building is pretty much about proteins. So I think we're going to focus way more on proteins and extra measurements. We're going to be talking quite a bit about measurements of fluctuations and what I might not have said here. But in fact, we're going to relate this to physical concepts. Why do things happen? And can we tell something about how fast they happen? So why? The why question is probably these hander or focus point of the course. And then we're actually going to have you sit down and do a bit of program with super simple models. And eventually they'll also do computer simulations of proteins and some pre-existing programs. This is still very much cutting edge, but 10 years ago when we started nobody was interested in these things. And today you can't even see any nature science paper you see about a new protein structure today. They will also run some computer models on it to see what happens. We think an ion would bind here, but does the ion really bind there? And that we're testing computers today. So computers are taking over the lab in this space just as they're taking over pretty much everything else in society. When it comes to proteins, this is pretty much going to be about three things. Protein structure, protein folding, or understanding why proteins have the structure they have, understanding their stability, and in particular understanding the relation. How do we get from sequence to structure to function? This is a really famous triad in protein science. We're going to be talking a little bit about predicting protein structure, but not so much from the bioinformatics point of view. It turns out if you just have a sequence and want to predict what it's like, what a bunch of you just did in this course, you're going to design a predictor using just a sequence. There's nothing that comes close. But that only works of course if nature already has something. What if you were going to design a new protein to do something? Something that doesn't exist in nature, an enzyme that for whatever reason nature is never constructed. You're not going to be able to construct that by copying nature because nature didn't come up within the first place. Then you're going to start to think about the structure from scratch. It's engineering. It's hard, but occasionally it works. Before I really started, we're going to run this at 100% pace. I know that you have a lot of other things, but at four lectures per week, the one thing I can recommend to try not to fall behind. I am frequently busy, but when I teach I pretty much reserve the next five weeks for you. I'm in my office on Alpha 6 pretty much all the time. I can basically respond to e-mail at least until midnight. Use me. Use the TAs that you're going to meet this afternoon. And if there is something you don't understand interupt me. There is absolutely no point in trying to over well. Everybody in the class probably understands this. I'll try to read it in the book tonight. If you don't understand it, I can promise you that there is at least one more person in the class who doesn't. We're going to run computer practice not all afternoons, but say two or three per week. The idea with this is to give you a bit of a hands-on experience. We don't expect you to be programming wizards. Some of you know Python really well. The other ones from the biophysics program have you programmed before? I mean information. Good. I think that's going to be enough. In particular, we're not going to grade the practicals a part of the practice. We want you to hand in a super brief report max one page. Have you reflected about what you learned? Did you learn something here? Did you understand something? Is it something that worked or did not work? And we want you to hand those in within 72 hours or something just for you to be done with it. The actual grade is going to be based on a written exam and we'll come back to that later. Tentatively, the only date that's important for you, well, there are lots of dates that are important, is probably Friday, April 29. Does that work for you for a written exam? I know that there are a couple of you that have planned travels and everything, but that's a good part here. Since I'm so busy myself and I hate it when people want me to be in a specific place, we're going to give you the same freedom under responsibility. Things will hopefully be recorded online. These practicals, you can do yourself. You will even get, hopefully get logins here so you can log in to our computers remotely. As long as you learn what you should learn and do what you should do. I like having you here, but there are other things in life too. So if you need to go away a week or so for your own sake, tell me because I might have something important coming up and then I can try to accelerate things so we give you the stuff, the reading material ahead of time. But for the written exam, it would be good if everybody could do that at the same time. Otherwise, we're going to try to be super flexible with it. Eric, we saw that there's a bioinformatics conference in Copenhagen that week and some of us were thinking to go. It's on 26 and 27. Okay. Does it work with the exam on the 29th? For me, it would work, I think. But I don't know if everyone ifs. So in principle, I could certainly do an exam on the weekend too. That's fine for me, but I try for you to avoid planning anything in the weekends. So here's what I suggest. You all think through this tonight and then we'll decide the date tomorrow. We can certainly do it on the Saturday or the Sunday too if you prefer that. And in general, the last week here, I've deliberately, even that Monday or the 25th, that's kind of my reserve time. So I expect the last three four days of the course, that's going to be not free time, but that's going to be study time for you when I'm not going to bring up new things. There will be some Q&A sessions if you want to ask me stuff. But we're not going to run full pace until the very last day. But think about whether that date works or if I should move it to the Saturday. The course literature I already talked a bit about. The main reason why I'm going to push you to reading is the second eye. You're getting, you might not think about this now, but you're getting very close to your degrees. So within a year or so you're going to have a degree. And at that point, you're likely not going to attend any lectures in life anymore. That's fun from one point of view. The type part at that point, the only way you're going to learn stuff is by reading. Reading, well, reading books if you're lucky writing, reading scientific papers were far worse written in many cases. And that's why it's important to get used to this point of reading to learn. You're going to stumble when you read. And the idea that we can talk about is my ideal scenario is that if we spend half these mornings just discussing things from the last day or ideally the current day and then I'll just skip through some slides. Overall, you're going to need to read the book to pass the course. But it's also, when it comes to the topics and everything, the topics I cover here are the topics we're going to focus on. If there's something we haven't even mentioned here, don't worry too much about it. But in many cases the book covers things in a deeper way than I have time to do in the lectures. And that brings us to the real stuff that I'm going to go through today. So what I'm my setup the first week here, we're going to start from the top and then drill down. So we're going to start talking about proteins. We're gradually going to see how amazing proteins are, but also getting from the large structure down to the very simple atomic properties and interactions and seeing what proteins really do. We're going to start looking at the architectures and the elementary interactions of proteins. Sorry, at the last point I'm not going to cover today at all. Tomorrow I'm going to start to connect this a little bit more to statistics and physics. So tomorrow, and I would guess that this week, if you have a biology background Tuesday and Wednesday I like it going to be the toughest days and that's also why I'm going to read that material ahead of time. The cool thing is that all the high level stuff here is really governed by physics. And there are some very simple rules that you're going to learn that will help us a lot in understanding why things happen later. So there is something called the central dogma of proteins and that's the thing that I mentioned before that sequence leads to structure, leads to function. And you might think that this is obvious but it's I would argue that it's not as obvious as we occasionally think. In particular the converse is not true. A protein that has a specific function you might guess that some structures are more common but function does not determine structure and structure does not determine sequence. So in this case the sequence of this protein is the amino acid. This is a very small protein called the villain headpiece and take a computer simulation of it. The headpiece here doesn't really do well all by itself but the point of having this movie is to show you that this is relatively flexible for a protein. Proteins are in general hard but they're not they're not hard as rock, they do move a bit. And the motion and the structure that this protein has folded into it's really what's going to determine what it can bind, how it works, how it moves into the function test. For instance on the right this member protein that I talked about the reason why this one binds something up here is that you have a bunch of amino acids around the binding site that have been perfectly designed to bind, say, acetylcholine. Acetylcholine receptor is a very common ligand data ion channel. And then you have a bunch of amino acids that helps us connect the signal and then deep inside here the pore that conducts ions. Most ion channels only conduct one type of ions, they're selective. Say only chloride or only potassium or something. And the reason they only conduct one type of ions is again that we have a very specific set of amino acids that have evolved to be a channel for that specific type of ions. So that brings us to amino acids that you probably know all better about than coming from life science but I'm going to go through it anyway. How many amino acids are there? Sorry? Yep, they're way more. There are 20 essentially amino acids or alpha amino acids. Normally we're going to talk about these 20 but it's actually important to know that they're one of the most important in the nervous and active gamma butyric acid which is not one that you would typically have in your genome. Occasionally we can encode special amino acids in the genome which can be remarkably useful I'll let you know why in a second. The special thing with an amino, the amino acids are these are the basic molecules of life and for a very long time the entire reason we divide chemistry in organic and inorganic chemistry is that for a long time researchers were convinced that life science or organic chemistry would then define it had some magic life science substance in it. It's not just atoms there's something that gives it life. This of course we know now is completely wrong but it was a good idea at the time. Amino acids as you can probably know they consist of N-terminus with a simply nitrogen and three hydrants typically positively charged. You have a carboxyl group with a COO-negatively charged you have a hydrogen and then you have a side group that can be pretty much anything. And again for the 20 normal amino acids we have we know what these side groups are but there are others and they will have other side groups too. Another very special there are two more special protein. Amino acids are so-called swinearionic is that these you really have an ion here you have a plus one charge here and a plus minus charge here. I don't think that we're going to go through details but this can actually be important when you try to model these or if you use them in electrophoresis or anything. Amino acids are charged but their charge depends on what pH we're doing to experiment that. In many cases there's a side group that will change this charge but you can actually get both the amino and the carboxyl group to change this charge too depending on pH. That's what the zwitter ionic is. Zwitter ionic? Zwitter. Zwitter comes with 5 in German. It's a double ion. Good question. The other special thing with amino acids is that they are so-called stereo isomers or they're chiral. And with that you take most amino acids and you take a mirror image of it you can't take the molecule on the right and rotate that to become the one on the left. For this to happen in chemistry this is a property that's not specific to amino acids but you need some sort of atom in the middle with at least four bonds and then you need to have four different groups and you can by far the easiest thing is probably I have a molecular building block kit that I can upstairs I think unless I'm displaced. I can see if I can find that and bring it down. But the point is that no matter what we do this is one amino acid and that's a different one. They have exactly the same chemical formula but they have different physical properties. Normally if you took undergraduate chemistry what people will tell you is that when it comes to stereo isomers or isomers that they have exactly the same chemical properties it's only the first except that matter. For instance they turn polarized light in different directions. That's wrong because we're not the simple chemistry anymore we're in life science. All these enzymes and everything they also consist of amino acids. So an enzyme that does something you can't replace a left-sided amino acid with the right side of the amino acid in a protein. There's nothing that will work. No enzymes. We can't incorporate them. So in life science even the chemistry depends on this chirality. All the amino acids that are encoded in your genes unless you're somebody very special no I've never heard of that. All genes code for L amino acids. The exact definition of L and R you can look that up on the internet if you want to but it's not important from this point of view. It's not important because it's not the first time that you're going to be in the same area or rather almost all of them are 19 out of 20 are chiral. So which one is the exception? Yes. Yes. Why is glycine the exception? Because it has to hydrogen. Yes. The glycine has a hydrogen and a hydrogen right and that fails the condition that you need for different groups. And in theory chemistry you could it's actually certainly happen. This called the racemization where one stereo isomer turns into another one. But here I said that it never happens. They will not convert spontaneously. Technically that's wrong with done in practice. This is something we're going to come back to a lot in the course of energies. In principle this will happen like every billion years or so maybe every thousand years. The point is that this will happen. But the energy for it to happen is so high that in practice it will not happen on a biological time scale. Because you're going to be dead before that happens. Actually it might happen to one amino acid in your body but nature will take care of that. You're not going to see anything of that. And that's a problem right because normally in physics we only talk about rules. Well we talk about rules that are absolute in a way. Either something does happen or something does not happen. The problem in biology is that we always have a time scale entry to. Does something happen so quickly that it's relevant biologically? Technically I might tunnel through that wall. It's not an infinite energy. It's a finite energy. But I know that in practice it's not going to happen no matter how long we wait, right? And that's for, so biologically we say we can't do it. So many of these things that I said that will not happen or something is because it's not going to happen on a biological time scale. So why do I bring this up? Why could it be useful to ever use something would say either an L amino acid or a non-natural amino acid picking one that is not one of those 20? Can you imagine any point where this would be useful? What was the question again? So that, now I'm talking about this from a physics point of view and you could of course are, yeah this is a curious this is a curious minor fact that there are some amino acids that are different from the normal ones. If say if you pick D amino acids or if you pick amino acids that's not one of those 20. Why would you ever want to use one of those? You could make interesting structures, but there's something else that would be interesting. So imagine that you design a new protein something that an empty cancer drug or something blockbuster. It would potentially be a hundred billion industry. There's only one problem. This is a protein. How are the patients going to take it? What happens? I'm getting digested. It would be digested. Do you now have a protein that you have to inject? Sorry, your hundred billion company just dissolved. This is sad. It's not science, but no company wants drugs that have to be injected because it's complicated. It requires a doctor's visit and everything. It's far better with a drug that can be orally taken or just as a patch in your skin or something. But the digestion, this is enzymes. They're proteins. And the whole key thing with these are locks that don't fit in the normal locks. So by using non-natural or de-amino acids, we might be able to create a protein that is not really compatible with the normal proteins. Now normally that would be bad, right? But in this case, not being compatible with the protein that tries to digest you is a really good idea. We would then have a protein that you could take orally but it would not be digested. The problem is, of course, you would still need to make that protein compatible with the model of the other protein you wanted to interact with or something, but this is the reason. It's actually used quite a lot both in experiments and the protein design to see whether we can do other things. The problem is that it gets kind of expensive because you can't do it with normal gene technology. You need to use trace. Since you know a bit about proteins, I'm not going to go through that in too much detail as for amino acids. There are a bunch of different ways to think about amino acids. You're probably aware, well, you can't read this. You can think of an amino acid as a collection of atoms. That's certainly right. You can think of some sort of space-training models. Those of you taking the probiotics course will probably possibly be aware of thinking of how you're going to create, build a model or something. If you have a very small cavity, or if you have a small cavity you have a large protein sticking into that sort of large amino acid sticking into that cavity. If we take this large amino acid and replace it with something much smaller, you effectively created a larger cavity. That's going to influence what ions or other small molecules can bind into the cavity. You can also think of a protein in terms of some sort of electrostatic potential. In this case it's an arginine, so it's going to be very blue up here, which means that we have lots of charges, electrostatics. There is not one way that's right or wrong here, but I think it's a very good exercise to start thinking of proteins in terms of classification. Some proteins are hydrophobic in the sense that they're not water soluble. Exactly why they're hydrophobic. You don't know yet. At this point it's just a matter of act. Tomorrow or possibly later in the week we are actually going to be able to understand why some things are hydrophobic and some things are hydrophilic. And it's not as easy as you think. Many any amino acids are certainly charged. There are some that are positively charged and some that are negatively charged. Basic and acidic. There are some that don't have charges, but they are quite polar and like water-locked serine 3-in-ine aspergene glutamine. There are a bunch of amino acids that are hydrophobic and then there are some special ones like cysteine which can form bonds between the two softwares here with the different parts of the chain. Glycine which is really small, the one with two hydrogens and then proline which is proline is technically not an amino acid but an immuno acid but nobody cares. Everybody calls it an amino acid. And the reason that proline has this ring that connects back to the nitrogen which means that we don't have the normal hydrogen on the nitrogen when this is a modern chain. We'll see why that is special. I don't expect you to know these by heart. You should know a couple of the important ones. You should at least be able to think in terms of classifications so that for instance if you need something, if you would like a cavity to be more hydrophobic you should at least be able to look up a couple of pseudobily amino acids based on their hydrophobicity. Nowadays actually do you know these by heart but I'm old enough that when I was your age we still used the three-letter abbreviation of all the amino acids all the time. Nobody does it anymore. So today is a single letter because we used an enormous amount of amino acids. But be aware of a couple of different ways to classify these and I think that's a good exercise for you to start reading in the book. Where do the amino acids come from? How do you determine what amino acids we have? Yes. Good. You're a life science people. I don't expect you to know this by heart either. I certainly don't know this by heart. I think I've used this twice in my career. Here is one trick question here. How many different codons do you have? Four by four by four, right? Sixty-four. Last time I said we do not have 64 amino acids. We have 20 amino acids. Why? Yes. But why do you have these redundants? To prevent error? No. The funny thing is that some amino acids try to prevent for instance only have a single codon. You're not going to prevent error because the second it says C-A-G in your genome. It's not like we have a backup elsewhere that we double take with. Yes. First, we don't know this, right? This is only what we can observe. But if you look at the relative abundance of amino acids and proteins, which have amino acids to be called for, their abundance appears to correspond perfectly to the relative occurrence in the codons. So someone like Arginine, which is a positively charged amino acid is used in a ton of places, there are at least four different codons for it. Tryptophan, which is this complicated amino acid with two rings. There are a couple of cases we need it, but it's not very frequent. Nature has decided we don't need that a whole lot. Lucene, small, simple, hydrophobic amino acids used a lot, proline, a lot too. For whatever reason Nature has decided that this is likely the relative frequency of the meat. This is the official story. And I'm not going to have some sort of crack not conspiracy here. But the other, to be honest, we don't quite know. I was visiting a colleague in Tel Aviv a couple of years ago and it turns out that some of the hydrophobic amino acids here, they're going to be very common in membrane proteins and we'll get back to that later how membrane proteins are formed and everything. The overall idea which I too subscribe to is that membrane proteins are determined by the translocon, et cetera, if they're hydrophobic enough each helix will be inserted in the membrane. But his argument was that if you look at some of the hydrophobic amino acids in particular, there are differences in the third base here and he argued that some of these key differences in the third base would actually cause them to bind to some cofactors that would be that in turn would bind to ribosomes that are bonded to membranes. I have no idea whether this is right. I'm not really sure whether it's published. But the truth is we don't know. So this could very well be a way for nature to say, well all of these are arginines, but while we're constructing these arginines in theory we could use this difference as the RNA to target them to slightly different protein factors. We'll come back to what those protein factors are. This could potentially be a great bioinformatics project for somebody in the future. The only danger is that it could of course be completely wrong and that there are no paths whatsoever. Overall that's something that I think you should learn. One of the bad things with most sports books is that they give a picture of biology, physics, and biolibescience in particular as being finished. It's not. So most of these questions, when you think that's a good question, we don't know what the answer is. They just start digging. There are more things we don't know than the things we do know. So what happens with these amino acids, this is the key part of life, is that if you have two amino acids, you have the 08 part hydroxyl there and hide it in from the next amino acid. They can go through a polymerization process. So it's H2O. So they release water and then you're going to have a polymer that the two amino acids have bound together. This is not a spontaneous reaction that happens very quickly. You typically need an enzyme for this to happen. So this is what goes on in many of these protein factors that we have RNA, messenger RNA, and transfer RNA coming and carrying one amino acid at the time. And then with enzyme around them that causes these peptide bonds to be more. This was first discovered by Emile Tischer in 1906 that proteins are a polymer of something. But a polymer of something, you of course having started bioinformatics and everything, you're going to think that it's completely obvious that a protein has a specific sequence and everything. So this is a bad thing with having lecture notes that it's not going to be fun, but it was actually not until 1952 that Fred Sanger proved that a protein, in this case insulin, has a unique sequence. With minor well, at that point we weren't really even aware of these minor variations between species and everything that came later. But what we really understand that a sequence of a protein uniquely determines its function, 1952, Fred Sanger and Sanger Sequencer. This is a pretty remarkable paper and I actually have some of these paper I put up on one though. If you're interested there are actually at least three papers that I put up today, all three of them are great in different ways. If you have time I would encourage you to at least glance through one of those papers and we can talk a little bit about them tomorrow. Today with all the high throughput acknowledging everything, I think we occasionally forget how amazing these early results were and if you see just the work they went through to sequence a single protein and they used the three letter codes for amino acids everywhere and then you imagine the the pace at which we can sequence things actually just across the corridor there and the facility here. Here we're talking about hundreds of billions of genes per month. These peptide bonds, the yellow ones here are very special. If you're a chemist and you know all about quantum chemistry there are lots of reasons why these bonds have special properties. This course is not about details of organic chemistry so I'm going to skip that a little bit. The book might actually talk a lot about different types of hybridization and everything. That relates to one of the previous slides I has. The reason why bonds are formed is because electron wants to be in a favorable state. Sometimes these favorable states correspond to elections delocalizing so rather than just circling around one atom they're going to circle around two and that's what we form a bond. However, depending on what these atoms are and how many of them are involved occasionally they they form pairs or triplets and that's the reason why sometimes have these tetradial shapes around an altercarbon and otherwise you have these triangle shapes that are typically already having a double bond involved. And it turns out that the peptide group is an example like that. So this peptide bond it's going to be very rigid or planar. There are lots of bonds in these amino acids that we can rotate but the peptide bond we typically can't rotate. And it's a bit complicated because the peptide bond itself we usually don't draw as a double bond but it says this whole quartet of hydrogen, nitrogen, carbon, oxygen that will cause that bond to be very very stiff. Now what you see here is pretty much the same as the previous slide that they form a pair and they release water. In practice what you need to know, peptide bonds very very stiff, they hardly ever rotate. We'll come back to that. So how many amino acids do you ever assemble in a protein? What's the size of a typical protein? Yeah, 50 would be very small. Physicists or physical chemists people who love to tinker they kind of argue that they can create proteins like 20 residues or something. Most chemists would probably call that a polypeptide. There is no sharp limit between anything with more than one amino acid is called a polypeptide. Typically we draw the limit somewhere when you start having some amino acids or residues as we call it, that are buried. That's some that are not exposed to water and then you somehow already, then we have a structure that's large enough to call it a protein and that would start somewhere like 25 but 25 is the extreme minimum. Even 50 is a very small protein. And as you see on the next slide, this just keeps going up. There are some insanely large proteins with 30,000 residues and that is part of your muscles actually. I'm going to show you an example of that in a couple of cases. The reason why we historically, most pictures of protein you've seen have been the small ones and that's simply because their structures have historically been easy to determine. Almost all the really large ones here correspond to noble prices well not to be yet but RNA polymerase, certainly in overpriced the ribosomes that we're going to look at later have been in overpriced for the work and everything. So it's insane how large proteins you can have with these molecular weights of half a million dollars or something. Typically we divide these in three classes. I'm not going to talk so much about these classes today, well maybe a bit but since the book does it, I figured I should cover it at least and then we're going to come back to this after the Easter break. The first class is the most boring one that we're not really going to talk about. Fibrous proteins, nail, hair, skin, occasionally bows too but the most important part of bows is taking long proteins. These are very boring structures they're long repeating and everything usually lots of bonds in hand and we're going to come back and actually look at the structure of your hair is made wrong. You typically can solve it and then by definition they're very strong. They're not particularly interesting for us, I'm usually going to skip them. By far the most common proteins are globular or water soluble proteins. Think of these as small bolts that are dissolved in water. They can be super complex in the way that there are no simple rules for those structures. You saw that in the bioinformatics course. They can have helices, they can have beta sheets, they can have combinations of them, they can keep pretty much everything and they typically have lots of small groups bound. But this makes it very interesting and interesting for that reason where I think we're going to spend most of the time looking at globular proteins. Although this is another problem when I was your age we pretty much only cared about globular proteins. Why? Well it's worse than that. So when I was maybe, well no about your age in London in the 1990s there were I remember Stuart Forsy and my lecture physical chemistry had us strike out the line and the companion because he said that, well in the companion it was written that we now know 50 protein structures and that was 300 instead of 50. So how many protein structures do you know today? No more. I would say close to 50,000. It's so many that I should know exactly but I didn't even bother looking it up. It's a number that's protein data bank has goals that I know will keep track of it. But in my days virtually all these protein structures were globular proteins because it was super hard to determine the structures of membrane proteins. The first membrane protein structures were Nobel prizes too. So that's something that has changed us the last 10 years. Now we actually starting to have a bunch of structures of membrane proteins and the difference here is that the membrane proteins they don't exist in water but they are in this lipid environment in the cellular membrane. That's a problem because something that likes to live in oil is not going to be very easy to crystallize and that's why we can't determine structures of them. Exactly how you determine the structures of them. There's a long tab we can get back to that later but typically you try to attach an antibody or something to this membrane protein and then you essentially crystallize the antibody but you're not really interested in the antibody but as part of this crystal you also get a small appendix which is the protein you're interested in but that can fail in quite a few remarkable ways. The important thing with membrane proteins is that they are because they are in the doors and windows are hard cells they're super physical and super functional. They correspond to nerve signaling and they correspond to energy factor they always do something and it's very clear what they do. Globular proteins are much more subtle better or worse. It's 10 am right now. I'm going to give you to either it's probably easier if we run another 20 minutes and then I give you a slightly longer break and then we continue about an hour until lunch because we take two breaks both of them are going to be really short. So if you look at the assembled proteins an assembled membrane protein and here you have a hemoglobin subunit. Hemoglobin actually consists of four of these. Here you also see something small in red which is a proto-porturin actually but when you have a proto-porturin with iron in it it's called the helium group and that is the group responsible for binding oxygen in your blood. It's actually the reason why your blood is red too because you have the iron here and when light hits this you have tons of these proteins in blood. You have a slightly different protein in your muscles called myoglobin very closely related and that's something you should think about. So if you have hemoglobin binding obviously oxygen in your lungs how will hemoglobin release the oxygen to the myoglobin? It would be very slow. Enzymes in theory and enzyme could work. It doesn't work that way. Myoglobin has more affinity for oxygen. Yes, but that's not easy right because it would be easy to have myoglobin have higher affinity to bind oxygen stronger if hemoglobin was bad at binding oxygen. But if hemoglobin was bad at binding oxygen it would not really bind a whole lot of oxygen in your lungs and that would be remarkably beneficial. So the parameter hemoglobin is excellent at binding oxygen. Myoglobin is also excellent at binding oxygen. But somehow hemoglobin has to be a Dr. Jekyll and Mr. Hyde. When hemoglobin is in the lungs it has to be insanely good at finding oxygen but when it's suddenly out in the muscles it has to switch character and suddenly be bad at finding oxygen so that it gives it out to myoglobin instead. We'll cover that later in the course. But these are small proteins. One of the examples is the tycheen I mentioned. So this is part of your muscles. You might have covered this a bit in the course at the Carolinska but if you look at a muscle fiber or muscle bundle deep in the fibers you have something called a myofibril and now I'm skipping over a couple of levels here anybody at the Carolinska Institute would kill me but that's fine. And far down in these fibers you have something called a sarcomere. And again, this course is not about muscles but if you keep digging in here there are a bunch of different parts of these sarcomeres and in particular there are a bunch of different proteins and one of them is called myocene that has to do with part of this and then there are other parts very close to it where you have both actin but also some strange chains of a bunch of small proteins here and it turns out that this chains is 244 different domains typed together in this chain. So it's a really, most proteins would somehow coil up and form a ball or something, right? But this one is pretty much extended so you have one domain up to the other and a bunch of these domains look something like this so you have a chain coming in here and then a bunch of beta sheets in this case going back and forth we'll cover what the beta sheets do later and then we'll continue with another domain going back and forth, well done. So what these do is that every time you move a muscle by force here each of these domains will actually unfold and then when we no longer have a force applied they will refold again so this one gives muscles its elasticity it's a protein and it folds and unfolds in like microsecond or something there's this entire reason why muscles work and that's why it's so gigantic, right? It's these 244 different domains that gives us the amino acids in total and nobody has determined the structure of the entire protein but we have structures of the subunits and that brings us to something else, how hard are proteins? Well, the book says something I would not necessarily agree to 100% but overall if you look at a single domain that is kind of like a small football or something but not quite proteins do move they're hard in the cells that it's not like a squishy ball they deform them but don't think of a protein like crystal the book might even have said that sorry, a solid protein behaves like a crystal that's not quite true that's the way we see protein structures but there's a problem with that how do we determine protein structures? the crystals? yes, I'll get to that in the next slide the protein structures are typically determined at around 100 Kelvin and I'm not sure with you it's like a cup of coffee which is definitely like a crystal but if I now pour liquid nitrogen in that and put it at 100 Kelvin suddenly it's going to be like a crystal that doesn't mean that coffee behaves like a crystal so the scary thing here is that we know relatively little about it this was one of the first strong results that people achieve with simulations in the 1970s that we actually realized that proteins move and so five years later we can see that and this is called B factors in different parts of the crystals are and for larger proteins this is even more pronounced, multi-domains that I showed you, that's entirely flexible and in general the more domains and the larger your protein is the more it has to move that's the entire reason why you need those multiple domains if it was just something small it costs time, energy and everything for nature to build something larger so if you can achieve something with a small protein it's much better and that leads into a greatly gigantic protein that we somehow need it and that's either because in the case of nails skin that we need to cover a large area or in the case of these machines that we need to achieve something very complicated and the complicated part usually has to do with motion and that leads to some key elements of folding in particular, a protein when we say that a normal protein a native protein or something it is well defined in the sense that it might be a bit flexible but the amino acids the neighbors with which an amino acid interact are usually constant it's the same, that doesn't change and that means that a protein is it's kind of stable, you keep heating a protein it will work, it will work, it will work until it no longer works, until it denaturates and that is usually a very abrupt process so it appears that the stability of a protein is like all or nothing, like a light bulb it's either broken or not, it's not kind of broken like the projector here that's wearing out and that brings us to how we determine these structures X-ray crystallography this is not really of course an X-ray crystallography but it's important to know about this because there are some amazing things here but there are also some large shortcomings what this is, is a tiny crystal and they have some sort of folder typically to be able to determine the structure of proteins we need a couple of milligrams or something and that can be a gigantic challenge because we're expressing purified amount of protein we're getting much better at that now some 15 years, 20 years ago you could pretty much never achieve that for a membrane protein, but today we can the only problem is that there's not enough to have a lot of protein you also need this protein to form a crystal I'll tell you why on the next slide no, actually no, it's going to be on the next slide but the problem is that many things won't crystallize in particular membrane proteins, they are in oil but if, assuming that you were really lucky and you could create a small perfect crystal where you have billions and billions of perfect copies of your protein sitting next to each other in some sort of crystal then you can use the synchrotron and this is a brand new synchrotron down in Lund called Max 4 that it's actually not in use yet but they turn on the electron beam this fall so it's just a couple of months old this is going to be one of the brightest synchrotrons in the world, so if you're interested in working structure work, you'll likely have a bunch of opportunities down in there so what you do with an X-ray is that we have our small sample and then we shine an X-ray on it and that, in principle is a topic for an entire course but if you've ever been a kid and played in water, you might realize that if you pick your fingers or something in water or just play around with it you create waves, right? and depending on what you do and how quickly you can get some of these waves to either cancel or amplify each other sometimes you can cause some waves to be larger than others and you can think of this, if I keep picking here there are some circular waves going out and they have another wave going out here and at some points these will start to intersect and at these points where they will intersect when they are in phase that both of these are going up I'm actually going to get a signal that's twice as high but where they're out of phase they will cancel is that it's not just a single molecule we're shining at but billions and billions and billions of molecules and that corresponds to having billions of fingers pointing in the water at one time so that depending on the relative orientation of not the atoms actually, but technically it's the electrons the electrons will scatter our photons here and then you're going to get a very characteristic pattern here that corresponds to the frequencies and orientations that were favorable that would be amplified things and then you get something that looks roughly like the picture here in the middle this is not dirty, but the small black points here are the so-called constructive interference and if I were an extracurricular which I'm not, then I should a good extracurricular can actually start to say something about the general shape of the sample or something just by looking at the sample and the general shape of the object the white point you see here and typically today we no longer use film as it says up there today you would have a camera what you then do is that today we put this in computers and throw tons of computing time on it actually not so much and then from that we can deduce the blue shape you see here and the blue shape here is really the electron density because again, it's the electrons where it scatters but based on where the electrons are we can then let the computers try to fit this and ideally in this case we have traced a backbone in here so you have been able to guess what residues are where and how does the protein backbone fit in this electron density the reason why I'm showing you this is that we frequently think of experiments as somehow an experiment is the opposite of a theory or a calculation, right? but the point is that the only experimental result here is this one the structure is technically it's an experimental structure there's quite a lot of modeling that has gone into this step and in particular I'm not sure how good you are but where are the hydrogens? you don't see the hydrogens because the hydrogens hardly have any electrons so they're not going to scatter so all the hydrogens you ever see in a sample are usually we have to place them there with a computer now in case of a hydrogens it's kind of easy because they know they're pretty much one extra away from the carbon so that's trivial but even if you have a low resolution structure it's possible to make errors here so that there is more modeling involved in an experiment than you think and we're going to come back to that many times in the course models are not limited to theoretical chemistry so what did we do before we had computers? so this is one of the first structures hemoglobin, max per roots and nascent biophysicists it took them 22 years to get the structure can you imagine just starting where you are now it's one thing to start and do a 22 year quality if you know you would get the structure but nobody had ever done this before at that point people weren't even sure whether proteins were rigid or not they kind of started before they must have started their work before Fred Sanger published the result that protein had a unique amino acid sequence and just keep going for 20 plus years and then this must be possible to solve every single modeling they had all these roles where you place atoms and then you use a ruler and you measure things, you measure angles and then you use this down with paper and pen based on the relative positions on these atoms I predict that I would get an x-ray scattering plot that would look roughly like this and then you go in and measure the x-ray scattering plot and see if your model is right and it's not quite right we're going to need to move that back and calculate again there's a reason why it took 20 years so the tour of the force behind this is simply insane and they also yelled the Gopin Nobel Prize for this together with John Kendrew who determined myoglobin in parallel this is the other thing that's so amazing you mentioned this word hemoglobin and myoglobin it turns out that they look almost the same myoglobin is one subunit hemoglobin is four subunits they didn't know they just happened to pick a protein that was almost the same as the two slightly different forms and both of them did it with the MRC encampments if you ever visit the laboratory of molecular biology they still have a bunch of these models in the lobby and it's simply astonishing to just look at them this is hemoglobin today in a molecular model and everything that we do with computers but this is how it all started I'm going to come back to this because there's some other work related to that more modern stuff this is one of this is actually not the first membrane protein but it's the first ion channel that a structure was published for it's a small bacterial ion channel that's very common because their bacterial proteins not just channels their bacteria are simpler or you could argue more efficient than two months they don't carry around a whole lot of extra craft and stupid complicated domains because they need to have a very very efficient energy turnover and that typically means that the bacterial structure is more simpler, more compact and that makes it easier to determine structure though and that's why most structures we determine we usually start out to use a bacterial model and determine the structure of the bacterial protein and much later you can determine the structure of a corresponding human protein so this turns out to be a small P8 regulated channel so depending on P8 it opens or closes and it's almost the central part here we now know that it's almost identical to the one that you use for voltage gated sensing but the difference there is that you need a voltage gated channel because you have a nervous system bacteria don't bother with a nervous system well their intelligence isn't quite near yours but when it comes to energy efficiency you're not even close the bacteria is a much more beautiful creature than we are this is the human one far let's see we'll get this right, yes we have the central part with the helistar so that part corresponds roughly to the bacterial part here so you can see all the extra stuff that we're carrying around and this of course this costs energy to build but it enables us to do some pretty amazing things for their nervous system there are some advantages to being human another of the very early channels Aquaporin, this is a similar small channel that lets water in and out of your cells which is good because depending on pressure and everything, salinity if you didn't have anything that you could control the volume of your cells all your cells would explode if suddenly a change in temperature or a level of salt or something and most of us usually don't explode unless these channels let water in or out so both Dieter Ager and Rod McKinnon they shared the 2003 Nobel Prize for chemistry for these molecules and this is actually pretty fun because I've given some of these slides are old and the reason why I have these slides is because a lot of these Nobel Prizes have actually been able to I wouldn't necessarily say predict but when you go through these Hall of Fame of important molecules they tend to get Nobel Prizes sooner or later so this is another molecule the gene factory the RNA polymerase this is a molecule that reads DNA, opens up DNA and then converts the DNA into RNA molecules it's a gigantic enzyme this would happen spontaneously, it would be so slow that life would not be possible so that this enzyme converts DNA to RNA and then we can send RNA on to something else this is actually Roger Kornberger who is in the Department of Structure of Biology the staff work when I made my postdoc I didn't do it with Roger though with Mike Nobel Prize I should know with it that I think that was 2006 or so, also for chemistry the other cool thing is that they got the Nobel Prize for one single paper the paper that determined the structure of RNA polymerase 2 but if you think that people get an Nobel Prize for their carrier but it's not true, it's for one paper the second part of the gene factory is the ribosome which when I was your age we do something like this we knew that there were two parts of it but we knew nothing of the details so what the ribosome does is that it takes this messenger RNA and then you get inside here you actually bind some amino acids and then the ribosome stitches these amino acids together exactly the reaction that I spoke about that we get one amino acid binding to the next one and then the ribosome puts out what you call the nation chain and this chain is where proteins they probably start to form already in the nation chain sorry in the exit tunnel but we don't know the details and then depending on if it's a globular protein it should just go out in the water, if it's a membrane protein it should somehow go out in the membrane I'll come back to that two later three famous people, Tom Steitz and Peter Moore at both at Yale University and Ben Kiramakresna at LMB Cambridge Nobel Prize 2009 and this is fun because I had this fight before 2009 I should start putting up these up online or something that you can show that it's predicted deep protein couple of receptors and again I don't expect to know these proteins it's just an example of understanding how important protein structure is for a long time we said that nobody will ever determine the structure of a deep protein couple of receptors companies spent billions of dollars trying to determine that because there are so everything that has to do with signaling in yourselves is controlled by deep protein couple of receptors and everybody had given up until suddenly Brian Kubilka at Stanford and Ray Stevens almost at the same time but Brian Kubilka was earlier showed that they had managed to over express purify and crystallize the DPR and determine the structure of a beta 2 adrenergic receptor and that was of course an overpriced tube but not to Ray Stevens just to Brian and Brian's former advisor and today we have more than 25 structures of these and there are even a bunch of these structures that are not publicly available but that pharmaceutical companies have crystallized internally because there is so much money involved in this it's insane how many drugs are going to target this because this has to do a gigantic kind of everything that has to do with how sales communicate with each other, super important so one could even you can even imagine yourselves that do some sort of Nobel prize omatics that if you want to go after a Nobel prize you should likely head into structure of biology there is no other area that have gotten more Nobel prizes than structure of biology and that's I think that's partly because we've become fascinated with these structures but it's so obvious that before we have a structure we don't know how something works and when we have the structure we can suddenly start to point and understand how things work and I have given this course a year or two ago I would have stopped roughly there x-ray crystallography had been the revolution that is x-ray crystallography is really the tool that allows us to just change from the macroscopic world and go out, go down and see the atoms it seems to change and this is the amazing thing with science so just two or three years ago there has been another method to try to determine structure that's exactly microscopy a couple of years ago there was a Nobel prize for super resolution microscopy and that gets you what resolution you can see with super resolution microscopy, all-park so what would be the historical limit of a microscope how far down can you see with a microscope if we get about the Nobel prize so the problem with a microscope is that you're limited by the wavelength of light and that's in the ballpark of 500 nm and a protein, these proteins are like 2 nm the resolution of an x-ray might be 2 angstrom or something 0.2 nm so 400 nm isn't you don't think you can see the protein if the protein flourishes for something with light you can see where in the cell a protein is but you can't see the structure of the protein super resolution microscopy brings that down to roughly 20 nm through a bunch of tricks really cool technique but 20 nm, 0.2 nm it's not going to work so what you can do though is that it has to do with the wavelength of light but life would normally be photons right but there's nothing to say that I have to use photons so what people started doing way before life sciences is that you can use electrons and the electrons is technically not a rule you might not think of it as a wave but in physics quantum mechanics you have every particle as a wave and particle duality so that if you just have a particle with very high energy it's going to start acting like a wave and the wavelength will depend on the frequency of that electron so if you accelerate an electron with a couple of hundred thousand electron volts it's going to behave like a wave with a wavelength of say 0.1 angstrom or something so you can actually use electrons per image the only problem is you can't use normal lenses you can use magnetic lenses it's complicated the other problem is at the end you're going to need to detect these electrons and to make a long story short shortcomings in these detectors meant that the best thing we could hope for an electron microscope was to see something like that was in all particles say 5 angstrom or so so you could see a shady blob corresponding to a protein where it was virtually impossible to see sightings or anything faster because it's the best of their career senate until two years ago so this was an opinion piece in nature that I also haven't decided as well where it's really there was this was an old song and I forgot by who it said the revolution will not be televised it was a big theme and the black power movement in the US in the 1960s so the title of this piece is the revolution will not be crystallized suddenly there was a new generation of semiconductor detectors that can detect electrons with the resolution of 1.7 angstrom and this was like it's sudden from one month to another and suddenly you get a bunch of the biggest group thank you Ramakrishnan for this he got a noble prize for his work on x-ray crystallography the entire lab has switched to cryolocal microscopy so this is very much led by the laboratory of molecular biology in Cambridge the cool thing with cryeum is that you can suddenly determine a structure directly of an individual particle forget about over expression forget about purification forget about crystallization those were kind of the three hard steps to the protein suddenly it's a matter of getting protein structures in days or weeks instead of months or years at the difference you have this article the difference with x-ray as I explained before with x-ray you have a fairly large sample at least a milligram and they use them to get a diffraction and this diffraction pattern you then let the computer interpret the built-in interference and they can interpret the pattern here and try to get the structure back with cryolictromicroscope you have a very tiny protein sample and that's the cryo part typically 100-month protein and then we have a lens which does not need an optical lens but a magnetic lens with a lens we actually create a real image so it's not just a diffraction pattern but this is a real image but you can imagine how noise it's going to be and you can see particles so the way we then do this is that you take thousands or even millions of images and then we let computers sort this out because there is one small complication imagine that this is a car but you're getting two-dimensional images of the car it's like taking a car and then slicing through the car with a very sharp knife and then you're going to get a million random slices through the car and from this one million random slices is the three-dimensional structure of every single component in the car it works but it takes weeks of computer time so this was not even possible before we had the latest generation of computers and here I think there are two potential overprices first Richard Henderson for his method development work here and it could happen very soon the other one is Yifan Cheng who is at San Francisco one of them would determine some of the first structures so this is a typical old cryo model of this TRP1 protein so this is a receptor that is sensitive for pain and heat it's a channel so when you have certain things like capsicin and chili peppers they will bind to this receptor and that will generate nerve signals that you interpret as heat this would be the traditional cryoenomics just the gray blob here so you could see the shape possibly but there is no way you could see any atomic detail these are the images with these new detectors you might still think that this is noisy and it is but this is suddenly enough resolution that we can start to trace individual sightings and everything and then you get these beautiful structures the entire coding this could very well be an overpricing the future too just because it's the first high resolution one I think this is a really good place to take a break and then I'm going to get back and talk a little bit about the diversity and everything after the break so let's get started again I'm going to summarize with this TRP-V1 channel that's a potential enterprise this capsicin molecule exactly what doesn't look so important this one it binds out here in the subunit that's actually very similar to the wolf's gated channels but in this case instead of being activated by a wolf it is being activated by the binding of the molecule and when these molecules bind you can open the center pore and it conducts ions which leads to a nerve signal so nature tends to use simple building blocks because there are simply there are not that many if you need something to move or influence another domain and that's something that comes back over and over and over again if you see structural biology which is also intimately related to intelligent design because if you ever heard about the argument of the intelligent design it's like how would somehow create something like an eye there's noting that create an eye by pure revolution because it's not really until the eye works that the body has any use for it and the way this works is really that complicated things are built from many small building blocks nature tends to reuse the building blocks the building blocks already exist somewhere else in the genome but the cell repeats for something else for instance a wolf's gated channel being turned into a pain receptor so the reason for this diversity is two-fold one of them is as we mentioned we have a pretty amazing sequence of amino acids here but at phase value it's just a long string of amino acids just because you have many different amino acids doesn't really cause anything special there are other examples of polymers the plastic bags you buy the plastic bag isn't really characterized by being able to perform all these functions so the special things with proteins is that they're not homo polymers they're hetero polymers you have a different composition of amino acid and it's also very specific and that in turn means that we get a lot of possible different ways which these can interact but also these chains with the exception of the general bonds here these chains are very very flexible they can exist in lots of different conformations I'm not sure if you brought this up in the bioinformatics course but there's a classical example where you can toy a little bit about this that how many conformations is there for a typical protein and if we're going to get back to what these two angles are but if there are say two angles per amino acid that can move and we sample those let's say 10 degree intervals that would mean there are at least 36 different ways we can put each amino acid in and if you then have a 100 residue protein that would be 36 something like 36 squared to the power of 100 which is something like 10 to the power of 308 and this is actually fun because if you try to put this in a computer you're going to get an overflow that number is so large that you can't even have a problem positioned on a computer so it's an insanely large number only one of those is the native structure how true is that one last statement that there's only one native structure we'll come back to that, good question so what happens is that if you look at these bonds they can go through what you call an isomerization that you can rotate bonds in different ways and when you rotate bonds in different ways they can either be different sites called trans or you can have these two groups on the same site which consists and I'm just realizing this is probably going to I'll skip that slide if you think about just in general how can a large molecule move the simplest thing would of course be you could imagine these bonds vibrating like being stretched or compressed the book goes through this in some detail I'm going to need to gloss this over a bit we won't be able to prove this until tomorrow but it turns out that if you use infrared spectroscopy we can see these vibrations peaks very well and they're roughly seven times 10 to the power of 13 hertz or so and you can actually show that a frequency corresponds directly to an energy in physics and why I will show you tomorrow and it turns out that at room temperature roughly 300 Kelvin there is simply so little energy in any of these bonds that are going to be excited bonds are not going to be vibrating they're kind of more like stiff rods they're just in the quantum quantum ground states so with a little bit of hand waving you can trust me that say bond vibrations are irrelevant for coating motions the angles can vibrate a bit say up to five degrees or so but it's not really going to be important it makes the coating a little bit softer but it's not going to change the motions at all and for this reason that the only thing you really have is the torsions so if you start to look at a chain like this if we can't we can't move the bonds and we can't move the angles the only thing left is you can rotate around some bonds so what happens if you rotate so this is a side chain in a protein what happens if you rotate that bond that small one you get these hydrogens rotating sure we might have had an arginine side chain here and then you would have an arginine side chain whispering there but it's not really going to change anything if you rotate the peptide bond what's going to happen remember what did I say about the peptide bond this was a trick question you can't rotate it it's one of those bonds you can rotate it but in practice it won't happen is it just when it's produced exactly you can't produce it in two different forms but there are two bonds that we can rotate the bond just before there's Adam in the center of the beta amino acid it's called the alpha carbon there's just chemical nomenclature it's the first carbon that we start the entire amino acid from so the bond just before the alpha carbon but it's from the nitrogen for the carbon and then the bond just after it from the alpha carbon to the normal carbon both those are free they're normal bonds so you can rotate it around for that reason they have very special names you can if this is a molecule where you have the preceding amino acid you can kind of draw because this is a peptide bond right so that's stiff so you can draw a plane here and then after the alpha carbon we can also draw a plane to the next peptide bond so both of these will be almost entirely rigid and planar but you can rotate that plane and you can rotate that plane and that corresponds to the rotation around the first bond and the second bond here that's pretty much all the important degrees of freedom you have in a protein it's a bit oversimplified but the first approximation and because these are so important they have special names and these names you need to learn they're called Phi and Psi and this is one of those senior realign highs that have we been spotted we would have called them names that are slightly more different but sorry you're going to need to live with it and you need to know which is which here rather than trying in principle you can try to remember exactly what I would recommend you do just remember the order first Phi and then Psi when you're going from the N terminal to the C terminal in the direction of the sequence the one before each alpha carbon is the Phi the one after the alpha carbon is Psi so what do you do then if I suddenly ask you well what atoms are involved in this portion so this is where you need to start doing that one of the defining characters of physicists is that we are remarkably lazy physicists do not like to learn things by heart so when somebody else is defining this you start to draw your protein and well you don't have to care about the site and there is a nitrogen and the alpha carbon and the carbon and then you have a nitrogen, alpha carbon and the carbon if you know that and if you know that one of these bones is just before the carbon and the other one just after the alpha carbon you can define this with paper and pen so you don't have to know it by heart and that's one of the reasons why I like the six don't try to remember things think and rewrite it if you have to so there is two of these per residue Psi and that's of course when this argument I had before I said that imagine that there are two degrees of freedom per residue there are lots of other things you can rotate but the thing is that if you rotate these bonds it's going to have global effect rotating this bond will change the orientation of the rest of the chain anything I rotate in the side chain will just have a local effect these dihedral angles we're used to thinking that the way we define this is really that you have three atoms always define a plane right so if you have four atoms here atoms i, j and k they define one plane here the blue one and then j, k and l define a second plane the red one and this dihedral or torsion angle that is literally just the angle between these planes but of course there are two angles there is an angle here and there is also an angle which one should it be should be the larger angle here or the smaller angle there so this is the bad thing in science if there are two ways to do it polymer people have decided on one standard and biocentric people have decided on another standard I'm not going to ask you to remember which doesn't really matter from the point that you can look at and you will be able to derive this later so don't worry too much about that for now what you should rather think in terms of angles what you need to realize is that trans is if you have two important or large groups or in this case the entire chain here and the entire chain there trans is when they are on opposite sides and cis is when they are on the same side that you need to know the exact angle definitions you can always look that up if you need to but the cool thing is that if you do this and if we then use this definition that you had, we can start to plot this there are bunch of protein structures available in the protein data bank every amino acid we put one angle 5 here on the x-axis and another angle side on the y-axis every black dot here is an example for a protein this here doesn't really matter so much should you see that these cluster they are not randomly distributed so there is one big area here where it's common to find them there is one area here and then there are some smaller areas so you hardly find anything here why is that? this is not really important exactly, if you have a long chain and you can actually we can actually have a look at that with the molecular tool here I didn't find that to break there are some orientations where the chain will simply bump into itself and then you can't put it there no matter how much you would like to it's physically impossible it would be too expensive so it turns out that in practice there are not at all 36 squared but you could argue this is one conformation and that is likely another one so there is less freedom in these than you think with a couple of exceptions remember that some proteins were special glycine had lots of regions where it can be why is that the case? it's so small it has fewer things that bump into others and then proline on the other hand that poor guy there must only be in one combination and that's because proline has these loops where it bends back on itself and because proline bends back on itself it has hardly any freedom at all as you can even see here there's only one value per five that's even possible and then there are two different values per side while general ones have slightly more freedom the proline even screws up the previous residue here that the residue just before a proline doesn't have a whole lot of freedom either this is going to be important when we get back to proteins and not see it sorry and now I can show you that slide again because in general with this peptide group it turns out most proteins like have the peptide group in trans shape so that they have the oxygen the CO group and the NH group on different sides so that it's traced out but proline is an exception for proline you can actually you usually have cis form that means that proline usually kinks the chain a bit but it's more complicated than that that you can actually have both cis and trans it's usually cis but occasionally trans for proline so whenever it comes to protein structure prediction or something proline is complicated imagine you have a bioinformatics sequence a thousand residues and I just swap one residue from a glycine to a proline now bioinformatics in general would tell you that it's just one residue of a thousand it's essentially the same sequence that's not going to change it but if it isn't proline it likely will because if you put this proline right in the middle of the helix you're going to get a kink in the alpha helix the proline and that's the reason why proline are known as helix breaker if you haven't tried to predict secondary structure in the bioinformatics course and the reason that is a very physical property of proline destroys the local secondary structure and it's jump over I should have had that slide in the previous sorry so these diagrams are called ramosher and I didn't measure these are called ramoshandle diagrams when you put the phi angle on one axis and the psi on the other one and it tells you what structures are possible or not and this brings us to two very famous people in this field first Christian Anton St. Danish scientists this relates to your question in the 1950s that proteins always adopt the structure that corresponds to the global minimum and free energy you don't need to understand what the free part is now we'll talk about that tomorrow this is an amazing result remember how many degrees of freedom there were that 10 to the power of 300 and Christian postulated that this always it's determined by physics there is no magic living substance it's not the body that uses energies to fold the protein into a specific shape it's entirely governed by physics how do you think he argued for that and what do you think he used to prove it there is a way more beautiful result by the magnitude of it this experiment on every protein so it took us more protein and then it showed that you can put this protein in a test tube there is no cell there is nothing special it's just protein in the test tube we can heat the direction they use urea so they can with a very high level of uric acid we can denaturate the protein and destroy the structure but then he removed the urea again and then he showed that the protein recovered its function there is no cell involved so you can show that this peptide sequence that we had destroyed so it's unfolded it was spontaneously re-folded in a test tube without any cellular environment they got the Nobel Prize for this in 1964 I think I might be a year or two of them it's an astonishing result do you think it's true in general does it hold strong balance the amazing thing is that it's actually almost all the cells for every single protein they showed this for one relatively small protein for all small single domain proteins this is in principle true this is the reason why now we can fold proteins in a computer we just apply the laws of physics add water and everything that the protein folds it takes a gigantic amount of time unfortunately more than we can use in the lab up here maybe for a small sequence but this is the reason we can predict proteins completely not physics and this is important because in bioinformatics you predict protein structure based on the similarity to another protein that would work even if you had some magic energy involved right, that's pattern recognition here we're talking about the laws of physics but it's not always true we're just saying that to every rule in biology there is an exception and the exception is like prion proteins amyloids there are these proteins that misfold the protein protein diseases that Stanley Prusiner in particular pioneered and we will come back to this later in the course too but they actually turned out that in some cases there are diseases that appear to be spread that in agents that was neither a bacterium nor a virus and everybody saw the rule yes, and those were prions they didn't know what a prion was but eventually they realized that prions are misfolded proteins so there are likely proteins that somehow can convert into a different structure and one bad protein would somehow spread this bad structure to other proteins and that's likely, so there are some proteins that appear to have multiple states but that's complicated but again, biology is a bit different from physics in biology overall this is the rule there happens a proposal there is a global minimum and not any local minimum because the big thing about that statement is that it must be the global minimum yes, and on the cabinet here this is small single domain proteins there are some gigantic proteins where you might need chaperonins and everything but here the question becomes remember what I said about time scales in many cases I would argue the reason we need chaperonins and a lot of biological machinery is not because the native state has the lowest energy but because it would take too long for the protein to find that spontaneously but that is debatable this is a very interesting research field actually 20 years ago most people would have said that this is always true it's no longer as obvious anymore and for membrane proteins it might be I should not say completely wrong since I'm recording this but there are some interesting ideas with membrane proteins where we started to change our view how membrane protein evolving works in human science however in that perspective Cyrus Leventhal had a very famous paradox so paradox, so Cyrus said that remember what I said that make it even simpler forget about the 36 states in the Ramachandran diagrams we saw that there were typically two regions for each amino acid that there could be let's oversimplify each amino acid can only have two different states even with two different states there would be two to the power of 100 different conformations from a chain if each of these conformations took like a nanosecond or so to explore it would still take the age of the universe for nature to pull this and still this happens do you know how long it takes to pull the protein yourselves? nanoseconds nanoseconds is a bit fast I would say that microsecond the very fastest ones up to a second the very slowest ones apparently it happens all the time in the cell and this is a very famous paradox we know that it's wrong but we can't understand why you can practice later in the course but it's not over the other thing that it should be summarized remember that I told the sequence the structure to function we have an amino acid sequence what Anfield said is basically saying is this white arrow the amino acid sequence uniquely defines a 3D folder structure then what we haven't brought up now that I will cover later on in the course is that this structure, the way the amino acids are organized and everything here will cause different proteins to fit to each other, specific interactions and think of this as a key and a lock and this combination will lead to a unique function but the arrows go down they do not infer structure does not infer a sequence and function does not infer a structure it might be very common for a specific protein whether a function has a specific structure it might be unique in the other one and then there are complications so what I'm going to central this was called re-naturation that you can destroy a protein and re-fold it again then there are of course a couple of exceptions here that complicate a bit you frequently have post-translational modification that you might bind or you might somehow change the chain for insulin that's actually the case it's too simple, we cut the chain in a complicated way and for things like hemoglobin you can bind other molecules to get the function so it's a bit more complicated in practice but this is also the power of physics don't think about the complications if you focus on the complications you're never going to get anywhere so try to cut away the complications and focus on the essence yes, there are examples where on this and it's not true but the amazing thing that it appears to be true in 99.99% of all cases and that's rather amazing Brian Bob Robson had a famous this is a bit corny but it's all that I have the great proteino C is amazing, split second lead from fully extended to tightly coupled twice and then there's this quote that you probably can't read I don't know how he does it lived in time in 1966 this polypeptide started there are a couple of things we'll come back to that, but later on we're going to talk a lot about the event house paradox there's one slight complication here there is formerly a omega bond and this peptide bond in principle that's almost always trans for probably it can vary I don't think that you would ever work with the peptide bond but it says you know, pi and psi suddenly you see an omega and it's good to know what that omega stands for and then in the side chains depending on how long the side chains it becomes one or more bonds the side chain bonds are usually called chi bonds and then chi 1, 2, 3 they are all from the alpha carbon and these you can frequently we are frequently really good at predicting these with bioinformatics predicting the global ones and the other that's hard so what this does is that this is really what will govern everything we know about protein structure we're going to come back to this later too but the primary structure that you're really familiar from with bioinformatics that is simply the sequence of amino acids it's a bit stupid to call that structure but the reason we call it structure is that everything else on the slide is called structure so occasionally we call the sequence for primary structure that's just the sequence of residues those Ramos-Chan and diagrams I showed you meant that for most of these there is pretty much only one or two regions they like to be and this caused these very small regular pattern that are called secondary structures that we'll talk a little bit about in the next few slides secondary structure in turn will use its side chains to interact in even more and more complicated matters which are called tertiary structure most of the pictures of proteins I've shown you are tertiary structure so that's one chain that has been folded up and for these very large iron channels and everything you typically have what we call the quaternary structure that multiple chains multiple domains like these that have typically been folded independently these subunits then aggregate together and form an even larger protein so the iron channels I showed you typically consists of five different chains five copies of one chain helices you have probably seen before so I'm not going to go through them in too much detail and also I'm going to skip a couple of slides at the end here so if you use high cost questions it will take those more I would argue that helices is by far the most stable structure I will see why later on in the course but the reason for this it's a very local structure this amino acid is interacting with an amino acid three or four residues away it's not a matter of this really complicated landscape where you need to form things with thousands of forces involved it's something you only need to interact with your closest neighbors in principle there are lots of different helices you don't need to understand the details of the helices but you do need to understand that by far the most common helix is the alpha helix and the alpha helix is characterized by each risk you making hydrogen bonds to a residue that is four neighbors away you typically have an oxygen here but you always have an oxygen here so this carbon oxygen and then four residues away you have a nitrogen and a hydrogen that hydrogen loves to interact with that oxygen and we'll come back to why later that is the entire reason why alpha helices are stabilized so the 13 really means that to get all the way to the same position you started from you're going to need to go 13 residues here so for an alpha helix it's on average roughly 3.6 residues per term so 13 residues later you're going to be back in exactly the same angle you started from don't bother about that that's not so important these other helices are not really important either but I'm going to an alpha helix when you have these hydrogen bonds think of the hydrogen bonds but that's really how you're going to be able to tell them apart when you're hydrogen bond to something four neighbors away that's going to be the most relaxed in the helical structures and that's again the vast majority of all helices you're going to see in PDB are going to be alpha helices it is possible for nature to take a helix and twist it harder and rather than hydrogen bonding with something four residues away it turns out that then you're hydrogen bonding with something three residues away that leads to a very tightly wound helix and that's for the 3.10 helix and you occasionally see that in PDB and the pi helix is the opposite that's one where we have pi between them that pretty much never occurs you can forget about the pi helix we do see this one now at the end of the PDB in particular those voltage gated channels so in the voltage gated channels we have those arginines and normally you have an arginine and just later you have another arginine then they're going to be placed like a stair case ladder here all around the helix but in this type of helix you will get all the arginines on the same side so that's likely why nature might use it take home message here alpha helix that we're hydrogen bonding to the residue four residues away but there is another type of secondary structure element so which one of these if you just, you don't need to put in that helix you can just take the residues and put them straight which one of these would be more stable what's the difference between them different strands that's a beta strand yes which by itself it doesn't, it's not well, it's not local structure elements of sheets and also it can just be a loop so the advantage here of course is way more flexible so you've got some large residues here or something that don't like to be in an alpha helix you can certainly already know this is favorable if you compare this to is that this beta strand is really sorry, beta sheet on itself it's not particularly stable at all because there is not a single hydrogen bond that it can make there I think but if you put several of these next to each other then you're going to need these large extended beta sheets and that turns out to be the critical difference that the alpha helix is a local structure remember that I said it was stable because it's forming bonds with its closest neighbors so an alpha helix will form just with local interactions and you can guess, that's a good thought question to think about it tomorrow, could you say anything about how fast an alpha helix is likely to form while a beta sheet on the other hand all the interactions here would have to be with some other strand and that's going to be a residue from a different part of the protein and that will give it quite different properties in the way they can form it everything there even turns out that there are two different ways we can pack beta sheets, they can either be parallel, that they go in the same direction or they can be anti-parallel anti-parallel is easy because you can just go up down, up down, up down when they are parallel once we've taken our first sheet here sorry, it spanned here, we somehow need to have another structure element to get back down here and then go up again and back down here and up again but for both of these they can form lots of hydrogen bonds between adjacent strands you see both of them in action so what are beta sheets useful for? imagine if you are the divine creator or something yes, that would look like this is my smarter and positive smarter answer that you might have to realize they literally, so this is a small protein called FABP I'll tell you why in a second so here you have one layer a beta sheet here so beta sheet is multiple strands and then if you can see it's a bit faint in the background, you have a second layer of beta sheets in the background could you imagine what this protein could do? this is a hard question I'll give you a little help a special property of the beta sheets is that they tend to have alternate residues facing in and out so if you now take every second residue here and make them water lacking, hydrophilic you put them on the outside every second residue the other ones are hydrophobic you put them so they face the inside what would that give you? so it would be a liquid pocket that could bind things that are fat so FABP stands for fatty acid binding protein and that's literally what the body uses to transport things like fatty acids to a membrane presence that we want because the fatty acid itself is not water-soluble so it comes back to this nature essentially uses fairly simple thermodynamics to create different patterns and this is something that would be virtually impossible to do without helices this is a work-based there is no intelligent designer it's all based on evolution but that's why they have very different properties that we will keep coming back to so when do you think these structures were determined? and how? that's a good question but it's wrong it's more predictive they were predicted theoretically before the first X-ray structures of protein return it's a very famous paper it's a set of 8 papers in PNAS in 1951 when they determined both the alpha helix and the pi helix that's why I include the pi helix basically they predicted every single helix they predicted both parallel and anti-parallel beta sheets and everything before anybody had seen their structures just from physics there just from physics those in the 1950s it's not like they had a super computer with how they paper and pen physics it's pretty impressive right? no idea that's significantly quite young version of Linus in this quarry I think Linus is a very fun character there is a review by Dave Eisenberg I'm not going to ask you to read all those papers by Dave Eisenberg of those papers I think it's a fairly we talked about before the role of theory versus experiments and everything and again this was at the time that people weren't even sure whether proteins had well defined structure so today being able I bet you could probably do this with a bit of work and everything but the key thing is that it's very in hindsight this was an obvious result but they did this as a point where nobody was even sure whether there was any well defined structure to a protein and put a bunch of gears before the first protein structures came out so all this early work led to a lot of investments and people being interested in determining structures people spent a whole lot of work trying to determine structure experimentally too and one of the most important molecules that people went after was a molecule DNA deoxyribonucleic acid where our friend Linus Pauling was also very active so this was a complicated start because DNA apparently gave rise to different x-rays depending what was dry or whether it was wet so whether it was a crystal or somehow water in it completely different shapes diffraction maps in x-ray so there were a bunch of people both studying different diffraction patterns people come up with different models because again we did not have the computers you need to have a reasonable model and try to realize what makes sense here so the structure of DNA was an important topic in the 1960s and what you can actually start here these are some of the first x-ray maps although this is not an x-ray course but if you start looking at the specific diffraction patterns here it turns out that you can relatively easy while this character is x-shaped you get here that's a very strong indication that you have something helical that's a sag essentially from a structure point of view back and forth and it's crossing itself multiple times so there were lots of consensus early on that there has to be some sort of helical structure to DNA and what you see here this is a paper by this is Linus Pauling's model of the structure of DNA published in the Proceedings of the National Academy of Sciences in 1953 so then he proposed that DNA was a trimer with the backbone facing the inside and the basis pointing out and I would not call this failure again in hindsight we never show bad results you can say lots of things about it but it's not the structure of DNA it's not bad, it certainly has the helical, I bet it has exactly arrived helical periodicity, it fit really well with this but the problem with theory is that it's not always right you can very easily go wrong but you can go wrong in the experiments too well I think the less we say about that structure the better the problem is but you can win Nobel Prize even though you have some structure that are not necessarily first class and then we had Rosalind Franklin in particular and based on Rosalind Franklin's extra map you had two young brats and Kim Jim Watson and Francis Crick who came up with a slightly different model of DNA amateur, do you know what their background was? because you might think of yourself as a transdisciplinary, I'm not sure whether you think of yourself as a physicist's chemist or something like physicists and mathematicians no, so Jim Watson was trained as an ornithologist and Francis Crick was a physicist now of course today we would call the molecular biologists because that's kind of, their discovery is kind of found as molecular biologists together with Max Perutz and some others if you are in the middle of the interesting stuff you're always going to feel that you're a bit post-disciplinary research is most interested in the point where the map is still white and that's where we don't really have good names with anyone so Watson and Crick come up with one key amazing result do you know what that was? so actually I'm not sure how much you know about DNA but they came up with the reason why this one is wrong the DNA on its backbone has a bunch of phosphate times and I'm not sure how much you know about phosphates but it's a phosphate and then surrounded by a bunch of negatively charged oxidants so there's a very large negative charge can you see any problem with that here? exactly so with a bunch of tons of phosphates here on right next to each other the phosphates would rather like to be as far apart from each other as possible right it's much smarter to put the phosphates on the outside and have these bases pointing inwards so they came up nasty people would say that Frank Crick did all the calculation and then you had Jim Watson just sitting and building the molecular models on the fascist crates and structures but I'm not going to say that this is a really cool paper that's available I think you should read it it's one and a half pages and the way that they just propose a structure for the salt of the oxyribonucleic acid and there's a beautiful formulation to the end of this paper that says roughly it has not escaped our notice that our proposed model also provides a mechanism for the transfer of the genetic material and that is pretty much the first of everything we do in this building and Andy said it's a beautiful British understatement that it has not escaped our notice they didn't say more about it in a paper today you would of course have three follow up papers focusing just on that aspect and everything and they would have spent six pages supporting information in the original paper because I think it's a really fun way that's funded much of what we do the sad thing that they are the, well I wouldn't say that they said this paper was published in 1953 do you remember when they got their normal price 62 sadly by that time people are pretty much criticised that Rosalind Franklin should have had the normal price too I think that's a bit awkward because Rosalind Franklin unfortunately died from cancer very young so had she been alive I think she would have shared it but we won't know but an interesting question why doesn't it take nine years for them to get the normal price so I'm going to leave you hanging I'm not going to tell you to think about that so the cool thing is that all the notes in normal prices are secret for 50 years in 1962 those notes the secrecy was lifted in 2012 so since a couple of years I don't know the reason for the normal committee why they awarded them the normal price in 1962 and why not in 1954 or something I will tell you tomorrow but think about why it took nine years it's an important lesson in science so there is one last thing that just to follow up on the Ramachandran Diagrams that I showed you the point with this Ramachandran Diagrams is of course that these regions that are really well populated to alpha helices and beta sheets and some of these other strange regions are disordered but they're not really that important but that's why we have these two regions that's pretty much the reason why we don't have more secondary structures in proteins is that there are only two regions in the Ramachandran Diagrams that are really favorable and nature has used that to form the stable secondary structures that doesn't mean that all proteins will form that but here we have this interplay on one hand and evolution on the other because there are patterns that are possible there are patterns that are possible and that will be very stable evolution will naturally cause proteins to mutate to reach those patterns because they are stable even evolution could never cause you to reach this area because there is nothing stable there and same thing that the reason why we might think that these are stable that there are certainly many residues in an alpha helix but if things are not stable in an alpha helix that protein will likely not have a very stable structure and if a protein won't have a stable structure it will likely not form and that brings to another question that we'll bring up later in the course that you could think about will all sequences form proteins what is the likelihood that the sequence will form proteins is like 50%, 10%, 1% 0.1% do you have any yes all sequences if you just create a random sequence what is the likelihood that a random sequence will form a protein you mean a functional thing forget about a function, just structure a well defined structure not just a secondary structure but a well defined tertiary structure so your guess was an insanely high take a 10 to the power of minus 10 or minus 20 there is virtually no random sequence of both stable proteins but there is something nature has done with 4.3 billion years of evolution there are a couple of slides on the stabilisation of actually we have 20 minutes I can include them after all because they're not the reason for including these slides is that I'm going to come back to them tomorrow so that's why I figured that it's good for you to have seen a bit when it comes to helixes if you look at this peptide bond atoms have charges you use the thing that ions have a full charge plus or minus 1 or plus or minus 2 right but it turns out that all atoms in a large molecule some of them attract electrons better than others and the ones that attract electrons like an oxygen in this case will have roughly minus half the charge localised there nitrogen will also attract electrons that will also be negatively charged but the hydrogen here will be positively charged and this will lead to that these peptide bonds in particular will have a relatively strong dipole that corresponds to a more negative charge here and more positive charge towards the oxygen and more positive charge towards the hydrogen and that's a unit called Debye and you'll need to know what it is but the difference in charge over a very small distance will yield to a fairly strong dipole here but it's just one dipole that doesn't really matter or does it? because in a helix you don't have one dipole so what happens you remember when we showed this alpha helix that one oxygen here formed a hydrogen bond to a hydrogen for residues away so what happens when you line up all this in an alpha helix all these dipoles add up and point in the same direction so when you have at least 20 residues here you can have a 20 times stronger dipole because all these dipoles line up in the same direction actually they point in the other direction so you have a positive charge in the helix here and as if you had a negative charge in the helix there so helix is going to be like a relatively charged small rod that you can imagine as a purely physical detail until you remember the KCSA channel that Rob McKinnon determined the first membrane protein this is a very very special protein because it conducts ions but it's selective for one type of ion I can show you all the information I can talk about so here you have a pore and then you have what we call a large cavity here where water is and here you have the so called pore filter so this is the one that selects where the ions goes through and in this particular the KCSA the K just means that it's an ion it's an ion channel that is conducting potassium ions K so it's conducting potassium ions but not so in your matter can you see any problem with that potassium is bigger so this is a hole that lets through big things but does not let through small things and to me that's a bit of a special hold how do you stop small things from going through while you let through big things so it's like you did? bigger so what happens here is that you have all these helix they're in high force so think of these like some sort of beams pointing in so they when an ion comes here an ion is just an ion any ion in reality will carry around what? water if you have a positive ion water will turn its oxygen towards the positive ion and this is a bit of electrostatic but the smaller an ion is the closer the water can get to that positive charge and then it's going to bind the water harder so the smaller an ion is with the same charge it's going to bind this water stronger so what these four helixes do is that they create they essentially create a binding site for the ion here and stabilize the ion so if an ion such as potassium which is relatively large that doesn't bind its water so strong so for potassium the potassium ions actually let's go with its hydration of water and is happy to be stabilized by these four helixes instead while the sodium is so small so the sodium the sodium can't essentially be stabilized by all four helixes at the same time so it's basically jumping between one of them at a time and it's also binding its water too hard so the sodium won't let go of its hydration water so the potassium ion as an ion it's larger but when the potassium has lost its hydration water it's really easy for it to go through the pore here the sodium doesn't use its hydration water and when the sodium has all its hydration water it's way too large and can't get through the cool thing with this is that this is so efficient because you can imagine that this is going to be really complicated high energy barriers or something that to the potassium ion this is as efficient as it was just a whole year the potassium ions are within one word of magnitude of the pure diffusion rate so the potassium ion doesn't even feel a bump it's just going straight through it doesn't see anything stopping it so in sodium ions do you think this protein happens to leak through? well this is nature there's always be some that have for 1 in 10, 1 in 100 1 in 1000 1 in a billion this is pretty amazing there is no normal machine we have more errors in computers and I won't talk about the program I'm talking about the hardware so you have these miniature machines that consist of a couple of thousand atoms but they have both efficiencies and selectivities that are insanely high and this is one of the reasons that I long term we would like to engineer things this way because this is going to be way more efficient than doing things with semiconductors water is also a very special molecule we're going to come back to water lots of times later in the week but I just want to there's this concept that water just consists of an oxygen and hydrogen and you probably don't think the water is being charged but I'm going to tell you that it is on average the oxygen has roughly minus 0.8 unit charges so it's almost like it's a negative ion here and each hydrogen has almost plus hot charge so water, while you might think of water as a normal liquid, water has tons of extremely strong bonds between hydrogens and oxygens and this is actually where you probably might have seen some of these classical experiments where you have a chamber or something and you can get what you can deflect the running water is that there are so strong dipoles in the water that when you have a comb and have charged that electrostatically by pumping your hair or something the dipoles in the water will attract the comb have you ever thought about how hard it is to heat water compared to ethanol or something ethanol is pretty much like water right almost the same density almost the same viscosity they're kind of similar as liquids it's extremely expensive to heat water it's like 5 or 6 times more expensive to heat water than it is to heat ethanol and the reason for that when you're pumping energy into water we have to start breaking these bonds and that's why water is such an excellent carrier of energy in demo it can store lots of energy but this is also going to be a really important provider of success we'll see later this week I'm not going to speak so the reason behind all this is we have to do with electrostatics interactions and we're going to talk more about specific interactions tomorrow but the special thing with electrostatics is that it's a super strong force and it decays very weakly this is one over the distance so if you take two unit charges separated by roughly one angstrom the energy for that is going to be in the ballpark of 300 kcal per mole is that a larger small energy? it's an insanely large energy the bond rotations might be 2 kcal per mole and these are the typical energies to see in a protein so that's when you can have electrostatic interactions protein or nature will do almost anything it can to fulfill electrostatic interactions because they're so good and equally, if they have the same sign you're going to get an insanely strong repulsion and that is the reason why points more for DNA was so bad doing bad things with electrostatic is pretty much a guarantee that it's a wrong structure they also decay as one of our which means that their long range and it was things throughout the molecule but we'll learn more about that later and that leads to the last concept that I'll bring up today the hydrogen bonds so what was the bond? what type of different bonds do you know when chemical bonding? so ionic ionic is a good part ion bonds are complicated because in a crystal they're really in a crystal they share the electrons but the second you dissolve say sodium fluoride into water it's going to be separated into sodium ions and chloride ions so they diffuse around which are typically having a protein if you have say one negatively charged group and a positively charged group we typically call that salt bridge they're extremely strong because they would be this pure electrostatic interaction the typical bonds in a molecule would be what we call a covalent bond and they're even stronger because it's electron resonance a protein will never break bonds during normal action that will require an enzyme or something to create to build a polycarphide chain but then there are these bonds that are they're kind of electrostatic like the ones I showed you minus 0.8 charge here on the oxygen then you have the hydrogen here with this plus 0.5 what of course happens here is that these electrons start to attract each other so much that they almost form a bond but not quite and I know this sounds horrible but in scale so this is much stronger than normal electrostatic interaction if you start to look at this it's going to look as if the bond is present almost all the time it's fairly rigid but we can occasionally break it in ice virtually every single hydrogen will be part of hydrogen bond and each oxidant will be part of 2 hydrogen that's where it forms bonds with 2 hydrogen and the energy for this this is something you need to know it's roughly 45 kilocalories per mole and compare that the energy for rotating a bond might be 2 to 5 or something it turns out that this is an energy range that's very important something that's 0.1 kilocalories per mole you know that's not even a speed bump I'll come back to why tomorrow and yes and the range of 4 to 5 start to matter and if it's a 500 or 1000 it's so large that we will never get over it biologically at room temperature these hydrogen bonds is the reason for DNA stabilization so this is one strand and this is another and every single base pair here goanine and cytosine for instance they form 3 hydrogen bonds between each other while adenine and bimine they form 2 hydrogen bonds and that's the reason why we have this specific so-called Watson-Crick base pair which is also something they postulated based on the model this is the reason why we have the genetic the transfer of genetic information we're going to have 2 copies and then each copy will recruit a specific type of molecule that pairs up with our hydrogen bonds so what happens if I said that these are formed in ice what happens when you melt ice so in water you have hydrogen bonds in water so I said in principle at absolute kelvin you would have at 0 kelvin absolute 0 every single hydrogen bond we formed you would have 2 hydrogen bonds per oxygen and 1 for hydrogen so at room temperature on average so ideally we have 2 hydrogen bonds for water at room temperature how many hydrogen bonds do you think we have but that's a good question let's see if we can show them yes it might be a bit hard to see here so this is water a simulation of water actually at room temperature it turns out that you have roughly 1.7 hydrogen bonds for water even at room temperature because there is such an insane amount of energy stored in these hydrogen bonds that nature will do almost anything it can to try to keep them fulfilled hydrogen bonds are actually the reason why most proteins are water soluble versus not water soluble has to do with these hydrogen bonds and I will show you later this week but it's not really until eventually when water turns to the gas phase then we will break all the hydrogen bonds but at any normal temperature first approximation we are closer to ice than we are to gas but almost all hydrogen bonds are fairly water all the time what nature will try to do it's going to try to rotate the water or the protein or the protein side chains so we will try to do almost anything we can to keep these hydrogen bonds because we gain 5 kcal per mole for every single hydrogen bond we can form or conversely we use that energy if we don't form the hydrogen bonds I'm sure they have yes this is a summary slide so we see in water here they are formed virtually all the time they are the reason we have the structure we have in DNA and they are the reason for the phase pairing in DNA but that's just a repetition I think we have 10 minutes remaining there I realize this was an insane amount of new slides for some of you at least but this is kind of deliberate I want to get to start reading the book how did the bonds in proteins we will talk about tomorrow so what I would suggest after what we are going to do first this was chapter 1 and 2 of this book protein physics do read through them in this particular case I think I covered virtually everything you need to know in the lecture and that's why I pushed through a lot of slides so let's try to skip through chapter 1 and 2 but also try to start looking at chapter 3 and 4 so you are a bit prepared tomorrow I can also talk questions I know Magnus my colleague he kept doing a course evaluation at KTH and basically the one recommendation from previous students that we keep reading the book and try to don't put off reading the book until 2 weeks into the course and everything the advantage is that it's a physics book so it's short it's no 100 page chapters but there are other 15 pages per chapter and if there are things that I haven't covered at all we will largely not bring them up so I can ask you about that I also to help you a bit I wrote down 20 I thought but it's only 19 study questions some of these things I've covered here some of I haven't and the simple reason I think the book describes all of this is described either in the book or things I've covered here if you know the answer to the vast majority of these you know the chapter and I'm going to try to do that for every lecture so you have something to follow there what I'm going to do tomorrow is that we're going to repeat the recovery some of the things I've lost over a little bit today I think we're going to look at these interactions charges and electrostatics in proteins a lot and what does this really mean and why do electrostatics take black proteins the way it does we're going to revisit our friends the hydrogen bonds and the peptide group but that's roughly half the lecture and the second half of the lecture we're going to start to dig in what does this that mean why do we see some of these things good to have things that have lower good energy and why is it bad to have things that have high energy and this has to do with a concept called the Boltzmann distribution in physics which is really a statistical mechanics we're going to bring this up in a fairly easy way so don't worry you're not going to have the insane amounts of quantum mathematics here and in particular historically we would derive this entirely with equations and then eventually when you have all these equations and proofs you can start to apply it so Bjorn and Darin have designed a very simple labs where we're going to try to just have things moving between different states super simple problem, you can even write it yourself from scratch and then we're going to try to observe the Boltzmann distribution instead and see if we can learn things by observing rather than deriving it but I think there could be a fun different way of approaching it but this is going to be an extremely powerful way to start looking at probabilities in general that in turn will be related to free energy, I will likely withhold both with entropy all the way until Wednesday but I might touch it a little bit tomorrow and then on Wednesday we will really go through entropy and free energy but I think that's all I had today you have any questions for me?