 All right. Good morning, and welcome to the course again. In particular, if anybody is just watching this online. So what I'm going to start doing today, I'm going to go through a couple of more general things that I didn't tell you before we started the recording. If there are any of you watching this online who did not sign up here today, make sure that you either send me an email or are here tomorrow, because you only have two, three days before the secretary gets upset if people are not in the course roster. This course is going to be a whole lot about molecules and very fundamental things about physics and biophysics. It's going to be a mix. Most of the things I like, because we use them in modern research. We're going to talk a lot about interactions and try to understand fundamental concepts and how these fundamental concepts cause higher level features in biology. For instance, why we can sequence genes, why proteins fall the way they do, why proteins work the way they do. I know many, if not most of you, took the bioinformatics course and the project course before this. I'm going to try to build on that. But since Arne gave you a great background on bioinformatics from the sequence and possibly structure prediction point of view, I'm going to focus more on understanding interactions and the molecules and also understand the biology of these molecules at a higher level. Since I also think that it's important for you to get some sort of education, there's this classical saying that education is what remains when you've forgotten everything you learned. I'm going to try quite a bit to talk about both the history of these fields and also history about how knowledge is created, how we create knowledge. There are a bunch of spectacular failures, both in science in general and these fields in particular. And it's interesting to go through them. Not so much to gloat at other scientists who fail, but of course we fail too all the time. And it's so easy to think that we live in the best of times and all the ideas we have right now, be it high throughput sequencing, crye and whatever, we think that this is so the right thing to do. And it's important to remember some 30 years ago there were people who thought that the things we now remember as great failures was so the right thing to do. It was the correct thing. But before we go through that, I'm going to give you a couple of snapshots about things that we are interested in. Do you know what this is? No, you don't, because you haven't seen all the time. I would argue that this is the world's smallest machine. You've probably heard about nanotechnology and these things in engineering, right? So when engineers speak about nanotechnology, they frequently mean nano in the sense that it's not quite a micrometer, so 900 nanometers. And there are certainly beautiful machines you can do there. But these type of machines really are nanoscale machines. So this is a small part of a membrane protein that is probably, say, two nanometers across. It's probably one of the most important proteins you have in your body, because that's the small part that's responsible for all your heartbeats. So every time you have a nerve signal or something, this small blue part in the background is moving up and down. We're going to go through all these things in more detail later, but my point here is that proteins are not merely static molecules. It's not just the fact that you should predict what the structure is, but they need to move to actually do something. And in this particular case, there is a very clear physical background to the motion. It moves because you're changing an electric field. This is another type of important protein. It's an ion channel, also part of your nerve system. And in this particular case, it sits on the receiving side of synapses so that based on small molecules binding up here, it's opening or closing to letting some things through down here. To me, it's a marvel of nature how this works amazingly specific. It's far more specific than any computer you could ever imagine. Mistakes happen, of course, like one in a billion times. And again, this is controlled by very simple laws of nature, physics. We understand fully how these channels work. No, actually, I shouldn't say that, but we understand the mechanism how they work, the details we might not necessarily understand. But we do understand that this, we can explain this from the laws of physics. There is no special magic biology involved here. You might laugh at that, but that's 30, 40 years ago. That was not so obvious. There were lots of us who thought, well, biology is special. You need not in the sense that there is some magic spirit in biology, right? That biological system works in completely different ways. There is no fundamental difference between these systems and a super small reaction in a test tube, apart from the fact that this is a million times more complicated, which is, of course, what makes it fun. And this is the reason why they are not just fun but important. This is what happens in your cell membrane inside your cells all the time. And again, I deliberately skip. I don't talk about what these specific channels are right now. We're going to come back to that much later on in the course. But the idea here is to give you an idea of the importance of physics and structural studies in biology. Your cells, most of them, are kind of like a gigantic battery. So inside your cell membrane, at least in all the excitable cells in your nerve system, you have a continuous flux of different types of ions through the channel. And most cells have a slight negative potential on the inside, roughly minus 0.1 volt. And the reason why you have a negative potential on the inside of a cell is that nature is making sure that you have fewer positive ions on the inside of the cell than you have on the outside of the cell. So it's simply this redistribution of ions that creates the potential on the inside of the cell. Our bodies use this literally as a battery, because it's not just a different distribution of positive and negative ions. We actually have a very different distribution of the positive ions. So typically, you have a huge number of sodium ions on the outside, but potassium ions on the inside. And then you have these magic channels, which are literally just holes in the membranes, but they're holes that are very specific. So you have one type of hole that only lets through sodium ions and another type of hole that only lets through potassium ions. So the way your nerve impulses and everything work is that the body selectively opens for one of these types. For instance, sodium ions flowing in. Then you're going to have positive charges going into the cell that will remove this negative potential. And when you remove this negative potential, your cell will actually interpret that as the start of a nerve impulse. And then a split second later, you're going to have potassium ions flowing in the other direction. And when the potassium ions flow in the other direction, you restore the balance. So then the potential drops again, and you've all seen this. It's the shape you have in an EKG, or electroencephalogram. Where do the body get all these ions from? So this is actually a very, it's a much deeper question than you think. Let me phrase this. Why do you have a membrane in the first place in your cells? Yes, well, that's the specific channel, right? But the reason why you need to encompass a cell some way, for instance, with a membrane, is that at equilibrium, you're all dead. If you mix anything into a test tube, you can't, the mixing process can be fast or slow. But eventually, if you just mix chemicals in a test tube, nothing will happen. Eventually, you will reach an equilibrium. And the problem is that if you just mixed all the chemicals in a human, eventually the human will die, right? Whether it's of age or because lack of nutrition or something doesn't matter, but eventually, we will die. So the only way we can sustain all these life processes is that you have some sort of distributions. For instance, positive ions in one place, negative ions in another place. So essentially, you save energy to be able to sustain processes later. And this energy concept, we're going to come back to it because it's so paramount in biology. You could even argue that biology is more about energy and how we distribute, generate, and consume energy. And that's the third part here. So the channels here, they're just holes. But the third part here is a small pump. So this pump is not just a hole, but the pump will use energy to move ions in the wrong directions. So if you just open a hole, you will equilibrate the distributions of ions, right? So that you eventually have the same number on both sides. But the pump can move things from a place where there are few ions to a place where there is already an excess concentration of ions. And the pump that does this is a, it's called an ATPase. I don't think I have a picture of it. Yes, there is ATP. How many of you know of ATP? Good, most of you. This is the energy molecule of life. And what these pumps do, there was Jen School who determined this in 1997, the structure of it. What this pump does is that it uses ATP to bind to this molecule. And then during a very complicated process that we will get back to later on in the course, it transports three sodium ions out, two potassium ions in. And it converts, in the process of doing this, it converts the ATP molecules and ADP molecule. So it uses the energy bound in one of these phosphates to transport ions across the membrane. And I need to put this intentionally upside down, because in all these places, the outside of the membrane, the outside of the cell is here and the inside of the cell is here in all three images. How frequent do you think your body does this? Yeah. So how much ATP do you think you use per day? Turn over, you wish. Roughly your body's weight. Sure a bit, but actually one of the largest uses of energy in your body is your nervous system and brain. And this comes back to this energy turnover, right? Because this is a way where mammals or vertebrates in general are quite inefficient. We frequently talk about bacteria as a lower life form, but you could also argue that a bacterium is a life form that has optimized everything to sustain the life processes. There are certainly some nice things we get from our nervous system, but it's very expensive to sustain. And that's why if you go through a bunch of other courses, there is so much in biology that really has to do about this whole energy chain. How do you convert energy? How do we turn food into energy? How do we store energy in ourselves? How do we later on use the energy to produce, well, for instance, nerve signals or in your brain or anything? We will go through all these things in detail much later on. But my point here is that you see how much this is about energy, physics, processes rather than just biological information or something, whether something leads to cancer. Just a second. But of course, when you have mistakes in these processes, they are frequently what leads to disease. A small error in one of these molecules might lead to that this pump is not really working. If the pump is not working, you're not gonna get the ions you need in your nerve system and you're probably gonna get a pretty horrible neurologic disease. Yes. Oh, that's a good, so let's see here. So the magnets, so the ATP is basically an ion, right? So I think you need the ATP itself would be two-fold negatively charged. And normally, so the magnesium here, I think is bound to the ATP. Let's see, or is it a basic? The magnesium is not involved in the ATPase itself. Let me check that, but I'm fairly certain that it stays bound to the ATP during the entire process. I'm gonna go through lots of these proteins throughout this course, but the point is that there are a ton of them. There are proteins that help you catalyze reactions. For instance, splitting proteins in your stomach. There are proteins that are responsible for regulating what genes that are expressed versus not. There are proteins that determine who you are, skin, hair, everything. There are proteins that transport different things in your body. Proteins are basically the work courses for everything you do. If there is something telling you something about biology and there is some molecule doing it, guess that it's a protein. That's a very safe first bet. Biophysics in general and this course in particular is very much about understanding how these complicated molecules work from simple principles. And the reason why the simple principles are important is that it's usually something that goes wrong influencing the simple principles that leads the disease. We're gonna try to explain these complex processes from atoms. We're gonna spend a lot of time understanding macromolecular structure. That might also seem boring. I can imagine that many of you went into this field because you wanted to cure diseases or help develop new drugs, right? Why should you understand the structure of molecules? Well, unless you understand the structure, you're not gonna get anywhere. So if you start a modern drug development process in a pharmaceutical company, the first they need to do is what is the target receptor or receptors we are after? What is the structure of these receptors? How do they interact with other molecules? What goes wrong when we get disease? Can we restore that by changing how it works? And then you start designing a drug to do that restoration process. And that's also why we're gonna, we're not gonna go through structure determination, but we're gonna talk a lot about why the structure creates the function it does. We're not gonna speak so much about prediction because Arne already did that, but we're gonna speak a lot about models. Later on in the course, you're gonna do a little bit of computer simulations, but my main idea with the model is that there is a very underestimated tool when it comes to model, and that's paper and pen and thinking. There are a surprisingly large number of difficult problems that you can solve just by sitting and thinking hard about it for a couple of hours. And this will likely also be one of the courses where I'm gonna force you to do a bit of math. The point of that is that thought that we're math fetishists, but math is actually a very useful tool to think about some equations. I know that many of you might have a background more in life sciences. The advantage with equations is that they can help you. When things get too complicated to try to hand wave about, the equations is what makes it possible to solve problems anyway. And that, of course, this entire line of thinking has undergone a revolution as we've done more and more computational models and simulations. I already think I spoke a little bit about protein structure. Occasionally, you can say that this is just a protein structure course. I will deliberately try to include a little bit more information about DNA and RNA because that's important to our friends at the Keralinskja. And there will be some applied courses there in the last semester. But fundamentally, virtually all the features that describe proteins, they also apply to DNA, the fundamental interactions. The structure is a bit different and I will go through that, but the fundamental questions are the same. They're not different between different classes of molecules. And the other thing that you should really think about and what I like with this topic is that, to me, this is a mixture of not just chemistry, biology, and physics, but also part computer science, thinking about information. Before I really get started on, there is some course meta-info. Stockholm University has a habit of running courses at 100%. For your sake, don't fall behind. I believe very much in freedom under responsibility. I'm not gonna require that you're present on the lectures. That's entirely up to you. If you prefer to sit at home and watch the lectures, well, if that's great for you, that's great for you. If nobody shows up here, I will eventually stop coming too. Same thing there. If you sit at home for three weeks and don't follow lectures, don't expect that I will spend half an hour going through all the things you missed. If you go skiing for a week, I trust that you will keep up with the course yourself and read up on the things you missed. Same thing with lecture notes. Don't come and ask me about the lecture notes tomorrow. It's not because I'm nasty, but I don't want to carry five lecture notes from the last week too. So I will bring the lecture notes if you want them for the same day. If you want the previous ones, they're available online to download. We typically, I don't think we average quite four lectures per week due to the holidays and everything here. This is probably around 3.5. I am well aware that that's a lot of information. I typically get to 80 slides per lecture or so. So that's 300 plus slides to go through. There will be computer practicals in the afternoon. Björn and Dari are not friends of having gigantic long lab reports for you to write. In some cases, I even know when there are few students, they occasionally even let you skip writing those lab reports if you actually listen through and it's obvious to them that you understood everything you do. Same thing there. If you can't be part of a lab, let them know. They will let you do the lab yourself, but if you're doing the lab yourself, we expect to get a report handed in to show that you understood the lab. So yeah, freedom under responsibility. Yes, that's up to Björn and Dari. I thought previously, again, last year there were so few students so that they, if they have time to talk to you and you understand the lab, they might even say that you don't need to hand in a report. But I like freedom under responsibility for my own team too. So they've done a great, they've actually developed those of these labs. So I will hand that over to them and avoid micromanaging them. But if anything, they're gonna be super short reports per lab. There is an Easter break from, you can actually go away already Saturday, April 8th and you would not have to be back until the evening of Monday, April 17th. I will not have any engagements. Actually, there will be a class here. There will be a spring school for praise when we talk about simulations that you might be able to join at KTH if you're interested. But I'm not gonna post anything new for you to do in the course in this timeframe if you wanna go away. We tentatively hold the last lecture on Thursday, April 27th and that will probably not be a lecture but course repetition. And then the written exam is on Friday, April 28th and then the Monday after that is actually a red day. So you get at least a three-day holiday there if you wanna prepare for the last course of the semester. And then there is some stuff. I'm not gonna go through this since you have it in the slides. Information of the books as I told you about and I'm gonna try to provide you with lots of extra reading material because some of this stuff is actually quite fun. And in particular, I love scientific papers. So that for every lecture, there will frequently be three or four scientific papers. I don't expect you to have read them in the sense that I'm gonna interrogate you on the contents of the papers. But I pick these papers quite carefully. We typically pick papers that are like four or five pages. They're very easy to read and they're good reading because it helps you understand something. It's not gonna be complicated. So in most cases, it might very well be easier for you to skim through the paper to help you understand the concepts. Many of the authors are usually very good too. So today I'm gonna talk about the basic concepts. Some of this might be a little bit of repetition for some of you. And what you will see throughout this course, occasionally you're gonna get irritated with me because you're gonna think that I'm repeating stuff I told you three or four lectures ago. The caveat is that I'm not. Because this is complicated, we're gonna start with some even not quite mathematics, but physics, you're gonna get on to more complicated things. I frequently try to layer this. So today, we're gonna start looking at, for instance, amino acids, DNA, RNA and proteins from the most simple structural point of view, just the atoms. Then later, either on Friday or next week, I'm gonna come back to this again. But then we're gonna start to look at this from a free energy point of view. And then we're gonna start to look at from a structural transition point of view. So we will gradually move to a higher and higher level in the way we approach these things. And that's why it's so easy. Sorry, that's why it's so important that the second or third time we go through this, you need to understand the basic stuff because then I will assume that you know all the basic stuff. We're gonna talk about the basic physical properties mostly today, architectures of proteins unfolding. And I'm gonna have a little bit of time to get into the elementary interactions of proteins. And then I'm gonna start hitting you with just a little bit of physics because biology is much more physics than you think. It's just that most of us hide this from you. And I'm not sure whether you're gonna be happy or devastated here. Björn and Dar are gonna start having you do statistical mechanics in these lectures. It's not any statistical mechanics that you would take in physics, but you're gonna sit down with paper and pen and try to understand physics. I think I love the first two labs, but it's gonna be a bit of work. Most of you are probably aware of protein structure. Have you seen this way of putting it for a sequence to structure to function? Do you know what it's called? The central dogma. Yes, it's closely related to the central dogma. And the whole idea is that the sequence is basically what you have in DNA that transfers information eventually over to protein sequences, amino acids, that gives proteins and structures that eventually creates function. This is as true for a simple protein as this. It's called the villain headpiece, as for a gigantic ion channel or the largest complexes you can imagine in the cell. There's no fundamental difference. Well, one difference is of course that you might, if you are in a ribosome, you might have 60 different protein chains with horribly complicated to understand, but the fundamental interactions that determine how they work, they're the same. But for that reason, it's gonna be, trust me, it's gonna be easier to understand the fundamental concepts from that molecule than from that molecule. So unless you have any objections, I would suggest that we try to start with these molecules to understand it. It's gonna be easier, but before you know it, we will get here. So you are, after the bioinformatics course in particular, I so hope that you are familiar with amino acids. And I'm not gonna go through all the details of amino acid that Arna and others have been covering, but I will try to introduce you to another way of thinking about amino acids. So amino acids are just small, you could call them stupid molecules. They're not really, chemically, they're not really super special, they just happen to be very important to life. How many amino acids are there? This is a trick question, exactly. There are tons of them, there are probably hundreds. But you don't ever, ever hear about those hundreds because they can be super complicated. The first ones, the only ones you will ever see are the alpha amino acids. And the alpha amino acids are the one that has a central alpha carbon in the middle. They're pretty much the only ones that are interesting to biology. And then you also mentioned something important. There are the essential alpha amino acids, which we typically say 20. There are a couple of metaiocelanine and everything, seleniumetaianine, but around 20. Technically, not all of these are amino acids, but the point is that there are a small number of molecules that are very important building blocks for all biological information. These are important for a couple of reasons. We'll get back to that, sorry, we'll get back to that in a couple of slides. You probably know, well, you know everything about these from a bioinformatics point of view. So I will just let you know about a couple of things that will be important later on in this course. These are so-called zwitter ions. Do you know what a zwitter ion is? Yes, so it's really an ion, it is an ion in solution, that you have a positive ion here and a negative ion here, and that means that the net charge in most amino acids is zero. If you look at these partial charges in water, it's pretty much plus and minus one, respectively. They are also stereo isomers, and that's the type of stuff that you might have studied in upper secondary school. I'm not sure whether you remember it. This is of paramount importance in biology. Did you remember this? And you know what it is? I didn't hear everybody saying yes, so let's go through it. Any chemical compound where you have some site where you have four different groups bound to it. Depending on the order you put these four groups in, it's gonna be impossible to take that molecule and rotate it to the molecule on the right. You can mirror them, but there is no way you could rotate that molecule and do that one. Normally you don't care about this at all in chemistry. They will have different physical properties the way you bend light and everything, and that's probably where you saw it, but normally the way chemists say that stereo isomers, you can only separate them physically in chemistry. They behave exactly the same way. That's not true in life science. It's true for a single molecule, but when you start building very large, complicated structures of these, such as proteins, they need to be compatible with each other. So it turns out for some reasons that the only amino acids you produce in your body are so-called L amino acids. The other type would be D. We have no idea why. Actually, I wouldn't even say that your genes code for L amino acids, but your cells make sure that the only amino acids that we extract from food and everything, biology is really about L amino acids. There are no D amino acids normally. If you started to introduce D amino acids, they could not be incorporated in proteins. They would not bind to the large molecules we have and everything. So here's actually an example where the stereochemistry pretty much separates biology into two completely different parts. You could imagine a different world where every single amino acid was a D amino acid. That would work great, but you can't mix L and D amino acids. Won't work. And they won't convert spontaneously. Actually, they might, but it would take a billion years. And before that, you're dead. Yes. Sorry? Yes. So what about glycine? Is glycine a stereoisomer? Because it doesn't fulfill this one, right? And while you're there, and that's actually the one out of the 20, that's not chiral. I'll let you in on another thing on this one. There aren't 20 amino, there are not 20 amino acids that are essential. Everybody has been lying to you. Amino acids, you can represent the amino acids in a ton of different ways. These are different physical representations, how you draw it. But possibly from our point of view, it's likely more interesting to think about different properties of amino acids. Yes, I have a slide about that. And you can classify them in a tons of different ways. There are some that are typically positively charged. Arginine, histidine, lysine. Typically negatively charged, aspartic acid and glutamic acid. Although that will depend on the pH. We'll get back to that two later in the course. There are amino acids that are polar, but not really charged. There are amino acids that are, have large hydrophobic side chains. Some of them are aromatic groups. And there are some special amino acids, such as cystine, glycine and proline. You can also classify amino acids as being large or small. That depends if you're gonna pack something in a tight pocket, you might not have room for a large amino acid. So that in bioinformatics, you introduce this from a concept of what amino acids are most likely to be mutated into another. But the alternative way of thinking about that is, of course, from a physics point of view, what amino acids are compatible with each other. If you have an aspartic acid, it's probably not gonna be too unlikely that you can mutate that one to a glutamic acid and the protein would still work. But taking an aspartic acid and mutating that to, say, a valine is a completely different property, right? So it will likely not work. And we'll come back to that later on in the course. But both this image and everything else is a bit of a lie. This amino acid, proline, is not really an amino acid, it's an amino acid. Pure for fun knowledge. I promise I will never ever ask you about that. But it can be fun to know that because you don't have proline, you have this ring where the carbon and nitrogen are first bound directly to each other, but then the whole carbon ring goes back to the nitrogen. But technically it's a slightly different class of chemical molecule. Everybody makes that mistake all the time. And I, too, will call proline an amino acid because for all intents and purposes it is. It's just not strictly. So what determines, there are 20 of these. What determines when they're produced or not? Yes. I was a bit worried here that you had forgotten every single thing about the bioathematics course. So this is genetic code. Friend of order would see a problem here. Well, the problem is kind of illustrated there. You have 64 combinations of triplets, but there are only 20 amino acids. So which amino acid do you think is most common? Or in general, which of these amino acids do you think are more common in proteins and which do you think are pretty rare? It's as simple as that. Take notice, this is one of these questions I kind of like asking on tests because it appears nobody ever manages to remember this. And they come up with super complicated answers about evolution and everything that this must be caused by selection of protein, stability or something. It's super simple. Amino acids that are coded for by many combinations of triplets are gonna be more common in nature. And it's cool that the abundance of amino acids and all proteins, no matter what the species are, no matter what they're related or anything, it's pretty much exactly proportional to how many triplets code for them. So this is likely also nature's way of regulating that small, simple building blocks like alanine or valine, we need lots of them. And then it makes sense to let them be coded for by many building blocks. But tryptophan, for instance, we don't really need a whole lot of it. It's complicated and it's only used in special cases. So you only need a single codon for it. Who discovered this? Yeah, when? It's newer than you think. Roughly, it's a bit over 50 years old. Prior to that, we had no idea. And of course, even I wasn't born in 1960, but it's not that old. My parents did their PhDs roughly at the time when this was discovered. The reason why we still like amino acid so much is that they have one very special property that they can polymerize, just as plastics. So these C008s, the carboxyl group here, one amino acid and the NH group on a second one here, they can react with each other and splice off an H2O molecule and then form a so-called peptide bond. This was discovered already in the early 1900s by Emil Fischer that they are, sorry, the proteins, sorry, actually the polymerization was probably discovered early on that, but Emil Fischer proposed that proteins are indeed polymers. Note the time frame from 1906 to 1960, something. It took us 60 years from the point where we understood that proteins consist of some sort of simple polymers, amino acids, until we actually found out how they were constructed. So even, you might think that the 1964 is a very long time ago, but you are closer to the discovery of the genetic code than the time it took between the genetic code, sorry, between understanding that proteins are polymers and actually discovered the genetic code. Science is not as fast as you think. And this polymerization is what gives the proteins a unique sequence. So proteins are, as I hope you're aware, the protein is different from plastic. If you pick anything, well, the pen I'm holding it. If you just look, if you forget everything that's red or if you just look at the backbone here, that's not really too different from plastic. It's a simple polymer that's just repeating. The way proteins get specific proteins that all the specificity are in these side chains. And as you will later on see, this is gonna give us kind of a lock-in key feature that you have all the features we need here depending on what side chains we pick. The person who discovered this was Fred Sanger. And Fred Sanger sequenced the very first peptide insulin in a very famous paper in 1952. That's one paper I think you should look at at least. Just seeing the paramount importance of seeing a sequence and how he presents just one sequence and that it's possible to determine the sequence. And today, this is something where you sequence entire human genomes in a couple of days. But the fact that it's possible to determine the sequence and that it is unique and specific to the protein took almost 50 years after the fact that we knew that they consisted of amino acids. I won't really go through the specific chemistry of those peptide bonds because I don't think that they're too important for you. But as you hopefully know, is that there is a huge span of different proteins from relatively small proteins up to gigantic structural proteins that consist of thousands or tens of thousands of different residues. And if you look at a ribosome, a ribosome here would have something like 60 different polypeptide chains. We are later on in the course gonna go through a lot of these proteins. There are some proteins that with typical fibers that determine the structure in your body. There are a bunch of different water soluble proteins that are the ones we'll go through next. And then there's my particular love because we work a lot with them research-wise, membrane proteins. Membrane proteins are important because they're really, as I mentioned before about the importance of this compartmentalization in biology, membrane proteins deliberately break compartmentalization. Occasionally, they allow something to go over a membrane or they deliberately send a signal over a membrane. And that's at least why I personally find them so intimately related to biological function. They're not, all of these are of course important but I have a particular sweet tooth for these ones. We're gonna talk a lot about assembled proteins later on in the course. Both iron channels, but also something like hemoglobin. This is a small part, the prosthetic group, protoporphyrin, that when it binds an iron, we call it a heme group. And that iron is what makes your blood red. It binds the oxygen in your blood. Super cool molecule, one of the first ones to be determined. All your muscles are made up of a molecule called titin. I will go through this too later on. The reason why I'm showing you this is that you probably think of a small protein as a one small molecule or something, right? Or one sequence alignment. Just look at the hierarchy of these things that you have small proteins where each protein has some sort of feature that it can move, that is a bit of flexibility bound in. On the scale of a single protein, this looks almost boring. Yeah, so what if you can have some beta sheets that can move away from each other a little bit and then you have a restoring force? It almost looked boring. But at this fact that you essentially have a small spring in the structure here that then allows you to build larger and larger and larger structures and eventually get up to these layers that depending on whether you apply an electric potential or not, you can cause the entire thing to either contract or relax. And that's of course how you get the muscular activity. So you're gonna see a ton of biological processes throughout this course that what your high level biological function, you can drill down, it might be one, two, three, occasionally 10 layers. But eventually you get down to very simple physical processes that determine whether something will contract or let through an eye. So for a very long time, we didn't really know much about proteins. We thought we knew much more than we did. If you look at the small single domain protein like a hemoglobin, they're relatively hard, not quite like a rigid ball or something, but they're somewhat easy to determine a structure of. And the larger, most of the proteins you see if you search for proteins online or the ones even we show you in these lectures, they typically belong to this class of proteins. These are the proteins where it's easy to predict their structure. It's relatively easy to understand why they work the way they do. If there is some mistake with them, we've usually been able to understand and solve these mistakes. These are all the other proteins. And the danger here is that you as I, we're frequently led by the things we see and we think the things we see are the full truth. And we don't think about the things we can't see. And in proteins, that's in particular large multi-domain proteins, large assemblies, things like an entire nuclear, poor complex with hundreds of different proteins that determines what can go in or out through the nucleus of a cell. What is that determines the shape of your cells? And of course, if we don't know what they look like, we typically ignore them. And then we think we understand everything about proteins because proteins are small and simple. The dangers of course, the larger and more complex they are, the more likely they are related, problems and they will lead to disease, for instance. And the more likely that they are very sensitive to misfolding or say moving between different conformations. And we don't really, for most of these things, we don't have any good experimental methods to study it. It's not like you can't take a movie of a protein as it's opening a channel to see what happens. So what if an ion channel is open a little bit too much or a little bit too little? Today we know that we can typically know what type of disease it leads to, but why this disease happens, we don't know. And I would argue in many of these fields, understanding proteins and structure, we are roughly, Watson and Crick were in 1945, that we start to think that we know biology fairly well, but we have no idea what's around the corner. All this is very much physics. We're gonna go through tons of it in the course, but to be able to understand that, the first thing we're gonna need to do is understanding protein structure. And this is one area where the book is so completely outdated, which I think is great. All books are outdated here. We always present physics, science, biology as something that's finished, yes. Sorry, just one question, what do you think? That's a great question. This is something that's soft, right? This is something that's hard. So what's the difference? The both consist of molecules. Yeah, but, and this is cotton, which is bunch of small different molecules. Cotton actually gets a structure from the small, simple chemical bonds, very simple to the ones we're gonna go through. But the way they get these different properties has to do with how they're connected internally, how they interact. If you just see a snapshot of something, for instance, if you just see a snapshot of a ball, it can be very hard to say, is that gonna be a soft ball or a hard ball? If you can touch it, you will know that it's soft, right? But if you only see a snapshot of it, will it move or not? I'll come back to that in a second. Because it's very much related to x-ray crystallography. X-ray crystallography is super easy when you're a physicist. It's the way to determine structure. And you can determine a structure of super complicated molecules such as sodium chloride. I've done that myself, probably almost 30 years ago. It's very fun. I think it took us three days in the lab to determine the structure of sodium chloride. It's a perfectly rigid lattice. You only have to measure the distance between the two atoms and the periodicity of the lattice. Trust me, this is hard. It's super hard to do for sodium chloride. But the amazing thing with x-ray is that the way they work is that by using a large x-ray facility, such as the one in the South of Sweden, this is Max4, you, x-ray have a very, very small wavelength. And when you shine x-ray on some structure that's not just small and random, but there is actually some sort of regularity in here. For instance, because it's a crystal and copied billions of times, you will get a very specific diffraction pattern and the location of all these dots will be determined by the distance between all parts and the molecules. For instance, a protein or a sodium chloride crystal. So for a sodium chloride crystal, there are only a few dots and that makes it possible even for somebody like me to understand this with paper and pen. You can imagine the number of dots you would have for a protein, right? But there, at the time, there were no other methods that really could determine structures on these scales. And that's why all these, this entire revolution really started from x-ray. Do you know what this is? Sorry? Yes, and what diffraction pattern? Yes, it's a super famous photo. So this is one of Rosalind Franklin's first images of, I think it's, I think it's, it's definitely the D structure of DNA. I think it's photo 52. And this is Linus Pauling's notes in the end about the specific crystallography symmetry groups that I identified and everything. And there's a very special cross pattern here that has to do with the way the repeats you have inside the structure. Could you determine what DNA looks like based on that one? So that, let's be generous and say that you have, you might have 20 data points here. So I think you can start to see the complexity of the problem, right? So there were a large number of groups that spent a huge amount of time on this, in particular one of them being Linus Pauling. Have you seen Linus Pauling's proposed structure of DNA? Ha, good. This is one of the things where things went wrong. And I'm, so Linus Pauling is probably my largest hero in physics and chemistry and biology, he's a superstar, amazingly smart and nice person who died some 15 years ago. Linus Pauling was super smart because he was one of the people who very early on realized that these patterns means that, this typical X pattern means that there is a repeat and based on this repeat, you could even say that it's, it has to be a helical structure, something that's helical. So Linus came up with a great idea that you have a backbone here that's sitting on the inside like a spiral stair and then you have all the bases of DNA sticking out to the side from the spiral stair. So this is published in the proceeding of the National Academy of Science, Linus Pauling's proposed structure of DNA. I forgot what the year was. So it's not the worst scientist, it's like two Nobel prizes, kind of okay. It's a completely wrong structure. And of course at the time, I'm convinced that Linus Pauling was so convinced that he has cracked the problem. He has understood the, one of the most important molecules in molecular biology and he has solved the structure. But of course it was completely wrong. And the people who got it right, more or less by coincidence was Jim Watson and Francis Crick in particular. The reason why Francis Crick got this right, I would actually credit Francis Crick more than Jim Watson with this, is that when you have this few data points, there is no way you can, you can't really invert these data points and decide what the structure must have looked right, right? Today you might wanna do that with a computer, but we don't have the information, what do you do? So what they did is that they sat down first with paper and pen and try to come up, what is a possible model? What could the structure look like? And the great thing with models is that they're cheap. You can have lots of them. And then you say, based on my model, that looks for instance this way, based on all my distance between the atoms and then you need to get the ruler out and measure the distance between the atoms and note them down, what is the diffraction pattern I would expect to see? And then you compare my so-called predicted or expected diffraction pattern with what I see here. If they agree, your model might be right or there might be something you forgot. I bet that Linus Bohling's model probably mostly agrees here too, but there were probably some minor things he forgot. There was in particular one really cool thing that Watson and Crick realized that the basis in DNA, they always, the concentration of them in any sample, they match pair-wise between AT and GC. And if they match pair-wise, they should somehow be paired in the structure, right? And that's of course something that Linus Bohling's model completely failed. And to Linus' credit, when they published this paper, Linus Bohling was the first one to immediately hear that his own model is wrong versus correct. And that's how science makes progress. Make mistakes, making mistakes is great, but you should be the first one to realize when you're wrong and say that, hey, the other alternative is better. And this comes back to the whole thing with modeling. You think of models as computers, but this is really the core problem. Dare to have ideas. The way we present models in computers and everything is bad because we only show you the good models. And every time I make mistakes all the time, I sit down and guess something and then I give up two days later because I realize this won't work. And this is partly the attitude I wanted to take in this course, dare to make an assumption and work with it and see how far you get with it. And that's also where we're gonna start throwing a bit of equations at you later. Equations is a great tool to force you to make these assumptions. I'm gonna continue here for roughly 10 minutes or so because I used quite a bit of time in the beginning and then I'll let you have a bit of a break. You probably know a bunch of the chemical structure of DNA so I won't go through too much of it. If you don't spend 10 minutes, have you seen this compound chemistry thing? It's an awesome website. And if you wanna have, if you wanna cheat, if you wanna learn a new topic in two minutes, go to compoundchem.com and for instance, the chemical structure of DNA. This small style tells you everything. What are the contents? What are the main things? And how does it work? Boom, you know everything about it. And there are tons. I can see if I can look up a couple of more for these scores. There are other molecules too, RNA, which is almost just half a DNA molecule. I guess you've all seen and heard of RNA, right? So I won't go through the details there. RNA is fundamentally different from DNA because this makes the molecule much floppy. I said floppy. Floppy. The D, you can probably see that this structure is a bit rigid, right? Due to all the bonds. If you look at that structure, it would be much freer to move. And while DNA virtually always has some sort of long extended structures, RNA in practice might look something like this. So this is a small transfer RNA molecule. So you have a chain here and then it's coiled up and starts making hydrogen bonds back to itself. So you can even see all these pairs here. You can actually do a fairly accurate secondary structure prediction of RNA. So if I were to ask you a small question, if you look at these molecules, DNA versus RNA, which one of them is more stable? How long do you think an RNA molecule survives? Yes, but remember, if you're in a virus or something, you create a special environment for it and you pack it and you might even have, the virus might infect other cells and stabilize it. So if you've worked with RNA in the lab, you need to keep it on ice because otherwise it decays. So RNA has a lifetime of minutes. DNA, is DNA stable? Sorry? So for a very long time, we thought that DNA was so stable that it never ever decays. And did you see the Nobel Prize two years ago? So one of the things that people realized is that there was a student in Princeton at the time. The entire rest of the group was working on the stability of an RNA. And for whatever reason, his experiments didn't work. And he got fit up with this. Eventually, his studies had it with RNA. He was going to study DNA instead. And at this point, the entire group and the P, I guess, just sighed because this is stupid that we know DNA is stable. You can't just study the degradation of DNA. But he persisted and kept studying this and eventually realized that DNA, you can actually measure the hop life of DNA too. So DNA degrades too much slower than RNA, mind you. And today we know the hop life might be 500 years. But you can measure that there are random errors in DNA introduced. So basically the cytosine basis will turn into uracil by mistake now and then. And of course, 500 years of hop life sounds, that doesn't sound too bad. But you have lots of DNA in your body. And of course, if the hop life for each individual small piece of DNA is 500 years, the probability that there will be some mistake somewhere in your DNA in one of your cells, that's gonna start to increase pretty rapidly, right? So it turns out that DNA is not stable. It's the DNA actually goes under through damage all the time. And this led, this student who's actually my uncle, Thomas Lindahl, to realize that you need to have special repair enzymes for DNA. And if you don't have these repair enzymes, you would die in a matter of hours. So you constant, and this again, remember this use, so the body uses energy to have these repair enzymes find these bad uracils in the DNA, cut it out, and then you're putting in new bases by repairing this so that the DNA is pristine again. And eventually now that over the last four decades, people have found a bunch of these different repair enzymes, there are lots of different types of damage, you can even cut off the entire spiral here. And this is something that's increasingly leading to cancer drugs. Because frequently what happens in cancer is that not that the DNA is damaged, but since the cancer cells undergo so many more cell divisions, you can actually try to actively damage these processes. Because if you destroy the normal repair processes, that sounds really stupid, right? But because the cancer cells are so much more sensitive, normal cells have an easier time to survive this, but because cancer cells are more sensitive to this, you're basically gonna hit the cancer cells more than you hit the normal cells. So Thomas got the Nobel Prize for this in 2015 together with Paul Modric and the Sysankar. It was pretty fun. The discoveries that go back to the 1970s. But DNA is not everything. We're mostly gonna talk about protein structure and I'll let you have a break in three minutes. This is another superstar, Max Perutz, and the molecular hemoglobin. So you see that they have metal rods here that he's placing atoms on. And the reason for this, you constantly need to change your model, right? You didn't have computers at the time. This is like 1960s. And then you needed to be able to move an atom just a little bit up or down and measure the distances between them. It took them 22 years to determine the first structure of a protein. Can you even imagine that? They started in the 1940s. And it's one thing that it takes 22 years for you could spend it 22 years on it because you know that somebody already did it and it worked. Could you imagine starting any project now and realize, no, you're not gonna see the result of this until 2039 or whatever. And you're not even sure whether it's gonna work or not. The risk, it's insane. But of course, they managed to determine the structure of hemoglobin. And John Kendra at the same light managed to determine the structure of myoglobin. This is really the structure of DNA, the first proteins, and together with a couple of other things such as the bacteria of eggs and everything. That was really the birth of modern molecular biology as we know it. And molecular biology from the start was actually very much x-ray and physics and understanding processes. So frequently we credit this to Watson and Crick, but it could be important to tell that. Do you know what their backgrounds were? Yes. Yes, Crick was a physicist, and Watson? Sorry? He was an ornithologist by training, actually. I mean, you can laugh about this, but I think it's also important to realize science always makes lots of sense when you look in the rear view mirror. But science is not made in the rear view mirror. Science is made going forward. Most of the fun new discoveries and topics that we are gonna define the next two, three decades, they don't exist yet. So don't let that dissuade you from going there. I think that I've held you quite a little bit. Let's take our 15 minutes of break and we can be in here at, let's say, 10.35, and then I'll try to go through the last 30 slides for today. Good, we talked about structure. So what I just told one of you here at the break is, as a physicist, it's very easy to think that a problem is simple in theory. And this problem is simple, even though it took me three days, it's something a computer could do at 0.1 seconds, right? But can you imagine going from these two atoms to something like that? And that's of course why it takes 20 years. It's an insanely difficult problem. The problem is so difficult that anybody who's sane should simply decide that it's impossible, we can't solve it. The other problem is that even if you get all those, what you effectively get with X-ray crystallography is that you do a Fourier transform of the entire protein. So that instead of representing this as positions, you represent this different type of periodic distances, wave vectors, represented as a function of trigonometric functions. Now, if you had all those trigonometric functions, it would be easy because then you could just do an inverse Fourier transform and go back. So are you familiar with Fourier transforms? So the Fourier transform is that if you have any type of signal, say an audio signal, you can represent this signal as a function of time. An amplitude as a function of time, how the signal happens can be anything here. Or if you're a musician, you might say that, well, you know, this signal is a combination. You have one note that's 440 hertz, a second note that's 600 hertz, and then a, well, second harmonic, whatever, 1720 hertz or something. So there are three important waves. So rather than having a long complicated signal as a function of time, you could also represent this signal as a function of frequency space. So instead of frequency, we have one frequency here, one frequency there, and one frequency there. And this is general. You can always transform back and forth. What you're listening, what you're hitting on the keyboard is basically you're hitting three notes, what you're hearing in your earphones would be this. But physically they're equivalent, just different ways of describing it. And same way with a protein, you can describe this protein as that, the position of this atom, you could also describe this as a sequence of different periodicities, frequencies. In particular, because in a crystal, you don't just have one protein, right? You're gonna have trillions of them. Think of this as an infinite sequence of proteins. This sounds like a very complicated way of describing a protein, right? The reason why this is so powerful is that when you're still recording with these so-called diffraction patterns, is that what you're gonna see on these slides, that's the Fourier transform of the protein. The only problem is that you don't see the entire signal. And this gets a little bit complicated mathematically, but if you take anything that consists of real coordinates, coordinates in space, x, y, z, and do a Fourier transform of it, you can end up with complex numbers. So every single one of these three-dimensional frequencies is gonna be a complex number, so you have a real and an imaginary part. And you can describe that as a circle, really, sorry, as a vector in a two-dimensional space. And another way of describing that is that you have a length of the vector, the amplitude, and a phase that determines how much complex versus real is it. The problem is that the second you record this in the experiment, we lose the phase. We can't record the phase. So you will only see how long the vector is. I have no idea. There is an arrow and it's three centimeters long. We have no idea what the direction it's pointing at. I can't take that arrow and go back. So the problem with this is that, although it works beautiful to take a structure here, if you have a model of this, you can predict what should the x-ray crystallography pattern and diffraction pattern look like. That works great. But you can't take the diffraction pattern and go back to the model because you throw away half the information. And that's why people need to spend all this time on models. But still, this is what we have been super good at this and people have done this for 40 years. This is the cradle of structural biology. If you want to get a Nobel Prize, go into structural biology. There is no, I'm serious. There is no other field that is close to the number of Nobel Prizes of structural biology. And the reason for this is that prior to having a, basically, look here, blank, if I ask you to understand how binding of oxygen in your blood works, you're gonna be hand-waving. Once you have this, you can start to make very detailed models exactly how the binding works. So when you have a structure you have something to work with. You can start describing the processes. And it's not a coincidence, this was another Nobel Prize. So virtually every one of these important structures are gonna show you the result of the Nobel Prizes. Membrane proteins. KCSA, I was a PhD student at the time, the very first iron channeled airman. Rod McKinnon got the Nobel Prize for this a few years later. You go from nothing in blank paper and to actually knowing how something works. Beautiful voltage-gated iron channel. That's actually, they got determined this for several years after getting the first Nobel Prizes. Peter Ager shared the Nobel Prize with Rod. This is water channels that regulate the osmotic pressure inside yourselves. I think, well, yes, 2003. So we're gonna come back to structure in a second. There are not just proteins. If we take a step back and see where do we create these proteins, it's gonna be an entire sequence. It's gonna be kind of a Nobel Prize parade, but it's fun. The first thing that we use to extract information from DNA to the messenger RNA is a small enzyme called RNA polymerase that reads the RNA and copies the information to DNA. Roger Kornberg, got the Nobel Prize for determining it. That's actually fun. I was a postdoc in the lab at Stanford right next to Roger. Mike Levitz, when they determined this. And of course, a few years later, Nobel Prize. All of these things are so obvious once you know the structure. But again, prior to that, we just know, yeah, there is some sort of enzyme that somehow reads DNA. We have no idea why, related to tons of the diseases, but you don't understand anything. The second you get into this RNA, we're gonna need a ribosome that takes the messenger RNA. I love this older because this is from Gunnar and C. White. This picture is only some 20 years old. And the reason why this is this simple is at the time we didn't know exactly what a ribosome looked like. So you drew something schematically, a large blob and a small blob, the large and small subunits. Now, of course today, we know what the structure of the ribosome looks like. It's a super complicated structure with both tons of DNA, sorry, both tons of proteins and RNA. Tom Steyts, Peter Moore, and Vinkie Ramakrishnan got the Nobel Prize for this too. I can't even imagine the complexity of trying to determine a structure like that. What takes your crystallography? Signaling, G-protein coupled receptors, Brian Kubilke, Ray Stevens, Nobel, no, sorry, Ray Stevens didn't get the Nobel Prize for this. It's actually fun because almost these slides are old. And when I first made the slides, Brian hadn't got the Nobel Prize for it yet. Today, we have dozens of these and even this picture is a bit old. There are a bunch of them up here that we have structures of now. There's probably 75 plus now. We're eventually gonna understand everything about signaling in the cell. When I was your age, we thought that DPCRs are gonna be so hard to crystallize that there will never ever be structures of them. Whenever somebody says that, it's wrong. We will eventually understand what's happening. And the amount of, there are pharmaceutical companies that have spent billions of dollars. In determining structure of these because it's gonna be resulting so many useful new drugs. So given that, you would assume that X-ray crystallography is not just important, right? But it's the thing to spend your time, money, effort on in this field. There are some other small methods such as NMR and everything, but they can't really compete when determining structures. And this was the case until just a few years ago. And science, science time to move in a step-life function. We think that nothing is happening. Nothing is happening. And suddenly there's a small revolution. And one of these revolutions happened roughly in 2011 called cryo-electron microscopy. This is actually, it's a nature paper about two years ago. Have you heard, have you seen, there's a poem and a song act called The Revolution Will Not Be Televised. So this was nature, nature did a fun blurb on this. I think it was September, yes, September 2015 that the revolution will not be crystallized. Well worth reading. I think it's available on the modest side. So what you do in X-ray is that we shine an X-ray beam which is photons on a sample, a small sample here. And this sample will lead to scattering interference just like the one you see with water waves if you're dropping multiple things in water. And based on the interference pattern here, that's going to result in all these dots you see on the film or detector and that we then try to use to go back and determine the structure. Another way, which is actually quite old is that we still have a beam, but in this case it's not photons, but electrons. So ideally what you really would like is what if you had the world's best microscope? If you could just have a microscope that was so powerful that you could see your proteins, right? Why can't we do that? If you forget about all the shortcomings of technology and everything. So what's the problem with that? Sorry? Yes, so what's the wavelength of light? 500 watt? Nanometers. How large is the protein? Yes, or a couple of nanometers if you like that. So you can't study something that's 50 times smaller than the wavelength of light. There was a Nobel Prize for this too. There are actually some tricks that allow you to go a bit longer, maybe to 30 nanometers or something with super resolution microscopy, but you're still in order of magnitude away from it. Light microscopy will never allow you to study proteins. But if you know a little bit about popular quantum mechanics at least, you probably know that there's this particle wave duality of matter, right? So if you take an electron and accelerate it fast enough, the wavelength of the electron will be inversely proportional to the energy. So if you accelerate this with say, 100 or 200 or 300,000 volts, you're gonna get a wavelength of an electron that is in the order of 0.1 nanometer. So you can use electrons as a microscope and that's the whole idea of an electron microscope. That works great in material science or anything. There is only one problem. If you've ever seen these pictures of atoms in the letters IBM or something, any picture of atoms you're gonna see is gonna be an electron microscope. They've been around for almost 100 years, but 50 at least. So why haven't they been able to just solve this problem with proteins? What do you think happens to your proteins when you shine electrons with 200,000, 300,000 volts on it? Yeah, you kill it, you can break the protein, right? Because these are sensitive biological samples. We're not talking about the level of metal. So that in theory it works great. The only problem is that you're gonna need maybe 10,000 electrons or something before you see anything. By the time you've shown the second electron on it, you've broken it. It doesn't work. So what you need to do is that you need to take the dose down by 1,000 volts. You need to have an extremely low intensity of electrons. So you maybe shine 10 electrons per square angstrom or something. And then you're gonna get images, but they're gonna be super faint. And since the protein is also gonna be in water or some sort of vitrified ice, it's gonna be noisy. So this was even, for a long time, we joked about this and called it blobology because all the guts was these rough blobs. Oh, we have an example there. That might be something you got from a membrane protein, not the blue and the red structure, but just the outline here, right? You could, if you were really lucky, see some rough shape. And what then happened is that suddenly there was a new generation of semiconductor and for 20 years, when I was a student, we spent like 10 minutes on cryeum just because all students needed to know about cryeum. And there's a second you had introduced that you pretty much concluded about saying as a teacher that, and it's in practice, it's pretty much a worthless method for biology. It's never gonna help anything. I'm exaggerating, but, and then you forgot about it and never touched it again. Nobody in the right mind would ever dream of using a cryeum structure for any structural or computational studies. There were two lower resolution. And then suddenly in 2011, there were two companies that announced a new generation of semiconductor detectors that pretty much brought the theoretical resolution from five angstroms or so to 1.7 angstrom. So you went from that to that overnight. And this has, in less than five years, the entire field has changed direction 180 degrees. Everybody is now doing cryeum. Thank you, Ramakrishna, who got a Nobel Prize for their x-ray studies of the ribosomes. There is not a single person in Venki's lab using X-ray crystallography anymore. Everybody in Cambridge has switched over to cryeum. We too have a brand new cryeum facility at SILAP lab that I'm gonna show you later on this course. This is so amazingly cool because the problem is that you're not gonna get, they're not gonna look like that. Would you in practice get out of these hundreds of thousands of images and we're gonna need to use insane amounts of computational processing. But the cool thing is that everything you know about structural biology is changing and it's happening right now. Which also means that something like this, that the TRP1 channel, this is a membrane protein that's a pain and heat receptor. And it's also the receptor, if you utility peppers, for instance, this is the receptor that gives you this burning, capsicin, the binds to it. We didn't have any structure of these receptors until Yifan and Cheng tried it with cryeum and then they got a structure of it. So all these channels receptors that we have never been able to determine structures of before because we couldn't crystallize them. Well, there is something missing in this picture. It doesn't say crystal, right? You don't need a crystal anymore. You just need a bunch of proteins in a sample and then we put it under the microscope. So there have been generations of scientists that specialized in crystallization and overexpression. It's not needed anymore. Sorry? Here? So the point here, and we'll come back and talk about cryeum later. There's one small problem. You're not gonna get a three-dimensional picture of your protein. If this is my protein, what you effectively get is that if I take, if I had 100,000 of these remotes and then I took a very sharp knife and randomly cut this in different slices, you're gonna get random two-dimensional slices of your protein. And remember now, you have these proteins that consist of a couple of 100,000 atoms and they're super noisy. So somewhere here, you're gonna need to go from these two-dimensional, really, you're gonna need to identify them and then you're gonna need to go back to the three-dimensional part, which is a super complicated computational problem. But here, there's one difference. You have faces because it's a real space image. It's not the diffraction pattern. So we can solve this computation. I'm gonna go through that in much more detail later on in the course. The point here is, remember, this is also something that no book will tell you. So if we come back to this diversity and what I'm gonna speak about a little bit both today and the next two lectures of the course is really how the simple low-dimensional features in the boring parts, because you're gonna find this boring when we start. When I keep talking about single amino acids instead of all these fun physical concepts, the biological concepts, all the complexity and the feature and the detail in these large molecules and the biological function is caused by single side chains. In many cases, we can turn a complicated disease down to a single side chain that has changed its function. We swapped it in something else. And this is the hard part in biology because on the one hand, you need to keep the overview. You need to understand the interaction. You need to understand the entire proteins, right? But you also need to focus on the details. What is the difference between, in this case, we have a tyrosine. So there's an OH group here. What if you have a phenylalanine without the OH group here? You can imagine it's the smallest difference possible, right? But that single hydrogen bond, in many cases, will lead to a disease because you can no longer bind it to something. And this is something that's at least for me is hard because you think, well, it doesn't really matter if something has changed a little. The devil is in the detail. And part of the reason for this devil has to do with the complexity of the molecules. If you think about, you probably primarily thought about this from a bioinformatics point of view, but I'm deliberately gonna be a bit of physicist here. Sorry. Each of these bonds, this is one amino acid, right? And then one amino acid and one amino acid, and they're bond together by these peptide groups. This bond turns out that it's completely rigid and has to do with how the electrons are placed here and everything. You can't rotate around the C to N bond here. It's gonna be effectively a double bond. So for each part of this proteins, if you start to rotate that bond, you will rotate that group. You will rotate the aromatic group there, and you will rotate those atoms a bit. If you rotate that bond, you're gonna rotate those two methyl groups, but you won't change anything else in the structure. So any rotation or change you make in a side chain is gonna be a local change. And you probably just probably introduce, aren't probably introduced that in some sort of sense when you look at the structure in bioathematics. However, if you start to rotate these two bonds before and after each alpha carbon, you're gonna start to make large changes in the confirmation of the entire chain, right? So the global structure of the protein, at least the global structure that the protein can assume is gonna be based on what values are allowed here and how many different ways can you rotate each of these bonds before and after each alpha carbon. And these bonds even have names. You call them Phi and Psi. So starting from the N end, the bond before each carbon is called Phi, and the bond after each carbon is called Psi. Phi and Psi. I'm gonna need to erase them, because otherwise they're gonna stay in the next slide. So we call, if we take that for each of these, each of these could in theory be, assume one out of 360 degrees, right? But on the other, if you just change a bond by one degree, that's not gonna be a gigantic change. So that if we say, if you start changing the bond, the ballpark of 10 degrees or something, then it starts to be a bit different. So we assume that they can each take 36 different states. And again, this is just a ballpark estimate. You could pick 20 if you wanted to or 30. So how many states do you think one should use? Well, the key is just back. This is how you think as a physicist. There is a saying that any physicist where it's his salt should be able to estimate anything in the world to within an order of magnitude without actually knowing anything about it. It's not gonna be a million states. And it's not gonna be one state. Just pick a number roughly somewhere in the middle. You're gonna make a bunch of errors, but the idea try to make errors in both directions and we'll see where it leads us. So I don't really, this is wrong, but it's intentionally wrong. It's ballpark. So if we then pick a chain with 100 such bonds, there is gonna be, for each of those bonds, actually if there are two, no, there's gonna be another 50 residues there because there are gonna be two of them per residue. That's gonna be in the order 36 squared states per residue, right? 36 for the first one and 36 for the second. No, sorry, it is 100. So the phi can take 36 values. So the total number of phi-psi combinations is gonna be 36 times 36. There are gonna be a bunch of these that are impossible, but again, don't worry about details for now. So that's gonna mean, if you now take two such residues, the first one can exist in 36 square combinations. The second one can also exist in 36 square combinations. And then you just keep multiplying this for 100 of them. And 100 is a small protein. So it turns out for a, this is actually a fun number. So this can be 36 to the power of 200. This is a bit complicated because we start doing mathematics. The reason why I need to save this to 100 residues is that somewhere here, most computers can't follow anymore because double precision on a computer can only handle numbers roughly to 10 to the power of 300. If you start to sample these like either in a lab or computer, you couldn't do it. It's an astronomically large number. One of those states is gonna be the native structure. And remember, this was a small protein. So while it's superficially simple, there's an insane number of combinations you can achieve. I'll come back to this in a second when we talk about Leventhal's paradox and others. And the point is that number is likely wrong by a factor of 10. Do you see the point? Yeah, so it might just be 10 to the power of 298. Doesn't matter. We need to find, is it large or small or realistic? This is so unrealistically large that it's for all intents and purposes, it's infinity. There are actually some, I like you a little bit here because that peptide bond can actually move a little bit too. This is something, and I'm deliberately gonna ask you to read this in the book, not because actually not so much in the interest of time, but to force you to read the book. That peptide group is normally oriented this way, what you call a trans, so that you have one atom here and then two atoms and then the fourth atom on the other side. And that's normally how we always place the peptide bond to have the protein as straight start as possible. That does not work very well for proline because of this ring you have built in in the proline. So for proline, you actually typically have them in the cis confirmation so that you start with the chain here and then the chain comes out on the second side. So that would be equivalent to starting here and then going out there and then going back there. That looks like a very small, minor detail, but it's actually one of the reasons why proline can influence protein function a lot. To make things worse, if you had to guess for a normal protein, you should always guess trans. But you can't say always cis for proline because it's 75% cis, 25% trans. So proline can exist in both these cases. So particularly if you try to do structure prediction, did you see any proline problems when you did structure prediction? Proline is a royal pain in the a when you wanna do structure prediction because it's virtually impossible to guess where it should be just by looking at it locally. Look at this and then we'll cover this tomorrow. So we talked a little bit about the rigidity before. Enjoy the good question about that. How do we know that a protein moves? It's actually, it's not an easy question. When we got the first x-ray structures, the only case where you can get an x-ray structure is when something is rigid, such as sodium chloride. So the fact that we can get x-ray structures of proteins that means that they should be super rigid, right? You will only get a diffraction of each and every one protein and billions and billions and billions of crystals are in exactly the same combination. So that means that we have solved it that proteins are rigid. Can you see any problem with that? Sorry, but I'm telling you that it's perfectly rigid in the x-ray crystal. I have experimental proof for it. And what are those conditions? Yes, and under what conditions do we have them crystallized typically when we do the experiment? 100 Kelvin, liquid nitrogen. So what do you know is that protein, if you freeze proteins, they are very rigid. So that's essentially the argument saying that chicken is rigid, because when I have it at minus 10 degrees centigrade, it's very hard. The problem is that proteins, protein, well, life doesn't happen at 100 Kelvin. And that's the problem with x-ray crystallography and cryeum for that matter. So the, and this is, ideally you would like to make experiments in a cell, but you can't really make these experiments in a cell. The way we know that protein moves is that we can do spectroscopy. So we can, there are a bunch of different techniques, ire, Raman spectroscopy and other things that these are very fast, but we can essentially measure how quickly bonds, either in water or a protein or something vibrates because we see how much energy is absorbed in these bonds when we shine light on them that corresponds to these vibrations. So there are lots of vibrations in proteins and water. We're gonna go through that tomorrow, but the point is that when all we see proteins, we typically see them as static structures. That's not how they look in reality. That's, I think, one of the greatest things that computer simulations have brought us, that we see that proteins move. They're much softer at room temperature than we think based on x-ray crystallography. But that means that there are a number of motions that we should think about. The easiest things might be these bonds vibrating. A small exercise could be to think a little bit about more motions that can happen. So let's look at some of these motions. Here is a small protein again. How much is it gonna influence this chain if we change that bond length by 1%? Well, it might be very important for the spectrum, right? But you wouldn't even see this. So these bond length vibrations, they're probably super important for physics, but it doesn't really change how the protein behaves. Or if the angle between the hydrogen, carbon, hydrogen there, that changes by 5%, it's just gonna be vibrating a little bit. You're not gonna see anything. But again, these two rotations around bonds, let's see, before or after that alpha carbon, it's gonna have a tremendous change because you would take, if you rotate that bond, this entire second half of the chain would instantly rotate down there instead. So rotations around bonds are gonna be by far the most important degrees of freedom for proteins. And as I already told you, there are a bunch of these. These are some things you need to know, but we will kinda come back to them. Phi and Psi. So here's how I like to think like a physicist. You can choose. In biology or chemistry, and it's very common that you wanna learn things by heart, I became, this is the thing that I'm bad at on YouTube, but I became a physicist because I'm lazy. I love medicine, but I couldn't even imagine studying these tens of thousands of pages of books. I wanna understand things. So when I need to know what these are, I sit down and roll my protein. I can derive this in two seconds. I only need to do it like every week or so, right? In two seconds, I know that I have, I know that a protein backbone, I have a nitrogen and a carbon and a carbon and a nitrogen, and that's the C alpha. And then we just continue that way. And then for each protein, I know that the bond before the C alpha is called Phi and the bond after it is called Psi. And then I can derive what these are. You should either be able to derive this or you need to know that by heart. You need to be identified what these bonds are. Arne might have talked about that in Bioinformatics too. Just for fun, no, I wonder. The peptide bond here is actually called Omega. You see it that there? And the bond out from the side chains are usually called Psi, but that's extra information. We define these bonds as so-called torsions or dihedral angles. This is also something I won't go through in detail, but so if you have a plane of the first three atoms and then a second plane of the atoms two, three, and four, the angle between those two planes is gonna define how much we have rotated around the bond and that's what we call the dihedral or torsion angles. These are kind of difficult to work with and particularly in this case, because you would need some sort of molecular building toolkit instead of work with them. So for a large protein, it frequently makes sense. Remember how I said that we typically only have these two degrees of freedom. So for each part of a protein, it makes sense to take one axis and call that the phi angle and the second axis and call that the Psi angle. So that means that each residue in your protein has one phi angle and one Psi angle. That's how we have oriented this bond just around this residue. So different residues might have different angles and for now we don't know anything about residues. Imagine that you're in the 1950s and people don't have these. You're one of the x-ray pioneers and you wanna understand how the proteins work. We know in principle how they look, but we have no idea how the larger structure is created. But if you know these angles for one structure, you could place it as a point in this diagram. So this would mean that the phi angle would be say minus 50 degrees and the Psi angle would be say minus 30 degrees or something, one dot. And if you then keep placing dots here for one protein, 100 residues would mean 100 dots. If you have 10 such proteins, you might be able to create a thousand dots. These type of plots are called Ramachandran diagrams and each black part here is one amino acid. The reason why these are very popular, these are something you need to know, is that it's a natural way to describe the inherent degrees of freedom in a protein. And do you see something here? I forgot what protein this is actually, but they are not randomly distributed, right? So there are virtually all these residues fall in some parts here, here. You never have anything here. If you try to put anything here, the amino acid would bump into itself. So it's completely impossible physically. There are very few of them here. The reason why you have some variation is actually that there are differences between the amino acids here. So most normal amino acids that have their chiral and they have a few atoms in the side chains, they will look something like the blue one over here. Since glycine doesn't really have a side chain, that one is more flexible. It's not gonna bump into itself so much. So glycine can be in many more conformations. Proline is pretty much the opposite of glycine because proline is much more entangled than any of the other ones. So protein won't really be happy anywhere, but there are a couple of places where it can tolerate itself. And because proline binds back to the nitrogen, that's actually gonna influence the residue that comes before proline. So the residue just before a proline can almost only be up here. So that actually turns out that you probably know a little bit about protein structure already. We're gonna come back to that in a second, but all the protein structure we see corresponds to two or three areas in this remachandron diagram. So why do I go through this now? Well, remember the thing I told you about taking things in layers? Five minutes ago, I talked a little bit, if we have two degrees of freedom per residue, we had these two angles. And I said that there could be maybe something like 36 different states for each angle. And then I said that per such residue, you would then have 36 squared states, right? Which is the ballpark of 1,000. Do we have 1,000 different states per amino acid? So we have in the ballpark of three. If we're gonna be radical and generous, let's skip that one and say that we have two states. If you're only a one state per amino acid, it's gonna be pretty boring because one multiplied by itself any number of times, it's gonna be one. And we know that's too little because proteins can exist in multiple states. They fold, right? So one is too little. The smallest number, natural number, you can imagine after one is two. So that at least we've not, at least it's not too much. But so let's assume that it's just two states. So you go down from roughly 1,000 to two states per residue. That's pretty simple. But this leads to another point. Even if you just had two states, sorry, before we say that there's, no matter how many states we have, somehow the protein needs to find the best possible confirmation it should be. Do you agree with that? There's a harder question there, I think. What is the best possible state for a protein to be in? Sorry, almost. Actually, you know, for not say, I will say energy. We're gonna introduce this concept of free energy later. But let's say energy, I agree with that. But that sounds like a very boring physics way of talking to it. We know that this is biology, right? Can proteins be different? There's a magic happening. The body somehow naturally creates a protein. You could imagine that without going into things about intelligent design or anything, you could imagine a cell creating a protein in a state the cell would like to have it in. Why should the protein, is there anything that says that a protein has to assemble the lowest energy state? So this is actually a very, very important question that was discussed a lot in the 1940s and 50s. And it was a Danish researcher, Christian Anfelsen, who showed this in a very famous experiment that he could take a small protein in a test tube. So you're no longer in a cell, pure protein in a test tube, denaturated so that it unfolds. And then when he later restored the conditions, he could show they didn't determine the structure of the protein, but he could show when he restored the original conditions, the protein started working again. So somehow the protein folded itself. And if you're in a test tube, you don't have any energy, you don't have the cell pumping ions or anything, or you don't have ATP, it's just water around it. So there can't be any magic. If you're isolated in a test tube, you have to follow the laws of physics and chemistry. And that's how you show that it has to adopt the lowest energy states. And somehow the lowest energy state then depends on things like the temperature, whether you have urea or something else that denaturated it. Do you think that was an obvious result? Well, it was not obvious enough that he got the Nobel Prize for it. So again, it's something that we take it, most of the things you take for granted, somebody got the Nobel Prize for it. And that's why we take them for granted today. But here comes friend of order. It's easy to say that it must find itself. But even if you just had two states per amino acid, for a small protein of 1,000 residues, that would be two to the power of 100 different states. That starts to be a couple of trillion states. And again, two is a horribly low underestimate here. So even if you're horribly underestimating the possible number of states, it's gonna be so large that we can't even imagine sampling them all in a computer. So this is gonna be a very difficult problem. And it's even called Leventhal's paradox, because we of course know that this happens in your cell all the time. Proteins fold in a second at most. And still it's one of the most complicated problems you can imagine. So if we think about this from the structure to sequence the structure and function point of view, if we start with the amino acid sequence, we know that the amino acid sequence somehow generates the 3D folded structure. And that's kind of what Anfinsen proposed, right? That we know that somehow this long string of amino acids will fold themselves down into a state. We know that this state is unique because the protein, if you take hemoglobin, hemoglobin is gonna have the same biological function every time. You don't have to be lucky for your hemoglobin to carry oxygen. So we don't know how it happens, but we can be damn sure that it does happen. And then somehow we're gonna need from this 3D folded structure to create a specific function. For instance, that the protein actually binds oxygen. Same thing here. A protein's function will not determine the structure and a protein's structures will not determine the sequence. And then there are of course, the re-naturation is pretty much what Christian Anfinsen showed that if you destroy the function, the protein, it can refold itself. There are, as always in biology, there are no laws in biology without exception in contrast to physics. So there are post-translational modifications that you somehow modify a protein after you've started producing it. And in that case, these things are not necessarily reversible from a physical point of view. And there can be other molecules that need to bind for a protein. In the case of hemoglobin to bind oxygen, you need that small heme group. And that heme group is not something you code for in your genome. That is another group. And if you remove the heme group, that heme group will not magically reappear out of nothing. You're gonna need to add the heme group. So there, this is a fun old slide by Ben Robson. The great proteano, sorry, this very low resolution. From fully extended to tightly called twice nightly, I don't know how he does it, Cyrus Leventhal. And this is really the source of this Leventhal's paradox. We know that it does happen. We have absolutely no idea why. Actually, we do know. And by the end of the course, you will be able to answer this. Because this has long been one of the hardest problem. Well, the hardest part of understanding how we're gonna reconciliate simple fundamental laws of physics with advanced complicated functioning biology. But we're not gonna solve that one today. I already mentioned that there are a bunch of other torsions available in these two. I will deliberately not go through this either. I'm gonna spend, so what I will normally do in these lectures, and I'm gonna do that tomorrow too, I'm gonna spend the first 20 minutes of each lecture to go through a bunch of study questions. You will have them at the end of each lectures, and then I'm gonna have you discuss them. So you should either you will pick them or I will distribute them in a round robin fashion so that we talk about them. Because it's so dangerous that I just rush through this. Each and every one of you is gonna copy these things right away, but you're not gonna think about it. So I deliberately want you to think a little bit, what are the local structural features around an amino acids? Which ones are important? Which ones are not important? The bond vibrations here are not gonna be important, but these backbone and side-chain torsion degrees of freedoms are what's gonna determine everything around proteins later. But the cool thing is that actually, I wouldn't just say cool, this is also the horribly complicated thing. Normally in chemistry and everything else, you're used to decide whether reactions happen or if they don't happen, right? Some reactions happened and other ones, you might need energy for. If you just take ice and put in a room temperature, it will eventually melt. The reaction only goes in that way. If you take water and put it at minus 10 degrees, it will eventually freeze. If you mix things in a test tube, either something happened or it doesn't happen. That's because you're working on the lab scale when you have lots of things. When you're working on really small scales in physics, the problem is that things sometimes happen and sometimes not. And that is because these energies, the energy of rotating around these bonds are in the ballpark of a couple of kilocalories per mole. We're gonna come back to these energies tomorrow or later, so don't worry too much about it now. But how do you work with things when they sometimes happen? That's where you're gonna need statistical mechanics because we're gonna need to think in terms of statistics. And you can no longer say that things will happen or that they won't happen. We're gonna need to start just as you did in bioinformatics. We're gonna now need to start thinking in terms is it likely that it will happen or is it unlikely that it will happen? If I take this pen and drop it, it will fall, right? That's a lot of gravity. I'm not gonna break the pen here, but you could argue that if I take something like a, if I throw a brick through that glass, I'm not gonna do that, but the glass will break. But you have all the atoms and everything. Can't you unbreak the glass? Technically, all the atoms could have, because each small part here is kind of reversible, right? It's just that when you combine so many events that it would be extremely unlikely that it would break. And that's why it's so fun playing, say, a movie backwards. You're never gonna see the pen moving up by mistake. If it's a very small energy fluctuation, you will see it now and then. But the reason why you know these things that you instantly know that it's extremely unlikely. I will come back to that tomorrow, but I'm gonna spend the last 20 minutes here or so, 30 minutes talking a little bit about repeating protein structure. I know that you've heard some of this, but we're gonna need this. This far, I've just talked about amino acids, and that's what you have hopefully seen a lot of in bioinformatics. We talk about as primary structure. That's kind of a stupid name. I never say primary structure, nobody does. But if you ever hear anybody talk about a protein's primary structure, you should know that it's the amino acid sequence. The point where we really start to talk about structures is that when you're gonna have these local organizations of amino acids, for instance, in a helix or sheets, how do we know that proteins have helices? I'm not gonna tell you. Well, now in the ice, we know because we have X-ray structures from right. How do you think you just, the term discovered the first helix? I'll come back to it in a slide or two. These helices will eventually assemble into even larger structures that the tertiary structures or a fold that you probably call them in bioinformatics. And eventually you're gonna have something like a ribosome or hemoglobin, where you have multiple chains like this and super complicated structures. Could be a hundred of them. So the helix is the poster child of biological structure. All naturally occurring amino acid helices are right-handed. Why are they right-handed? Couldn't you imagine left-handed helices? So the laws of physics are symmetric. There are no special left-handed physics versus right-handed physics, right? So by definition, couldn't you just make a left-handed helix? All you would need would take the helix and put a mirror right next to it and mirror it. And it would be left-handed. So that does mirror remind you of something? Chirality of the amino acid, right? So there's a built-in handedness to the amino acid. And here's the strange thing because normally in chemistry, we always say that the chirality only matters for physics, not chemistry. But somehow because the way biological structure is hierarchical, then that means that this built-in handedness of the simplest molecule now starts to propagate. So because there is a natural, you could argue, because the amino acid is L and that means that this handedness of the amino acid means that there is also gonna be a handedness to the helix, that the helix will only go in one direction. Our normal technically could try to make a different helix, but it's not gonna work well. There are a bunch of different helices. There is one of these you should know and that's called the alpha helix. It's the one where the protein is happy. Actually, I will skip that. It's easier to show them here. The alpha helix is when the protein is happy and that's when you have each amino acid here is making a hydrogen bond to an amino acid for units further on in the helix. And that means that there's 3.6 residues per turn. If you have a spiral, any time you have a spiral or something, if you're a kid, at least my kids always do this, you can try to twist it harder until it breaks or you can try to untwist it, right? Eventually it will break if you untwist it too. But most helices or any type of helicostructure, spring or something, they have a bit of built-in flexibility. You can twist it a bit harder and a bit looser. It might not be happy because you would like to get back to the relaxed equilibrium. And this is the case for biological helices too. You can take a helix and twist it a bit harder. And if you take this, normally these hydrogen bonds would be four units apart. If you twist it a bit harder, first it's gonna resist, but eventually all these hydrogen bonds will jump one unit. So you make a hydrogen bond to something three units apart instead. And that's when you get something called a 3.10 helix. The exact definition here doesn't matter, but this number is the number of residues between the hydrogen bonds. You can almost see that this leads to a perfect triangle, but this is, it's locally stable, but it's pretty unhappy because it's relatively strained structure. This is so much more common. If you look in the PDB, you're gonna see 99.9% of all helices in that conformation, maybe 0.1 here. Yes, that's actually very good. I'll get to that in a second. So that the nomenclature here is really that the first number here is the number of residues between the hydrogen bonds. And the second one is how many residues, how many atoms we need to go through per helical turn. So there are, forget about the two seven helix, I've never seen that in my life. But the other option, instead of twisting it harder, is you can untwist it a bit. And then instead of having four residues, you will have five residues between the hydrogen bonds. And that's gonna lead to this so-called 516 helix, which is usually called the pi helix. That's actually super good. I don't know, I look it up until tomorrow. If you think that the three 10, there's a reason why I don't know. I have never seen a pi helix in my life. I'm sure I have seen, but I've never noticed it. I've never done anything that's relevant to it. These helices can actually, they are relevant in a couple of iron channels and everything because they have one very nice property. If I have a bunch of side chains here, they're all gonna align. So they're gonna be on top of each other. For a normal helix, it's gonna be a spiral staircase that they will go around all the sides. And there are just a handful of, like one in one time in a thousand, it can be useful to have this. This is more, I would, I've promised to never ask you about a pi helix on the exam. You should have seen it. You should be aware of it if somebody tells you about it. It's not important. I've never seen any biological relevance of it. Yes, that's a very good question. Is it a quest? Well, what do you think? So I'll tell, I'll draw a molecule for you. This is a molecule that you're probably very familiar with. So this molecule, is that a rigid or liquid molecule? I can't, I can't, if I were to tell you the name of the molecule, I would of course lead you in one direction, right? But this is a very small, simple chemical molecule. One oxygen and two hydrogens. If you're doing experiments on that molecule, will you find that molecule to be liquid or rigid? It depends. What does it depend on? Temperature, pressure, it depends on the surroundings, right? And you could even imagine if you take just one water and put it inside, say a solar cell or something inside somewhere else, that molecule will depend on other molecules around it. It's gonna be exactly the same thing here. It will depend on the surrounding. You just might have part, imagine that you had a pro, so normally it's gonna be here because it's happiest here. Most proteins will be happy with this state. But you just might have a molecule where you don't have a whole lot of space. And in that case, it might actually be easier to put the helix here. The helix itself will be a bit unhappy here, but it might, together with the surrounding, it can just occasionally be better to have it here. I will show you one example of this later on in the course for iron channels where it actually is important. And that's the only reason why I need to introduce it here. If it wasn't important there, I would probably even skip the slide. But the 310 helix is important now and then. A bit, but these are much, much weaker effects than between different secondary structure elements. This depends more on the surrounding. So based on Linnick's question, that leads us to a larger question. You have probably seen a bit of secondary structure in the bioinformatics course. Which one of these structural elements is most stable? Or most rigid? Sorry? So that's, again, would be a leading question, right? That instinctively, we haven't even brought up the beta sheets, right? But my point here is that on alpha helix, because of you having all these hydrogen bonds inside the structure, you're essentially adding a lot of glue to the structure. So you're creating something that's fairly rigid. If you just take this, don't think so much of a beta sheet here. Think of this as just one long sequence of amino acids. Amino acids are normally flexible and can move. But the second you put them in alpha helix, we lock them down to one state. So this is gonna be fairly rigid. So in general, local secondary structure proteins, as you probably know from bioinformatics was read, is highly ordered. Helices, sheets, turns. Because it is ordered, we can classify it. So rather than thinking of hundreds of different structures, you can think that something is either helix or sheet or turn, which I'm sure you've gone through in bioinformatics. And the second, this has nothing to do with physics really. The physics is just based on atoms. The atoms are atoms and they interact. They don't really care that they are in something that you happen to call a helix or a beta sheet. So this is just a conceptual, and now you're doing models. You're not thinking of this as models, but these are models in your brain that we're organizing rather than trying to think of a large protein ribosome with a million atoms. Instead of trying to think of that as three million degrees of freedom, X, Y, Z for each atom, you can try to determine this hierarchically, right? So you can say amino acids combine into helices and then we start to create some sort of building blocks that you can then use on a higher level. This is purely for your brain, that it makes it easier to understand and think about, but it's not really the interactions are not really separate than other interactions. Beta sheets that you've probably also seen in bioinformatics, we're gonna start looking at them from a slightly different point of view here. This is the other very simple building block we see. And instead of rotating them and putting them in a helix, if we just take sheets of amino acids and stretch them out, there are two ways we can stretch them out. Either we can go back and forth or we somehow have them all going in the same direction. Remember, the second you have a sequence because you're starting from one end and going to the other, there is a built-in direction of the primary structure. And these are called parallel and anti-parallel sheets. For both of these, you're gonna form lots of hydrogen bonds between them, between the oxygens of the CO group here and the hydrogens on the NH group in the next one. We're gonna go through this in much more detail later on too, so I will just wet your appetite on it. And what this gives beta sheets, you can create very long rigid beta sheets that become larger than alpha helixes because an alpha helix can't really be larger than the ranges of the helix. But these beta sheets, you can create spread out structures and that's gonna be useful in many cases but it's also gonna be more complicated to work with. So that comes back to this question that how on earth did we discover these structures in the first place? So this was actually partly a theoretical prediction. There is a very famous set of papers and proceedings of the National Academy of Sciences in the early 1950s by Pauline Quarry and partly Branson when they sat down and went through all these structures and said what they should look like. There is a, these papers are somewhat difficult and there's lots of mathematics in it but there is an amazing review written some years ago by Dave Eisenberg that I uploaded to this model system that is well worth looking at. You probably don't think this is as cool as I do. So what is the difference? Why was this cool? Why was this an amazing discovery? Look at the years. The long before, definitely before the DNA structure and it was 10 years before the first protein structures. They said what the protein structure will have to look like before we actually had the, long before we had the protein structures. So there is a, there is a famous, I think there is a quote in the Dave Eisenberg paper where I think it was Max Pilots, it doesn't matter. It doesn't matter. So Linus point, they sat down and did this theoretically. So based on the distance pairs that you saw in these x-ray structures that what are the different repeats you were seeing? And for a long time, because remember that I said that x-rays are diffraction pattern, right? So anything that repeats periodically you're gonna see this periodic distance. And the problem here is that the only, by far the strongest periodic distance people saw was something that was a bit over five angstrom. There was also a much slower repeat somewhere around 1.5 angstrom, but this was the strong one. And this was, this is much larger than the normal, this is in an alpha helix. So the problem is that most of the samples they looked at was not just one helix, but many helices that were wrapped around each other. And that set everybody off for many years. But I think there's a quote here by Linus Pauling eventually said that he realized that there were a bunch of other people and they kept posting incorrect models. But Pauling was also convinced that if they just keep proposing incorrect models far enough, eventually they're gonna stumble on a correct model. So he had to beat them. So Linus then suggested this alpha helix model, which again by now you're gonna think that that's obvious, it's a beautiful stable model. The only problem is that this model is fundamentally incompatible with the data. This model would just predict there should be a sharp peak at 1.5 angstrom while lots of people saw a peak at five angstrom. If you're a good modern scientist, what should you do when your hypothesis disagrees with the data? Well like rushing and publishing in PNAS or nature is not necessarily what your advisor would be most happy with, right? So it requires a bit of courage. If your model disagrees with the data, Linus is basically saying the data is wrong, my model is correct. And of course the cool thing is that he was correct. So what happened after this publication and everybody had focused on this five angstrom peaks. After the publication of these papers, Max Perutz realized the beauty of this model. And he went home and I think he had some horse hair or something around in the lab. So they took this horse hair and they assembled this in the local x-ray machines at an angle of I think it's 30 degrees or something so that they would to see at this angle they would expect to see the 1.5 angstrom pattern. And today you would just look at the computer screen but at that time they had to develop the film and everything and after a few hours it's a beautiful peak of 1.5 angstrom. And after that everybody agreed on that the alpha helix is gonna be one of the most important features of protein structure. And this was of course a tremendous help to say Kendru, Perutz and everybody as they then tried to determine the structure of hemoglobin and myoglobin in particular because they could then use these theoretical models when things started to look like this we're gonna say, oh, this part of the protein has to be an alpha helix and they can model that in and get it to fit the x-ray. So if you look at these Ramachandran diagrams I kind of hinted it before but I didn't say it explicitly that different secondary structures occupy different places in these Ramachandran diagrams. So the alpha helix is the large part over here. 310 helix is kind of an outlier. The 2.7 helix would be somewhere in the middle there it does, you never see any density here that's why we can't forget it. Up here would be beta sheet, these are not helices. And if you were to, despite having L amino acids if we try to take our amino acids and anyway create a helix that went the other direction you would be out here and you never see anything there in a normal protein. There are a bunch of other I know I would probably skip helix typos but no I won't, no I won't. We'll come back to helices later on but try to think a little bit about them and think about what determines the helix structure from a local point of view. Look at the protein papers if you want to at least Dave Eisenberg's summary because I think it's a beautiful just recollection of really how science happens. This looks like something only a physicist would add but it's a beautiful illustration of this things that biology can come from very small. At the lowest end, once you down to the atoms inside atoms you have charges but these charges on a normal atom an atom would either have not have a charge or an ion would have a unit charge plus minus one or two but the second atoms are bonded to each other it becomes more complicated. So water molecule for instance your oxygen, hydrogen, hydrogen what will happen here is that the oxygen like electrons so the electrons along these bonds will move a little closer to the oxygen there and there. Sorry, yes, the electrons move there which means that in practice you would have something like maybe minus 0.8 charge on the oxygen and plus 0.4 charge on each hydrogen. So these are not unit charges, they're not exact because this is based on the rough average distribution of the electrons. We're gonna come back and talk about these type of partial charges later on in the course. This happens inside the peptide bonds too because the nitrogen likes electrons better than the hydrogen so the nitrogen will steal a bit of the electron from the hydrogen and the oxygen here also likes electrons better than the carbon so here you will also have the electron moving in that direction. So what this means that along this entire bond you're gonna have a net motion of upwards of all the electrons and now you start to thinking why on earth am I taking a physics class? I thought this was biology or biophysics. This is pure physics, right? Electrons and atomic nuclei. We're not gonna talk a whole lot about that in the course. In physics you can describe that with something called the dipole so that if you have a charge here and then the dipole is drawn in the direction from the negative charge to the positive charge and you measure in a unit called the Debye, it's not important for the course, you don't need to know what it is but that means that along each such peptide bond there's gonna be a dipole, fairly strong dipole actually. Why do you care? We just know that the peptide bonds exist. Well here's the thing, the second you draw this in a helix the helix is a very periodic structure and I was about to say by coincidence it's not a coincidence of course the helix is a helix because of this periodicity but it turns out that all these peptide bonds are oriented in the same direction in the helix, every single one. So if you now have a small helix with 20 such bonds you can have 20 relatively large dipoles and they're all pointing in the same direction. So now your entire helix, not just the atoms but the entire helix will appear as if this entire helix was now a large dipole as if you had a plus charge here and a negative charge here. Remember one of the first slides I showed you this morning I told you that the amino acid is a twit rhyome so there you have a distribution of the charge, right? This means that when you now pair two amino acids together you're also gonna get distributions of charges and suddenly we see that that in turn in the third step at least that the helix will also have some sort of electrostatic dipole or distribution of charges. Okay, that was the last physics. Why do you think this is important? It turns out that this can be super important for biology. That ion channel that Rod McKinnon got the Nobel Prize for, KCSA. We're gonna come back to ion channels later on. Now I'll ask you a very simple question. And ion here like the potassium ion, how can this one pass throughout the membrane where you don't have any water or anything to solve it at all? And it's pretty much as if it had no barriers at all. It just goes right through. Because somewhere along the road here it's gonna need to strip all the water. And the way this works, don't worry, I don't expect you to know this. This just illustrates the importance of the physics. When this ion comes in here you have four helices like this pointing to it pretty much like small guns. I think all of these have dipoles. So what happens is that this ion basically sits here and suddenly it's gonna be very happy because you have all these four helices around it help to coordinate it. They basically help to bind the ion. And at this point the ion is gonna be so happy that it lets go of all its water. Because it no longer needs all this water. It has the helices to help stabilize it. And when the ion has let go of its water it will go through the channel. So if you did not have those helix dipoles this would never ever work. And we know in simulations today that this is true. So, and this also illustrates the point unless you had the structure of the ion channel there is no way we could understand why this works. This comes back to something else. Why does this one only let through potassium ions? K rather than sodium, N because they have different energies required to strip them off the water. Yes, sorry you had it, or yes. So normally sodium is a smaller ion. So if you have a hole, it's kind of a hole but the large ion goes through not the small one. There is no way you could do that without electrostatics. I will come back to that. This also leads to a bunch of other things. These are the reasons why you can make the simple experiments with water and comps and everything. That has to do with these partial charges of molecules. So one of the last things I will do today, I think we're good, I think we're roughly on time. I have eight more slides. The reason why I keep stressing electrostatics to you is that it's one of the strongest forces we have in nature. Two charges separated by just one angstrom so it's a very short distance. They would have an energy of the ballpark of 300 kilocalories per mole. Remember that I talked about that these bond rotations, there may be two to four kilocalories per mole. So that anytime you have a charge involved in something, the charge is completely gonna dominate that process. So biology is more about electrostatics and charges than you think, much more. And they are also very long range. They decreases one over the distance. We will come back to that later. This is something you should have a rough idea about. I'm not gonna care whether you say 250 or 350, but the point is it's not a billion and it's not five. But it's in the couple of hundreds of k-cals, super strong. So if you now have an interaction that assumes that you have a protein that needed to rotate two or three bonds so that they can form a better electrostatic interaction, would that happen? Sorry? Yes, because basically you would pay two to three times that cost and you would gain that cost, right? So protein will do virtually anything locally if you can get better electrostatic interactions. These are just B bumps. And that brings us to the next part, hydrogen bonds. So here's where things start to get a little bit complicated. In principle, this is quantum chemistry, but I'm not gonna go through quantum chemistry with you. We still don't understand how water works. We have rough approximate ideas, but David Chandler in Berkeley, he still publishes papers in nature and science today about understanding how water works. This molecule is more important to life than anything else. We could not live without water. But these partial charges I told you about there, this is gonna lead to that you will have a negative charge here, positive charge here. Actually, I love being able to write on my slides. And then you're gonna have a positive charge there, a negative charge there and a positive charge there. What this will mean is that this negative charge will now start to track this positive charge. And this is a crossover simplification because this really has to do with electrons and quantum chemistry and everything internally. But you can think of this in terms of charges. Remember that electrostatic interactions are super strong. Hydrogen bonds will determine everything in water. And it's not just water. In DNA, the reason why the DNA is paired together is that between all these bases, we are forming hydrogen bonds. So it's hydrogen bonds that keep DNA structure together. And that's why DNA is more rigid than RNA. The typical energy of a hydrogen bond might be five kilo calories per mole. And here's where I think of that. If one thing was 300 and the other one is two, go with the 300. So if one is four, roughly, and the other one is five, roughly, this is the problem with proteins. We're gonna have lots of interactions that are roughly the same strength. So would you break a hydrogen bond here if you could have this slightly better interaction? Well, maybe, sometimes, sometimes not. This is a number that you need to know if I wake you up at three a.m. in the morning and ask because it's so important for biology. And here is the other thing. Should you use K-cals or kilojoules? Well, there is a good and a bad part to this. The good part is that you can choose, but you need to realize that there are not, you can't say five kilojoules since that's five K-cals. And again, there's some, I would say that they can be three to six K-cals, four or five K-cals, at least, or maybe 15, 20 kilojoules per mole. You need to have a rough idea what the magnitude of a hydrogen bond is. We're gonna come back to that so many times. That's the other thing to learn tomorrow. So let's look at water. Ha-ha, yes, that's the simulation, good. This is a room temperature simulation that I did a few years ago on my computer. So at room temperature, virtually all hydrogen bonds in water are intact. Each water molecule participates in four bonds, right? Because each hydrogen can make that hydrogen bond, sorry, that hydrogen binds to another oxygen in there, that hydrogen binds to an oxygen here, and each oxygen participates with two other molecules. But of course, there are two waters involved in each hydrogen bond, so that at the perfect ordered system at zero Kelvin, you would have 2.0 hydrogen bonds per molecule. At room temperature, you have roughly 1.7. But they become looser and they also, waters can move, right, so that they sometimes break these hydrogen bonds and they move around. As you're gonna see, I think tomorrow, this is what's gonna determine, for instance, whether water is a liquid or ice or even gas. And as we will eventually see, this is also what's gonna determine a whole lot of the protein interactions, whether they form hydrogen bonds and under what states they form different structure. And the hydrogen bonds will also tell you, for instance, what basis and DNA will bind to each other. Two versus three hydrogen bonds, if you can't, you can't pair these up perfectly anyway. You can't have two adenines bind to each other. You can't have, well, two DNA strands bind better to each other than RNA or anything else. So then that's of course also why DNA will eventually heal itself with the help of some enzymes. All these interactions, how we transfer information in biological system pretty much comes back to simple hydrogen bonding and interactions between molecules. Pretty amazing that it works. And that's, I think, have one or two large proteins. So what we're gonna look at later this week is that what starts to happen in proteins is that depending on what state proteins have, for instance, whether it's a 3-10 or alpha helix, they will form different hydrogen bonds. And if you just had a long stretched out chain of amino acids in water, they would all make hydrogen bonds with water molecules. But then under some conditions, you're gonna have the protein folding and create some sort of interior, in which case we will have parts of the protein where there is no water, and other parts here where there might be some interactions with water or the water will have to interact with itself. I'm well aware that this is a bit of hand-waving now. But the problem here is that to be able to understand this, we're gonna need to understand a little bit of very simple physics. What happens in hydrogen bonds? What happens if you need to come, how should we compare these different things? What does it mean that the molecule is free to move here but it can't move here? So we're gonna need a bit of, call it mathematical or physical toolbox to start to put numbers or at least letters on these different things. And that's in a particular concept called free energy that I'm gonna spend a lot of time talking about tomorrow. So I think what, perfect timing, what have we spent most of the time going through today? And if you feel that you're weak on the protein side of things, I'm gonna go through this many times over. So you should not, all these things I just touched upon and say that I will come back to them later on in the course. I don't think, don't worry about it. We will come back to it. But if you feel that you don't remember the basics of amino acids structure, scan through chapter one and two in this book or the PDF of it. And you need to understand, again, you need to understand the things that we assume that we're gonna come back to tomorrow and on Friday. And I want you to understand the hydrogen bonds. On the most simple level, not phase transitions or anything, but just understand why they happen. And then if you have time, I would suggest ideally continue with chapters three and four. And then you can ask me more questions. We're gonna go through this all tomorrow. And here's also what I would like a little bit of input from you. As I told you at the beginning today, I have, I know most of the stuff in this course. I think it's fun lecturing. But if you would like to, for instance, sit through and just scan through the lectures scan through the YouTube versions of the previous lectures. In principle, we could spend three hours per day talking about your questions. I've tried that some years. And what that usually leads to is that there are a handful of questions and then there are no more questions and then people head home instead or something. So on the other end, I also think it's a bit boring. I wanna avoid you just taking notes. Don't take too many notes. So what I tried last year and what I would suggest that we tried to is that I will spend 20, 30 minutes tomorrow, not just a quick recap, but then we'll talk about the things we covered today and I'm gonna have you pick a question each and then we'll see how that works. But this is something I want feedback on. If you would like more lectures or less lectures, let me know and I will change. We're doing this for your sake, not primarily for mine. There are a couple of advice. You can read that later and they start reading, start reading, start reading. Everybody tells that. These are the study questions that I will not go through. If you understand all these questions, you're golden, but understanding questions means understanding them, not just citing something. I love to ask questions that has to do with more understanding than by hard knowledge. So tomorrow I'm gonna present exactly the same slide and then I'm gonna, one danger when I let you answer whichever question you pick is that you're gonna pick the easy ones. You're gonna pick the questions you know the answer to. That's really stupid. That's probably what I would do too if somebody asked me, but the question you know the answer to is not the question you need to discuss. What you should pick is the question you really don't understand because this is not an exam event. The point is that this is formative feedback. So what I will likely do tomorrow, I will start randomly and assign a question to each one of you and that's to make sure that you have a bit more risk actually guessing questions you don't understand. It's perfectly fine not to understand them, but read and if you feel that you understand them you will likely, hopefully you shouldn't have to do a whole lot of exam starting at the end. Does that sound like a good idea? Good. Tomorrow then after that I'm gonna talk more about how to do bonds and we're gonna kill hardcore physics, the Boltzmann distribution. This is actually quite fun. You will also see part of what I love with this course in a particular book's way of introducing is it, is that it flip-flops a bit. So you're gonna have some lectures where we talk just about chemistry, function biology, and then suddenly you're gonna flip and then we're gonna have a lecture where we talk pretty much entirely about mathematics, deriving distributions, but then we're gonna come back to biology and use those equations we just derived to understand things on a higher level. So today we hand-waved about protein structure. Tomorrow we're gonna go through some of the mathematics and physics we need to understand this physically and then we're gonna come back and interpret what we just told today but interpret that properly in terms of free energy. I will also try to make a point of finishing noon sharp or earlier every day. The worst thing that can happen is that I don't get through all slides and in that case I'll just move those slides the next time. We have spare time.