 Today I'm going to head back heavily into life science and talk almost entirely about protein structure. Very little mathematics if any. And it turns out that there are three big types of proteins, fibrous proteins, globular proteins and membrane proteins. And you might see that one of these three is missing here and that's because I'm going to cover it in detail tomorrow. But before we head into that I would let's let's spend the time we need on our normal discussion to see where we were yesterday and if there are things you want to discuss in more detail. I would ask I would almost argue that this is more important than the lectures because most of the stuff in the lectures you can read from the books and lecture notes. And you were passing yes you're passing around the handouts. And I should have put numbers on these sorry so you're just gonna have to read them aloud and say which ones you're picking. Yep difference between thermodynamics and kinetics. Yes and in particular we discussed this in relation to the free energy landscapes yesterday and the free energies. So there was even something more you could say about this that it's entirely correct what you're saying but this means that there are different parts that are relevant for them. So transition rate center where? And that and for that what part of the energy landscape is relevant. If that is wrong you get one more chance. So the point where we talk about kinetics you talk about transitions and then we're talking about barriers. So surprisingly enough with kinetics the worst part of the free energy landscapes the ones where we virtually never spend any time they are the they are almost the only ones that matter because they determine whether we can go over the barriers or not. But with thermodynamics the key is well thermodynamics and stability that's thermodynamics stability in particular that's what we get to in the limit of infinitely long time and no matter how a barrier is as long as it's finite you will eventually get over it but it can take astronomical times. So thermodynamics low regions in the free energy landscapes kinetics high free energies. Oh sorry that was number two my bad it helps to read. So what is most important for kinetics in terms of understanding things? It's related to the fair to the second one but it's not entirely. How do you determine kinetics or transition rates? It depends on the barrier how does it depend on the barrier? Right and that means that what type of expression do you get? Right so you need the reason the reason why you get an exponential expression is that to get over a barrier you need to wait until by chance your particle L by chance by distribution your particle has that energy right and that means that the probability of going over these barriers is governed by a Boltzmann distribution but and that in turn the derivation I made what was it on Monday this week that you get this pre-factor that there is some characteristic time that we can't predict exactly what it is and then multiplied by an exponential of the free energy barrier but it's because of the Boltzmann distribution we get that and that's important because not every single exponential you will be seeing not even in this course it might look very similar to Boltzmann distribution but just because something walks and quacks like a Boltzmann distribution doesn't mean that it is but this one is so she will let you know what since I didn't put any numbers let's just continue down what is the point of separating initiation and the elongation free energies and roughly how large are they for a typical helix exactly so that's this is not really initiation and allegation this is not some fundamental physical concept an alpha helix of any given length has free energy relative to being a coil right and it's up to us or you how you define that free energy but by trying to separate that in what is the high term that you will have to pay before you form it we know that that corresponds to a barrier and then there is some additional extra term that makes you benefit more and more the longer it is it's a completely arbitrary separation it's just that the separations enables us to identify this initiation so roughly how large are those terms for a helix and what side yes and which one is larger magnitude right so that it takes a while to get and that leads to friend of order then also sing else but in that case the more residues you add the better it is so why don't you have it why aren't most helices hundreds of thousands of residues long yes and that goes on to probabilities right your only have 20 amino acids and you can this is pretty much statistics pick out the number of amino acids that are very strong helix formers they might correspond to one third of the amino acid residues or at best 50% 50% it might be easier so that probability can happen it's 0.5 raised to 100,000 that's a relatively small number so we can kind of forget it so that this is quite true but you also have to factor in the probability whether you have residues that even favor helix is in the first time and of course if you put a prolin here what would be L the elongation free energy be for a prolin that would be positive right the prolin doesn't like to be in a helix so at some point this is just for residues I'd actually like to form a helix in the first place and that's why it's not always going to be negative so how are numbers like that determined right so in this case CD spectroscopy and this is really I think that what we have had one or two slides when you try to argue that based on these simple energies what is the number of residues that you expect to have helical in a typical heel coil segment right the reason for doing that is that the fraction of residues that are helical I can measure in CD spectroscopy I can't see which ones are helical but I can see that 37% is helical and then I can go backwards because that was expressed in the free energies and if I know the number of residues I can go backwards and get the initiation and elongation free energies so that's frequently why you do these very simple models it's relatively hard to say that based on the shape of a curve in a CD spectrum what is the energies of the underlying process at least I'm usually not smart enough to do that but if you create a model a very simple process you say what would my model leads to the lead to part you can frequently observe and then it's matter of solving the equations to get the original energy or probability or whatever you were looking for back also that you have the point you have three curves right you have one type of curve that is the shape you would have for coil you have one type that we would have for an entire alpha helical sample and they have one type you would have for beta sheet and then you pretty much fit it so you assume that the curve you see is a curve with three components alpha times alpha helix beta times beta sheets and then you could say C times coil but it's not a completely free curve right so that you only read the the mix of three components leads that you have two factors here so from that and so we just assume that they add up linearly which is a fairly good approximation you don't get a number from the machine you get a curve from the machine right and we know what the curve should look like for a pure coil and we know what it should look like for a pure helix but it's not obvious that the curve should always be a linear interpolation between those but for cd spectra that works very well you could imagine that the coil and the helix interacting in some non-linear fashion but it doesn't so what type of transition is alpha helix holding exactly it's a cooperative transition but not phase transition and that brings me back to two other things that I have so what's kt yes or in kilojoules know those by heart and it's not because of this course that will I can with that will help you a lot throughout your career just having got the when do things happen when don't happen what's the energy for hydrogen pot sorry yeah sorry I thought you said point five for a second that's right and that varies to at least within a factor of two but it's not a hundred and it's not one also one of those things that will help you tell quickly whether things are realistic or whether they happen or not roughly how long does it take to fall the typical helix or point one I would in microsecond is fairly slow point it's fast the point that it's so fast it's so fast that you're entering the regime where most experimental methods have a really hard time tracking it and you even saw that in some of these simulations I had that it happens so fast that first you see you don't see anything and suddenly you see the entire helix and how is beta sheet folding different from that in particular from a kinetics point of view why is it a phase transition yes well you have a barrier for the alpha helix to so the barrier is not specific to the phase transition the problem of this phase transition is that the width of the region where in energy or something where the transition happen will eventually go to zero right you will have a very rapid transition but the actual it might take a very long time before we get there you could also argue it's not critical to know that beta sheet folding is a phase transition but there are many of them the way we see beta sheets whenever we see beta sheets you usually see much larger reason regions of beta sheets and that has to do with these things we mentioned beta sheets are not really happy to coexist with another structure locally and alpha helix is perfectly happy to have helix and then a small coil and then a small helix and then a small coil but for beta sheets that is fundamentally unfavorable and that's as you're gonna see today beta sheets usually prefer to be in other regions of the protein than the alpha helix's and we also showed yesterday that it turns out to be an exponential dependence on the stability of the residues here so beta sheet folding is also different in the sense that it spans several orders of magnitude it can be anything from say a millisecond up to weeks or so and that's also why you end up having these prion proteins misfolding and everything that's virtually always beta sheets and then I also the next one to sorry how long does beta sheet folding typically take we talked a little bit about the coil and how the end-to-end distance varies with the number of residues either in a coil or in a DNA or something and to first approximation how does the the end-to-end distance vary you don't need the formula you need what is the power a half there's a square root right so if you're doubling the length if you're moving from 100 residues to 200 residues the average end-to-end distance is going to be 1.41 times larger and that is the same thing as the diffusion properties you saw in the lab that the square R goes as the square root of time or conversely R squared goes as time right this is important not just for protein but it's going to be very rare that you study the coil but there are a number of experimental methods that rely on sorting or finding things on gel based on their size and that's why you will in all those as you're going to see a square root behavior normalization in the curve and that's because of this and then you can show that when you take it count that the chain can't cross itself and everything you get a slightly larger number but that's a surprisingly small correction so when it comes to real biomolecules in contrast to the very simple ones that we've studied what what typical terms do we need that was the stuff I spoke about after break yesterday well so the first point those are the individual terms with what's the whole thing you call it you need a yes or a potential right you need a way to describe as a function of the coordinates in a molecule what is the energy where enthalpy and that in turn contains several different terms Leonard Jones is one of them what's Leonard Jones interaction there's both the retraction at long distance that you have for any particles and also the repulsion at very short distance that they can't process good so that's one of them there are at least another four angles between bonds yes yes sorry you're improper as you usually call them out of or out of plane torsions that describes for instance there are many groups that's let's see here a carboxyl for instance you might have a C and then oh and then OH that carbon should actually be completely planar for chemistry reasons the way it's the way the electrons are organized so then you apply a potential to make sure that this one stays planar we also had the bonds right since you said angles and there's one thing you haven't brought up electrostatics electrostatics is actually complicated because electrostatics is painful and that all these other interactions decay very quickly but electrostatics goes as one over R and it turns out that you can't really cut that off for say that if you're far away it doesn't matter or something it's a fairly deep now I wouldn't say that deep mathematical result but if you have the function one over R and integrate that you get the logarithm of R and the problem so that if the interactions one over R the sum of all these would be somewhat proportional to the logarithm of R and if you then make a cutoff at any distance say a billion miles the integral of L and R from a billion miles to infinity is still infinity so no matter how far out you make a cutoff you still make an infinitely large error the reason why this works in practice is that we don't have free charges you might think of an ion as a free charge but you never got you can't give me a kilo of sodium ions sodium ions only occur together with for instance chloride ions so that we don't have free charges in nature macroscopic scales and that effectively gives us dipole-dipole interactions so there are a couple of tricks that you apply in simulations that you borrow from crystal solid-state physics and crystals really so that you can sum up an infinite number of periodic copies of these interactions or something the only reason I'm mentioning this is at Bjorn and Dari when you start doing a small simulation they might mention something about using a special algorithm called PME for electrostatics and that's the special algorithm and then why I spoke a little bit about energy minimization so what is that energy minimization does and what it doesn't do in our case at least I would even be careful not to use the word refined so by refinement we usually mean that we get something that's substantially better in the sense closer to the native state or something and that's not gonna happen because the energy landscape is way too rugged so what the energy minimization does is that at best it gets to the closest local minimum it will not on average move to significantly better parts of the energy landscape the reason why we do that is that if we're unlucky when we start a simulation or something we might be at the red at the very at the horribly bad part we're starting from and then things will crash just because the velocity everything is going to be everything is going to be too bad velocities forces are too large accelerations too large velocity is too large so the point of energy minimization is really to get away from colliding atoms and things like that but the second we can run a real simulation that's much more efficient which connects us to the next question so what does the simulation do yes and in particular by exploring the energy landscape you meaning what or yeah that's a good formulation and by explore I like the word sample so the point is not this is not while it might look at something is moving our point is not to get a time sequence of points you might get that anyway you can you can show that things are but the point is you're sampling things according to the Boltzmann distribution but you already showed that you could do that with Monte Carlo simulations too so why don't you do Monte Carlo simulations well it's not just that right that for a large protein what's the problem with the Monte Carlo simulation so Monte Carlo simulations will explore the Boltzmann distribution just as a simulation does you've used Monte Carlo simulations or your simplified models and the reason we use the simplified models is that they contain conceptually the same ideas as the complicated ones so they the reason why Monte Carlo simulations are efficient is that you make much larger moves you remember this old movie I showed of the water yesterday so when you're simulating simulating a molecular simulation in the real sense and integrating Newton's equations of motion it's wonderful in a way because you know since you're moving according to the forces and potential you fulfill the Boltzmann distribution you fulfill this criteria called detail balance it's amazing the curse of it is that you do this by moving in insane these small steps that you're hardly moving at all the beauty with Monte Carlo is that it can take much larger steps you're much more efficient at sampling your energy landscape but when you take larger steps you will on average bump into things this works in a simple system it works in a system if you have a protein without water but the second you start having water or a neighbor protein you will virtually always bump into something so that you're gonna keep trying to do Monte Carlo moves but we will never accept any moves because the motions are too large and that's why we normally give up on Monte Carlo simulations but occasionally it can be very useful as we're gonna move to docking at the end of the course and show how you place a small molecule in a protein we're gonna use Monte Carlo simulations much better than empty so the point is that what you would then get from a simulation that's related to the second-last question you can actually formulate the second-last question in another way this sampling is what it's an approximation of something mmm but we have it we had a name for that the collection of all microstates and their probabilities the partition function right so the partition function is this holy grail in physics that if you know the partition function you know every single microstate and you know how likely it is to be there the second you know the partition function you know everything about your system stop simulating and just calculate all the properties because all properties are just gonna be averages overall the microstates and we know all the microstates that's never gonna happen in reality so the simulation tries to approximate the partition function by at least focusing on the important parts so that brings us to the last question so what do we then get from a simulation yes and and that internally able to calculate averages of observables right say the free energy or the secondary structure or potentially a transition rate between a folded and unfolded state how likely it is for a big protein receptor to bind a drug some spectroscopy property anything you name it and I'm not sure if it's obvious but you see the analogy here that I mentioned that we had a very simple model here where we knew the energies for initiation and elongation by using that simple model and predicting what observable we would see we could use cd spectroscopy to get the fundamental factors back this is exactly the same thing we're doing but that's about 10 10 orders of magnitude more complicated so we have a model in the sense that we have a potential you have a way you describe your protein based on this model we predict what you would observe say a free energy or something you compare that to an experiment sorry not a free and in this case how likely it is for a molecule to bind you compare that to an experiment and based on the experimental result we can then go back said let's see did we have a correct model for how this small drug molecule binds to my receptor there are two answers to that the answer can be yes or no so that it's a more complicated model but this is really the modeling part of simulations it could also be that you have a very very fussy cry and that and I haven't forgotten that we should try to bring it on to the facility and this model is simply too fuzzy to directly place your atoms but if you can use a simulation and predict what your structure would like you can then use that to well first you can let your simulation be guided steered by the and cry in data for instance but in particular you can sell based on the data you see in the microscope that this seem to be a reasonable model of my protein that was all the questions we had I think it's fairly good to go through them because I realize I'll tell you a little secret I've talked about these things before so I know it fairly well and that's why it's so easy for me to rush through things and then I realize when we talk here there are some things that I feel that I spoke about two times yesterday and I realize you haven't caught them quite so the one thing let's if you have a chance try to answer questions all of you here if you don't know that's an even stronger reason why you should speak up the worst thing that can and I will spend as much time on those as we need absolutely worst case I'll start recording all lectures offline so that you can look at them yourself today we're gonna look at real proteins lots of pictures very little mathematics and as you might have guessed by now or even know that's the recurring theme here is really how we create these complex assemblies from simple building blocks I'm gonna talk a little bit about something called super secondary structure and motifs that you might might or might not have heard and then we're gonna talk about lot about some classes of proteins in general in the world there I would argue there are I and the book and a whole lot of other people would argue that there are three big classes of proteins and now I see the slide and I realize that I've drawn their size and it's exactly the opposite order from what they would be the first type of proteins we determined or found and realized there were proteins in that sense and got structures for were globular proteins globular proteins are either vanilla water soluble proteins that when you think protein this is usually what you think about the reason why they're called globular is that there is a there they're typically small they're very close to spherical simply to minimize the surface of interaction of with residues with water rights because you have hydrophobic stuff on the inside they perform a ton of functions in your body and they're very easy to work with there I would say easy but they're possible to crystallize that is also why most of the known protein structures are globular proteins not necessarily because most proteins are globular proteins they probably are but it's it's where the ones that we found easiest to work with in the beginning this is a membrane protein which is actually larger than the first one there or no it's probably about the same size membrane proteins are still relatively rare in databases it's a matter of a couple of hundred membrane protein structures and the main reason for that is that they live in this lipid fat surrounding and it's super hard to crystallize anything in fact actually I would argue that it's worth it's typically impossible so there are two ways people get membrane protein structures either by putting the entire protein in sort of my cell or taking the protein and binding it to an anti-body gigantic anti-body and then you can crystallize the anti-body and then you essentially get the membrane protein in your my cell as a small cargo to the anti-body and then you just happen to see that in the x-ray structure too there have been lots of noble prices for membrane protein structures the last few years and I bet we're gonna see more of them roughly one-third of proteins in your body are either in the membrane or they're connected to the membrane surface or something so they're much more common than you might think from their the number of structures in the data banks and I I have to confess that I'm not entirely unbiased here because this is my life essentially for proteins but one of the reasons why they're cool is that they're very functional their doors and windows they pass signals they they very concretely do something very biological in the cell and then we have they on the right here we have the type of proteins that's pretty much the opposite of membrane proteins these are boring proteins I would argue large fibrous proteins so that these are extremely large structures that form bone teeth hair skin and everything the special thing that they're so large that you can see them and there's very boring simple structures that just repeat or something there are a couple of cases that can be fun to see so I'm gonna take try to take you through a bunch of these membrane proteins I'm gonna spend all of tomorrow talking about the book is completely outdated of membrane proteins I'm not even I'm gonna deviate completely from the book there and today I'm gonna start talking a little bit about fibrous proteins and then about globular proteins I very much doubt that you're gonna do much work on fibrous proteins but there are a couple of things that I want you to have seen at least so take the fibrous protein part with a grain of salt and think most about the globular proteins when I go through them so all these structures both the structures that I draw and other structures are actually not produce right but they're models of proteins and there are a couple of programs that's in principle we could make a lab on this if you want to but I don't think it's worth it there are some free programs that you can download so that you can actually look at proteins the produce you get from the protein data bank is just a horrible simple text file where each line says atom and then it says what the number of rest the rest of you and the atom is the name of the rest you the name of the atom and that XYZ coordinates in angstrom you can't do anything with those files so be able to be able to ever look or investigate proteins you need to download one of these programs VMD is one of them Pymol is another one you can just search for PDB viewer or something the reason why I still want to stress the importance of these programs is that a protein like this the ribosome or something that can have like hundreds of thousands of lines of atoms and just being able to understand forget about the computations and simulations just being able to understand what these do and what parts you have it it's a bit of an art to learn how to represent that well so this is hemoglobin which is the oxygen carrier in your blood the small green group you see there is what you call a prosthetic group and a prosthetic group is a group that is not really this is not an amino acid but it's a group that exists in your bladder cell and when the protein has folded this group binds to the protein yeah and you know what this particular group is called yes kind of it's actually called there to name it's a proto porphyrin and when a proto porphyrin binds an iron atom it becomes a he and right next to that iron atom you combine the oxygen so you have an hemoglobin actually consists of four subunits so you see two on this side and two on the other you would not have seen that had I shown you that image and that is the image that you see by default right so that to be able to visualize and understand proteins you need to learn a bit to work with chains or surfaces or coloring different ways of drawing them that is really what the bonds and the atoms are this just bonds and coloring by atoms you can decide to draw the main chain if you want so that's just essentially just the alpha carbons even that I find very hard to understand or you can actually draw all the chains so here we don't have we see that it's alpha helices and you might be able to start to see that there are four different subunits right you certainly don't see that up there and you can draw the side chains but here you see the curse to the more information you have the harder it is to see the trees for the forest and eventually you might want to zoom in on the protoforferin and actually see the oxygen right next to it there so it all depends on what you want to see there are some simply stunning visualizations that are made usually with these tools in particular where people are awarded noble prices or so for their work and there's actually even quite a lot of research in modern visualization so how do you like systems like this this is a bacteria fate and the I'm not going to go through the story of the history of early molecular biology and everything but these structures were super important when people first started studying molecular biology max periods and others ribosome I think gigantic structure to see here the red chains here actually RNA while all the blue stuff is helices proteins that's also not something you would see unless you try to visualize it correctly or you can use it to simply do a visualization or simulation what happens if you have a DNA and you're going to pull it through a nanopore so this is not something this is a device that doesn't really exist yet in particularly not through a graphene nanopore but people are of course running simulations here so why are you running simulations before we even have nanopores right because the way we're going to try to detect this is a base on what base pairs you have here we're going to see hopefully the bases here have slightly different composition right so hopefully there will be some sort of difference we can detect say the capacity or the electric of the some current in the graphene layer or something and if we can see if that is not possible to detect even theoretically it's kind of a bad idea to start building the pores the cool thing is that you can show in simulation that it is possible to see the separate signatures for AGC and T here that doesn't mean that we can discover it in an experiment but that means that's at least it's at least worth trying and that's what a lot in the groups in the world are working on right now so this might fail we don't know but right now there is at least one small company called Oxford Nanopore that has manufactured a device that is not based on graphene or anything but we're starting to have the first so-called nanopore devices for sequencing come out on the market have you seen them they're super cool I don't unfortunately I don't have any they're the size of a USB stick actually they're not just the size of a USB stick it is it is it is the USB stick so that this one is the type connected USB you open it up you put a drop on it you close the lid and they run it so it's a sequencer connected to your laptop so that they're and people are still a bit worried because it's not entirely obvious how you create the sequence right that do you start having some systematic errors because it's a completely new type of sequencing technology but in many of these cases just being able to visualize it and showing what happens this is more connected to simulation you might think because it's a on the other hand it is a bit dangerous because it's so easy to be seduced by this beautiful plots right but this is just a model it's not reality until you simulate it and show that it matches reality so the first of these classes that I'm gonna speak to it's fibrous proteins the reason why fibrous proteins exist is that there are the structural building blocks of your body otherwise you would just be a small and mobile or something with the wall not actually no not a mobile collection of a mobile multicellular but you would just be a pool of cells and most of us aren't pools of cells and that's because of these rigid building blocks that in particular means that they don't it's pretty common that they don't really have a direct biological function their biological function is really to act as the scaffold they are usually hard very hard and to be able to create something that's millimeters or centimeters thick you can't have one protein structure that's where right so that they typically build in a huge number of hierarchies that so you have one helix four helices eight helices sixteen helices that go together and together and together and together and eventually you to get to levels where you can form bone tooth nails etc shells claws that's all proteins so that the protein itself is large but assemblies of these proteins are even larger so you're it's very common to have things like this that are long threads or filaments or chains or something and then you go to larger and larger and larger scales I think that is from bone actually and they're very they're assembled in very simple ways normally just hide it in bonds let's start with the very simple ones silk fibroin one of the first ones this is actually what gets silk its properties and it's it's just beta sheets you felt beta sheets pure beta sheets and it's serine glycine alanine glycine alanine glycine and then you repeat the serine glycine alanine glycine alanine glycine serine glycine alanine glycine alanine glycine pretty boring but that's the whole point these should not have a very specific biological function right if you just want large extension just keep repeating the same pattern over and over and over again this in particular the serine in particular means that they have separate hydrophobic and hydrophilic surfaces which means that they also like to form layers and these layers assemble in larger layers and that's why it eventually gets large enough that you can see it that of course in real silk this would not be entirely pure but it's kind of like a crystal but still not quite I'm not sure whether I have this here it's either here or in another slide have you ever looked at a shampoo bottle and see that they say that they can tell you silk protein that has absolutely nothing to do with silk it's this and you think that it's something well they they announced that as if it would be expensive or nice in any way that first of that absolutely nothing to do with your hair your hair for instance whatever it's also going to be protein your hair will not spontaneously starts to absorb other amino acids there are other properties in the shampoo that makes you get swell shiny hair whatever it is so it's a complete market scan that absolutely nothing to do with silk and you can manufacture tons of this as no cost whatsoever it's the cheapest amino acid you can imagine no because that then and then you need these very long very specific structure that would be completely different so getting the getting the large structure right would be hard it's actually good I bet you can make artificial silk but it would probably be more expensive like silk is not that expensive silk is expensive in the western world once you've had good maybe once this has gone through 14 14 different merchants and everything but the actual farmer with the silkworms and like even silkworm silk isn't that expensive originally I would I would assume that's why it's not a gigantic industry way more money to be made from cosmetics or something that just pretend that it's expensive and sell the cheap stuff collagen collagen is a really screwed up so collagen is a helix it's actually three helices that do you see that is proline proline proline proline proline proline you would not believe this that you would see proline in a helix right note that it's not an alpha helix there is not a single hydrogen bond inside each chain here that is either white gray or black but there are hydrogen bonds between the chains and frequently with the water very strange structure this is a quarter of the protein in your body very boring protein bone teeth skin just creates these large extended structures and it can be either very hard or a bit brittle and this can be like 300 nanometers long extremely long fibers and then it aggregates into larger larger larger layers this consists of mostly proline and then a little bit of glycine so it's the glycine here between it that gives the link so that it can actually turn but it's two-thirds proline almost all the protein in your body is in college if you wanted to go into detail these chains are actually not strictly identical there are two types of one chain and one type of a slightly different chain it's not super important but have you seen what we've done here so we've taken our chains and I think we've assembled six of them or something here and this is very typical for fibrous proteins that you go for one large chain and then you make a chain that's even larger and then you have several of these filaments that are even larger and eventually you're going to get to something like this that this is dentine in your teeth and I should see that this is actually an electron microscope image and I think it says 100 kilo electron volts there and I should know what the extent of that is but actually don't remember but this is pure protein but the other thing that can happen remember that glycine if that glycine is mutated to any other residue it's no longer going to be as flexible and when that is not as flexible you're not going to get all the beautiful hydrogen bonds in these structures and then you're going to get brittle bone disease so it's a very classical example of a single residue mutation that leads to pretty severe disease and it's all based because it's influencing the structure remember this thing I said earlier on the course that what is the central dogma of molecular biology so that's like sequence mutation leads to a problem in the structure leads to a problem in the function normal collagen is pretty much just one secret and again you might have some very rare differences that every hundred residue you might have something else but the reason why I can show you these sequences that there is extremely little sequence variation because there is no point right if you if you look at if you look at hemoglobin or something and I will get back to that there might be reasons to have slightly different properties of hemoglobin but bone is bone is bone that there aren't we don't really have needs for different types of bone there is no point in natural selection no I would be almost certain that it's not assembled in the ribosome that actually good question I don't know that no so these no so this mutation this is usually something that you're born with this is something you're born with most mutations are in rare cases of course mutations happen spontaneously but then they happen in one cell that's usually not a problem for you unless this is a mutation that causes this cell division process to go berserk ie cancer and the problem with cancer is that that one cell is then multiplied into billions of cells before you know it then eventually it will take over your entire body but in general the body deals with mutations by killing that cell you have plenty of cells to spare alpha helices in general remember that I said that collagen was really a set of helices that assemble into more helices and this happens quite a lot technically I'm not well technically I'm not sure whether I would say that this is a fibrous protein but this whole point of having many alpha helices go together and building even larger structures that is something that recurs in lots of places so in every single remember that I spoke about titanium one of the first places so there is a protein called myocene in your muscles and that's this protein that is this coil coil helix here and what happens when a muscle contracts or something is that you go through a process where this protein essentially walks on this other muscle fiber here and that's how they contract in a millisecond or something using a huge amount of ATP there is a ton of different structural transitions they go through so that there's no intelligence here is that they're just going to something that's more favorable thermodynamically but when you it's easy to understand these things when you see one protein I'm still simply amazed the way the body can handle these things purely by thermodynamics it is only thermodynamics very good question the reason why they're typically 3.5 rested use is that after two turns you want to get back because otherwise you would keep shifting a bit but that's not always the case so that's in many cases even when you have these coil coil helices the entire coil tends to move around a bit too it's it's not important so I'll skip that and that brings a little bit more to helices I'm sorry helix geometry why am I gonna say this the reason I'm gonna say this is because this will be super important tomorrow when I talk about membrane proteins if you just look at a helix that way it's not really gonna be too obvious you can also look at one of those cartoons but that doesn't really tell you off either or we can look at the surface of the helix and if you look at the surface of the entire helix in particular if you see this in a program you will actually see that there's kind of a valley here and then there's a peak here and a peak here valley here so there's kind of depending on how you look at it there are both valleys here and there are peaks between the valleys and these peaks of course correspond to the side chains and depending what peaks you look at well the side chains are points right that there are kind of two peaks one around 40 to 50 and one around 20 minus 20 to 30 degrees what do you think can happen when two helices interact exactly or that depends on your residues of course what if you now put two gigantic residues say two trip defense right next to each other they're not going to be able to pack right because well even if it's not repelling electrostatically it's simply there isn't enough room so that ideally if you take one helix move on top of the other and then rotate it a bit these can pack beautifully if you just look at all your protein structures how common is it that you see two helices pack that way no you'd never see it they always pack that way they always this is the reason because it might sense you but why on earth are then why on earth aren't they really packed and the reason why they almost make an angle to each other and there are two angles they can make either roughly 20 or 50 in the other direction and the reason for that is exactly this otherwise you would not be able to pack their surfaces very well Francis Crick predicted that 53 10 years before the structures pretty cool yeah they kind of had a they had a productive year you see that and there are several helices in particular in coil coils this is important so when this helices cross each other first is really good to have glycines because glycines are small and they don't really take a hold of space but it's also very favorite that these red residues be lie and leucines so why would that be useful sorry they're hydrophobic right they're both small but they're also strongly hydrophobic so it would be bad to expose these side chains to the water they love to interact with each other on the other hand and when these are placed roughly ever every seven residues apart they form a beautiful they let them we literally call them losing zippers because it's a zipper that you can zip up a very very long if you just saw an alpha heli and these helices can be like 50 residues long you're not going to see them in the protein data bank why don't why aren't you going to see that in the protein data bank well no it's that so these frequently occurs part of a receptor or on a bacterium something sticking out of a cell so they're long they're fairly floppy and they're extremely hard to crystallize right if they're 50 or 100 residues long so you're not there but you're never ever going to see them in a crystal but this is the great advantage with bioinformatics we see this with sequence prediction nowadays that there is something beautiful that's alpha helical in addition to alpha helix you see that every seventh residue is a leucine and by that time you know that you're going to have a coil to coil losing zipper roughly 10% of these residues are cysteines so how can you use that well I kind of said that they're already right if you now add disulfide bridges to this you can make this even stronger remember that for a second because this is alpha carotene it's here all your merchant acts is not entirely that there are some there's some other trace amounts of other proteins in here but here is almost entirely alpha helix so what you do is that you have an individual helix that first coil coil up in a coil coil so there are two of them and then you have a pair of these two two ones and then depending on exactly how these intermediate filaments or something pair up you first form a matrix and you form a macro fiber ill and then you eventually form these very large structures that actually is the hair which in turn connects to the the cortical cells and the cuticles in your skin so that I actually forgot the exact you can actually do what that's a very good exercise you can calculate how many helices there is in an average hair if a hair is in the ballpark of 0.1 millimeter but then I realized that I actually looked up how fast hair forms and how many alpha helices that correspond to and I kind of lied to you yesterday I thought that this would be almost at the limit speed of what alpha helices can fold and it's not hair folds roughly a thousand times slower why is that it's actually makes sense if you think about it well we need to we need to build all the amino acids right we're limited by we're limited by the speed by which it can provide new amino acids to the cells and everything so in this case we're not limited by the alpha helix formation speed but it's still pretty decent right you're adding something like every 30 milliseconds or something you're adding an amino acid to each helix so this brings us to some other question the cysteines what could you do with the cysteines do you think or a permanent way right that's how you make that's how you make a permanent way so that normally in the hair those systems will form disulfide bridges so now you first add a reducing agents you break all your disulfide bridges and then you form the hair in whatever shape you want it or just comb it straight and then you add a new oxidizing agent so you reform the disulfide bridges and then the hair will tend to stay that is the hair is certainly still certainly flexible right and everything but for instance if you want curls in it then you now force it to form the disulfide bridges in a form where the hair is curled very simple chemistry sorry I normally don't but if you go to the hairdresser and get a permanent wave they they are working with your disulfide bridges in your hair yes that's how it works I think it was more popular in the 80s than today but elastine is another strange protein it's a very high it's a highly elastic fibrous protein which is kind of similar to collagen but it's a bit softer and what happens is that you're having these fairly short collagen like helices and then they're cross-linked by small leucine here and this makes it very elastic you can stretch it and pull it and this is what nature uses and things like uroorta and many of these diseases what you get when you're older is that for instance if you have a deficiency in some of these license modifying enzymes eventually these protein will change so that it's more brittle and it's not as flexible and that's for instance why you can eventually get an auto rupture or something that you it's very rare to have an auto rupture when you're 20 but eventually as you get older and older your vessels becomes more brittle and fragile this is incidentally related to something have you read the scandals about Paolo Mazzarini at the Kirliska hospital that the hardest Italian surgeon who kept inserting artificial track and everything long term this is of course the direction in which the the entire field would go that using biomaterials instead so what is the advantage of having a biomaterial instead of a plastic order well lies license modifying enzymes aren't going to work on the plastic either so that's not a problem what's the problem with plastic your immune defense does its job what happens if you have a protein it's bio-compatible so there are of course some metals such as Titan or something that are largely bio-compatible but proteases even better proteins are awesome but you won't even try to reject it it's still not something that's used a lot because it's fairly early research and it's hard to make these things again nature has had 4.3 billion years to perfect these processes but long term if we could make things with biocompatible materials you could replace organs or anything but people still would not be dependent on anti-rejection drugs which is frequently why the reason why people die after its procedure because if you take at these anti-rejection drugs that the rest of your life suddenly you die from a cold not your new heart so well the problem is kind of the antigens that the body is reacting on right or when it's biologically I don't think it would be so much antigens it's hard because we don't know exactly so that has been a biological material then it would usually have been proteins that your immune defense reacts to specific proteins in a foreign organism the reason why this would work is that they would use the same type of protein as we have in a human there are some materials that appear to be more biocompatible than others for instance Titan so that's what you frequently use in hip replacements or something for whatever reason it's a very smooth surface it doesn't appear to upset your immune system a whole lot so there you won't really need an anti-rejection drug plastic is more plastic is just as complicated as proteins in a way right there are billions of ways to create these polymers plastic is nice and married in the other ways you can shape it longer you could 3d print it and people are actually using that not so much plastics but when you operate when you perform surgery in your brain or something and you remove a piece of the bone then they actually 3d print small metal cages to perform be part of your skull so in principle the answer is yes it's just that it's it's a very young science the other problem with all these things that you usually it's very rare that you perform surgeries on healthy 20 year olds right so then and if most of your subjects are 85 years old there will be lots of deaths not necessarily because of the procedure but because these patients were burial in the first place so I think that's all I'm going to say about fibers proteins and now we're going to have too much fun yourself globular proteins this is going to be biological so biologically there are you know about the alpha helix and the beta sheets you could probably almost guess this there are proteins that are only alpha helix that are proteins that are only beta sheets and there are some proteins that are mixtures of helices and sheets and that's pretty much the way you divide it have you studied anything about faults and anything in the bifidics course a little bit some of you might not have so let me know if I'm repeating some stuff you already know otherwise I'll take you through it here too in some way you would like to try to classify things but it's an extremely large space and what you might what you might not think about but what we're doing in the very first lectures in this course I was studying atoms right I was studying how is the say the alpha helix sorry how is the alpha carbon connected to the nitrogen and how is something else connected to the hydrogen and how is that connected to the beta carbon after that we started looking at how amino acids are connected to each other so we are essentially gradually moving up in some sort of sequence based on primary to secondary to tertiary structure and the idea of doing that is that just as you saw yesterday all this is really a collection of packed atoms but understanding how the chain goes helps us to trace and understand evolution in many cases and that is hard to understand so what you might this is starting to get a couple of years old so it's not that common anymore but it's very common to try to look at topologies of proteins and you can for instance decide to draw helices look at this from the top and draw this complicated structure with triangles for the helices and circles for the triangles for the sheets and red circles for the alpha helices and they can vary this structure which might be complicated do you see the pattern here so you have a helix sheet helix sheet helix sheet helix sheets and all these sheets they line up together so all these sheets are going to form hydrogen bonds with each other right but the actual sequence goes in and out in and out in and out here is an even more complicated protein so that while this one helps you see the sheet this one helps you see how the sheet is formed in order of the sequence these types of plots are getting more and more rare but this is the reason why we typically like to color things by secondary structure it helps us understand in principle how things work of these structures I would actually maybe surprising alpha helices don't form the simplest structure beta sheet structures are usually much simpler and you might you might disagree with me seeing that protein which might not be the best example but if you if you ignore all the coils here for a bit and everything these are actually very nice and simple so you have a sheet there you have a second sheet there you have a sheet there and it's not too many complications between rotations and everything you almost never have more than two stack sheets that's a very strange question why don't you have more than two sheets well we'll get back to that but nature appears to prefer to keep things simple you might not agree with that considering some of the beta sheets protein we will see later and everything but nature nature adds complications but although the complication we actually need the larger protein is the more expensive is to build the more expensive it is going to be to tear down when we're done with it how do you how long do you think a protein like that survives by the way in your body there's a reason I'm gonna ask it because I have that in a second slide later today ballpark you know what that's pretty good I was like two to four weeks yes you know see this is the advantage of guessing if you have absolutely no idea if keeping quiet you're gonna have a chance and I get it right the second thing if you're gonna be a successful scientist you should just not rather say that it was a guess no most of the sheets are anti-parallel but there are certainly parallel ones too and then you can start the book goes into some detail about this you can actually if you're interested you can actually there are some statistical differences whether you typically go this way or do you typically go that way why on earth do you care about things like that unless your physicist to think that it's fun yes or you can certainly think about in specific structure but it's excellent this is remarkably useful if you're ever gonna start predicting protein structures because it turns out some of these are extremely rare and if you now have two of these blue right next to each other but you can't see the red loop are you gonna guess on the probable or the improbable one so if you don't know what it is right we know that one is way more likely than the other that's likely going to be the structure let's guess and pick that one it even turns out that some of these don't exist at all in nature we have no idea why it's just that we haven't seen them night you might not nature might be a bit late but it will come by five billion years since beater sheets are flats they're pretty much just two ways in which it pack it can be either orthogonal or they can pack in a line parallel way they get just so slightly different properties I think yes I have an example start with the one on the right gamma crystalline that's what you have in your islands your entire islands consist of beta sheets to you know what this is kind of funny right because it we had silk that was entirely beta sheets to silk is certainly not transparent gamma crystalline definitely is transparent so even though things have the same secondary structure and everything they can definitely have quite different properties the other protein I might not have had the other story I'm gonna say if you look a bit at how this protein this was too much we'll have to go here we start on the end terminus and then we go up here and then we have a very tight loop and then we go down and then I have a very tight loop again and then I have a long loop you see that the long loop here goes over the short loop and then we go down and then we keep we keep repeating this pattern that a short loop and then go down and then the long loop over the short loop that's a very very common pattern in beta sheets that it's so common that it even has a name it's It's called a Greek key. And it has to do with these shapes, right? You see that a short loop and then a long loop, so you always have a short loop. Why do you think nature uses this pattern? Apart from the fact that it's beautiful. I would argue, but here I am certainly sticking out my neck a bit, I would argue that nature uses this for exactly the same reason that it's used here. That's actually a very good reason too, but why do you think the artist picked this pattern? You can draw it without lifting your pen. This here, right? You just continue the pattern. It's a continuous pattern. You'd never, ever lift your pen. And lifting your pen here would correspond to having a break in your sequence, right? Then you would no longer have one protein. Then you would need two different proteins. So this is a way to have a single chain pack into something that's fairly rigid. So why is- It's all structured. Sorry? It gives no structure. That's a good question. I love this. That's a good question. What would be the use of just having a large beta sheet this way? I can keep drawing it. The answer is not the whole of it, right? Apart from the fact that it might be beautiful to have in your wall or something. Nature needs to do something. So if you have two of these very large beta sheets together, what would the use be? Well, no, not really, because it's gonna be open on all sides, right? So that's like, if I take two pieces of, well, if I take two pieces of paper, these two, it's not really gonna be the world's best channel or pocket. It's just a bit too many holes in it. So if you move to another protein, this is the other type of protein. Do you see here how nature has frequently tried to close it up, either with a helix, or in this case with small loops or something? So that nature frequently tries to use this loop to close the pocket. And ideally you would like to have something here which you actually have here, because you don't see the side chains. Something that is pretty much closed on three sides and then you just have a small opening from one side, because suddenly you have a real pocket where you can bind something and it doesn't get out. So the reason why we want these loops, we don't necessarily want the loops that are as small as possible, but so we wanna form something that is pocket-like and then we can use these loops to close the pockets on the top and bottom and sides. So this is this fatty acid binding protein that I showed you very quickly at the beginning of the course. Here we have a fatty acid, an oleic acid, bound in it. This is an extra, a real extra structure of it. This is a really cool protein because it's completely hydrophobic on the inside. So it has this hydrophobic binding pocket while it's hydrophilic on the inside. And this is the protein that transports all the fatty acids in your body so you can build your membranes. Extremely cool, small, simple machine. Kind of amazing how nature does it. But the funny thing with beta sheets is that you might think that there are so many possibilities and in principle there are, but you don't see most of them. There's only a really, compared to all the possibilities in the world, there's a relatively small amount of topologies that we actually observe. We never see mixed parallel and anti-parallel beta sheets right next to each other. It's usually that you have a large region of the protein that's either parallel sheet or anti-parallel sheet. Don't ask me why. Well, you probably know it's likely related to entropy, but we don't really know. Left-handed crossover form for sheets of these two red chords I showed you, it virtually never happens. And part of this, of course, comes from the properties of amino acids, what's favorable both entropically and in terms of energies. And at the end of the day, we have the proteins we have because to form a stable protein that is happy to be folded, you need its building blocks to be stable. So in most cases, that these building blocks are not quite as stable as the ones we do observe. But it's still very strange that this, no matter how small these effects are, you're talking about differences of one or two K-cal or something. And it turns out one of the forms we never see for whatever reason. I'm gonna come back to that a little bit because this enters in prediction. David Baker had a really beautiful result that they, oh no, sorry, I have it in two slides. So anti-parallel beta sheets are by far the most abundant. You can go up, down, up, down. And that's based on these hair pins we're talking about. Why is that more common than parallel beta sheets? Do you think? So the problem is that what do you need to do to create a parallel beta sheet? Yeah, so once you have your first strand, right, you need to go elsewhere and create a helix or do something so that you can come back to the start and then you start forming an axial. And this, of course, makes it an even more large-scale, long-range interaction. Then you're gonna need to mix sheet, helix, sheet, helix, sheet, helix or something. There are some structures that do that, but it's a more complicated structure. But just going up, down, up, down, up, down is so much easier. And that likely makes it faster to form, but that's just hand-waving. I certainly have not proven that. The other thing is that you can't really have a loop or a knot or anything, but the funny thing, there is no rule, there is no rule in biology without an exception. So there is a small protein called pepsine that if you look at the structure, it actually, you see there how the green one goes on the outside of the yellow helix and then in again. So that is actually a knot in the mathematical sense, which is a small, very rigid protein. It doesn't matter what it do. And that brings me back to the other thing. You have actually seen some knots. When could you create knots in a protein? Yeah, so how did you create the knot there? Right, and that's how nature can cheat a bit, right? It's very difficult. The reason, sorry, why do you think that we don't have knots in normal proteins? Well, it's not too strong. It's something more complicated. Yes, because it's something important you need to fold the protein, right? And that has to fold by following the normal rules of nature. And for that to fold and spontaneously forming a really complicated knot, that's never gonna happen atropically. But the beautiful thing with the disulfide bridges, how do they get around that? Exactly, so the point of when you fold the protein, you don't have your disulfide bridges. You can form those once you fold it to stabilize the protein. And that way you don't, it's not a knot in the sense that the backbone in these toxins don't form a knot. So the backbone is kind of a normal backbone, but once you have folded the protein, the cysteines will lock in the backbone in a way that you can't really unfold the backbone anymore. It's pretty smart. So there are two more slides I'll take on beta sheets and then let's have a break. And this leads to something that David Baker people realized a few years ago. It turns out that this is the normal beta meander, which goes to the way rivers flow. Anti-parallel, up down, up down, up down, when the sheets would be in order one, two, three and four. These are the key patterns. And what I've drawn under here, that's kind of how common they are. And then there are a bunch of patterns that you pretty much never see. There might be some of these that you see once in a blue moon, but at least a couple of these have never, ever been observed in nature. Well, partly it's because they cross over each other. Partly this, I would imagine this one for instance, right? Here you have the end of the sequence stopping right in a loop. And again, it's not impossible, but then you would need to have another loop and move that to the side or something. It's simply not that common. So what David Baker realized that when it comes to predicting beta sheet structure, you don't need to consider every single possible beta sheets because if you've never, ever observed a sheet and let's say that confirmation, this is arbitrary. I'm not saying that this one is informed. If you've never observed this in the 120,000 PDB structures we have this for, if you have a new PDB structure that you're gonna predict, are you gonna propose that structure for a beta sheet? No, you will likely propose one of those three, right? So that you can use this, even though we don't know why, which is both the power and curse of bioinformatics. We don't know why, which is a bummer. But we certainly, we can see the writing on the wall or the whiteboard in this case that if something hardly ever occurs in nature, it's with all likelihood, it's gonna be extremely unlikely that this occurs at our protein too. So suddenly you can weigh the probability with which we predict different things based on how prevalent they are in nature. At the end, yes, at the end of the day that's the only thing you can do. But there is something else you can do. You can be fond of laws and obey the laws of nature or is there something else you could think of trying to do? Well, or can you try to design a protein that would be stable in that sense? Nature hasn't done it yet. But just because nature hasn't done it, that doesn't mean you can't. And people have actually been successful in that. So you can create structures that nature has not yet created. And this becomes complicated because that is a structure will you ever be able to predict that structure with bioinformatics? And that's not really a shortcoming of bioinformatics, because bioinformatics rests on the assumption that evolution works, that we find patterns in evolution. But if this pattern is not present in evolution, you've lost. And that's why a big programs in bioinformatics, for instance, like Rosetta, they tend to use a combination. Homology modeling is insanely powerful when it works. You should always use that first. But there are of course cases where you have to go back to physics and start to optimize these things and see can you create a stable seed here? So in this case, this is still the last five, six years. So this is on the toy level where people are trying to see whether they can build something that's stable. But today I would say that there are probably two dozen examples in the literature when people have built proteins that nature and completely new folds, folds that nature hasn't used before. So that's the other question. What function? What function do you want? Yeah? I thought of bringing this. Okay, since you asked, I'll take this up. Because this is of course the real question. Here I was just talking about designing structure, right? What's the point of designing structure? Well, it's of course a point of designing structure just to see conceptually, can we do it? But the goal is to design function. Do you think anybody has done it yet? They have, even David and others. People have designed enzymes. So remember that an enzyme is a catalyst, right? That speeds up a reaction. And people have not just designed, taken a fold, a protein that is not an enzyme and turned it into an enzyme, that was not so hard. But what people have even been able to do, they've been able to find a chemical reaction that is slow and that there is no known enzyme that speeds up this reaction. But then design an artificial enzyme to speed up a reaction that was previously not catalyzed in nature. And the cool thing is, of course, if you can start doing this for important biotechnology processes and everything, there are insane amounts of money to be made from making this process go faster. So this is certainly happening a lot in biotechnology today. Yes, and that's part of, based on a tradition, which whether it's healthy or not, doesn't matter so much. The problem is, there is a general problem in life science today that is getting, the requirements for putting a new drug on the market or performing clinical trials, they're getting harder every day. Everybody wants a drug, but nobody wants to pay for it. It should be so safe that ideally you should test the drug for 15 years before you put it on the market and after 20 years, your patent expires. So that's why it's getting, I would argue that the way we currently, the way we have currently developed drugs is heading towards a failure. Or you can target biotech industry. Like 12 months after you have your new protein, this can be used in a company because you're producing soap or something, it doesn't matter. It's like you're not administering this into humans, right? So in biotechnology, you don't necessarily need the ethical permits or anything, just start doing it, use it in the factory right away. Now, the reason why we still have pharmaceutical companies is that in general, people are willing to pay slightly more for a cancer drug than for soap, but that's why there's so much work in biotech today. It's easier to sell it. And that brings me to the, I think this is, yes, the last slide on beta sheets. Sorry, beta sheets have this funny ability that I, oh, I mentioned this yesterday when we talked about priors and misfolding, that you can have two beta sheets dimerized with each other. And at this point, it starts becoming a definition. Is this one or two beta sheets, right? Effectively, this is one sheet. Okay, this one goes up and that's one go, sorry, this one goes down and that one goes up. So it's just a continuation of an anti-parallel beta sheet. And this leads to extremely strong dimerization services. You can just count, start counting up the hydrogen bonds. This is important because this is also where you can start to design protein-protein interactions. At the end of the day, protein-protein interaction is nothing else that two proteins binding to each other. And you can sit down and try to design something, make two proteins bind each other better. Or in many cases, if you're developing drugs, it turns out that protein binding almost causes something to activate a receptor. Assuming, in this case, it's a dimer, but could you imagine doing something here if this is bad? Could you imagine doing something to break up this interaction? Yes, or triptophan might very well be good, but the whole point, either you put something that disrupts that beta strand, or you put something here that's so large that they don't really want to bind. And that's frequently how people develop drugs. There is a famous drug called LISPRO, which is a form of insulin. I'm not sure you know about insulin, but when people are diabetic, one of the problems with human insulin is that it takes quite a while for the insulin to reach its functional form. Insulin is frequently a hexameric solution or something in the body that says to dissolve first into dimers and then monomers. And that takes like 30 minutes or an hour or so, which is part of the problem is that it's hard to administer the dose correctly. And I think it was Eli Lilly, who had this LISPRO is their internal name. It has some other market name humane LISPRO insulin or whatever. They just took two residues, LISCEN and the proline, I think it is. That's the reason for the name and swapped the order of them in a coil. And for whatever reason, this influences the insulin's ability to dimerize. And suddenly it doesn't dimerize as well anymore. And if it doesn't dimerize, it's gonna be in the monomer form. And if it's in the monomer form, it's acts directly in the body. Super simple. But of course, somebody had to be pretty smart to realize that those are the two residues to do. Let's do it. And at that stage is based on modeling. Somebody sat down, likely sat down and looked at that. You know what? Let's try to swap those two residues. That's kind of dangerous. Can you make a drug just based on a computer model? So that's of course the whole point, right? The second you have the idea that, oh, this would be really smart. Now you can tell the lab guys, have you tried swap, try the sequences that I see that works better. And it worked better. And then you're done. And once you are at that sequence, it doesn't really matter how you came up with the idea of swapping them. It just works. It's 10.30. Let's take, let's take our 20 minutes and meet here 10 to 11. And then I'm gonna continue talking about alpha helices, a lot too. I finished the beta sheets before the break and I'm not gonna go back to them, but just to explain, there are lots of reasons why alpha helices are different. We went through a lot of them both the week before the Easter break in the last few days. And you know everything about the properties of the structures. The only thing that we maybe haven't mentioned really that all these hydrogen bonds in the beta sheets leads to lots of constraints because they are not local, right? So that the more long, the more long range distances that you're freezing and saying this has to be exactly one on a meter or whatever it is, the more you restrain different parts of the protein together. While alpha helices, formerly they have just as many hydrogen bonds as every residue participate in two with the NH group and the CO group. But since all of these are local, it means the much, much fewer global constraints. And this is the reason why alpha helical structures, surprisingly, although it's easier to understand how an alpha helix folds, this means that their largest structure is gonna be more diverse and more complicated than the beta sheets. So in general, alpha helices, the only requirement we have here is that we somehow need to pack the helices together. There are usually never any, there are usually at least not a lot of hydrogen bonds between different alpha helices. And here you probably see right that any time to helices cross, they appear to cross at an angle that is like 30 degrees or so, that I spoke about before. So this diversity is certainly more important than when particularly when you have mixed structures with helices and beta sheets, it's gonna get even more complicated. There are some channels like, sorry, not some channels. There are some structures like hemoglobin that it just seems to be large blobs of helices packed. In particular, membrane proteins that I'm not gonna talk about today that we will look at tomorrow, they frequently have these helices that goes through the membrane that I'm sure that you're also in the bioinformatics course. And then we somehow tend to arrange the helices around a pore or something that you wanna transport things in. But similar, we actually have very simple, plain four helical bundles or something, just collections of four helices in nature. They're not entirely parallel, so they do make a bit of an angle there. You can take those four helices in turn and start to subdivide them. Eventually, this is how you classify proteins in nature. So all these are examples of four helical bundles, cytochrome C, TMV code protein, and hemorrhithrine. In all these cases, adjacent helices are anti-parallel. So you go up, down, up, down, up, down. That's always the case for these four helical bundles. I'll just show you briefly. So which one of these do you think is most stable or which one, is there any of these that you find that the structure is a bit surprising that it looks strange or something? Yeah, that looks like it should be really floppy or not particularly good, right? There is a reason for everything in nature. This almost looks like a wedge. That's as thin here and wide over there. The first one is on the left, cytochrome. Cytochrome fold is a gigantic fold of proteins. And it's usually related to electron transport or binding metals. This is a very fun project that we worked on some 15 years ago at Stanford. We were actually supported by DARPA, which is the US Defense Research Agency. And we were trying to do fold recognition at the time of the entire organism. So basically trying to determine every single fold in seven-in-la-one dances, MR1. Why on earth would the US defense be interested in something like that? Well, let's see. This one combined to metals. And it's likely a bacterium that can chew radioactivity. And this bacterium has more cytochrome folds than any other known organisms. We still don't know, because then the Iraq war happened and they needed the money for cruise missiles instead. I guess I've been able to waste some research funding that was spent on ours instead of cruise missiles. And that gives me a bit of a good conscience. So we never got further in that project. That's the danger with DARPA. If they feel that they need the money for something else, they would just take it back instantly. But the point is that there are lots of really complicated processes in nature that we have no idea how they work. It's still one of them. We have no idea how a bacterium interacts with radioactivity or heavy metals. And it's kind of important, because heavy metals is a common source of pollution, right? Even in drinking water. So what if you could get bacteria to somehow absorb the heavy metals? Because once the heavy metal is in bacteria, it would be much easier to filter it out with water. Unsolved problem. Tobacco mosaic virus. Tobacco plant leaves. All the black stuff here is a virus that has attacked the plant. Absolutely horrible for tobacco growers. And you can already in the 1940s and 50s, people were able to identify this virus in an electron microscope. Do you see these long rods? It's all a virus. And you can keep amplifying it. And at the end of the day, this is what the virus looks like. That was Rosalind Franklin who determined. These small proteins, you see that it's a small wedge. So each and every one of these small virus is a four helical bundle. This is frequently our nature does it. So viruses in general, they're coated by something. What is the red stuff in here? Exactly. So RNA, a virus is dead, right? A virus isn't live, but still it's a dead particle that contains genetic material. I love viruses, but they're kind of exactly halfway between life and death. So there is absolutely no cellular process or anything in life. A virus literally just contains, consists of the genetic material and then sort of code protein to protect that. But then, of course, depending on what proteins you have and everything, these viruses get really good at infecting other cells. And then when it infects other cells, they take that RNA, inserts that into the other cell. What do you think that RNA codes for? Tobacco mosaic virus, right? So that it codes for this protein. So that then you have more RNA and more protein, and then you have more viruses. There's an extremely simple life form that way. Absolutely beautiful, remarkably efficient. But the reason is, of course, if you're going to pack this, you'd better have something wedge-shaped if you're going to pack it in a circle around this, right? So that's why the protein looks the way it does. Had this been a normal four helical bundle, it wouldn't work. Hemoglobin, we've talked a lot about before. I'm not going to go back to it. But hemoglobin is a very amazing molecule because it's such an insanely good binder of oxygen. And I will come back to that later on. Why? There's also different organisms have different properties in binding hemoglobin. There's a related myoglobin too, because when you're a whale, if you're a llama living up in the ants, that hemoglobin binds oxygen stronger. Why? There's less in the air. So I have no idea how nature does it. Fetal hemoglobin binds oxygen better than normal human hemoglobin. Why? Well, that's certainly true. But if fetus doesn't breathe, how does the fetus get its oxygen from the mother? So it had better be able to steal oxygen from the mother's hemoglobin, right? Otherwise, the oxygen would go in the other way, which would not be good. But that's, of course, that would not work if the mother, once this fetus grows up, if this is a female, right? By the time that person is pregnant again, you have to change this. Because when you've grown up, you need to have weaker hemoglobin. Or you will never be able to be pregnant and support a fetus yourself. So this is something that has to change. And you have both fetal and normal hemoglobin in your genes. But depending on this gene expression, it changes during your life cycle. And exactly why it binds oxygen stronger or weaker, I'll come back to later. So this is a single unit, six helices that essentially forms a tetrahedron. So it's like a shell around it. And then you have this small protopherin heme group in the middle there, and then the yellow iron. And that's actually an iron, really. Myoglobin consists of just a single subunit. Myhemoglobin consists of four subunits. Why on earth is that? I think I'm going to come back to that in like two or three or four lectures from now on. But this, of course, has to do with how you deliver oxygen. Where do you have myoglobin and where do you have hemoglobin? Yes, and hemoglobin is where? Yes, in your blood, right? And the hemoglobin somehow has to be able to deliver the oxygen to the muscles. Like if otherwise hemoglobin, hemoglobin is great at binding oxygen, but at some point it has to release it. Otherwise you would just have lots of oxygen dissolved in your blood. So hemoglobin needs to be able to get the oxygen to bind in the lungs. But once you're out in the muscles, you need to release it. And this can't be any complicated biological process. This has to work all by itself. And this is how nature has evolved the molecules. So they have slightly different properties. Exactly why we'll talk about later. Is it because this cooperative system, like binding one, somewhat enables to bind the second? Exactly, and it's a remarkably cool process. And it's just a freak of nature that these hemoglobin and myoglobin just turned out to be the first two proteins that we ever got structures for, and that's a coincidence. Remarkable coincidence. But again, it's a very typical way that you create a protein. So where do we create these proteins from? Why do they have the amino acids they have? Yes, and this sequence comes from DNA. And by now you've taken all these courses. You should know what an intron and an axon is, right? Do all organisms have introns and axons? Why not? Yes, you're not as cool as bacteria. Sorry. You might think that you're, it's very easy as a human to think that you're the height of evolution and everything. First, you're rather embarrassing compared to a virus, right? That virus doesn't need any coffee to work. A bacterium, why on earth are you wasting all this space and everything? Can't you, do you seriously need to keep changing the proteins you expressed during your lifestyle? Can't you just get to work and work? No, it's bacteria. I'm actually not joking. Both bacteria and virus are far more efficient organisms than you, partly because they go through, they evolve so much more rapidly. So it's like we are not, we are the junkyard of evolution because we evolve so slowly. They're far more beautiful organisms than humans. We do have one or two advantages ourselves, but in different ways. So eukaryotes have these axons and introns. So they have regions that we express genes in and then introns that are not just junk. Today we know that they are responsible for quite a lot deciding on how much protein we are gonna express and when we express them. For instance, this fetal versus adult hemoglobin and everything. So what do you think these blue, green and red regions correspond to? And that's such a good question, the answer that is completely wrong. That's, this seems so natural, right? If you have red, blue, yellow, and then I have this helix, so the first helix, the second helix and the third helix in your protein. And here's how we call it that. It has nothing whatsoever to do with the structure. And the reason for that is on the previous slide that these introns, the introns are cut out long before you get to the ribosome and actually start to assemble the protein. So that introns and axons, this is the only side I ever talk about it. Important for deciding when to express it, they have nothing whatsoever to do with structure. And the most importantly, the cut is not here. For hemoglobin, it actually turns out that, sorry, it actually turns out that axon two, the middle one, the blue one, this part can bind a heme group itself. It doesn't work that efficiently. And in particular, it doesn't bind oxygen. So that's, again, obviously why nature needs all three of them. I already mentioned these helix pair packing a little bit when I spoke about early structure, so I'll skip that slide. And you remember this, I said that you can pack the helices, but you can actually turn this the other way too. So it turns out that you get something that's either roughly 50 degrees between the two helices or roughly 20 degrees between them. And if you think that that's just my hand waving or something, just go to the databases, but you can look this up and it works beautifully. You get exactly these helical crossings. So that almost all helices, they group into very beautiful cases, and it's true for all membrane proteins and everything. So this works great. And in particular, this means if you now have a small helical bundle or something that you want to create, this is essentially your first building block, because this is going to decide how can you pack two helices relative to each other if you now have two such pairs? How can you pack four helices against each other? So why would it be important to pack four helices? Exactly, right? So two helices, they can certainly interact with each other, but there is no inside here. They just pack against each other. Even three helices would be complicated because there's still no inside. And sadly, I run out of pens here. But the second you have four, you're gonna start having some hole in the middle where you can put something. And if we're gonna do protein design, that's a pretty darn good start. So we take these helices, assuming that you would like to bind something like a heme group. What do you think that protein could do? Yes, or if you're not interested in transporting iron, there's something else you could transport after the fact that it's bound in iron. Oxygen. You could kind of make artificial blood, right? The way you do that, to get that to happen, this is, there's quite a lot of hydrophobic residues so you're gonna need this whole bundle to be stable. So we would like hydrophilic residues on the outside and hydrophobic on the inside. Remember the thing that I told you that there are these helical wheel structures, right? So if you know, these numbers just tell you how the residues will be placed on a helix, and then we make sure to put lots of loosings on the inside between the four helices and then charge groups on the outside. And once we have those, we just place those out in the residue. We have one cysteine that's bound to the porphyrene group here and then you sequence that, sorry, the porphyrene is of course not in your genome and they put this in the bacterium and try to sequence it. Do you think it works? It does. You can't create artificial blood. Why would that be useful? It's not just that it's commodity, right? That blood is a live organism. There's cells in it, which of course means that it can carry infections and we're pretty good at detecting, say, HIV a couple of months after you've been infected, but if you were infected yesterday, there is no test in the world that can detect that. So if you donate blood, what you do is that we typically try to save that blood a couple of weeks before we give it to somebody to at least see if you had something, but it's not gonna catch HIV, for instance. And that's why we have all these rules that might seem horrible, that if you belong to a risk group, we prefer that you do not donate blood. It's not based on discrimination, we can't detect these things until it's too late and therefore we prefer not to take risks. There are, of course, all the lots of parts in the world where it's not safe for something or you don't have enough blood donors or if you have a major disaster. So the whole concept of being able to create blood in any amount you want, that would be pretty neat because this blood won't have, you won't have any worries about specific antigens or that could be rejected, you could tailor make blood. The problem is that it's not quite as efficient as normal hemoglobin. So in the rich part of the Western world, when we have fairly good ways of controlling blood or so, we still prefer to work with human blood. It's still better. But still, that's the key we're there. That doesn't mean that you can't create something that's even better in the future. And in principle, we could tailor make hemoglobin or something, right? It's not quite, you will need the entire red blood cell too so that it's, we're not there yet but give it another 10 or 20 years and I think you're gonna see way more completely artificial proteins that are binding in that way. Now I'm not saying that it can't be recognized but you remember it's artificial, right? So we can design it. Of course it will be recognized by the immune system. When it comes to blood, for instance, if you donate blood to me or vice versa, there are lots of factors in this blood. And in particular, there's the main blood group, right? O, A and B. And A and B are antigens here and they're common antigens so that you can't, if you have blood group A, that means that you have antigen A. Let's see, yes. So I have blood group O which means that my blood is nice in a way. My, anybody if you can get my blood group because I don't have any of these antigens. I actually happen to have O and then raises negative two and that's even better. That's even better from the point of other people because my blood can be donated to anybody. I don't have any antigens. The only problem is that I can't receive any blood except O, RH minus because that's, I would not like AB plus blood or something. So when it comes to blood, you certainly recognize it but the immune system is not binary, right? So that there are some of these antigens that are so common that we know that you would react to that and it's not the hemoglobin you would react to. But no, in general, you're right, that what do you think an antigen is? Yes, it's a protein. That's of course, that's how you make vaccines. You take the virus and you pretty much grind it down in a food processor that is no longer alive. And then you inject this in horse or something. You get antibodies and these antibodies you can just have this gamma globulin and inject or you can try to grow these, the vaccine then in an eggs or something. Oh, sorry, you grow the virus in eggs so that you get enough of it. That is all about getting your immune system to recognize the parts of the proteins. This can be used for something else. Small proteins such as alpha helices, if they're hydrophobic on one side and hydrophilic on the other, they can actually work as emulsifiers, protein emulsifiers. So why would that be important? There are plenty of emulsifiers in the industry. Why would it be need to have a protein emulsifier? Yes, you can, and particularly you can eat it, right? Digestive is important, but it's also you don't get sick from it or anything that might even taste good. So that's frequently what we use in low calorie products. Like these modern margarine or something with 20% fat, how can you have 20% fat? That would be 20% oil in water. Based on what you know about thermodynamics in this course, it's impossible to dissolve 20% oil in water. But that's because you have some sort of emulsifier that sits between the water and the oil. But that has one drawback. So first, not all emulsifiers are proteins. There are many of them that are small lecithins and fatty acids and things. But in general, using proteins as emulsifiers is great because you can tailor make them any way you want. And then they have one horrible drawback. A horrible drawback. You're not gonna die. What happens if you try to fry in low-fat products? Yes, the protein denaturates, right? And that's why it looks beautiful, it spreads great, and then you put it in the pan and before you know it, you have a mixture of water and fat because you broke the protein. And that's why there's quite... You can imagine there's quite a lot of development in the food industry here. Can you make them more stable so that you can fry, for instance, or at least boil it? Do you know what this is? And I was a bit nasty. I'm gonna have some more images, but I didn't print them out because then you could guess. What type of structure is it? Birch sheets. Yes, lots of birch sheets. So this is part of an antibody. Antibodies are mostly... Actually, probably even the whole antibody. Antibodies are mostly birch sheets, and these are then some regions binding antigens or something. Do you know what this is more? Do you know what it is more? Well, you can't see it here, but it actually turns out that there are lots of disulfide bridges. So this is a very complicated multi-domain protein that you use disulfide bridges to stabilize this, and it's not that beautifully drawn here, but it actually corresponds to having this entire domain with this Y-shaped part. And here you have these antigen-recognizing regions, and this is the backbone of the antibody. Do you know what this is more? There's a reason why I'm showing you this. It's not just for protein structure. No, well, many... I say it's a good point, but many vaccines are anti... Well, most virtually all vaccines are related to antibodies. Typical vaccines work by injecting a very small part of the destroyed protein or something in your body so that your body develops its own antibodies. So you could say that this could be an antibody derived from blood that corresponded to a vaccine. Gamma globulin or something would rather be the type of... No, that was not what I was thinking about. It is actually a monoclonal antibody. Most antibodies are monoclonal, right? If you look at an individual molecule, but when you express a monoclonal antibody, that means that you express this is completely pure. It's an antibody that every single antibody in a small flask or something is exactly identical, and they will only bind one specific antigen. It's a bit expensive to develop and everything. And this is actually a drug, the Humira, Adelimum AB, subcutaneous use only, so it's a small pen with a needle that you inject. This is a drug medication used for rheumatoid arthritis and pleuriatic and a bunch of things, Crohn's disease, ulcerative colitis. So this inhibits part of your immune system, which is called the TNF inhibitors, tumor necrosis factors. How long does that one live? There's a reason I asked you before. So when you inject this, it stays around in your body for maybe two weeks or so, and then you need more. So, yes, how much money do you think people are making from this? 12.5 billion dollars last year. This is the world's most selling biological. Do you know what the biological is? Do you know what the classical drug is? Yes, aspirin, right? One or two small hydrophobic rings or something. It's hydrophobic because that's hydrophobic things tend to bind to other small binding pockets. Most signal peptides or signal substance of the body are usually small and hydrophobic. Do you know what the molecular weight of aspirin is? Ballpark. So you frequently measure these molecular weights in atomic units, right, and then dolton. So an aspirin is roughly 180 doltons or so. Do you know what the molecular weight of the thing that I'm now hiding behind that one is? 150,000 doltons. This is a completely new type of drug. 20 years ago, nobody developed drugs like this because it was too complicated. Because basically you need to target humera in particular. I'm not sure whether this is computational design, but there are a bunch of companies in the world now that are trying to design antibodies to specifically target one cancer cell when you know what the antigen looks like. There is no natural antibody for that. But the question is, can you then tailor make these surfaces, even with computer simulations, so they get this antibody to bind one specific thing stronger than another? And there are drugs like that in the pipeline now. It takes, you remember, pharmaceutical development is slow. So from the point things work in the lab until they are on the market, it's at least 10 years. But this is a revolution happening. So it's two years ago, there are more biologicals being patented than there are small drugs. So virtually all the, I wouldn't say all, but a vast majority of new drug development is focused on biologicals. And biologicals literally means proteins. That we're making protein drugs in small, dead or small peptide drugs. So everything in sequencing and everything is gonna lead to a revolution in this field. 10 years from now, and I think it's gonna be very rare that we develop new small drugs. But my point here is that, forget about the car industry. It's an insane amount of money. There's just no narrow research topics. There is one really big problem here though. Can you come up with it? Can you imagine what that problem is? If you're a pharmaceutical company. Is this to be injected? Yes, it's really bad. Because now you need to get a patient to visit a doctor every two weeks. Well, it might be possible to inject. So let's see, how many of you would spontaneously like to take injections every two weeks? Not that it's expensive. This will possibly work in the Western world and everything. It's not the pill. Pharmaceutical companies love pills. First, because eventually there's a patent, say Prylosek or Omopressol, which is called Losek in Sweden. Astra has blockbuster drug for several years with just the pill, right? Now you can get it without a prescription and the pharmacy. It's great. They don't make as much money when the patent protection has expired, but they're selling way more of it. Do you think that people are gonna go and buy that in the local supermarket? So this is a problem. Why and why do you have to inject it? Yes. And this is another one of these. I touched upon that in the early part. If you could start making these type of chemicals, biological drugs, but put them in a pill, you would make insane amounts of money. Yes, so there are a couple of different strategies. One of them is to somehow try to put this in a delivery mechanism that could be a vesicle or something in lipids or something around it or some other port to protect it. Another possibility that we are actually working together with the company, purely on the research side, I don't have any financial interest in it, it's actually to apply patches on your skin. Because that is by far one of the nicest way to deliver drugs. You get a small continuous dose administrated in your blood all the time. The problem with that is that only works for small and relatively hydrophobic molecules today. And why, well, we have an idea, right? That's because they have to pass through membranes. But the question is, can you design components of drugs and everything to make them pass through membranes more efficiently? Because that type of, you're not gonna get antibodies to deliver that, right? But it's a very cheap and efficient and the simple way to deliver drugs. Lots of work remain this here. So you can kind of see there are two parts here. One of them is the actual biological function of the drug. The other is kind of drug delivery. And drug delivery, in many cases, there's actually more money in that. Because if you find a good drug delivery mechanism, you can use it for hundreds of different compounds. So what you would do if you were a pharmaceutical company, let's assume that you have developed a drug. And so you have this awesome new drug that is super good. But then you would seek out another company, specializing on drug delivery, say, hey, we have this particular antibody. Could you help us deliver this in an efficient manner? And in this case, they failed, so they have to inject it. And of course, if you have said A and B, we're not gonna say gamma, but mix A and B. It's virtually impossible to mix a single beta strand directly with anything, because that one can form hydrogen bonds. And it turns out that when you start mixing things, is you have either alternating helix and sheet or completely different helix sheet parts. And you should remember those, either you did them bioinformatics, so A slash B or A plus B. So A slash B, these are the type of structures that are sheet, helix, sheet, helix, sheet, helix, helix, helix. And the reason why they are that is that these sheets really form one continuous sheet, right? They're virtually always parallel, because you go up with the sheet, down with the helix, up with the sheet, down with the helix, up with the sheet, down with the helix, et cetera. This is a very classical structure called Tim Barrel, combined things in there. Alcohol dehydrogenase is another fun molecule. Do you know what this one does? Yes, it breaks down ethanol in your body. In large parts of the Asian population, they're deficient in it. And that's why, for instance, that in particular Japanese population, they get, you get intoxicated very easily. You can't stand as much alcohol, because you don't really have, it's not an important, apart from the Nordic countries, I guess, it's not an important part of your body to be able to drink up your amounts of alcohol. There's kind of a freak of nature where we have it and they don't. But the point is that this is not cultural, it's genetic. There is another cool fold called the Rossman fold, and that's also mixed, beta alpha, beta alpha, beta alpha. So you have one beta sheet in the middle and then alpha helices on both sides. And that actually creates more binding pockets on both sides of this sheet. Beta sheets are awesome for binding pockets. And there's Mike Rossman, founded in the early 1970s. Actually, I have a slide on that. It's similar here, right? So that you have one binding pocket here and one binding pocket in the middle. And it's very common to have these binding pockets right next to the sheets. And these places are usually awesome, active sites or binding sites. Why are they awesome, active sites? Oh, that's not easy to guess, but remember all those loops in right next to a beta sheet, right? Those loops gives you some flexibility to create the pocket or something. It's very hard to take an alpha helix. You can't distort the backbone in an alpha helix, it's rigid. You can't really distort the backbone inside a beta sheet either. That's also relatively rigid. But up here we have some freedom to create, well, nature has some freedom to swap out residues to create a good binding site and it can be either polar or hydrophobic. So that it doesn't mean that you always see them there. But if you saw a structure like this and had to guess where the binding site was, guess at the edge of the beta sheets. That's usually where you see them. And that's more important than you think because what you see, when you first see a structure, you're gonna see a structure like that. There is no, you might be surprised, but there are no blue arrows in the protein data bank telling you where things bind. People determine a structure and then you need to start guessing. Where might things bind? Because you might know that it's an important receptor. How do you test that? You can simulate it, but I was trying to test it. You can actually test it for real to see what you can even simulate it. But assuming that you find a simulation and you find the residue here, that is your candidate. Is that right? Yes. So the point, if this is an alanine, mutate to say a tryptophan or something, change it completely, right? And so if you change the residues around that site and absolutely nothing happens, your prediction was probably incorrect. But if you start mutating the residues and that completely changes the binding, you were likely right. So we've said A slash B, we can say A plus B two. The only difference here is that here you have one part of the protein that's beta strands and another part of the protein that is alpha helical, typically. So the point, never alpha beta, alpha beta, alpha beta. So you have at least two or so of them. But by far the most common is like you have here that you have sheet, sheet, sheet, sheet, and then two helices. And in this case, they usually fold independently of each other. Remember what I said about the phase transitions? Nature prefers this because the sheets stay together. It's actually quite okay to have the other ones too, but even there you never mix one strand directly with an helix. The sheets wanna stay together at any cost. These are very common when it comes to DNA binding proteins. So called tata binding proteins. So tata is this motif in the DNA, right? Zinc fingers and everything. So these proteins, they love to bind in these grooves in the DNA. I'm not really gonna talk too much about that. But these are important because they can initiate transcription. What could you use that for? Well, assuming that you have a gene that for some reason it's expressed too much, right? Or a gene that's bad or something. You should be able to, if you can find something that binds to the right sequence here, you should be able to silence that gene. Because if you bind something else, you can't initiate transcription at that point. And that's also what people try to do frequently in cancer or something, that if there is a known mutation that's bad, can you silence that mutation and try to bind it directly to DNA? It doesn't really work too well to tell the truth. So these haven't been exceptionally efficient that cancer drugs. But the whole point, what all this comes back to that, if you wanna start interacting with the cell, you'd better find the process that goes on and find a protein that can bind to it. Every single interaction that the cell has to do with protein binding at some point. In many cases, we don't know what the structure is, but that doesn't change the fact that it is protein binding. Possibly so that cancer is a super complicated disease. And I think the mistake that we've all made, I haven't worked on cancer myself sessions, I don't really know the details of it. Which should be aware of this, that there are so many different factors in cancer, things protecting DNA, things splitting DNA, other parts of the cell machinery that goes wrong so that this whole hope of being able to find one or two miracle drugs that cures all cancer is not gonna happen. So it's really, it's a collective process that involves the entire cell's natural life cycle, right? How a cell divides. And I, the problem with cells is that again, nature has been, has designed itself to be robust. If something happens to bind to your DNA, the cells should not immediately die. Nature is kind of like an airplane, right? That they always try to have a second root or something that it should be able to survive no matter what it does. So that cells are awesome at surviving. And the problem when it comes to cancer is that you're not gonna try to fight the nature it's something that the entire cell has been designed for, to survive. And that's why it's virtually impossible to find one single process and fight just that one. So last year's Nobel Prize in Chemistry was related to this, right? And Thomas got the prize for what they actually discovered that DNA naturally breaks down and then there are these repair proteins that repair DNA. And that was kind of amazing. We all thought that DNA was stable for billions of years but even DNA spontaneously breaks down. So how do you think you could use that to form a drug? Yeah, something like that. So the problem with what people ended up doing is very easy to go wrong here too, right? Because you think that, oh, should you try to create the protein? What actually turns out, you pretty much do the opposite. You knock out the repair proteins. That sounds horrible. No, you knock them out in all cells. You can't, it would be great if you could do it only in the cancer cells, that doesn't work. So didn't you kill the person now? Exactly, so the point is that cancer cells divide a factor a thousand more rapidly. So for every normal cell you kill, you're gonna kill at least a thousand cancer cells. So the problem is because cancer cells divide so much more, they're gonna be much more fragile to anything that hits this procedure. But that's also why it's so easy to go wrong. You need to think of the entire cells perspective. But there are lots of proteins being decided now to try to target different parts of a cell's natural cycle. I think that, and it's also cancer. When you see cancer as a non-empty from the outside, it's very easy to think that we're still, well, we are still fighting cancer, but it's very easy to think that we haven't made any progress. There are huge areas of cancer where we made tremendous progress. In particular, for instance, child leukemia. Just 30, 40 years ago, the mortality in leukemia was like 95% among children. If a five-year-old got leukemia, they were dead within two years. Today is pretty much the opposite. 75% to 90% survivability because of modern drugs. And once again, it's just 20, 30 years, right? It's an amazing development. Sorry, I had that part of the toxins. We spoke a little bit about toxins before. Toxins are special in the one hand. They should be super stable. And then they should just bind and somehow destroy another proteins the way another protein work. The point is that many of these proteins, they might look absolutely horrible. The reason why this looks horrible is because it's going to bind something and that part has been optimized to bind. So that might bring us to another point that I've shown you a bunch of faults and I don't expect you to know all these faults by heart. But as you move up in this ladder of successively higher hierarchies of organizing things, at some point you don't even want to think about a helix. But you can think of like a globin-like fold. A globin-like fold is the type of fold you have in hemoglobin, right? You have this roughly tetrahedral shape with five, six helices. And at some point you start to realize that nature reuses these folds. Overall folds, so anything down in a family or something, the blue curve here, they would be evolutionary related. But a fold is literally just a shape. And nature just because two things share shape doesn't mean that they have the same genetic origin as you would in bioinformatics. Nature just, and there's surprisingly few folds. In 1992, Cyrus Socha in Cambridge, he made a famous paper called A Thousand Folds for the Molecular Biologist. And he predicted there are probably only in the ballpark of a thousand folds in nature. That's not the whole odds. Imagine all the number of sequences we have. And we talked a little bit about before do all sequences fold proteins? We haven't answered that question yet. But even if they do, these billions of different sequences, they will likely fold into a relatively small number of structures, which is a bit strange. So Mike Levitt, who I worked with several years with, he actually found a couple of years ago that this isn't quite true as beautiful as this was in 1992. We are a bit over a thousand folds today, maybe at 1500 or so. But the cool thing is the number of new folds is shrinking. We're discovering fewer and fewer new folds, which likely means that we're starting, there are still some white spots on the map, but we have covered most of it. So we pretty much know the entire folding space. I think that right now it's getting less and less likely every year that we find something we haven't seen before at all. So fold space is way more limited than you might imagine from sequence space. So the last thing I'm gonna say that if fold space is so limited, can we design things to fold? Sorry, that is a bad quality image, but if sequence determines structure, determines function, the question is at what point do things change over from one fold to another? Because it's very obvious if I change one rest, do nothing happens. In bioinformatics, what do you usually say? At what point can you start to detect that things have the same evolutionary origin? 30% or something. So that you should be able to change at least 30% of the residue, sorry, two thirds of the residues and still say it's the same, right? But there are examples, linear again is a jail for many years. They can keep 50% of the residues identical in a protein, but still turn that into that. So all those things are, again, no rule in biology without an exception that all residues are not created equal. Some residues are more important than others. And in this case, it was of course completely fake in the sense that you decide this in the lab. But in theory, this could happen in nature. Nature might, in theory, end up swapping 50% of the residues in a protein. So occasionally, a change in sequence actually results in a change in structure. It's just, it's very rare. And it's one of those things. If you see something you don't understand, don't think that this is the reason, but you should be aware that while things are robust against single changes, at some point you change enough and the way you have changed enough, you will suddenly get the structure change. I believe there's such a drastic function. Yes, sure. Unless there's a small binding site or something in theory, of course it might have, if you're really lucky, you would have the binding residues around it. And again, one time in a thousand or something, you might have a better property there. And then this one would gradually die out instead. Yeah. Yes, and that has to do, for instance, a good example, hemoglobin. If you're gonna bind these iron group, there are only two types of residues that are really good at doing that, histidine and cysteine. If you only have those two, the amount of variation you have is not gonna be a large. You can choose cysteine or histidine. Those are the ones you're gonna see. There are a bunch of study questions there too. I'm gonna actually leave you 20 minutes early today. I'm not gonna go through those, but I will go through them tomorrow. And what I'm gonna do tomorrow, we're gonna focus almost entirely on alpha helices, but we're not gonna talk about alpha helices. I'm gonna talk about alpha helices as part of membrane proteins. In particular, the globular protein part is really good to know well then, because the good part is that it's actually gonna be a bit less diverse in membrane proteins, but there we're gonna talk much more about function. What do membrane proteins do? Why are they created the way they do? Why do we have the channels and transporters and everything do? And that's gonna be fairly biological and a bit of neuroscience too. I think that's all I had today. Do you have any questions?