 So, it's our great pleasure to have Robert Penner from the IHES. And he will talk about how to apply geometry tools to study viruses. And more precisely, the title of his talk is Protein Black-Bone Free Energy to Discover Sites of Interest for Antiviral Targets. Please go ahead, Bob. Okay. Well, welcome. Welcome. I want to thank IHES and PIM for giving me this opportunity to talk about my current work, this recent work, and we started the board. There are four sections to the talk. First of all, we'll discuss chemistry, aspects of chemistry, then math, then physics, and then biology. So, this may sound daunting, since it's so comprehensive, but please don't be dismayed. I'm going to take just the tiniest little bit, tiniest little topic from each of these disciplines in order to present this new method, to present it from first principles. So we'll do, there'll be sort of four natural pieces to the talk. I'm going to do something a little weird and start with the references. This is for later attribution, also, so don't forget my collaborators. So here are the references. The first three are survey monographs, a book on protein physics by Finkelstein is a masterpiece. I can't recommend it highly enough. The second is a background on viruses and it's a gentle introduction, not very technical. You don't need to know much biology to read in the scientific, by Levine, the Scientific American Library monograph. And the third one, this background on viruses by Fidio Nostro is a beautiful book, but it's much more technical and you really need to know some background biology in order to pursue it. But it's a great book, I recommend all three of those quite highly. However, the two books on viruses, the two monographs on viruses, treat the topic, actually the main topic of our discussion, viral fusion mechanisms. They sort of give a short shrift. So the fourth reference is a survey paper. This is excellent and you can pretty much pick up and just start reading on viral fusion mechanisms, which we'll discuss later in the lecture. The next is a paper about the proteins, the group of strains. It really treats the topology of proteins. And that was, in a sense, the necessary first step to get to understand the geometry proteins, which is in this SL3 graph connection paper. Where you see an army of mostly genes, you'll notice two Anderson's and two Nielson's among the co-authors. And this really spanned multiple fields and disciplines in order to university them. And that is the starting point for our lecture today, this paper on geometry of protein. Next is a survey that I wrote a few years ago on applications to RNA and protein. In fact, there's a parallel or anyway related theory to what we'll discuss today for protein for RNA. And I won't say anything more about that, but there's a survey paper that treats everything that came before. And then the final two are my two recent papers in journal of computational biology. The first on a backbone-free energy and viral microproteins. And the second specifically on coronaviruses. So here are the references and I'll cite them as we go along as necessary. Okay, so as I said, the first topic is chemistry and when we begin to move this correctly. So let's begin by discussing the chemistry, I guess we need that board. Let me explain what is a protein. So a protein is a linear polymer of amino acids. Okay, what's an amino acid and what's a linear polymer? An amino acid, it's slightly complicated, they're 20 plus two plus one of them. So they're 20 standard common gene-encoded amino acids. And then there are two more that are rather rare on selenocysteine and pyrolycine. And then you have another one that is a variant of one of the first 20 called informal athenae. And malicious pretend they're 20, I mean malicious pretend they're 20 for simplicity. And the 20 can be explained as follows, 19 of them are still it's biology and there will still be an outlier, 19 of them have this basic chemical structure. And one of them, the protein has this chemical structure. Each is hydrogen, nitrogen, O and C are oxygen and carbon. R denotes a residue of one of 19 possible sub molecules that determine the identity of this amino acid. C alpha is the alpha of the first carbon atom in this residue of sub molecules. So that's the indication on the left and you see there's this slightly different structure for proline. So that's what an amino acid is and we're going to pretend there are 20 of them. They combine to form a linear polymer in effect by condensing off the water. So imagine two of them next to each other. This is called the amine and this is the carboxyl. The OH of the carboxyl can combine with an H of the, here's the amine or amine. It condenses off the water and what remains is a bond called the peptide bond between the carbon A1 amino acid and the nitrogen in the next one. And you can imagine yet another over here is these condense peptide bond forms between this carbon and this nitrogen and in this way these guys combine into a long molecule called the peptide or protein. And here is the part of a peptide that is independent of the identities of the residues. So let's go back up a bit and say, so the proteins that we should study, the proteins that occur in nature are these linear chains, this linear polymer of amino acids. The sequence of residues from the amine side to the carboxyl side uniquely determine the protein. And so a protein's identity is given by a word in this 20, 20 plus 2 plus 1, but pretending 20 letter alpha. And that's called the primary structure of the protein. And as I said, it uniquely determines, uniquely determines the protein. There's a sub structure here, it's kind of this little song, Cn, C alpha, Cn, C alpha. It's called the protein backbone. And it's the alpha carbon and then the Cn involved in the peptide bond and the next C alpha and then Cn involved in the peptide bond. It's so long, so that's called the protein backbone. And like I say, it's the song Cn, C alpha, Cn, C alpha. And there's another subunit that will be of critical importance to us called the peptide group. And there's a link to this in black here. A peptide group is this subunit, it's called the peptide group. And the amazing thing, the geometrically amazing thing about this is that it's plain, these six atoms lie in a plane, by which I guess I missed a little bit, these six atoms lie in a plane, by which I mean the center of their boron model, lie in a plane. This happens due to quantum chemical effects. You notice this carbon isn't 4-valent, as you would expect, it's 3-valence, something fishy is going on. And what's going on is there's an SP2, it's called an SP2, SP3 hybridized bar. So this is a higher quantum state and there's a figure 8-shaped electron path perpendicular to this plane that locks the whole thing into place. And this is really kind of an amazing quantum chemical effect that constrains the geometry in this way. So this is the peptide group and the amazing thing is that it's plain. And that will play a critical role for us. More over the angles in the peptide group are more or less as integrated. They're not quite 120 degrees, but very close to within a few degrees. So that's a good approximation to the geometric structure of this peptide group. So at this point, it's useful to have thermomorphized. So imagine that I'm on a DC office and we're in a hydrogen hat. It goes in our little song, C-N-C-Alpha-C-N, my knuckles. And my arms are sticking out at tetrahedral angles, so more or less 109 degrees. And sticking out of my back is the residue, which ranges from just a little hydrogen pimple for the smallest case of glycine to some great big hunchback, backbone thing for, for instance, for argonene or tryptophan, to the other amino acids. So there's still a little bit of geometry left. Namely, we're going to get C-N-C-Alpha-C-N, hydrogen hat backbone. There's still these two angles that are called the conformational angles. And they're indicated on the board. The incoming one is called phi. The outgoing one is called psi. So for each residue, there is a torus. And that's one process one of potential conformational geometry left. But the backbone and the peptide groups themselves are actually quite rigid. Of course, these residues have other moduli. So it's more interesting and complicated. But the backbone and the peptide groups themselves are actually quite rigid. And again, you imagine a bunch of copies of me holding hands, these angles between them. OK. So I will later on want to remind you of the conformational angles on this. So that's why the morphism is useful. OK. So proteins fold into crystals, not crystals in any sense of the mathematical or physical sense of the word, but crystals in the sense that there is a lowest energy state and the nearest competitor is rather far away. So the protein folds into some characteristic shape. And the thermal fluctuations that are inevitable are not large enough to bump into another competing state. And this is necessary in order that the protein have the correct shape for biological activity. It's critical for your life, for life, that the protein folds reliably into this crystal. That's the terminology that's used to fold into crystal. OK. There are a number of forces that lead to this structure, van der Waals, ionic, sort of hydrophobic, entropic, electrostatic. And I'm going to pick on one in particular because of the critical role for us going forward, so-called hydrogen bonds. And here, I'm going to do the board. I'm going to hide it there. I'm going to get it back here. Come to hydrogen bonds. So a hydrogen bond forms when an electronegative, oops, we're still in black here, an electronegative atom like oxygen comes in proximity with another electronegative atom like nitrogen. Whoops, I messed up. Another electronegative atom like nitrogen that is covalently bonded to a hydrogen. So you have electronegativity as a notion that Pauline introduced. It's how hungry is the atom for an electron. And both oxygen and nitrogen are rather hungry for electrons. They are reasonably high electronegativity, not so much carbon. Anyway, let me not go further into that. So this nitrogen has this hydrogen covalently bonded. This oxygen wants electrons. And the nitrogen very generously shares the electron cloud of the hydrogen with the oxygen. And they come really quite close within two to 2.9, actually, actually, and know your hydrogen bonds. So they come quite close together. The nitrogen, the generous one, is called the donor of the hydrogen bond. And the oxygen, in this case, the receptor, particularly the acceptor of the hydrogen bond. So that's the notion of a hydrogen bond. They will play a critical role for us. And I guess you don't need to go back to the picture of the protein for imagine that one of these COs in one peptide unit travel then along the backbone far away, or not so far away, some distance away, could come back in proximity with an H of another peptide group and a hydrogen bond with my form. So we're going to call those backbone hydrogen bonds. This confused what I just described, the hydrogen bond that occurs between two different peptide groups in a given atom, in a given molecule called backbone hydrogen bond, hydrogen bond, which is BHB. BHB for short. And these will be critical for us going forward. We're almost done with the chemistry. There are just a couple more things I have to say. So the primary structure is the word in this 20-letter alphabet. The folded crystal structure is called the tertiary structure, namely the actual spatial locations of the constituent atoms of the molecule is called the tertiary structure. And whether I put it on this board, the PDB, as I'll use this, is the protein data. And this is the repository of all human knowledge of the crystallized proteins, all of the tertiary structures, all of the known tertiary structures. It's an incredible source. There's 170,000 entries in it by now. And I urge you to check it out. It's very accessible. OK, so the primary structure is the word in the 20-letter alphabet. The tertiary structure is the actual spatial locations of the constituent atoms. What's the secondary structure? The secondary structure is there are certain motifs of backbone hydrogen bonds that are extremely common. And now we indicate them quickly. I don't have so much time to go into any great detail. But here's a picture of the standard secondary structure motifs. Drawing the backbone is a bit straight line. It's not, remember? The Xs indicate the C-office. And the backbone is the horizontal stuff. Everything else is a hydrogen bond. And this just indicates the notion of alpha helix and beta strands, both parallel and anti-parallel, as the integration of anti-parallel. And notice that the beta strands can combine the hydrogen bonds into beta sheets. And this is a good way to satchel. I'll talk about the energetics of hydrogen bonds later. This is a good way to saturate the hydrogen bonds in the app using these secondary structures. So I guess the one other thing that I do need to say is there's something called here in the back row. I'll just say, there's something called sequence alignment. This is the key tool in bioinformatics. It's a way of taking two words in an alphabet and insertions and deletions, aligning in order to compare. And you could do this with four letters for RNA, or 20 plus 2 plus 1, for proteins. And this is a crucial tool, sequence alignment. OK, end of chemistry, not so bad, right? We can move on to the mathematics. But before we do, let me come here and explain what's going to be white. What's the main table from our discussion of the chemistry? And it is simply that proteins give the concatenations of plate groups, proteins, gative concatenations of plate groups. So, Gaten, this is one of those pauses to refresh if you have some questions that you'd like to put forward or we can proceed. Should I carry on? Gaten, are you there? We lost them, maybe? Yes. There is no question which had been written so far. OK, very good. I guess that's good. Either good or bad. Let's continue. OK, on the mathematics. Here we make another board. OK, so for the mathematics, let me draw up, let me begin with where we ended up last time, this peptide group. We have our c here with white. c alpha, cn, c alpha, and it's remnant of the carboxyl. O, and it's remnant of the amine beach. So here we have a peptide group. And what I claim and will convince you of is that a peptide group gives rise to a positively oriented orthonormal three frame. For this placement vector, c alpha c gives a vector. The vector co gives another vector in this play peptide group. We can take their cross product in oriented three space to give an orientation to the plane of the peptide group. And in this oriented plane, so what I just said is the chemistry naturally gives rise to an orientation on the plane. Not only that, sitting inside this oriented, now oriented plane is the displacement vector of the peptide bond itself. So we have a vector in an oriented plane. Well, that's an oriented orthonormal three frame, namely three vectors, so that the third one is the cross product of the first two. OK, so a peptide group gives a positively oriented orthonormal three frame. So an ordered pair of positively oriented orthonormal three frames gives a unique rotation of three space. Well, a backbone hydrogen bond, there's the peptide group of the donor and the peptide group of the secondary, it's in the order, it's ordered. So a backbone hydrogen bond gives rise to a rotation of three space. In other words, an element of the leader of SO3. So backbone hydrogen bonds give rotations. Now let me just remind you that SO3 is RP3. It has S3 as its universal cover. But maybe more to the point, of course, it is its killing form, which is binding variance on a symmetric and the associated hard measure, so it's a metric space, it's a weak differential geometric object. But more primitively, we know from Boiler that rotation, let me just remind you, that a rotation is given by an axis L of rotation in some amount of line in three space, in some amount of rotation about it. A non-trivial rotation of three space is determined by an axis and an angle of rotation about it. So let me choose a unit vector U in the direction of this line. And so to theta the rotation amount and the vector U are going to a scale. It just takes theta times U to get a vector of three space whose length tells the amount of rotation. This direction is the direction of the axis of rotation. Putting all this together, this is for a non-trivial rotation. To a trivial rotation, I'm just assigning a zero vector. And in this way, I'm able to identify SO3, I'm just reminding you something very elementary, SO3 or RP3 is therefore described nicely as a ball of radius pi, meaning the amount of rotation, three-dimensional balls, with the integral points identified by this integral unification. So I say all this because there's something quite natural to do. Let us go to the PDB, namely this database of the proteins that we know and sample all of the backbone hydrogen bonds there, a module that got a little bit in a second, and just take a histogram in this ball that is SO3 and draw a picture. What does the collection of all rotations, BHB, rotations of BHBs coming from, rotations coming from BHBs, what do they look like? In fact, that's not quite what we're going to do. And the reason is that the PDB is highly biased. There are popular proteins and unpopular proteins. For example, there's a protein associated with influenza called hemaglutinin, and there are over 200 examples of hemaglutinin. So there are some proteins that are popular and others that are not. There are other issues too. Some proteins are easier or harder to crystallize and get the actual data. But that part we can address. But we can address this popularity of proteins. So we do that by, there's a subset of PDB, we call it HQ60. This is a subset of the PDB, which is meant to be an unbiased subset of all the proteins that are there. HQ stands for high quality for also some PDB files are more reliable than others. And let us take the reliable ones, the high quality ones. Moreover, remember I mentioned this notion of sequence alignment. The 60 refers to a 60% sequence, less than or equal to 60% sequence identity in this sense. So we're going to find an unbiased subset of PDB using sequence alignment to not over sample the popular guys. We're only going to look at high quality things. And there's some other issues. Excuse me. Some other aspects that we impose that I think I don't want to go into. But this HQ60 is meant to be an unbiased subset of the protein data bank. And what we can do is take the collection of backbone hydrogen bonds that occur in this HQ60, which are 1 million, 1.66, 1.65 number. So a fairly large collection of BHBs that we're going to sample and draw a picture in three states. Draw a picture in this ball that is SO3. So let me show you that picture. Yeah, here it is. So this is a rendering from the North Pole to the South Pole in slices of the density. It's a heat clock density, red, yellow, in blue from North Pole to South Pole. And it's absolutely striking. What's striking about it, two things. One, look at all the white space. Nature, you might have expected, I certainly, we in Denmark certainly expected, certainly expected to see just noise. But in fact, there's this huge structure, there's all this white space. Nature is very conservative and uses only part of the geometry available to it. Indeed, only about 33% of the volume of this 3 is employed by this HQ60. Not only that, within that 33%, you can see this clustering in these red regions. So within the 33%, it's not even, it's not uniformly distributed within the 33% either. In the 33%, there are these clusters of places that nature uses a lot of backbone hydrogen bonds. In fact, you might wonder where is the alpha helix and it's right about here. Well, how to spot is the alpha helix, which sounds a little too surprising. And you might wonder also where the beta strand is and we forget where it's parallel, I think on the other parallel is here, the next spot is nearby. Here's another red, I gotta say this was a, I remember the day when we put, there were some bugs in the code and I remember the day that in Denmark when we finally produced this graphic and it was running down the hall, looking for your yard, your unbelievable, because it jumps off the page. There's the structure that one isn't really expecting and there it is. Okay, let me give you another rendering of the same, that really was the first graphic of this whole, this whole deal circuit 2012 or something like that. Here's maybe a more sophisticated version of what I call a graphical abstract. So here's the ball with the rendering of the density in it, rather less colorful rendering of the density in it and representative BHBs from various locations within SO3. So let me say, there was this further structure of the clustering and what we did with the Danish group was studied that and managed to reproduce, refine and extend the existing classification of geometries of backward hydrogen bonds, which there was an existing classification for things that were short range along the background. And we, like I said, we reproduced and refined that, but not only that, this is the first classification of things that are long range along the background and that's what was worked out in this paper with the many authors. The clustering will for us play absolutely no role in what comes next. After all, you're presented with a density and you as a human being are going to figure out some way of clustering. You interfere with nature, but if we just take stand back nature, there's this God given density on SO3 and that's the only thing we're going to use in this paper. So clustering will play no role. How are we for time? Let me make a couple remarks, just a couple and let me be brief. So you might wonder that this, the white space comes from the fact that your so-called steric obstructions, the atoms will bump into each other. It doesn't, we check that. In fact, what happens, there's something called DFT, density functional theory where you can solve, if you can give an approximate solution to the Schrodinger equation for the 12 atoms involved in two peptide groups and we did that, one of them, our team, they did that and that roughly reproduced the density, not the fine structure of the clusters, but this suggests that there's quantum chemistry. Quantum chemistry is somehow behind the scenes in the reason for this clustering in the first place. I made two more remarks, three more remarks, I guess. We varied HQ to low-Q to lower-quality PDB files, I varied the 60, the 30, the 90 and so on to check that the basic property of this clustering, of the density, were robust against the data used to compute it, and indeed they were. You might wonder, this is interesting too. So we have these two planes and there's a rotation that we're studying, but there's also a translation, say from the C-alpha, the first C-alpha to the first C-alpha, and so there's a translation as well and you might wonder if the translation's cluster, let's go around with that. And the answer is the rotation essentially determines the translation. So there isn't extra data there, and that's an interesting statement of the rotation basically determines the translation. This was a latter-day realization of mine, I think my collaborators didn't realize the sooner that I did. The clusters aren't very cluster-like, they're highly anisotropic. There's nothing like a normal distribution within them. They have kind of swirls and the local structure is something that we'd like to do better. Okay, and about matters that we shall need. Let's go back and continue with our main takeaway business. It's our main takeaway business. Our main takeaway is that a backbone hydrogen bond determines a rotation, 3D rotation. So H260, this subset of PDB that is unbiased, determines the density on SO3. And it is this density we shall now proceed to study. Maybe it is worth mentioning, well, you can look in the papers, there's a server in Denmark where you can upload a PDB file and it will compute the free energies in the download file, but it has the free energies in it. And let me not put the URL up, you can find it in the papers, but this is freely accessible on the web if you want to analyze your stuff. Okay, so, physics. I guess we can just add more physics discussion. Sorry, good time. So we pause and refresh if you have questions. Maybe I want to go again. Okay, we've received. There is a powerful method in protein physics called the pool Finkelstein-Pausee Boltzmann-Onsatz. This was observed by Paul in 1971 and proven by Finkelstein only, Finkelstein and collaborators only 95 years later. And it's the following, what's amazing is it's breadth of this state. So proteins have various local details, for example, these phi and psi angles, which was where Paul first observed this phenomenon. What we're studying here, these rotations, salt bridges that form, the angles in the residues, there are various local details of a protein. And for any local detail, and this is the amazing, let's just, for any local detail of protein, local detail of protein. So the breadth is so amazing. The occurrence of the detail is proportional to the exponential of the negative of the free energy over the usual plant constant times the effective temperature called the conformation temperature. So there's a Boltzmann-like law that describes the occurrence, that governs the occurrence of any local detail. F is the free energy, like I said, and Tc is the kind of effective temperature. This is not a Boltzmann-long and usual sense of a particle visit, not an equilibrium visiting energy states with the probabilities of the law. The difference of the law probabilities is the difference of energies, no, no. Because the protein isn't jumping around between the different conformations. It's not visiting these states. Rather, this is a statement about the statistics of which primary structures, which words in this 21st alphabet, stabilize the detail that you're studying. It's like, it's really the dynamics of flipping a 20-sided coin. It is, excuse me, that is described by this law. Bob? Yes, sir. There is a question. So I will leave Aurora to talk. Hi, Aurora. Hi, I just had a quick question about the clustering behavior of the BHB induced rotations. Sure. I was wondering if people have done follow-up studies in terms of correlating the clustering behavior with the hydrogen bond strength and cooperative behavior of the hydrogen bond network? In fact, our paper has gone largely unnoticed. I think there are five or something citations, you know, you're in this three of them and I did two of them. No, I don't think there's been any follow-up really to speak about, but I agree there should be, there's much, I think there's much that's interesting. I don't know of any follow-up studies on that. Maybe, period. No, sorry, I just don't. Okay, thank you. Sure. Okay. So there's this quasi-Boltzmann-Hanzatz of Paul Finkelstein and its power, and that's what we're going to capitalize upon, is that you can go off and do experiments and on the basis of the data you observe, estimate the free energy. So that's the point is that the empirical data gives you a tool via this Paul Finkelstein-Hanzatz to estimate free energy. Let me talk a little bit about free energy and proteins, and I had promised you some energetics earlier here, here let me do this. So overall, the protein has to have some negative free energy in order to retain its configuration. So somehow overall, there has to be somehow negative free energy. In fact, the limit of positive free energy for protein stability is about 8 to 9 kilocalories per mole. This is the limit of protein stability. Oh, it's taking protein and hitting it with more than this and it'll just fall apart. I'm going to go to protein stability. So the positive free energy parts of the protein have to be compensated for by other low free energy places in the protein, sort of a balancing act. Now, why would the protein permit these high free energy? Why would evolution permit these unstable high free energy regions? And the answer is high free energy can be useful for protein function. So it says if the high free energy parts are tolerated, by the rest of the protein and preserved by evolution, only if they're useful for protein function. And like I said, this is 8 to 9 kilocalories per mole. I thought I would give you a little bit more energetics. KT, so the thermal fluctuations are about 0.6 kilocalories per mole. A hydrogen bond, NH bond, is about minus 1.5 kilocalories per mole. So they're not that... This is in the aqueous... There's a difference between aqueous and unequeous. In the aqueous environment, the hydrogen bond is about... That's not flexible enough. It's about minus 1.5 kilocalories per mole. And I wanted to give you one more... This will come up later. An internal turn to an alpha helix. Internal alpha helix turn, I'll just put, has a nominal free energy of minus 2 kilocalories per mole. And I guess you can start to see why proteins use a lot of alpha helixes, because it gives them a lot of negative free energy in order to compensate for the high free energy elsewhere they might require. Okay, so I promised you some energetics and there is that. Now let me explain how we're going to use this fabulous tool of Pulfingelstein in RSI. Okay. We're going to construct... We're going to compute the density from HQ 60 that we had these pictures of. And describe it as a function D, density, a real value function, a positive real value function defined on SO3. And I'm going to treat this like an applied-up petition. There are maybe more elegant ways to do it, but this is how we did it in the Danish group and how I should do it here. There's this ball of radius pi. It inscribes in a cube of edge length 2 pi in a natural way. I take that cube and I cut it up to 81 by 81 by 81 about half a million smaller little boxes. And now I'm just going to count in each box how many BHP rotations from HQ 60 there are. And then at least I'm a good applied-up petition. I'm going to make sure to scale by the SO3 volume box. So to do this correctly in the geometry of SO3. So in other words, I construct this piecewise constant. Piecewise constant approximation to the density computed with these 1.17 or less back on hydrogen bonds. Okay. So having constructed D, I'm then going to finally define the normalized free energy of a point P in SO3. And it's the logarithm of the density at the alpha helix, remember I said the alpha helix was the mode of the density. It's the highest, the point of highest density in this, in this distribution divided by the density at P. So get P lives in some box and that gives the density to the P and we take the logarithm. Let me tell you that in fact, the thing that I pointed at twice, the density alpha is about 19, it's so close that it's fair to call it 19,000. 18,999 point, I don't remember the number two digits. So in other words, this pi of P is the log of 19,000 over DP. Let me tell you a little bit of the statistics of this guy. 7.5 pi equals pi, 7.5, 8.5, 9.5 and 9.85 are approximately, quite a good approximation to the 90th, the 95th, and the 100th percentile. So the log of 19,000 is 9.85. So this is sort of the statistics. I'll show you pictures of the histogram, the density of pi values in a moment. But let me first make a definition that will be key for us. I'm going to say that a BHB is exotic. It's the term I'm going to use, exotic. If pi of it is between P, and I don't get BHB, I mean the rotation between P. So the rotation of BHB, so I'll refer to BHB as being exotic. It's exotic if BHB is pi of P is greater than the 7.5. I think it's in the 90th, in the 90th percentile. And now I actually want to extend this notion of exoticness to residues. So suppose we have in our C alpha and C and C alpha, C and C alpha. In other words, there's a nearby CO and nearby NH. And if any of the hydrogen bonds associated with the carbon, or any of the hydrogen bonds associated with the nearby nitrogen, if any of those are exotic, then I'll call the corresponding resident exotic. So our resident is exotic. If any, let me just say nearby, nearby BHB is. So we have this notion of exotic play a key role going forward. So I promised you a plot or two. And here is a plot, a histogram of these pi values across HQ 60, across all of HQ 60. So they're in here, size 1.8, and the occurrence is here. So you notice there's much going on here, which I'll explain in a second. Down for low pi values, and there's this slow hump up around 5, and then the scale, which is where we're in the exotic range, which is where we'll be keenly interested subsequently. Below that histogram are the alpha helix and beta strand occurrence across the same energy spectrum. So I apologize both here and in my paper on this, the x-axis is all my factor. So this is 0.04, it's really 0.04. So these are aligned. The top plot is aligned with the other two. And you see the big spike for low energy are the alpha helix. For low free energy. Oh, sorry. Yeah, for low energy alpha helix. There's something I must say. There are two peptide groups, one on either side, and hence there are four residues that you could think of. And it's the flanking. The secondary structure type is an attribute of the residue. So I look at the secondary structure of the four flanking residues. It's what I'm calling the four flanking residues. So you see there's a big spike up here is alpha helices. Remember that ideal alpha helix, the nominal ideal alpha helix, is the point of high density. So pi is 0. So that's here. So it's interesting. It's not by any idea. It's not the maximum. Notice the different scales here and here on the three plots. And this sort of slow rump is the appearance of beta strands. That's what he stands for. That occur starting around 3.6. Now I mentioned before, but there are other types of secondary structures. So called to 10 helices and beta bridges, high helices and so on, high helices and so on. And that's the bottom plot here. And notice again with different scales, it's 15,000, it's 200,000. So I couldn't believe these are blue tokens on the same graph. And that's the reason they're broken up. And what you see is in this exotic tail, everything happens. So down for low free energy and alpha helices, then there's this something interesting, the positive free energy for beta strands. And then when you get out in the tail of free energy, it's anyone's guess everything happens. Okay, so there's another sensible plot to look at. Okay, how are we for? Okay. Namely, let's look now, not at flanking secondary structure, but flanking primary structure. And I was expecting this, you know, I was going to put this in the supplementary material of the paper. I thought this was going to be just boring, but something absolutely dramatic happens. Here's the exotic tail. So again, on the x-axis are the pi values. And here at the 20, the residue types, the amino acids have one letter codes. Here I used the one letter codes. Key is on the very top. And this is what happens in the exotic tail. And first of all, this is glycine. The one that I mentioned, it's just as the hydrogen people as, as residue. And you notice the prevalence of, of glycine out in this, in this exotic tail. And this is presumably as the, as the free energy goes up, the backbone is becoming more and more convoluted and twisted. And you need these small, the small residue to accommodate the, the contortions that, that are being done. But that's not the striking thing about this. The striking thing about this is, is choose some little window in, of pi values. And what you see is that there are certain fellow travelers. So for instance here, Alanine and Bailey, the blue and the gray between 86 and 87 traveled together. So what this strongly suggests is that there are primary structure motifs, small snippets of letters in this, in this 20 letter alphabet, small words that, that govern this, the high free energy. And the exciting thing about this is this might be recoverable with machine learning. And hence there's the potential of, of finding the high free energy sites. And you would see in the subsequent sections, why we care about that. From the primary structural load, no PDB file required. And this would vastly extend the method. So here's a machine learning. This is, seems to be screaming to be done to apply machine learning to this database to try and understand these fellow travelers, which you see with your eyeball. So. Okay. So that's it for the physics. And we see was only sort of physics. It was quasi physics. I guess you could say it. Let's go to our main, let's go around takeaway, takeaway board. And the takeaway board is that the HB free energy. Free energy. Free energies. Can be estimated. Estimated. From our density. And here's a suitable time to pause for questions if you want before we move on to the biology. Okay. So let's move on to the biology. Definitions love our definitions most of all. So let me give the most rose definition. Of virus. Because now we're going to turn our attention to viruses. So doctor most of says that a virus. Has four attributes. First of all, it's an obligate. Intracellular. I'll come back and talk about the words in a second. So. Parasite. Parasite. With. Infectious. Extra cellular stage. Extra. Cellular stage. So it just means it's obligated. It can't reproduce. It can't live its life without being an intracellular parasite. Infectious. So, The social. They all have. So. Stage speaks for itself. Second of all. Has to have. At least one so-called. Cap smear. Cap smear and a cap smear. He's a protein. Just like the ones we have, they discuss me. That. For the so-called capsid. Which is a protein. the genetic material of the virus. The genome is fairly fragile, not only that, but out in the world, out of extracellularly the immune system is especially alert to genetic material. So this capsaumir protects the genome in both these senses. And the capsaum protects the genome in both these senses. And this is after what's for the second axiom. The third axiom is that it replicates by assembly. So you'll see what I mean in a moment. The various pieces of the viral particle are constructed and we'll discuss that. And then they're put together and they're assembled before the viral particles enters its extracellular infectious stage. And finally, and very tellingly, it's subject to pollution. So like I said, here's I think actually quite a beautiful definition of what is a virus. There are estimated to be over a billion different viruses in our planet, which is astounding frankly, over a billion different viruses, different viruses. And they have two, okay, we'll come to the life side in a minute, but there are two kind of herchidelidin tasks they must, or delidin tasks that they must pull off. One of them is, well, this capsaum is pretty small. So the genome has to be pretty small to fit inside it. So they have to very cleverly code enough proteins for their function in a small genome. And so they need to fit in their small, in their small, oh my trouble. Do I even know what to do? No, I'm not sure what happened. I'm trying to get from a swallow. Yes, from a swallow, we need you man. Sorry, I just, I don't even know what I did. Maybe it's a good time to pause for questions. If you're pushing that, go back on to the first one. Sure. Something that, all right. Me, it was, thank you. Okay, so we're about, we're about to go to this. So the two, we got it, thank you. The two peculiar tasks are first of all, to fit all of the protein info into a small genome, small genome. We're not all nervous about the beginning of the work. So I don't know what I did, but okay. So first of all, and second of all, throughout their life cycle, there's the big brother watching, there's the immune system, both in the cell and outside the cell. So they need to trick, trick, let's say evade. Evade, host immune system, both in the cell and outside the cell. And it is absolutely stunning, the clever, cunning, brilliant solutions to these two problems that viruses pull off. There is a collective intelligence to viruses that is absolutely stunning. And of course there's not a neuron to be found, this kind of neural intelligence that we human beings appreciate is not what's in play, what's in play is evolution. And I reminded of this wonderful quote of Leslie Orgel in a slightly different context where he says, evolution is smarter than you are. And boy is that evident with the viruses. It's incredible how they solve these and their other problems and with brilliance and finesse. I mean, it really is stunning collective intelligence. Okay, enough poetry. This is art. Let me show you a diagram of the viral life cycle. In fact, I took something off the internet for hepatitis C because it was in the public domain and Hep C and coronavirus. We're going to turn our attention to coronavirus presently. They're similar enough that this should suffice. So here's a picture of this viral particle. It's an icosahedral capsid for Hep C, not so for coronavirus. It's a different structure. It's what's called, both of them are called enveloped. So outside of the capsid is a lipid and we have fat membrane. The capsid for coronavirus, the capsid is about 100 nanometers diameter. And then this, this lipid bilayer is three to four nanometers. And sticking out of this lipid bilayer are viral, viral glycoproteins. There are actually two different glycoproteins from Hep C and just one for coronavirus called the spike. And that's what's depicted here. The viral particle, the capsid for Hep C, the icosahedral, this lipid bilayer and the spike sticking out. I've learned that if I, in a practice for this lecture, I learned that if I start going into detail, it'll be two hours from now because there's so much intricacy and so much to say. So I'm really just going to regard this as a very rough cartoon. And in fact, it's not even quite accurate for Hep C either. Anyway, this viral particle is taking, cells are welcoming. They say, come on in. And they're going to eat you for dinner, but you're brought in in this endocytic particle. And then the virus has to escape this particle. And it does so with this spike, first of all, the spike lipoprotein mediates the attachment. And second of all, some viruses come in directly through the membrane layer. Hep C and corona come through this endocytic pathway and they're enclosed in a bubble, if you will, a liquid bubble, and they have to bust their way out. So they bust their way out through a so-called fusion. And this part of it will be concentrated on attachment and fusion in a moment. And so here is a picture with no lipid and no spikes. And then this so-called nucleocapsid or whatever this region has to uncoat in the vernacular and uncoats to releases RNA. In both these cases, it's positive sense RNA so that it can immediately start being translated into protein by the host ribosomes. And notice there are two things that must be translated. First of all, the genome. In order to throw out a bunch of offspring, you need a bunch of copy of the genome. So you have to replicate. You also have to do, the virus also have to produce all the proteins necessary for the capsidin and many other functions. And so that's the synthesis stage of replicating and producing all the proteins. Then there's a assembly stage where it's all put together. And then both Tepsi and coronavirus are believed to exit the cell through kind of the reverse of this interstitial pathway, but with a bunch of so-called multidisicular body that carries the your offspring out and buds out and then starts over. So then there's a release phase. So those are the basic stages in the life cycle of virus. And I believe that that was like I said, I learned that if I go into more detail, it gets so complicated and beautiful that we'll run right over. I see somehow, you know what? We lost one of the figures. Francois, we lost one of the figures because here's supposed to be, oh, no. Okay. Fine. Fine. Sorry. Just my. All right. So. I guess I, yeah, there's just a little bit. I want to say, and then I'll give sort of the survey of this first JCB paper. There are two, two structures that we're going to be considering subsequently. One of them is called the RBD, the receptor binding domain. This lies on this viral protein or coronavirus case on the, on the spike, like a protein receptor, receptor binding domain. And that is the part of the spike that. Recognizes it grabs on to the host cell recognizes and grabs on to those cells. In the other, the other structure in the viral protein that we want to study, shall we study? This is called FP or fusion peptide. Because remember, I mentioned that the viral particle either some, for many, for some viruses comes in through the, through the membrane of the cell or further two guys were studying here has to fuse this lipid bilayer with the, the, the boundary of the, of the endosome. So it's kind of somehow puncturing and gets, and get out into the, into the cytoplasm either directly or through this endosomal. Okay. So there's the RBD and fusion peptide. So here's what I did in his first paper in the journal of computational biology. I took five well understood. Viruses and looked at their viral proteins, namely influenza, is perhaps the best study of all, of all enveloped viruses. By the way, I guess I should have said not all viruses are had this lipid bilayer. Those are the envelope ones. They're also non-developed or so-called dated ones. And their attachment and fusion is not really so well understood, except maybe in a couple of cases. But anyway, back to the JCB one paper. I took influenza's viral glycoprotein, actually again, there are two, hemoglutinin is the one that I mentioned before, that carries both the RBD and the fusion peptide. Another virus, para, para mixovirus five. And, tick-borne and sephalitis virus. And another virus called vesiculostobatitis virus. I took these four because those are the ones featured in the paper that I mentioned on fusion peptides. And because they give representatives of the four, there's three classes of fusion, three classes of fusion. These four gave representatives of all three classes. So I took these viral glycoproteins before and after fusion. And this isn't sequence alignment, because it's actually the same molecule. I took these three classes of fusion. These four gave representatives of all three classes. So I took these viral glycoproteins before and after fusion. And then I analyzed the sequence alignment because it's actually the same molecule. I aligned the molecule, I aligned them atom for atom. And then I analyzed what happened in the, in the case of exotic, we had this notion of exotic residues, where the free energies in the, the neighboring guises in the 90th percentile. And in those four cases, with statistical significance, we need the following that exotic implies that at least one, either of the two conformational angles, one of the two conformational angles, changes by at least 180 degrees. Exotic implies either fireside changes, changes by at least 180 degrees. So in other words, exotic implies conformationally active. And this is kind of, here's how, here's maybe how you, there's a reasonable way to think of it. They're the, the, this viral glycoprotein is like a little machine. And with the right stimulus, it wildly reconforms, I'll show you an example of the second for coronavirus, dramatically than wildly reconforms. The trigger for this might be the binding. It might be this, this intercity pathway is actually acidifying. So some viruses use the, the acidity, the lower pH to, to provoke the viral glycoprotein to, to reconform. But anyway, the reconforms dramatically. The term tectonic changes is often used in a literature. So these tectonic changes come about because there are various chemical springs, like, like disulfide bonds that want to form. And you hold those in chat with these high free energy hydrogen bonds. So that you blow on it. That's all it takes to break the hydrogen bonds. Remember, our minus one point at best at one minus one point, five kilocalories per mole will bust them. And high free energy ones are just itching to break. And they're like latches on the gate, this gate that always springs on it. And there are these latches that they break and suddenly the, the molecule can reconform as it needs to for its function. And that's, that's what is at play here. The exotic guys control. They target the high, they target the confirmation change. The Congress is not. They're confirming, which it makes perfect sense. If you have a hinge, they, they, there'd be conformational changes at the hinge, but there's no free energy stored there. The free energy is on the latch, keeping the hinge from moving. So this makes perfect sense that the Congress doesn't hold. So I proved that there really is a perfectly good statistical proof. It's just that the argument that exotic implies conformational change. And it was one more thing I wanted to, to say. And I, and many other examples show that the, so, so I did these four examples in detail. And then I went and looked at, at this all PDB and all of the virus and all the viral backup routines I could find. I forget the fifties or seventies or some huge number of, of viruses. And, but a table made a table of the high, high free energy residues as targets for antiviral, and we're back to the title, targets for antiviral vaccines and drugs because, and what, what I found in this table is that the RBD, RBDs, and the FPs, institutions and RBDs often, I've been even saying, you know, all, always is a tough word in biology because they're able to sustain, but always biologically always are exotic. Okay, so this is, this, these are the two takeaways. No, I guess not. We're, we're now done with the first part of biology. And I'll move to, this was the first type. Do you want a computational biology paper or a take away page? Which is here. Because I do want to say one more thing. So it's like I said, first of all, I'm an exotic, implies, unformationally active. And I'm sorry, I misspoke. I misspoke before. It's not that if it's exotic, that residues find psi changed by at least 180 degrees, but we didn't want to make one along the back, it was sort of an error or one along the back. It was nearby. So exotic implies nearby confirmation. And I'm sorry, I just misspoke. It's not the resume on the nose, but it's maybe the next kind of extensive if you, if you think about it. So that's the first takeaway. I want you to take away the second one. Is that exotic residues. Should be, that's the best I could say. Should be. Should be good. Antiviral targets. For two reasons. One already mentioned. Antiviral targets. For two reasons. One I mentioned in the next I didn't yet. First of all. Because their interruption. For instance, attaching an antibody to them. Should. Again, this word should. Block. Infection. These are being. Fusion peptide and the RTT. Even being incapacitated and the buyers can. Enter the cell, but there's another, there's another reason. In fact, this actually was the first, the very first. Thing on the table. Was exotic. Residues are rare. Remember that was the whole, the whole point is that they, they're, they're not very common in this density on, on. So three. Well, if they're rare in the universe of all proteins, then probably they're rare in the host organism. And if they're rare in the host organism, the drug or vaccine that you're developing won't have side effects. So this is another. This is all just should wish hopeful, should, you know, modal, which should could exotic are rare. So, So. Should. Should. Not. Have. Sign effects. Okay. So here are two reasons for. Thinking these sites might be useful. There's going to be another one. Yeah. There is a question by Alessandra carbone. I'm going. Okay. We'll sound right. You have the model. Yeah. Yeah. There is a question by Alessandra carbone. I'm going. Okay. Do you hear me? Yes, I do. Yes. Okay. Hi. So I had two questions indeed. So I found interesting to talk and. This notion of exotic residues is indeed. Interesting. And I wonder whether you checked. These. Roll a special role of exotic residues in viruses. Where you know. Actually that there is. Conformational change of. For instance, the. The surface protein. That might play a role indeed in fusion. And there are viruses like. Hepatitis C. Where you can see this. And this type of conformational changes. So it would be nice. Eventually. If you. Check. These. Roll a special role of exotic residues in viruses. Where you know. Actually that there is a conformational change. Of. For instance, the. The surface protein. That my player role indeed in fusion. And so. And so. And so. Eventually. If you didn't do it. Maybe. To try to check the theory over them. But if you did. Maybe you can say some words about that. Oh, indeed I did. I'm sorry. If I wasn't clear. In the J.C.B. One paper. For influenza. Perimix of iris five. Tick born encephalitis. And vesicular stomatitis. That's precisely what I did. And in fact, what I didn't say there is. For example, with influenza hemoglobin. It happens quite specifically in the hemoglobin. And it was. There's a narrative discussion in the paper. It's stunning how the exotic residues. Every single exotic residue is explained by function. And every function is explained by the exotic residues. I did. I'm sorry if I wasn't clear. I did do exactly that. Quite carefully with these four examples. And so thank you. And that the same. C. Hep C and hep C is a little more complicated. We don't really. Hep C is not so well understood as the others, which is why I looked at the others. There is a paper in 2018 on hepatitis C. Speaking about indeed these change of conformation. Yeah. Anyway, controversial influence. The four examples are not controversial. Not, not, not even hepatitis C is controversial anyway. So I passed to my second question, baby. And which is about the evolution and co-evolution of residues. And I would like to know whether you, you studied the relation between your exotic residues and this type of properties that you can study. I did not. And that's a very interesting suggestion. The co-evolution is a very interesting suggestion. I haven't looked at that at all. Okay. Thank you. It's a good. That's a valuable, valuable suggestion. Thank you for your questions. Thank you. Okay. Carry on. Okay. I'm going to turn now to the JCB to paper. And let me start though with a, a graphic. When I thought I lost. So just to give you a sense of the huge reconformation that takes place in viral, like a proteins here, I'm looking at the spot on the left is the spike, like a protein of Corona. Call two SARS-CoV-2. Namely the virus that causes COVID. And a little explanation is in order. There is this spotlight region. And then on the top, there are these three heads. And the heads go up and down in the down. So they go up and down here. This is a depiction where two of the heads are down and the one on the far left, the head is up. Sorry. This is what's called the cartoon version of the rendering of a protein where the alpha helices are given by these, in this case, pink helices, pink coils. The beta strands are given by yellow fat arrows, ribbon arrows. And the loops are the other, the other part of the backbone of the other party of the resumes are given by, by these white loops are given in white. So on the left, as I was saying, you see the spike, like a protein for, for SARS-CoV-2 that causes COVID. With one of the heads up, these heads independently go up and down. The interesting thing is when they're up is the, the, the RBD, the receptor binding domain, the receptor is known to be ACE2. And it's only when the head is up that they can bind to ACE2. So there are these three heads. And when they're down, it can't bind. And when they're up, it can bind, the edge of it can bind to the ACE2. Moreover, you should know this is typical behavior for, for a coronaviruses. And in particular, SARS-CoV-1, the SARS infection from 2003, and mayors released in respiratory syndrome, those being the three high morbidity coronaviruses, the neutralizing antibody is also on the edge of the head, and presumably locked to the up position. It keeps it from binding. The fusion peptide, the FP is down at the very second, it's like, I don't know if you can see, I probably not, I'm not even sure you can see, is down at the center of where the three heads are. So when the heads are down, the fusion peptide is shielded from the immune system, as are the, as is the RBD. And this is consistent with all examples I know. It says if the immune system can smell exotic, and therefore the RBD and the FP, which are typically exotic, have to be hidden. And so when the heads are down for COVID, for SARS-CoV-2, both the RBD and FP are hidden. And when they're up, it's vulnerable. So that's the left-hand picture. The right-hand picture is, this was pre-fusion, and the right-hand picture is post-fusion. It's not Corona, it's not, it's not SARS-CoV-2. The PDV file we have there is for another Corona virus, myriad hepatitis virus, so a mouse Corona virus, but the behavior is expected to be essentially the same. And what happens is upon binding, the spike of the protein has two domains, an S1 and an S2. The S1 is the end of the binding part, and it's cleaved away. And then the S2, which contains the fusion peptide, if you remember, it's right in the middle of the three heads. It's like these bent alpha helices straighten out, and the fusion peptide is now up here. It's a weapon ready to stab its way through the, in this case, the endosome. But you see that the level of reconformation is dramatic from the left-hand picture to the right-hand picture. And this is what we're going to try and block in viral, antiviral targets. Okay, so I already mentioned the three human Corona viruses that have high morbidity, SARS, MERS, and COVID, those are the viral diseases. There are also five endemic Corona viruses with not particularly interesting names, OC43-229E, NL63-HKU1, and 4408. Probably everyone in the audience, fully 20% of the colds, human colds, are caused by these guys. So surely everyone in the audience has had one of these. They're not serious and one recovers. In fact, all of the endemic ones, except 4408, are represented in the PDB. And so in this, this GCB2 paper, I, there are about 45, something like that, representatives of spike glycoprotein, the coronavirus human, coronavirus spike glycoproteins in the PDB45 in various configurations of up-down and in different pHs and things like that. So for each of these three plus four, because remember one of them isn't represented in the PDB, for each of them that's represented, and for each of these seven human coronavirus spikes, I chose a representative that is comparable. The heads were all down and, and I did the following. So there's the notion of a bifurcated hydrogen bond with me. We're short on time. Let me not describe it. It's just what you think it's where the, the receptor gets to share two, two hydrogen, gets to share with two other hydrogens. And that's a so-called bifurcated hydrogen bond. And they occur, but they're kind of rare. So I looked at bifurcated hydrogen bonds, where at least one of the two hydrogen bonds had the maximum free energy. It's 925, it's 100th percentile. So it was kind of exotic square. It was as exotic as it could be. And I looked at these, these seven examples for bifurcated hydrogen bonds. And then I was just looking at the chosen representatives where I now finally the sequence alignment. I took sequence alignment for the primary structure of these glycoproteins, these glycoproteins. And then I looked nearby for other bifurcated bonds or other anomalies of free energy. Here's the point. There's actually kind of a sort of interesting new point here is that sequence alignment does not really correspond to structural or functional alignment of proteins, but the belief is nearly so. So in other words, though these two align as words, the structure represented here might be over here, but nearby, not so far. So I refined the sequence alignment by the free energy and hydrogen bond alignment. This is a general technique. This is a new tool to refine sequence alignment into a structural alignment. Anyway, using this, I then looked for things that were for, like I say bifurcated bonds or nearby anomalies that were universal for all of these seven examples. And in fact, there were five such. Then I checked more over that these five such remember, I found them with just chosen representatives, but I made sure they persisted over all 45, PDB files for spike-like proteins. And they nearly did. They certainly, it was a compelling argument that they did. And here they are in a table. So remember there's SARS-CoV-2, SARS-CoV-1 mirrors the ones with high morbidity. And here are the four of the five endemic ones on the lower lines, the four that are represented in the PDB. And listed are the triples of residues. Why are the triples? Well, because it's a bifurcated bond. So you have this residue and also the other two, the other two residues on the other side of the hydrogen bond. So here in this table are listed these aligned sites. There's an accessible surface area that I think I won't go into. It tells you, which is given in this table. It tells you how exposed are the, is the, is the residue. And I put in boldface the residues that were well exposed. And I maybe should even have included site three. And it's sort of a judging call, what is relatively more or less exposed. But the point here, what's the point here? The point here is that we can generate, we can build a vaccine that, or a drug that targets the, maybe we can, if we're lucky, we can build a vaccine or a drug that targets the strain of coronavirus that we're dealing with today. And then a year and a half from now when it's deployed, these RNA viruses have high mutation rates. And there's no guarantee that the variant, the vaccine we try and develop now will be of any utility, but in the mutated version a year and a half from now. And the idea here was to find sites that were universal for all the human coronaviruses with the belief that they will then be invariant under whatever mutations we're going to be dealing with for SARS-CoV-2. So first of all, let me show you, here's a picture. Let me convince you that, oh, sorry, I guess I should say. Yeah, I think I said this. There are five sites. Yes. Just to tell you, it's half past. So you may have 10, 15 minutes, but I just wanted to let you know. I have two more minutes. I'll be finished in two more minutes. We're very close to finish. Thank you. Thanks. I was saying, yeah, there are five of these sites that are universal for all the human coronavirus diseases. And for Corona, for SARS-CoV-2, three of them are especially accessible and visible. And here are those three. Maybe it's not so. This is the region just below the head. There are these three heads that I mentioned, and the JCTH head is also another little low. And if you want the sites A and B are in the low, and the site in figure C is down below. Figure D shows all three of them. So here are sites of, it seems to me, prime utility where prime should be useful in, in drug or vaccine design for Corona's, for SARS-CoV-2. Because not only do they have the two attributes that I mentioned before, I think there should be few side effects and they should interfere because they're high free energy. They should interfere with the function, but also now the universal and should be robust under mutation. I guess just a couple, two more things. This is a little bit misleading because Corona, coronavirus spikes are highly glycosylated. They're covered with sugar. So it looks more accessible maybe than it is, because these regions maybe shielded with sugars. And not only that, in general, there is a kind of a three molecule layer of water around, this is for any virus. So things are not quite as accessible as this picture makes them appear, as this diagram makes them appear. Let me just say there are many other applications for this tool. First of all, any other virus. In fact, my next goal is to maybe try and look at dengue and the four serotypes and see if I can find some universal layer as we did here. At any rate, there are many other places in biology where the proteins reconform and knowing in advance where they do put the use for or significant. For example, in single-francine pathways, tyrosine kinase receptors are well known to undergo dramatic reconformation for prions and amyloids. So for the disease of scrappy and Alzheimer's, this should have applications also for motility proteins by the very nature reconform. So for neurocrest migration in the melanoma metastasis. So let me close finally with the two more biological takeaways. One is there's a third reason that I already alluded to that these residues should be good antiviral targets and that is they should be robust under mutation. That was the whole point of the JCB-2, to be robust under the inevitable mutation. And I guess the other thing I want to say is that we need more generally, VHBs and free energy. Free energy should be, now these should all these motiles should be useful to going forward, useful tool in general in the structure of biology. So I guess just the final sentence is, never mind any of this. The next thing to do is to go into a wet lab and just take all this is phenomenological and take these sites for SARS-CoV-2 and see if they have the desired and required properties. And that's something I'm trying to do is find wet lab collaborators to participate in that. So thank you. Thank you very much Bob. Thank you. We do have questions. So we'll first give the mic to Aurora Clark. Hi. So that was really lovely and I learned a lot. Thank you very much. Thank you. I do have a question because, you know, there's a, and this is just a big picture kind of question in the sense that there's a lot of uncertainty associated with hydrogen atom positions, both from the PVB and, you know, this is one of the reasons why people do a lot of atomistic molecular simulations of proteins, right? To try and refine those hydrogen atom positions. And so I'm wondering, have you done like sensitivity analyses to try and understand how... Yeah, go back to the construction of the positively oriented orthonormal three frame. It didn't use the hydrogen. It just used the backbone and the adjacent oxygen. I understand what you say is the location of the hydrogen atoms themselves is quite problematic. And they just have to be inferred. We can't see them. But I don't use them. I don't use them. Oh, okay. I see, yeah, you're right. I understand exactly now what you mean. That's really interesting. Okay. And so then I have another quick question maybe, and this has to do with the quasi-Boltzmann statistics that you use in the free energy. And so I'm not familiar with the quasi-Boltzmann representation. I think I understand why you used it. But you know, like if you used Boltzmann statistics, and you took more of a statistical mechanics approach and you would say that you would construct a partition function over all kind of populated states associated with the system. And so what is the, from a practical perspective, the change in the population of states as a result of using the... It's different, right? No, I tried to allude to that. This is not a population of states. It's not Boltzmann statistics. It's not as if the residues are jumping around. They're not. It's not there to achieve... Stabilizing a particular protein detail. It is rather an analog. It's an analogy. And the proof, you know, it's a real-life mathematical proof. I think Einstein... A beautiful and substantial argument, but it's different. It's not Boltzmann statistics. It's not something visiting states. You can't make an equivalence between the two. No, they're really just different. It has to do with the statistics. It's, as I said, to the extent that I understand, it has to do with the number of words, the number of primary structures, that stabilize the protein detail you're looking at. It is not as if anything is jumping anywhere. You know, the primary structure of the protein is the primary structure of the protein. That's all you've got. It's not changing. So it's really sort of different. This is explained well in Finkelstein's book, the very first reference that I gave. It's explained quite well. OK, I will go there. Thank you very much. Sure, my pleasure. So now we have a question by Eleni Panagiotu. Hello. Hi. Hi, thank you very much for the talk. I'm not very familiar with the backbone hydrogen bonding. So I wonder how many, if you know, whether these are local or sequence distance? Both. Hydrogen bonds occur both locally along the backbone and also long range. For example, the alpha helices are nearby along the backbone. And there are various turns and so on. However, the hydrogen bonds involved in beta strands can be quite far along the backbone, can even involve different chains, different protein chains, different proteins. So both occur. So it's short range and long range. I understand your question. Yes, yes. And so is there anything noticeable different about them in the exotic ones? Are they in one family or the other? And the answer there is no. Look, you have these clusters. Now let's go back to the clusters. You have these clusters. And at the mode of the cluster, things are very dense so it's not so exotic. Unless, of course, the cluster has only very few members. But in a large cluster like those for the common long range bonds or alpha helices, for example, near the mode, it's not exotic. However, even in that cluster, so in something that you would regard as qualitatively similar, far away are there could be very low density. It's high. It's something very exotic. So there's no particular. You know, I think the answer to your question here comes from the graphic that I put up of the flanking primary structures in the exotic tail is that somehow the answer resides there. Is that there are these motifs of primary structure that themselves confer high free energy. And that has yet to be understood. Like I say, there's a machine learning problem there of it that seems like an interesting thing to pursue. Thank you. So I would have a question. Sure. So the size that you mentioned at some point that for some well-known viruses, one can use exotic residues, you can link them to some function of the corresponding proteins. So for the size that you've isolated in the coronaviruses, somehow what should one do? I mean, hope it's a bit naive. What can one do to try to link each of the sites with a function or? OK, listen, I mean, I should have said this when I did get time. We have no idea the actual mechanics. Remember that I had this picture of pre and post fusion coronavirus spike? We have no idea how we get from one to the other. The mechanics is absolutely not known. It took decades to understand influenza, doing point mutations and so on. We have no clue how to get from one to the other. But what these free energy techniques suggest is however you're getting from here to there, this site matters. And that's why these are suggested as antiviral targets. So just emphasize, we have no clue how the coronavirus spike reconformed pre and post fusion. I guess like completely these tectonic changes are completely unknown. And indeed, that's the case, I would say, for almost all viruses. It's a handful of them where decades of study have uncovered the mechanics of influenza. The four that I mentioned are presumably well understood from all this heavy analysis. Polio is also partly understood that's a naked virus. But it's a handful of them that we really understand the mechanics a priori. And that's kind of, like I said, that's kind of the point. It's now never mind that incredibly complicated and time consuming question. Whatever is the mechanics, here are some sites that probably are involved. And again, I go back to this model that should be involved. OK, so it's some kind of blind testing that's we don't know. We have no idea what we work. We have, like I said, I mean, I can now throw my hands up and say, I'm a biologist. Now we got these sites. Let's go do the biological experiments. Never mind where they came from. But the specific, again, just to emphasize, the specific mechanics of viral glycoprotein reconformation, we know a handful of examples. And that's all. OK, thanks. Sure. I mean, apparently we know Hep C better than I. Apparently there's a 2018 paper Carboni mentioned that I should go study for Hep C that there's more known there than I'm aware. So I don't see any more question in the chat. Should we see if the locals have questions? You guys have any questions? Are you guys tired? Yeah, probably. Nina said the question. Just a quick question. So you found these rare sort of places in the core of this spike that is still folded up. Yes. Don't you expect also to find special things in the hinges of these spikes when they flip out? I don't expect, I sort of alluded to that, I don't really expect free energy there. The free, imagine that there's a spring, like a cysteine, a tinoform, a disulfide bond, that wants it to be like that. So here's the hinge. And the high free energy isn't the cysteine. It isn't the hinge. It's the pin holding the hinge. Like I said, it's like the latch on the gate is the way I'll come to think of it. So the hinges aren't necessarily there. And indeed, hold on. Remember, the implication is just one way. Exotic implies conformationally active, not conversely. So there are all these other conformationally active places, like hinges, that you don't expect will dissipate free energy. Can you find interesting places like the pre and post for creation of your proteins? That's exactly what I did in these four examples. But this precious skew that we even have, we have the pre and post fusion. Period. I mean, I sort of did what I could in these four little examples. There are probably a couple more I could have done. That's it. OK. Sure. OK. Anyway, thank you all for your attention.