 Hello, I'm Robert Penner, and I'm here to explain my work from the past several years that identifies promising sites of interest for antiviral therapeutics and vaccines. With the recent apparent successes of mRNA vaccine technology for COVID, it appears that we've entered a new era of vaccine technology in general. And it seems clear that in this era, it is not so much the delivery or toxicity of potential vaccines that dominates development, but rather the identification of appropriate target epitopes so that the delivery of messenger RNA can elicit immune response. And it is to this purpose that my methods are specialized. For COVID, it seems that the RBD or receptor binding domain is a viable such target. For instance, in the BioNTech Pfizer vaccine, while it is a stabilized mutant of the entire spike molecule that is targeted by Moderna. For other viruses, various targets such as the FP or fusion peptide could provide corresponding vital target epitopes. Or perhaps still other peptides critical for viral binding and penetration could be targeted. And this brings us to the importance of discovering RBDs, FPs, and other critical peptides in viral glycoproteins in the case of viruses where such peptides may not be known a priori or there are not stabilized versions of the entire molecule available. Again, it is to this end that the methods I describe are focused, though I shall concentrate at the end on COVID as a case in point. My method is based on analyzing protein backbone geometry in order to target regions prime for large conformational change. Depends upon a database compiled from the protein data bank or PDB and the method arises from first mathematical principles. The underlying mathematics is easy to explain and it is here that I shall begin. The protein is the contamination of its peptide groups along its backbone as illustrated here. Somewhat simplistically, since we do not fully represent amino acid residues. The backbone is the zig-zagging repeating sequence C-alpha, C-n, C-alpha, C-n, C-alpha, and so on where C stands for carbon and N for nitrogen and these participate in a so-called peptide bond. Only the alpha-eth carbon atoms C-alpha of the constituent amino acid residues lie in the backbone, which runs on for dozens if not typically hundreds or even thousands of such units. A peptide group is the collection of six atoms C-alpha, C-n, C-alpha in the backbone together with the oxygen O bonded to C and the hydrogen H bonded to N as pictured. The important geometric characteristic of a peptide group for us is that these six atoms lie in a plane as is indicated. The standard conformational angles phi m psi, which are also depicted, describe the rotations between planes of consecutive peptide groups. Furthermore, the angles between bonds in a peptide group are very nearly 120 degrees as is indicated. The plane of the peptide group contains the red unit displacement vector from C to N as well as the blue unit displacement vector at 90 degrees counterclockwise rotation in the plane of the peptide group from the red vector. Each peptide group thus determines a pair of red-blue vectors. This is the first main mathematical point that a peptide group determines a pair of red-blue vectors. The NH and CO in one peptide group can participate with the CO and NH of another peptide group through hydrogen bonding with donor N and acceptor O as depicted here. Still with our color scheme of red-blue vectors, but now labeled X, Y, as well as their usual cross product labeled Z and the planes of the peptide groups indicated in gray. The two peptide groups in the figure are labeled I for donor and J for acceptor and the intermediary region between these peptide groups along the backbone is not pictured here. The hydrogen bond might either occur in the direction of the backbone with I less than J or in the other direction with I greater than J and maybe either short range along the backbone with the absolute value of I minus J small or long range along the backbone with this absolute value large. Though the distance between the donor NI plus one and acceptor OJ is always but a couple of angstrom. The acceptor may even participate in a so-called bifurcated hydrogen bond with more than one donor, but this is not pictured here. Suppose that the two peptide groups participate in a hydrogen bond called a backbone hydrogen bond or BHB for short. There is then a pair of red blue vectors, one from the donor and one from the acceptor peptide groups as illustrated. For any pair of pairs of such red blue vectors, there is a unique rotation of space carrying the first one to the second one. As I indicate here with my primitive visual aid, you simply first rotate the first red vector to agree with the second one and then rotate the first blue vector around the line that now contains both red vectors until the blue vectors also agree. So for any BHB with its pair of pairs of red blue vectors, there's a corresponding rotation of space. And this brings us to our second main mathematical point that a backbone hydrogen bond determines a rotation of space carrying the red blue pair of the donor to the red blue pair of the acceptor. For our final mathematical point, we must explain how to draw pictures of rotations. And it is easy. A rotation is determined by a line, its axis of rotation, together with the amount in radians of rotation about it, call it theta. So theta lies between minus pi and pi. We can assign the vector theta times u with direction the same as u, but of length theta in order to uniquely determine the specified rotation. So u times theta is a vector in space of length at most pi. Or in other words, as a displacement vector from the origin, a point in the three-dimensional ball of radius pi. This is the third and final main mathematical point. The rotation associated to a backbone hydrogen bond can be drawn as a vector in space of length at most pi and hence a point in the ball of radius pi. And this leads to our fundamental question about the structural biology of proteins. What is the nature of the collection of all the rotations of all the backbone hydrogen bonds in the protein data bank, or more precisely, in a suitably unbiased representative subset of the PDB? And here is the answer to the basic question. It is the fundamental discovery and the heart of what we shall employ here to study viruses. We investigate the collection of all rotations of BHBs coming from a fixed suitably unbiased representative subset of good experimental quality PDB files called HQ60, where HQ sends for high quality with at most three angstrom resolution and a suitable bound on B factors, if you know of these PDB details. And the 60 indicates at most 60% homology identity of representatives as determined by Dunbrack Lab's Pisces software. HQ60 contains PDB files comprising 1,166,165 BHBs, and surprisingly constitutes only about a third of the volume of the ball of radius pi. Depicted here is the distribution in the ball of all rotations from BHBs in HQ60, together with representative BHB rotations indicated for several points within this ball. Furthermore, within this one third of the volume, the BHB rotations in HQ60 cluster into 30 regions. And this gives a new classification for the geometry of BHBs that reproduces and refines what was known for BHBs that are short range along the backbone and provides the first such classification for long range BHBs. But the classification plays no role in our further considerations, just the full distribution on the ball itself. And so we shall see nothing further about the clustering. If you know about the classical two-dimensional Ramachandran plots that give distributions for conformational angles, pictured here for the four residues nearby an ideal alpha helix, then the ball is the three-dimensional analog for the distribution of rotations of BHBs. This new kind of plot should have the same impact on all aspects of protein theory as the Ramachandran plots. Here is another representation of the distribution in slices of the ball from north to south pole, illustrating the fine detail of the distribution of BHB geometry that we have computed. It is this distribution that will play the crucial role in our sequel. It's colored by density where the RyGB color is linear in the density within an 81 by 81 by 81 grid where the peak density is 19,000, corresponding to the reddest spot in the fourth box down and the fourth box over from the top left corner, which turns out to correspond to the BHB of an interior turn of an ideal alpha helix as one would expect. It is worth emphasizing that these findings and the basic properties of this distribution are robust over all subsets of the PDB probe with various homology identity and quality cutoffs. It is also worth pointing out that a density functional theory solution of the Schrodinger equation for the 12 atoms comprising two peptide groups also reproduces these basic properties of the distribution but without the fine detail of the specific clusters within the one third of volume achieved. Furthermore, the constraints leading to this distribution are not simply steric since an excess of 95% of the volume of the ball is achievable by pairs of peptide groups at a distance scale consistent with backbone hydrogen bonds. Thus, is this empirically discovered distribution born out also by theoretical calculation? However, without compelling explanation, other than that the DFT results suggest it is somehow a constraint of the quantum chemistry rather than a steric constraint. I shall next briefly explain how we can employ this distribution of BHB rotations using a basic tool of protein theory called the Paul-Finklestein-Quasi-Boltzmann Anzatz which is the following law. Occurrence of any protein detail is proportional to the exponential of the negative free energy F divided by KTC where K is the usual Boltzmann constant and Tc is the so-called conformational temperature, essentially the protein melting temperature. This was first observed by Paul in the 1970s and finally proved by Finkelstein and collaborators in the 1990s. The upshot for us is that we can use the distribution of rotations of BHBs from HQ60 to predict free energies where the sparse regions of the distribution correspond to large free energy. Let us emphasize the input to the method is any PDB file and the output is an estimate of the free energy of the protein feature associated with its BHBs based upon the a priori distribution derived from the database HQ60. Specifically, we define pi of P to be the logarithm of D of M over D of P where D of M is the maximum density 19,000 over the entire HQ60 distribution and D P is the density at the subject rotation P. Notice that the natural units for pi are in terms of 1K Tc but for simplicity we shall suppress this in the sequel and refer simply to pi values. Usually it is only differences of free energies that can be computed in this manner but we shall argue in a moment that these pi values can be taken as absolute approximations across different peptides in different proteins. The upshot is that the so-called exotic rotations in the 90th percentile for which the distribution from HQ60 is relatively sparse are distinguished by the fact that they correspond to protein details of large free energy. Here on the top is a plot of the histogram of pi values across HQ60 and its correlation in the plot immediately below with alpha helices and beta strands and on the bottom plot with the other various secondary structure types. Notice the different scales on the two lower plots the prevalence of alpha helices for low free energy and the mixture of secondary structure types for high free energy. As was mentioned usually only differences of free energies make sense but here the free energy of the ideal alpha helix which has pi value zero as was also mentioned before has been computed theoretically at minus two kilocalories per mole and so all pi values may be simply compared across different peptides and different proteins. We admit there is a small swindle here since the conformational temperature varies from one protein to another but not really so much as to disturb the utility of this approximation. I include here parenthetically a most surprising finding from the histogram of amino acid residues flanking of BHB across the tail of exotic free energies. One observes that families of flanking amino acids vary together in lock step in various regions. This strongly suggests that there are specific primary structure motifs that correspond to particular regimes of high free energy and this suggests the likely utility of machine learning to liberate these methods from acquiring the structure of a PDB file in order to predict peptides of high free energy from the sequence of amino acids alone. This is work for the future. So what has all this to do with viral diseases and COVID in particular? Well most features of a protein in general must have low free energy in order to stabilize the structure there are also energy defects as reflected by exotic protein details. Such exotic features occur somewhat infrequently and presumably arise for functional reasons. Preserved by evolution and compensated by other low free energy regions because it is required for protein function especially in cases where the function consists of conformational change. An unstable feature will be more likely to change conformation in a biologically reasonable time while a stable feature without defects would take too long to reorganize. Receptor binding and fusion peptides are just such cases as their function is connected with conformational change. More generally exotic peptides provide sites of interest in viral glycoproteins because their obstruction should interrupt function since their energetic cost should imply some conformationally functional dependence of the protein. Viral glycoproteins because of their typical tectonic reconformation from pre to post fusion thus provide a natural laboratory to test and then exploit the hypothesis that exotic protein features have functional consequence. More precisely, first of all recall that a protein feature is exotic if the BHB stabilizing it lies in the 90th percentile of PI values. And furthermore, say that a residue itself is exotic if at least one of its adjacent backbone oxygen or nitrogen atoms participates in an exotic feature. And finally say that a residue is conformationally active if at least one of its conformational angles changes by at least 180 degrees. Then the precise hypothesis to be tested for viral glycoproteins is that an exotic residue lies within one residue along the backbone of a conformationally active one in the transition from pre to post fusion conformations. We do not however assert the converse the conformational activity implies exoticness it's not valid. We test our hypothesis in four explicit examples where the mechanics of pre to post fusion reconformation are relatively well understood. Namely, we consider the fusion glycoproteins of influenza type A, paramedics of virus, tick board encephalitis and vesicular stomatitis virus. In each case the PDB contains both pre and post fusion structures so we can compare PDB files for their viral glycoproteins in the two conformations. These are illustrated here in the respective rows where the color scheme is that blue is non-exotic, yellow is 90 to 95th percentile, orange 95th to 99th percentile and red is the top percentile. Where 90th, 95th, 99th and 100th percentile cutoffs correspond to respective pi values 7.5, 8.5, 9.5 and 9.85. Across each row the one or two figures on the left are pre fusion and on the right post fusion in the order full polymer, monomer, monomer, full polymer. First of all there's a narrative comparison one residue at a time of free energy relative to known reconformation mechanisms in these four examples. And it is fair to say that the predictions of conformational activity based upon pi values compared to known function are remarkable for influenza and quite compelling for the others. But we shall not elaborate further about this narrative which is given in the paper. For a more analytical hypothesis test this first table lists the total number of residues common to pre and post fusion structures where comparison can be made. And among these the number of exotic and active residues as well as the number of residues at distance one and distance greater than one from active ones. This second table presents a histogram of distances from exotic residues to the nearest conformationally active one. Where the data is presented as a fraction with the denominator the number of data points where the free energy is conserved within one residue along the backbone between pre and post fusion conformations and the numerator the number of dissipated examples where it is not. The first P value test our hypothesis from before that an exotic residue lies within one residue along the backbone of a conformationally active one with respect to the natural trinomial distributions and probability tails in the four examples and are quite satisfactory. In fact, VSV is special because it can oscillate between pre and post fusion conformations. So if the exotic free energy is conserved between these states then it is ignored for the hypothesis test in the computation of second P values. Also worth mentioning the pre and post fusion PDB files for paramexovirus are for different strains giving plausible explanations for its less compelling P values. Further evidence supporting the hypothesis is given by computing exotic residues for a comprehensive collection of relevant viral glycoprotein PDB files pre or post fusion as of about a year ago. This omnibus table enumerates exotic BHBs in pairs donor over acceptor with the same percentile cutoffs from before. In the cases that FPs as indicated in boldface or RBDs are known, the method accurately identifies them except in a few cases. When these peptides are not known these methods therefore provide novel predictions for them assuming they have high free energy. This is an extensive table for enveloped viruses. There is a similar table for a number of naked or non-enveloped viruses where little is known about FPs and RBDs are understood in just a few cases. This database should provide useful targets for mRNA vaccines across the entire universe of viruses whose structures are known from a PDB file. Let us emphasize, choose a virus from this table. We may take the several most exotic sites in the table for this virus in order to determine sites of interest. Then reverse translate these exotic regions in order to derive their coding messenger RNAs. Package them together in the one size fits all lipid nanoparticles for mRNA delivery. And voila, you've designed a prospective multivalent vaccine for the specified virus. This illustrates the power of mRNA vaccine technology in general and indeed the potential utility also of these tables and methods. There are more refined embodiments of the methods as well as we next describe for SARS-CoV-2. I did something a little more involved for COVID as follows. There are eight known coronaviruses which infect humans, seven of which are represented in the PDB in multiple pre-fusion confirmations in roughly 50 structures. It is worth mentioning that though the post-fusion confirmation is known for several of these, including the SARS-CoV-2 virus which causes COVID, the pre to post-fusion mechanism is not well understood in contrast to influenza, for example. There are several experimental techniques employed in these PDB files in order to freeze the spike like a protein in its pre-fusion confirmation. The most prevalent of which appears to be effective for all seven human coronaviruses, namely the so-called 2P mutation where two residues at the top of the central helix in the S2 subunit are replaced by pro-lead. In order to align these 50 PDB files across the seven represented human coronaviruses, I started with bifurcated BHBs in each case, at least one component BHB of which had pi value in excess of 9.0, which is roughly the 97.5th percentile cutoff. Then all seven genomes were aligned as usual using Clostal Omega in order to determine for each exotic bifurcated bond if there is a nearby very exotic BHB, bifurcated or not, though often so, in each of the other six viruses, where nearby means within seven residues of the bifurcated acceptor and very exotic means pi greater than 9.5 or in the 99th percentile. The idea is that homology alignment is not exactly functional alignment, but nearly so in the sense that any functional alignment should at least be nearby along the backbone. So the scheme here is to find functional sites that are conserved across all seven human coronaviruses in the sense just made precise. The idea of taking only these conserved sites is that when desires epitopes that will be preserved across strains so that the vaccine developed at this moment for SARS-CoV-2 will still be effective for later deployment across different mutant strains. And functional sites conserved across different coronaviruses should be of universal function and therefore also be conserved across different strains of the same coronavirus. Five such conserved sites of interest were discovered as indicated in this table, where the entries are typically triples, namely the three residues involved in a bifurcated BHB specified as a triple of residue numbers below which are given the corresponding triple of one letter amino acid codes followed by the accessible surface areas in square angstrom. Entries listed in boldface are especially accessible in the so-called all heads down confirmation of the spike. The RBD, which is an apparently useful target for SARS-CoV-2 according to the success of the BioNTech Pfizer messenger RNA vaccine was successfully identified on the sole basis of free energy as our site number three. Note that the table provides analogous targets for all the human coronaviruses at once, even though different peptides, so different potential mRNA vaccine targets are indicated in each case. It is the equivalence of presumed function through combined homology and free energy that link the various sites for different viruses. Here is a figure illustrating the five sites of interest for the SARS-CoV-2 spike like a protein, each of which is comprised of a triple of residues with the same color. More than one target epitope is desirable in a vaccine, so-called multivalency, in order to protect against evolutionary vaccine evasion by the virus. So a natural conclusion from my analysis might be to extend the BioNTech Pfizer vaccine to target all five conserved sites in SARS-CoV-2. The Moderna vaccine skirts the issue of targeting specific peptides by delivering the mRNA for the entire spike, relying however on the spike being stabilized in its pre-fusion confirmation by the 2P mutation. AstraZeneca-Oxford vaccine relies on another approach entirely, delivering the native spike without any stabilizing mutation in a replication incompetent adenovirus. Our methods thus arise from first mathematical principles, leading to sites of interest as potential antiviral targets. mRNA vaccines can directly be engineered to such sites, so our methods go hand in hand with these breakthroughs in vaccine technology. Not only in cases where knowledge of promising epitopes is not otherwise available, but also for example, for robustness against mutation. As in our approach to SARS-CoV-2, we hope these methods may provide a useful tool going forward. I think you can say we've been lucky with the Moderna and BioNTech Pfizer mRNA vaccines. First of all, delivery and cellular uptake are successful using lipid nanoparticles. For the former vaccine, there is this 2P pre-fusion stabilized mutation, patented in 2016, without which the whole spike approach might not have worked. And for the latter, the RBD for SARS-CoV-2 was known in analogy to its cousin SARS-CoV-1, but for the next viral outbreak, where such stabilization or a priori data may be lacking, my methods already provide multivalent targeting as evidenced by the extensive tables. More generally, my method predicts function from structure, the tertiary structure of a PDB file in its current embodiment, and potentially from the primary structure alone using machine learning, as was mentioned before. It is thus a tool of greater utility and structural biology than only vaccine development. In particular, the technique employed for human coronaviruses of supplementing homology alignment with free energy alignment in order to infer functional alignment is also of potential greater utility. With that, let me close by giving references for the work we have discussed. First is the text on proteins by Finkelstein and Petitzen, an absolutely marvelous book starting from first principles. Next, a preliminary math paper from 2010 that discusses the topology of protein. And then the nature paper from 2014, which introduced and computed the distribution on the ball of all rotations, which we have discussed, and furthermore, studied the clustering of the BHBs. Then a survey paper from 2016 from mathematicians, also starting from first principles, on the techniques we have discussed for proteins, as well as related tools and results for complexes of RNA molecules. And finally, the two JCB papers from this year about which I have spoken in detail today.