 Luca, and he will talk about the influence of mutations on the compactness of viral single stranded RNA genomes in continuation of the work that we started in the morning. Thank you. Thank you very much for the introduction, Roja. So, well, first of all, I want to thank the organizer for inviting me here. You know, the first, I consider a bit at the beginning of my career as a scientist coming here to ICTP for a conference just before my PhD started, and of course I was among the audience. I wasn't presenting anything. And so it's a bit of an emotion being the first, for the first time on the other side of the auditorium in ICTP. So as Roja said, I will talk about the effect of synonymous mutations in single stranded viral RNA genomes, their impact on the size of the, of say, the genomes. And I will also present some new results about the effect of the distributions of mutations on, say, the genomes. Now, this work, many of you will have heard the first half of this presentation, and the second half will be dedicated to the new result. For those who haven't, well, it all started when I was in Ljubljana working with Trudy. I started working on RNAs with Ange. Also was finishing his thesis. One night coming back from Trieste to Ljubljana by car with Christian who was going, coming there for a, to give a seminar, we started talking about Bill's work, and Christian mentioned, okay, but you know, I would very like to know what happens if you consider synonymous mutations, because I've asked them and I couldn't, had not looked yet into it, and okay, we can try to look into that. And this came out of it, I'm very happy about this collaboration. So, well, the introduction, I believe I can skip by now since we had it from Galbart, from Bill and from Avi, also part of the result. But yeah, it all starts from the different packaging mechanisms between double-stranded DNA viruses and the single-stranded RNA viruses, and Bill very efficiently pointed out with that slide showing how much bigger the double-stranded DNA is than the single-stranded RNA, it becomes clear how the phenotypic properties of these two polymers will influence how they are packed into the capsules. So you require a motor with double-stranded DNA viruses while a single-stranded RNA is a self-assembly process. And again, as was pointed out this morning, in vivo, the right RNA is recognized by the capsules. How that is so, it's a very fascinating object of research. So there are, as Raiden pointed out, packaging signals in MS-2, some other viruses are believed to not have them, actually. Size turns out to be important, and perhaps there will be also several other things, but those are for the moment the ones that are known, and of course, electrostatic plays a very huge role in all of this. So talking about the size of the RNA fold, as Avi pointed out, there is a correlation between the size of the viral RNA and the size of the capsules, which pointed them to study, now it's kind of a temporal loop I'm presenting, your old result after my... Look at them to study what was the size of viral RNAs compared to random RNAs. So Avi already presented all of this this morning, but I will have the slide against this. It is the starting point for basically what we did. Long story short, you can, as I said, associate a measure to an RNA fold, which is its diameter, the maximum ladder distance. You have to consider an ensemble, a thermal ensemble of folds, since they are so large that the minimum free energy fold will not tell you much. There are very many folds within one KBT of energy, and if you then consider what is the scaling for a random RNA which has a composition, the average nucleotide composition of viral RNA, you obtain that power law, which goes this, and power two-third, and instead if you consider, if you measure numerically, what is the size of the viral RNA coming from icosahedral capsules, you get the points, oh, I do have the pointer, you get the points I, well, you get the point, why did it change slide, sorry, okay, my bad, so you, you get the points again, you, I should have tried those before, you get the points below. So the viral RNA is more, is more compact. It's seen also in experiment, this slide also was seen before, and yet as it was already said, the interesting part is that really, if you get the radius of gyration from TSMLD, you get a size which is comparable with that of the caps, of the mature capsid. If you look at the tracing of RNA in cryoAM, and the size of the capsid, again, you get a comparable size. The question is, is this compactness, this phenotypic characteristic of the genome, something which is selected for independently of other properties, or is it a consequence of some other evolutionary pressure of the virus, which are acting on the virus? The first one to check is the requirement for the virus to encode for its own proteins, and that's what we set out to study. So I will go on a bit, the first part of the talk will be dedicated to this question, and this is the plan for it, that's what we followed. We went, first of all, we recovered the result, just to be sure we had everything set all right. Then we started to mutate the viral RNAs with stricter and stricter constraints. So we started by including the denucleotide frequencies, which allow you to distinguish between viral families and the other. Then we allowed the only synonymous mutation which preserved the protein product, and finally we conserved also the untranslated regions at the beginning of the end of the genomes, which are important for the secondary structure which is present there, and the codon frequencies that will come to spring in more detail later. So recovering your result, well, we consider the same viral RNA compositions for the power law here at the beginning, but we also consider the Timoviridae like random RNA sequences. The Timoviridae were not considered in the original study because they have quite different composition, and it turns out that those changes affect only the pre-factor while the scaling is basically the same. So having this set, we went on, so all was done with the RNAs in our case, by the way, we went on including the first constraint, which is the denucleotide frequencies, that is, how frequent a pair of nucleotides appear in the virus with respect to a random population. And as you see here from the graphs of all different families we consider, you can really distinguish one from the other if you check. And we put this constraint in with fictitious energy basically in our Monte Carlo scheme, which was then, basically we only kept in the end those sequences which were within a certain distance from the average denucleotide frequencies we wanted. Synonymous mutations was the next constraint, so I believe all of you already know, but normally we got from DNA strands to a messenger RNA to protein synthesis, there is a triplet, so each amino acid is encoded by three nucleotides, and that's how we go from the four-code letter of the nucleic acid to the 20-letters alphabet of the proteins. And the main point is that if you consider how you can translate triplets of four nucleotides of four-letter alphabet into an alphabet with 20 letters, it turns out you have, well, 64 possible triplets, only 20 letters out of it, three of those triplets code for stop sequences, so there must be some degeneracy in the product. There will be several triplets which code for the same amino acid, and that is what we call the genetic code in the end, so here it is, taken directly from the Albert's book, and you see that some amino acids are very degenerate and others are not. And the question is, are the mutations which preserve the protein product, so if you mutate, for example, a GGU triplet only within one of these three, so that we still have glycine, are those neutral or do they disrupt the fold, the RNA fold? So we said to find out that this is our method, we have Monte Carlo scheme, mutations which change amino acid are rejected, there is an energy on the denucleotide frequencies and then we take, we filter the sequences up posteriori. We consider 122 viruses, for each one we kept between 500 and 2,000 sequences, and for each sequences we produce 500 folds on which then, I mean a thermal ensemble of 500 folds on which then we evaluated the MLD. Just to check, the nucleotide frequencies turn out to be about right, if we fix all of this. This is the result which I already saw this morning. As you can see, doing this, so only synonymous mutation and maintaining the denucleotide frequencies brings the MLD, average MLD of the viral genomes all the way up to the scaling law for random RNAs. Which is pretty interesting, we were very happy when we found out this result, because we was confirming a beautifully Bill and Navi paper, and we decided also to go on and include further constraints later. But before doing that, we wanted to see, since those were mutated really as much as possible, we wanted to see how many mutations it took to destroy this compactness. In order to do that, we considered only four viruses, four genomes, and we started to do what we call the mutation dynamics, which basically instead of printing the sequences out when they were really independent of each other, we printed them out every few steps and computed the MLD again for each one of them. And here you can see how it goes. So this is what Monte Carlo steps basically, depends on the length of the genome, but you see that almost immediately the average MLD goes all the way to the value we have at saturation of the mutations, so the random RNA one basically. This is a time of virida, which is a bit more strange in behavior. On the right-hand side, you have a sequence similarity instead of time, and you can see, for example here, that when the sequence similarity is reduced by 5, 10%, well, we have already destroyed the compactness basically. And it depends a bit on the virus, but for this two, for the BMV, for example, 5, 10% is enough. This will become important in the second part of the talk. Also an interesting part, which will be further seen later, is that if you look at the very beginning here, when there are few mutations, those can enhance the compactness even. Why? Well, arguably because if a virus or an organism, or that matters, optimizes for some phenotypic product, you will have to do a compromise with something else. So no optimization is complete. We want to stay in a spot, perhaps, that allows you some possibility to optimize something else if the environment around you requires so. So as I was saying, we moved on to include two other constraints, which are the untranslated regions at the beginning on the end, we left untouched. We mutated only the genes, and we conserved the codon usage bias. What I mean by codon usage bias, if someone doesn't know, as I pointed out before, there are several equivalent codons, which code for the same amino acid. But depending on the organism, one codon is usually more represented than another one. So they're not really equivalent in the cells, since you have different proportions of the equivalent ones. In order to take into account this, we shuffled the equivalent codons within the genomes. So in this case, we did not conserve the denucleotide frequencies, but we did conserve nucleotide frequencies and codon frequencies. And those are the orange points. And you see that still, even with these far too strong constraints, the mutations do destroy the compactness of the virus. So from this, we can conclude with some safety that, yeah, the compactness appeared to be evolutionarily selected. Now going to the second part of the talk, as Avi was pointing out this morning, there is the question, what causes this compactness? And we really saw that the pairing frequencies are the same between the mutated, as you said, between the mutated viruses, mutated sequences and the wild type sequences. The length of the duplexes, average length of the duplexes is the same. Even if you keep the same proportion of a degree of range in point, the same distribution, you can have, if you move them around randomly, you do have a size which is larger. Some observation they did, I'm always starting from your papers, it appears, is that in fact considering viral RNAs, we have that high degree branching points tend to cluster in the center of the fold, basically. And that does reduce the size of the fold. So from this, we started to wonder if there is perhaps on the sequence some hot spots, some stretches of sequence which forces the high degree branching points to come near each other, basically. So the final question we would like to answer is, of course, what does cause the fold compactness? Is there a particular property of the sequence, a code, if you want, that causes for this? We are moving toward trying to answer that step by step. And the first thing we looked at after the first study is if there is simply some stretches mutating which you destroy the compactness, or if there is not. So to answer this, we took into consideration two viral genomes, BMVR and A2, and Phage MS2, which are believed to have very different packaging mechanisms, MS2 as we have seen as some strong packaging signals, BMVR and A2 is believed to be mostly electrostatic. They are both hand-sized, they are both compact, and they are about the same size from 2,600 to 3,500 nucleotides, BMVR and A2 MS2. And on these two viruses, we consider the different kind of mutations distribution, localized mutations if you want. So first, we consider batches of mutations which are local on the genome. So up here, we mutated everything in the orange window, then we moved it, we moved it, we moved it, and so on. Then we consider the average fold, and we mutated the transportions of the fold, which are first the most central, and then we moved out to the least central. And finally, for a comparison, we took into account completely randomly placed mutations, these first ones. For each window, we produced 100 independent sequences, and for each sequence, 500 folds, again, we conserved only the denuclearized frequencies this time, where the mutations are not synonymous, we wanted to keep it as general as possible. So let's go on with the first, perhaps the movie will clarify it a bit. The first one is, we simply move the window around, each time we produce 100 independent sequences, and for each one of those, we compute the MLD. We consider then different sizes of the windows to see if there is some specific contiguous part of the genome which encode for the compactness. If there is, at the point, we will mutate one, and everything will explode. That is the idea. So here you can see, for BNV-RNH2 and FHMS2, wild type expected average MLD, shuffled genome, which is basically the power load, and for two different sizes of the mutation blocks, 120 nucleotides in green and 720 nucleotides in green, what is the MLD along the genome? So here you have the center of each windows. And there is not much happening, I mean, for BNV-RNH2, there is a spot in the center which is a bit more sensible to this kind of mutations, but even if we consider 720 nucleotides, which correspond to 20%, by the way, so this is what, with the synonymous mutation was disrupting MLD as for the quantity, even in that case, we don't reach the power load. For FHMS2, things are even worse, if you want, so that we don't reach it even less. Below on the bottom row, you can see the histograms of the MLD obtained combining all the values of MLD for all mutations sites, and really, it doesn't change so much from the wild type distribution. For the small windows, basically it doesn't change at all. So we went on then wondering if perhaps it's being central on the fold, which is important, so it's not a contiguous batch along the genome which encodes for this property, but some different batches which come together to form the center. And to do so, we basically put a measure of centrality, which I described in the next slide, and we ranked with that all the nucleotides of the fold, and we formed our windows depending on our batches of mutation, depending on how central the nucleotide is. So it goes like this. From the most central one, our mutation moves outside till they reach the tip of the average fold, and here you can see what would be the most central batch, which on the genome is like this, and the least central batch, which corresponds to the three tips, which is given by three different stretches of the genome. As a measure of centrality, we use the ladder distance again. So if you consider the ladder distance, the set of points which are the lowest ladder distance to each other point on the fold define the center, basically. And you can consider for each nucleotide what is its ladder distance to this set of points, and that is our measure of centrality, because the lower this ladder distance, the most central we are. This, for those of you who are familiar with graph theory, is basically equivalent to a centricity, which would be the maximum distance from one point to each one, every other point on the graph, and the point with the lowest centricity is the center of the graph. So we can get this measure, we average it over the ensemble, so we get an average distance from the center, and we rank our mutations according to that. Here you can see from the BNB RNA2 and the Phage MS2 the profiles of this distance from the center, the average, and in gray, we have the standard deviations from this average profile, and you see that near the center, the deviation is not so much for BNB RNA2, but MS2 is a bit more noisy, but still, if we consider a big enough window, we are safe that even on different folds we are keeping the central, the most central regions. So now we consider the most central part, 20% of the genome, the least central 20% of the fold, sorry, and also the one which will be in the middle, and you can see the effect of the mutations here, unexpectedly actually the most central, the most central portion is the one which causes the least disruption, so if you mutate the most central portions of the genomes, we are influencing the fold, the least. While the other two, the middle and far 20%, have more or less the same effect. Still, none of these reaches the shuffled genome distribution. Finally, to have a comparison, we took into account dispersed mutations, so, okay, so mutations which were, as you can see, are just randomly taken around the genome, and so we have more or less the effect that we had with the synonymous one, if you want, they are uniformly distributed. And in this case, going from 120 to 720 nucleotide, we really see the change toward the shuffled genome distribution. And comparing these different kind of mutations, sorry, to go back to the previous slide, comparing these different kind of mutations, so, block, the centrality rank, and the dispersed, sorry, we can diffuse there, we see that the dispersed mutations which are in dark blue here are the only one which reaches the power law, basically, for the MLD, where we mutate 20% of the genome, which was the amount of mutation which were seen to be disruptive in our previous study. So, from this, what we get is that whatever property is encoding for the compactness is really a global property of the RNA in our, or at least it is on a scale which is bigger than the stretch of 20% of the RNA. Yeah, we went up to 720 nucleotides, perhaps if we went up to a third or so, we would have seen something different. I don't know, but the results point out to it really being a global property. So, wrapping up, sorry, should be on time. The take home message is that, yeah, viral compactness appears to be evolutionarily selected for, and it appears also to be resilient to localized mutations. The open questions that remains, of course, is what is causing this viral, this compactness, what is the code, if there is a code, the genomes, and we don't have an answer with this, we would like to. So, thank you very much for your attention and to all those who contributed to the study.