 of this workshop after the meeting is over. So Rudy is the editor of the Journal of Biological Physics. So we are going to have, I mean, you can write new articles, review articles, short one, long one. But we tell you about the deadline. Rudy will communicate with each of you about the proceeding. OK, so let's start. The first talk of the day is Arne. He is going to talk about Arne versus ideal randomly branch polymers. And you change the title. And Rupert Shuffle. I change the title, I don't know. OK, so let's start. Good morning. I thank all three organizers for this invitation. And being here, I think I'm the oldest in the audience. So I said I'm the oldest. I know. It's not really directly about viruses. It's a little piece of theory that I will tell you that we were doing for about 10 years. And starting with the thesis on your first UCLA. And it was done with my good friend Bill Gelbot, who is sitting here. Actually, we started working on viruses around the year 2000, first on the theory of phage ejection. And in fact, I wanted to talk about phages and loading. Because I like the stories about entropy and energy, which I like very much. But I saw Rudy in Israel a few months ago. I said, no, that's too complicated. Talk about RNA. So we'll see. The work was done with students of Bill Gelbot. Because I quit taking students about 10 years ago. But I get the manpower from UCLA. They visited me in Jerusalem. In reverse chronological order, Walter Sohenda Shingaram, who is now with Mike Hagen, a J who is a company in Los Angeles, who did beautiful work, especially cryo-electron microscopy, Lee Tai, who is somewhere in San Francisco, who visited me and we did some theory after he did experiments with Bill. And the first was Aaron Yoffe, as I told you. And the plan of the talk, I will try to finish on time. I will first compare viral RNA to random sequence RNA. This was one of the first exercises we checked years ago. Then I will mention, in one or two slides, the work of Luca Tubiana, who will talk later in more detail, which is relevant. And then I will go from viral versus synonymously mutated RNAs, a few words. And then I will try to see whether random sequence RNA, which set an upper limit on viral RNA, upper limit in the sense of the radius of gyration, as you will see, I will check with you whether it is similar or different from randomly branched polymers. The key words you can see here, MLD, maximum level distance, RG, Cramers, a theory of calculating RGs. I'll talk about three graphs, vertex degree distribution. And I'll tell you about the profile shuffle. I guess most of you don't know what it is, but it's a nice trick that we have recently used. It's in Hebrew. It goes right to left here. So we started long ago when we learned, I learned, that there is correlation between the size of the RNA and the size of the capsule. I think is a very nice example where the virus divides its genome into three pieces, all about 3,000 nucleotides, all packaged in identical T equals 3 cup seeds. Then what you see here at the bottom is a cup seed. That's now yesterday in Christian poster. You could see much more advanced cryo-EM pictures. But what you see here is a RNA packaged inside, what is it? BMW or CCMV cup seed? You see the RNA outside. And you see that in order to put it into the cup seed, you don't have to squeeze it very much. You have to, but not very much, not like DNA in phages. A more recent example also from the lab at UCLA, at least three of the people who did the experiments are sitting here or should be sitting here. And here you take RNA of different lengths. And you see that if you take 3,000 nucleotides, it forms the expected T equals 3. It takes 6,000. And they form a dimer, take 9,000, a dimer. Take 12,000 and they form a tetrahedron of identical cup seeds. That's just to ensure that there is a correlation, of course, between cup seed size and RNA length. And people here did theories about the relation between the size of the RNA, the curvature, the spontaneous curvature of the proteins, et cetera. So to characterize the size of RNA, first we start with a secondary structure. Here is a simple example that you see. And we adopted from Bunchu and Noir paper, 2002, what they call the ladder distance. The ladder distance is you take two base pairs and you count how many ranks, how many base pairs you have between them. And in particular, we looked at the maximum ladder distance that's the largest herping to herping distance. But that's a measure of the size of the RNA. Here you have an example. This is a viral. I should use this one, I guess. This is a viral RNA. And you see that it's MLD, about 200. That's how many base pairs you have from this point to this point, the longest. It's comparable to what you have between here and here, here and there. But if you go to a non-viral RNA, in this case, it is a random sequence. You take the same base composition of this RNA. You shuffle it randomly. You go to one of these softwares, RNA fold or M fold. You calculate the secondary structure and you see that it's a big increase in the size, suggesting that indeed viral RNAs are more compact than non-viral RNAs. It holds also for biological non-viral like yeast, et cetera, RNA. Now, this is from the thesis of Aaron and paper we published together. These are different viruses that are MLD as a function of sequence length. And what you have here is you take the sequence. You randomly shuffle it and you get a very nice straight line with a slope of 2-thirds. The MLD scales like n to the 2-thirds n being the number of nucleotides. These guys here sit on the random sequence line, but they are not icosahedral. So why are they here? Here I come to the work done by people sitting here in the audience. What they did is a very elegant idea, we think, takes a secondary structure, say, this is an example that I made before this talk. You have a duplex. Now, suppose you do a synonymous mutation. You replace some of the nucleotides by others, but without changing, you see the serine and arginine, still serine and arginine. But instead of base pairs, you spoil the structure. It's becoming unstable and the structure opens. The genome is still the same, but the structure is different. Seems bigger here. But it can go the other way around, too. And this is what they did a few years ago. These are the viruses, the MLDs. You synonymously mutated them, and they go on the line of the randomly branched RNA slope 2-thirds again. And you will hear more about them soon. As a measure of the size, we want to go for the secondary structure, which if you want is 2D size to the 3D size. And what we did, we asserted, we assumed, we conjectured. We took the MLD. We could take this one, we could take this one, we could take anything. By the way, the scale is the same. If you take the average helping to help in distance, it will also scale like 2-thirds. And we said that to get the 3D, the radius of generation, all we need to do is take the square root assuming ideal polymer behavior, namely, ignoring extrude volume interactions. Years later, is Shura here? No, he's here. Shura, that's from your book. You recognize. But years later, I saw that you have this argument to prove the scaling 1-fourth. I'll show you if we have time. Anyway, what they did here, they also said, not they also said, we also said, but we didn't know that they also said. Anyway, you take the helping to helping, you calculate the distance here. It's a tricky method. You do a random walk around, doesn't matter. And then you take the square root and you get the end to the 1-fourth, which is the scaling of a randomly branched polymer. OK, that's also a recent paper from these people. Some of them we heard about the work yesterday morning. And what they did, they measured the radius of generation by fluorescence correlation spectroscopy. And Walter calculated for them RG from the same idea that we did. There are two methods, one of them is to take the square root of the secondary structure and the other one is a method that I'll show you soon. And you see good correlation between the experiment or reasonable correlation between experiment and the calculation. What do I have here? Now the question is, what is the origin of the difference between random sequence or random sequence RNA and viral RNA? Is it due to different fraction of base pairing? Is it due to different duplex lenses? Is it due to different order of the loops? By order, we mean how many like here? You see many duplexes emanate from the loop. So this is a high order loop. Herpins are 1, Balges are 2, et cetera. And this one I don't know which is degree 5. So we wanted to check this. And Aaron already did this calculation. I'll tell you shortly. None of these parameters is very different between random sequence and viral RNA. They have about 65% of base pairing. Duplex lens is somewhere between 4 and 5. And even the energy, the energies are also very similar. So we concluded that most likely it is something that has to do with the branching pattern of the molecule. What do we mean by the branching pattern? To analyze the branching pattern, we made a simplification. We take the secondary structure. We map it to a tree graph. It's very good, very coarse grained. But you'll see it works nicely. So we assume that all these loops are flexible joints. And all the duplexes are stems. Very simple mapping. Just we want to understand the qualitative behavior. So there are some simple relationships. So by the way, the average degree is always 2. Well, it's 2 minus 2 over L. L being the total number of loops. It's about 2. LD, I will use it, is the number of vertices of degree D. Here you have 1, 3, degree 3, degree 2, degree 1. Let's see what we get. These two guys have different degree distribution. And obviously, it doesn't take much. It takes a different conformation, assume freely jointed chain behavior. And obviously, this one has an RG, which is smaller than this one. That's OK. Look at these two guys. They also appear to have very different RGs. However, if you look at their degree distributions, they are exactly the same. I made them to be the same. You have eight hairpins, no second order vertices. You have four, one, two, three, four of order three, and one of order four. Yet the RG is obviously different. If you don't remember what RG here, it is reminded here. Here is another example. Three graphs on the left and right have the same vertex degree distribution, but very different RGs. So the difference between viral and non-viral three graphs is not simply the vertex order distribution. What can it be? Here is a three graph corresponding to the secondary structure. And we take the square root. Remember the MLD, take the MLD first. It was scaling two thirds, assuming ideal behavior. You get that it is scaled like one third. One third is like a collapsed polymer, but remember, we ignore extreme volume in this case. But the scaling of RG that we get for the random RNAs is n to the one third, not n to the one fourth like randomly branched polymer. That's another elegant way to calculate RG for a freely jointed chain. Here is our RNA. We map it to a three graph. And this is called the Kramer's formula. What you have to do is very simple and elegant. You take your polymer. You cut it everywhere. You count how many you have on one side, how many on the other. You multiply one by the other. You do all divisions. You average. You divide by the total number of square. And that gives you the root mean square RG. We call it the Kramer. We used it to calculate RGs. And what you see here, again, now the RGs. And frankly, I forget whether it was, I think, one of the, no, I forget. This is a function of loops. The number of loops scales linearly with chain lengths. May sound strange, but it can be proved. You see that that's the random ones? And the scaling is one-third, both here and here. These are the viruses. So we wanted to understand the one-third. By the way, I didn't tell you, but we get exactly the same scaling by a very simple model. What time is it? Because I want to get to the more important stuff. In which case, I may skip this one. I don't understand you. Oh, that's wonderful. I never have enough time in my talks. Anyway, we suggested a very simple, we wanted to understand what we played with all kinds of models. One model, which we called SFM, it was first a simple folding model, but then it's not nice to publish a paper with simple, so we call it sequential folding model. It is indeed sequential. If you want, you can even call it fractal, if you want. Take the RNA. And what we suggested, take the longest duplex you can form. There can be more duplexes of the same length. You pick one randomly. So you form a duplex. Now you have two loops, and you keep repeating it here, here, here, until you get the final structure. You stop when the loops get too small to form a duplex. So using this very simple model, we generated structures. By the way, they look, you must admit, they look not very different. So this comes from M-fold, or RNA-fold, and this comes from the same sequence, SFM, we did many of them. And even the scaling is similar. It's not exactly one, so both of them, but very closely, within error bars. And I can also tell you that this simple sequential model gives correctly, you can, in an approximation on the simple model, when you always divide into loops of the same size, you can analytically calculate the average duplex length, the average base pairing fraction, and so on. And it is like random RNAs that you do with RNA-fold, and NM-fold. However, so the scaling is also the same. What unfortunately is not provided by the SFM is a one-third scaling. We get it numerically. When Littai took the sequence and started folding it, using RNA-fold, calculating RG, we got the one-third. But from the analytical approximate formula, we couldn't get it. It's something different. The mathematician in Jerusalem, he likes it, he's an expert on combinatorics and graph theory, and he's now very much interested in trying to help the game. So the next question we ask, is it a randomly branched polymer? Sure, that's a elaboration on your model. Not in your book. I mean, not in this form in your book. I do it for laymen. It's a very nice idea. It's not mine. So I'll tell you, you take the randomly branched polymer, a K-Littery in this case. You pick a point. We want to calculate, first of all, the contour distance. If we calculate the contour distance, then we take the square root and we get RG. So to calculate the contour distance here, what they suggested is take a random point here and start moving either away from the point. This is in red, or coming back into the point in blue. And altogether, you have two n steps if you have n bonds. And that's like a random walk of length to n. The distance, you diffuse two n steps. You go square root. So you get the square root for this length. So this length is square root of two n of n. And then to get RG, you take another square root and you get the one-fourth. Very easy way to get the one-fourth at Stock, Meyer, and Zimm did. I struggled with it. And there are papers by Degener and many others. But I think I found a problem here. It's so elegant, let's not look for problems. OK, you didn't give me anything. What? When you do scaling, I don't know. Ask Sure, I don't know. He's a polymer expert. Look, now we wanted to understand the difference. We wanted to know what's going on with a degree distribution. Oh, I have 10 more, no? It's plenty. The idea first came from a J who visited me in Jerusalem. And he discovered something called the proofer method. I bet most of you don't know about the proofer method. The proofer method is a mapping of a tree graph to a sequence and vice versa. You can go both ways. Take a tree graph here. And what you do, you label the vertices. And you start plucking leaves here. You see the leaves are green. And you start plucking one after the other. So you start with the one with the smallest label, smallest number, this one number seven. You see, here you can randomly label them. But here what we did is we took the smallest inside. And the leaves are all the larger number. In this case, we have 12 vertices. So you pluck number seven. Here you pluck number seven. You are left with this one. And when you pluck seven, you get three. So you put three. Now the smallest is three. You pluck three. And what you get is five. And then you continue. And you get the sequence. OK? Very good. You can go backwards. You can have the sequence and go backwards. But what is interesting about, oh, you see here, in the sequence, none of the leaves appear. Because remember, you pluck them one after the other. And what is left? You are left only with skeletal vertices. All of them appear here. And moreover, a skeletal, say, a vertex like this one, number four, which has a functionality of four, will appear three times, four, three times. It's every vertex of degree D will appear D minus one time, because it takes D minus leaves to be plucked until you get to it. So you see in the series, the series reflects the degree distribution. Like this four, which appears four times, is of degree four. Vertex number one appears twice. It's of degree three, where is number one? Here you see, one, two, three, et cetera. Altogether, there are n minus two elements in the profile sequence of n, n minus two. Because you are left with the last two, you don't need to do any more. Now, remember, this is a reflection of the degree distribution. Suppose you permute the sequence, shuffle it. It's the same degree distribution. But if you go from here to here, you get the sequence. You shuffle the sequence, you get another sequence. You go back to the graph, and it's different. So you take the original one, you map it, you shuffle it, you get another one with the same degree distribution. And then it's like cutting and sticking differently. So we did that. Let's not bother with this one, and that's how to go back from the sequence to the graph. We did it. Walter did it. And using the sequence, we also wrote a paper devoted to our friend Bill Gelbart in his first script, where we show also that from the sequence, you don't have to generate trees or anything. You generate sequences, which is much easier, right? You do it not for me, but for Walter. It's easy. And when you have the sequences, you can calculate herping to a leaf-to-leaf distance. You can calculate graph diameter, which is the maximum distance between leaves, et cetera. You can calculate the analog of LDs. You can calculate everything. And then when you have all these distances, you can use kramers, for example. Or you have MLDs. You take the square root, whatever you want to do. And you do the scaling. And look, you do the shuffling. And you calculate Rg. We did it by several different methods, Rgd. All of them, you do the shuffle, the proof of shuffle. But you calculate Rg by different methods. Graph diameter, there is a program in Mathematica, a built-in program. You can do, that's equivalent to the graph diameter that we got from the sequences. You can go the average leaf-to-leaf distance also from the sequences. You see, you go here to a very large number. The square root is about 20. So your three graphs are huge. It is slow because it's not optimized, but you can go to huge entities. And you see that no matter what you do, once you shuffle, in this case, you shuffle for what is it. That's RNA, like the composition. You take RNAs. You start, they have 26, 75, 14. That's the ratio of the degree of the vertices in a random RNA. Or maybe a viral RNA, it doesn't matter. And this one is another case where a randomly branched polymer, you start with equal numbers of hair pins, second order and third order. In both cases, no matter what kind, we have more examples. They all scale like n to the one fourth. In other words, they all become randomly branched polymer after you do the sequence. Yes. What? Rg? It is here. It is here. It is here. Is that in the square? Yes, it's hidden under this. Trust me. I looked for it yesterday, too. No, no, you're in good company. I think so. Anyway, so let me, let me, I'm almost done. No, I'm almost done to the 35 minutes. But listen, I'm done. But no, I can talk more. But I think it's a good point to summarize. So what I showed you, we took viral RNAs. They were compact. We randomized the sequence. They became bigger. They became bigger and they scale like n to the one third. We want to understand why it is n to the one third. Maybe it's not relevant for viruses. But it's a puzzle we want to understand. And so far, we don't. Maybe my mathematician friend, maybe smarter people, can understand it. But it has to do with the folding of the RNA. It's not a simple randomly branched polymer. It's something in the sequence. Yes, Robin? No? Yes, in the last. After I shuffle, because Roya is getting an important call from somewhere here. No, it's not, it's just that. I, it's a time. Robin, can we wait a second? You will be the first to comment. Throw it away. So I get 20 seconds more. Look, I remind you. The viral RNAs have small MLDs. You shuffle the nucleotide sequence. You get what we call random sequence or random RNA. They scale as n one third. Now we do the same if you do a synonymous mutation on the viral, but that's not what we did. Now you take the random sequence RNA. You first shuffle it. You keep the vertex degree distribution. And they become n two, one, four. They behave like a randomly branched polymer. In other words, the random sequence RNA are subset of the ensemble of randomly branched polymer of the same vertex degree distribution. Why? I don't know. So the conclusion is simple. RNA is not, well, I told you there were more conclusions along the way, but the final one is that RNA is not a randomly branched polymer. It's different. And why for random sequence RNA, RG is scaled like n to the one third and not. Otherwise, I don't know. It's a challenge. And I thank you for your attention.