 The third seminar for the series is that we are organizing with Misha Grumov and François Kepes which is called Fundamental Questions on the Amazing Logic of Molecular Biology and this series of seminars are about very interesting topics in molecular biology and all these seminars are linked to a book that we are currently writing with Misha and François here and it's a book which is about grand ideas and breakthrough ideas in molecular biology. We are happy, we are planning to write. Planning to write. And today we are pleased to welcome François Kepes. So François is a research director at the ISSB, so Institute of Systems on Synthetic Biology which he founded inside the genocular and he is a cell biologist studying and engineering and genome architecture using synthetic systems on molecular approaches and more recently he founded his own company, Cinevance, if you want to say a word about it. Thank you, thank you very much for the introduction. No, I don't want to say anything. I will not read the pattern for you that is pending. Okay, so as you understand from this introduction, the seminar aims at trying to show how ideas came about in molecular biology and I've really tried my best going in this direction. It's not easy, but at least in one case where I've been working since 2001, I can at least testify for myself and trying to be honest about how the ideas came about. And so that's what I'll try to recount here and because you are scientists, to avoid frustration in the end I'll tell you the situation now with this idea and project even though the main purpose is to indicate how it came about. I'll just make some sort of a final retrospective of where we stand with respect to this project now. Okay, so in the first part I'll really try to explain how and what were the influences, the people who brought something to the table that I used and then how I used it and then how I tried to handle the issue. Okay, so in this first part I'd just like to express one element that was pretty obvious long ago but has been increasingly documented. People look at DNA often as an inert piece of information that can be read by interpreting the sequence of ACs, GSTs along the DNA strands. This is really I'd say the minimal information we can draw from DNA. DNA encodes, and this we are well aware of, encodes the sequence of the macromolecules, RNA, protein and so on. But it also encodes the level of their expression and it encodes more surprising things such as how fast should the protein, protein, not the RNA, the protein, how fast should the protein be made at the beginning, a little later, later, later, along the sequence of the protein. This is encoded in the DNA as well. So here you see that we are already far away from the just sequence of nucleotides or amino acids in the RNA and the protein, the first item here. Yet this is still forgetting a little piece of information which is that DNA is actually chemical and physical. It has a physical chemistry. Why do you want biology or life organisms to not use this feature? Well actually it's not our choice. I mean we are the outcome of this choice. Evolution made it. Indeed DNA is also carrying information at the level of its physical chemical properties including the folding. And it's more on that side that I'll dwell today which is underlined here. Folding versus expression. So now to the influences. The first influence for the story I'll recount to you is about the local organization of gene transcription. Beno Mullerhill, now retired but at the time professor at the University of Köln in Germany. And also the writer of the first books and the first inquiries about medicine in the Nazi period. He inquired himself and wrote books about that. I'm not sure you know that. Beno Mullerhill was the first to show this little twist to the story of Jacob Monod published in 1961 which is here is the lack operator that's the DNA here. Where the regulator protein like repressor in that case will bind. And influence the level of expression that is the rate of transcription initiation of the downstream gene or operon. The lactose transcription unit so that's this 1961 story. Mullerhill brought an important twist to it by showing that even though in the traditional scheme there was one binding site for the regulator there were actually more than at least two more than one and that changes everything. Because it allows for what biologists call cooperativity or what physicists would call a sigmoidal behavior. When you look at the effect versus the dose you have an S curve instead of the simple one binding site configuration. We can reason with two even though there may be three four but the result is the same. We do have cooperativity and this he showed very very well. So the here this this cartoon is showing some interpretation of it. So first of all there is more than one binding site on the DNA for this protein which is repressing the expression of the genes. Second the protein has two hands to grab DNA. It's divalent again this is essential otherwise how do you have any cooperativity. Whatever it is a tetramer dimer octamer what is important is it is divalent. Okay and so on this cartoon what you see is the protein bound to two operators that is two binding sites inducing a loop in between. And he showed that if the physiological repression fold is 1000 that is when the protein is bound the level of expression is 1000 times lower than when it is not bound. If you touch any of the features that make it divalent you lose this you go from 1000 fold repression to 15 fold repression 15. So you lose a factor of 70 in the control of expression. Why should it be squared? It should be that big difference right? Two factors because it's squared it's not just right. Each of them is 15, 15 by 15 is more like about 200. Why 200? Well this will multiply by 5 or a factor is by another factor so it kind of multiplies it. Oh. Why have more? No but it's the same it's one protein it's really it's one complex protein only. It's not more it's not several proteins assembled. No there are two factors in the same behavior right? And usually in fact it's kind of multiplies no more than that. Oh so you mean you mean it should be 15 times 15? Yes. In terms of probability but with the 15 fold is really without any effect there is no loop anymore right? And perhaps also should we take into account that you're right on the numerics. That actually there is more than two binding sites in this specific case. There are four but the two others are weaker. That is possible. The exact indicate there must be another one. So anything he could do like for instance cutting this in middle making it more available or making the loop here intervening loop very very long. But even something more subtle works really subtle. So the physiological size of this loop is 93 base pairs that's here. Okay and you see the full repression don't look at the scale here is it's arbitrary unit but was 1000 fold up there. But if you remove or add two nucleotides two nucleotides then you are down to 15 that's here. Just look at the point take two points back words of 2 points forward and you are down. Okay so it's extremely sensitive to removing or or adding two to eight nucleotides to the different position. Yeah it's a matter of positioning respective positioning of the binding sites because the double helix is a 10 more or less 10 I mean nucleotide pitch. Yeah that's what you see here. Yeah if you if you remove 10 you're back up 20 30 you're back up I mean it's a little bit then more than 30 and if you go forward same thing. Actually can be used as quality. Here with this period 10 you can add again by the way. Yeah. I mean is it the only system that is. Yeah you can yeah and I said that there were there was a third binding site in reality here. It's not shown but it's at 401400 one base pair and same thing same thing. So with. So here this one I drew with the main binding sites is 93 base pair but there is yet another operator here at 401 base pairs and again that's like 39 turns or something. So that was a lot of that gave me a lot of inspiration. You'll see why in a in a in a little while. Besides some of these these it's difficult to read Vila and Liber 2003 actually. Misha if you remember we had we had organized with Alessandra and Paul Borgina a meeting in 2002. In Avery it was the first meeting really on systems biology in Europe and we invited Mulheril and Leibler and that's how they met. And then a paper came out in 2003 the next year. And another thing is Leibler is no longer working on it but Vila is still working on this story. Yeah so. That's one one thing. As a case it's like this. It's wonderful. Some examples. Yes there are many other examples. Yes. But it was the first one. How are you carrying out the situation is different. The distances are much longer and you have this notion of enhancers long distance enhancers and so on. It's slightly different. The principle I mean a physical chemical principle is must be the same but the details are different. No these ideas may have been very long class with a particular protein which can form the ring like that. So these extra mechanisms. Yes. But you mentioned about enhancers. It's another case. What about the repressor? Here actually it's a repressor. It works both ways. The strength of the control. Not about being an activator or repressor. It's about the strength of the control. How much you can bind and stay. There are different ways to tell the story afterwards in the interpretation at the interpretation level. So for instance Müller Hill would say the story like this. He would say it's a matter of local concentration effect. So if you have this loop of 2 93 base pairs. You have to imagine that of course you have an on rate but you also have an off rate. The protein is there. That's my two feet now. And that's DNA here. Sorry I'm not sure you see my feet but anyway. So you have the two feet on the DNA. Now there is an off rate. At the moment one of the binding sites may be unbound. I lift my foot. So now how much should I search to find it again? Well if I'm still bound with the other one I will search a small volume. If I'm fully unbound then I have to search the volume of the cell. That's how we explain this. Because we know that it goes on and off. So if you have only one foot on one site. Then once you get off you have to explore the whole volume of the cell. 10 to the minus 12 cubic centimeters. But if you are still bound by one foot. Then you have to explore 10 to the minus 4 times less volume. Smaller volume. Or if it's 400 or 1 then it's 10 to the minus 14. But if you increase it to 1000 base pairs. SCD in some of its experiments. Then it's 10 to the minus 13 cubic centimeters. It's not a big advantage. But this is a big advantage. In the first approximation. The probability of getting both feet at the same time away off. Is the square of the probability of getting one off. And so this will not happen. In practice it's like it will not happen. It happens to get one foot off. But it doesn't happen to have both feet off. I mean reasonably. Just because it's a square of the probability of the first one. So it's very improbable. We are talking about a specific binding. The protein will stay on average a minute an hour. So you see the probabilities are very low. So getting both feet off is really not probable. So the first piece of reasoning that I made at the time was. Let's go from intergenic to intergenic. So here we were talking about the two binding sites of one gene. The lactose gene. How about having longer loops and having binding sites. So this is the binding protein which is repressing or activating the regulator protein. It's divalent which is the case for almost all these regulators. And now how about bridging two sites that are distant that are not in the same gene. And how about from intra to interchromosomic. And these proteins which were bound on one chromosome by one site. Now binding another chromosome. However, why not. But besides some physical difficulties when the distances become important to get these two sites at the same place to bind the protein. There are other difficulties. And notably I asked myself the question how can the cells do this knowing that we have hundreds of sites for a given regulator to accommodate. And knowing that we have hundreds of different regulators each claiming for their own lives and importance and relevance and actions activity. It was difficult to imagine for me. But I'll come back to that. Another source of inspiration which was distinct was the case of the nucleolus in higher inner sorry in eukaryotes. So you have this nucleus in the cell which was prominent morphologists at the end of the 19th century using the optical microscope. We are able to show that there was a little organelle within the nucleus that they called the nucleolus. That was visible. Visible. With the optical microscope. No trick. Almost no trick. So it turned out much more recently. And one person who did an important work on that is Daniel Hernon des Verdins in Jusieux. So I mentioned her paper but many people have worked on the nucleolus. I mean it was shown that the nucleolus is actually not bound by a membrane or an envelope. It just exists out of its activity to summarize. It's the place where there is a lot of activity of making ribosomes. That is one of the major elements in the cell. So it's massive in terms of what the cell has to achieve is to make ribosomes. Another way to see the importance of the ribosomes is to notice that it's actually the point of self-catalysis. Because it's made of proteins and RNA but it's made of proteins and it's the thing that makes proteins. So it's the point of self-catalysis. So in a rapidly growing bacterial cell or yeast cell it's like a quarter of the dry weight. It's the ribosomes. Okay so I'm not yet at the picture. I'm coming to it. So it's just to have you feel how active is this little place where ribosomes are made and assembled within the nucleus. Now these are just fluorescent images of the nucleus in activity. I don't remember what was labeled there. I forgot in the meantime. But you can tell these are images of the nucleus within cells, within the nuclei. So I think it is significant that this has no membrane and envelope of lipid membrane or anything around it but was considered an organelle. It's just made out of its own activity, the activity that makes polymerizing RNA, assembling proteins with the RNA. It's still chemical. It's mostly proteins. Ribosomes, RNA and proteins. We already have half made ribosomes and proteins which make this much ribosomes in the cell. It's just proteins made by a ribosome in the nucleus but normally it doesn't happen in the nucleus. Exactly, exactly. No, you're right. So actually to tell the story more completely. So it's the place where the ribosomal RNAs are made and the messenger RNA that encode the ribosomal proteins of course are exported out of the nucleus are translated into protein and the protein are imported back to the nucleolus and there they are assembled with the RNA that was made and so on. So it's a huge factory really. It's very impressive in terms of activity. And so this activity without envelope, membrane envelope, without envelope actually, which self-assembles as needed for the work to be done was enough to be seen as an organelle by the morphologists of the 19th century. A little bead in the bead of the nucleus. Okay, so just to keep in mind that activity of making RNA and so on can be focused in things that are very active such as the nucleolus. So it's another piece of information again that I thought was useful in the sense that at the time it was not so clear that there was an equivalent for the messenger RNA. But then some morphological work came but at the time it was not clear. So it seemed to be the case only for the very active ribosomal RNA assembly with. So then what happened is then in August 2001 I was in Normandy by the beach and I was on holiday. And I decided because the weather was nice that I would do like everybody else I would go to the beach. So I took my little towel and I went to the beach with an umbrella. And no more I think I can't remember it was the 5th of August for sure. But I think no more than 10 minutes later I thought I had something to check. I thought that we had a problem. So I had in mind Mueller Hill and the DNA loops and their importance. I had in mind the nucleolus for the ribosomal RNA that I thought perhaps we could use as a model for smaller things that were factories for the messenger RNAs of anything. And I had in mind discussions with Victor Norris from University of Roi about hyper structures as well. But I couldn't see as I showed my little two my two little chromosomes. I couldn't see how the cell could do it. I had a little problem on that exactly how can the cell deal with this little bacterial cells with 300 different factors transcription factors regulators. Each dealing with between one and 600 targets without some minimal order in the story. So I thought it shouldn't work. Of course it was a total totally qualitative reasoning but without modeling quantitatively which no one would have been able to do at the time anyway. And even now I think it's very tough. I thought it could not work. And of course I know that bacteria are around and have been around for over three billion years. So I thought there must be a principle of organization that we have not looked at so far that should help. And the idea I had after these 10 minutes on the beach was that it has to do with loops but not loops at the size of what I showed but larger loops that would allow for some phasing of any chromosome and would allow to aggregate things that belong together a bit like the nucleolus. In two transcription factories hyper structures but in the case of transcription right so transcription factories. So I had at the time I had so I went back to the flat immediately and I had my computer I had some Excel sheets with the data that were published in 2002 about the yeast transcriptional interaction. I had the data in an Excel sheet. So it's simply a chart of yeah it's a it's a simple chart of says that for it was for yeast in that case says that for this transcription factor one this is the list of targets targets ABC and so on. And for transcription factor two and we had assembled that and we published it in Nature Genetics in April 2002. That's all that's what I had on the Excel sheet but I also had the sequence so I knew where where I knew. Affinity but what I had the same or in the time you see the affinity. Sorry. Affinity forget it. It's there is no. Yeah exactly. There is no weight on that. So position on the chromosomes we had the positions from the sequence in 1996. Yeah. So we saw. Yeah. So there was on an Excel sheet and experimental experiment. Yes. Yes. This was my experiment. Yeah. No, no, it was based on experiments. Yes. So these four at the time I think 54 of the 300 transcription factors and this list lists were not complete but we're already long. And so I use my my Excel sheet and by midnight I had the first result and I have to say that it took. It took months and years to really work out this properly and I'll say a few words about that later. But yeah. So, so this. Find a pattern there. I think the point is to find a pattern. The logic. Yeah. Find a pattern. Exactly. Exactly. Exactly. And there was a pattern. But my tools were rudimentary and the data were still not complete and lots of problems. But it seemed encouraging already by midnight the same day. Yeah. So, so here is a sort of an implicit lesson for the youngest of you in the room that it's important to get bored sometimes. Actually. So you're expected to have something like nuclear. Nuclear. Yeah. It's the same civilization. Yeah. Close one. Close all this. Yeah. It's a pattern that would simplify the work of the cell as pattern on the DNA on the chromosome sequence that would simplify the work of the cell in order to organize itself internally for the purpose of doing all these things with 300 transcription factors each repressing one to 600 targets all at the same time. Close one. Right. Localize. Yes. But how do you, how do you do that when you have these many transcription factors and these many targets and the total of 6000. Yeah. Yeah. No, I'll tell you the pattern that I found. Yeah. Sure. No, it's actually simple in a way. But when you get to the details, it's actually complicated, but it can be captured simply. So then I had this view that if we could, if the chromosome was organized in such a way that genes belonging together, let's call them co-regulated genes or co-functional genes could be together in space, in the space of the cell, then I would get the best from the world of Benominal Hill and the best of the world of the nucleolus. And here is a second factor with targets as well. And there are more, of course. So there was more or less how it boiled down to this view. And in a very simple minded way, the pattern I was looking after that I imagined for a while was of this type. So imagine solenoid. The DNA is coiled. Imagine you have periodic positionings of the genes, the green genes that belong together, of the blue genes that belong together. And then you get some sort of a solenoid where if now you compress it in your mind, if you imagine that it is compressed, then you do have local concentration effects for the blue. Local concentration effects for the green, but you don't have crosstalk between the two systems because they are actually far away on the solenoid because they have a different phase, right? And this is important as well. You don't want to have too many green guys in the area of the blue binding sites on the DNA because they will bind. So the on-rate is the same whatever the factor and the binding site. It's the off-rate which is very different. So the on-rate, so because the on-rate is the same whatever the segment of DNA we are talking about, then crosstalk is a big problem that the cell would probably avoid crosstalks between things that should not talk, crosstalk. So one way would be to use the phase. And that predicted that if I look at... Phase meaning what is phase of what? No, phase is... Don't... Nothing specific. It's phase of phase. Yeah, it's just a different place on the solenoid in phase view. It's all. Do you know what that is saying for Ease, how organized DNA is? Because chromatin organization like for all... Yes, yes, yes, yes, yes. Yes, but I... It's a great period to be in. I'm not addressing this issue because I don't know really how to... How this could fit with the chromatin organization. There is no contradiction, but I don't know how to... Draw that... This is what you say is... Yeah. It's verifiable, right? Yeah, yeah, yeah. So, yes, sure. So actually the current situation, a few... So some words... It's simple. For example, if there are... There is some great effect that there is... There is a general mass partial and it seems close together, right? Yes. At this state. Yes. And this is, of course, verifiable. It's verifiable. Experimentally, you can check. Oh, sure. It's very interesting to make. Yeah. Sure. So the situation now today... So it all started in 2001. The first papers were published in 2003 about that. And currently there are about 26 papers on this topic, mostly for... No, all from the team plus there are maybe two papers from outside the team about this topic. And it's a situation. So what do we know in brief? So this was just to present the idea. It's very naive, very simplistic. The patterns are actually quite complicated. But the notion of periodicity and proximity holds. Even though we have sometimes several periods that overlap, we sometimes have regions that do not fit. We do have transcription factors that agree with the same period. We do have transcription factors that have different periods that are not reducible to the first one and so on. So it's actually a bit more complicated than what this view shows. So what happened is that we had more and more data with new transcriptomics methods at time. 2001, 2002 with so-called chromatin immunoprecipitation and so on. So that's the team. In some sense after that I should also say besides the papers that a couple of European grants were granted for this work and so on until now. Okay, so that's the current view now. The current view is do not forget this. Do not forget the physical, chemical aspect of DNA. The cells have not forgotten it. They take advantage of it. So it is a good idea to consider these three elements at once rather than as people do usually to consider them by pair. But the three of them at once. So what are they? Genome layout simply means the respective positions of co-functional genes, genes that belong together. Maybe co-regulated, maybe encoding proteins of the same complex or other. Co-functional gene, a respective positioning of co-functional gene. This is the expression of those genes. And finally this is the conformation of the chromosome, the physical folding of the chromosome. And the idea is that it becomes important. So what we have to demonstrate and was in part demonstrated in the meantime is that the genome layout changes the folding. We showed that with a biophysical approach. I'll just show two slides about that later. Then that the chromosome conformation allows factories to be made for messenger RNAs. In addition to the nucleos, you'll see the data that this changes the expression is more or less the work of, first is the mass action law, of course, for all the chemists. In the case of transcription, it was shown by Murahel and others later. And finally there is another issue. How do we see patterns in current genomes, patterns that extend over the whole chromosome? And this cannot be a random effect, right? There must be some selective pressure. And so there must be some selective pressure to explain why these improvements to the control of gene expression, lead to such a strong patterning of the genome. On that side, little was done. We did publish in 2008 a paper where we showed that it's a model, an evolutionary model, which seems to say, yes, indeed, that effects on that will regularize the patterns of the genome layout. But I mean, it's just a little evolutionary model. Yes. Did you try to do the correlation between the expression level of the transcription pattern and this kind of localization? If the expression level is high, there is enough to manage the different ones. If it's low, then they should be more close to one another. Did you try to do the study of relations? Okay. So you have to remember that we do have these kind of relations, but we do not have weights generally. We don't know how strong are the binding and so on. But one element of information that goes in your direction, which I didn't want to say, but it's a bit complicated, the complicating the story, is that in bacteria, not in yeast, not in other eukaryotes with the nucleus, but in bacteria with no nucleus, there is a mechanistic coupling between the transcription, the translation, and even the insertion of membrane proteins into the membrane. This has been known for a long while. So it means that before the messenger RNA is finished, it is already being translated into the beginning of a protein. And before the protein is finished, it's already inserted into the membrane if it is a membrane protein. And so to get to your answer in any direct way, because we don't have... Before the messenger, messenger is not finished, that the protein is already started to... So because of this mechanistic coupling, I asked the question, will the gene, now not the protein, but the gene that encodes this transcription factor, be positioned in a neutral way? Or be positioned like its targets in register with its own targets? And the answer is in yeast, in eukaryotes, it's positioned randomly. In bacteria, the gene encoding the transcription factor is in register with its targets, very clearly. Sometimes it may be because this gene is actually under the influence of itself, of the protein, of its product. Some other times it is not the case, and it's still in register. So think of it in kinetic terms, and again, sorry for being qualitative, but this is sometimes how you actually understand things at the beginning. Sorry for being qualitative. So this gene encoding TF1 is there. Imagine there is a stress to the cell, for instance oxidative stress. You have a little thing, you use H2O2 to disinfect it, immediately those bacteria will be under heavy stress. Now it's a matter of seconds for them to react before they are killed by H2O2. So imagine that this is the TF, the regulator for the response to oxidative stress. The gene is immediately induced. Whatever the reason, I can come back to that, but it has nothing to do with my story. It is induced. So it takes a few seconds to make the messenger RNA, then a few seconds to make the protein. And then here is the protein. And the protein, remember my mechanistic coupling? The protein is made close to the gene in bacteria, and this has been shown long ago, 1970. And again in 2010. Close to the gene encoding. Close to it, because of the mechanistic coupling. The protein is made, the RNA is still attached to the DNA. So this has been well shown. So now the protein is close to the gene. And now if there is nothing special about the organization, it will sample the DNA of the cell. It has been calculated that given the on-rate, which is not specific again, it will sample about 1,000 sites before it finds one specific one. That is a target, where it will stay. 1,000 wrong sites with, let's say, 1 tenth of a second each. That's 100 seconds. It's too late. Now if it's made close to its gene, and its gene is close in register, close in space, close in space, to the targets, we don't know how much it will sample, but less than 1,000. It's already close to the targets. It will sample, but it may find the right targets after 10 attempts and not 1,000. I don't know, maybe 10, maybe 100. You save 10-fold or you save 100-fold. I don't know, but you save. Qualitative reasoning, sorry. You save something. You save time. So that might be a reason for these genes to be in register with their targets, even when they are not under their own regulation. Anyway, it's an observation that I made in 2003, that they are in register with their targets, in bacteria, not in eukaryotes. So as you see, it's an indirect response. I cannot say more. Sorry. So is it in bacteria, but not in eukaryotes? So in yeast, we do have this phenomenon with patterns here, but the gene encoding the factor is not in a special position with respect to the targets. It's not. Because you are talking about oxidative damage, right? Yeah. So usually, you have the loops, right? So you have a chance, right? And so whereas this is a damage, they make damage that can occur more easily. Because it's a... Because it's a charm. You could argue that the proteins are protecting the DNA as well. Right, right. So probably there, we should check for the repair factors. DNA repair factors. It should be localized whereas it's... Because, understand, one thing is this charm. It's easily... It's broken easily. It can be broken easily. Just easy. So for the oxidative damage, the targets would be many things. One would be something to destroy the H2O2. Another one would be to repair the DNA. Another one would be to avoid that the lipids get oxidized. You have lots of things, but they all fit in this... In this scheme, actually. So now briefly, just to give you a little update. Okay, so I explained this. This notion that on top of the Jacob and Molo view from 1961, that is a view from the gene. I'm a gene. I see the transcription factor come, bind to me, change, influence my initiation of transcription rate. There is this immuno-heal view with short loops internal to the gene, to the same gene. Now with the notion that we sometimes have factors, regulators that agree on a certain pattern, they have the same pattern, we think that there may be a view from the cell on top of the others. A view where you see from the point of view of the cell that there is an overall organization for all of transcriptional activities in a live cell. That's the idea that has not been fully proven, but we do know a couple of things. So what do we know, actually? So we know that from many morphologists that a transcription of messenger RNA, not ribosomal RNA now, which are the nucleus, actually you see the nucleus here in green is the activity of transcription. Red is just the background. So nucleus is here in this human cell nucleus. But you do see 10,000 dots. I mean, don't count them. I invited Peter Cook one time some years ago to talk about that. 1,000 spots of activity, transcription factories that are for the messenger RNA, not the ribosomal RNA, which are in the nucleus. But you do also see patterns and spots and dots in bacteria as shown here. That's one aspect. Another aspect is this Mueller Hill story, which I explained. It is exquisite sensitivity of the transcription control over tiny changes in the loop sizes. And third is what I explained. I did just look at transcriptional interactions and given one transcription factor, look at the positions based on the sequence. Look at the positions of the targets and then to some simple math. And less simple math, now we elaborated in 2010 a new tool to deal with biological data because you'd say, okay, you're telling me that there is some periodicity. Okay, so let me use Fourier analysis or microwave-led analysis of something derived from Fourier. It typically doesn't work because the biological data are, okay, as we know that. We devised a measure which is based on information theory, which is rather simple, but has the advantage that it gives a bonus not only when we have several genes coinciding on the solenoidal phase view, like here, but also when there's an exceptional void or an exceptional high-density of genes will get some bonus in the scoring system. And this has been published. So I am just to illustrate the path that we took and don't want to get into the data. It works very well and much better than Fourier on real biological data. It gives peaks at given periods like 9,510 nucleotides and also double, triple, quadruple. So the topic can be filled in and on linear. So Fourier analysis doesn't work. It's not energy, but it's still an instrument. It's a communicator. Yes, so I wanted to skip that. But see here, we took an example where there are two periods. And for each period, we have several points. And then Fourier simply crashes totally. This is minus noise. This is plus noise. Our method works. Yeah, sure. So and this 10,000 base pair is reminding us of the so-called microdomains of the bacterial chromosomes, even though we didn't show that has anything to do, but it's the same size. Again, besides the microdomains, there are macrodomains in bacterial chromosomes with the origin of replication, determinists of replication and areas that do not interact much with the others, but do have lots of interactions internally and not much externally. And actually the patterns that we see fit. So for instance, the borders here, which was determined by biochemistry in 2004, that's a paper show here, are found by the fact that we have this period of 9,510, centering above the origin of replication and taking up half of the chromosome. And then we have another period, 7 kilobase pairs, which is significant and which has these two boundaries. And actually, so it was fitting the macrodomains except here. So we talked to the authors of the paper and actually they said, no, since 2004, we changed our evaluation. We refined it and actually it's exactly there. So you see that the macrodomains and the microdomains might have to do with this. Here again, it's just for illustration, to avoid some frustration. This is proximity on the chromosome. The genes are next to each other. And this is periodicity. That's when genes are periodically disposed and it can be interpreted like this. So without going to the details, looking at all entero bacteria including Escherichia coli, the proximity phenomenon is lost after or is no better above about 20 kilobase pairs of distance. That is, you know, you are the neighbor of you and you are the neighbor of Fer and so on. But at some point, I mean, are you a neighbor of Nazim down there? So the answer is here. The answer is besides 20 genes in a row, it's not really any more proximal. Just to the contrary, starting with 20 genes, we have periodicity in the groups of genes that are bigger, bigger than 20 genes in the group. And all together and that's more important perhaps. So we have size effects, right? More importantly is, if we add up all periodic genes for the Escherichia coli, for instance, we get 500 genes, that's 12% of the protein encoding genes. They function in the synthesis of DNA RNA proteins. That's one aspect, the functional aspect. If you look at the transcriptional interaction map, they are in the core, they are most connected. But very important for the biologists, we looked at all the sequenced genomes of bacteria. We pushed these 500 genes, we looked at the best homologues in all the 800 sequenced genomes of bacteria, and we found that the orthologues were periodic as well, in all bacterial fila. So it's a ubiquitous phenomenon and I will not show in detail. What is the phenomenon again? The phenomenon is the same. Just the collection of periodic genes in Escherichia coli, if you look at the homologues, they are periodic as well in all other fila. I think it's homologues in particular, they are periodic here and periodic there. Exactly. Orthologues is a way to say in brief, homologues, and in principle the real homologues, based on some by informatics criteria. Okay, so it's true in all bacteria. It's true in one archaebacterium, we looked out of one, and it's true in one eukaryote out of one, which was this, because the first one actually we tried. That's all we know. We don't know more. So far. Is it the same period? Sorry? Is it the same period, the same rank? No. No. Not the same periods. It's always in the same area, but it's not at all the same period. It's just in the portrait. Yeah. Yeah, in the end what's important is the principle, and now then you can realize it with different periods, which are in some sense arbitrary, and if you look at cousin bacteria, you'll find something similar, but if you look further away, it's going to be not similar, right? It's an arbitrary question, I mean, this question. Okay, so what time we had one hour? One hour, so I'll go a little faster. I'll skip this and just say a little word about the biophysics. Okay, so it's Monte Carlo Metropolis model of a polymer where we have the notion of energy for the binding of the protein onto the DNA. The protein can grab the DNA with more than one hand. That's very important. And we also have a certain flexibility of DNA. We took the physiological parameters. We have the notion of flexibility of the persistence length of the polymer. Given this one model, one of the key results we had in 2010 was that if you take a strand of DNA and you put size four, let's say red, green, yellow, blue, red, green, yellow, blue, red, green, yellow, blue in a sort of a regular pattern with four equivalent of transcription factors, four regulators, four colors, then you get something which certainly is not the solenoid, this is polymer physics, but is going through these colors one after the other in a regular way. Now, if this is not shown here, but if your DNA is long, if your polymer is long enough, then you get anomalies. It doesn't go circular like that. But with a short DNA, you get something rather solenoid. Okay, that's one aspect. Now, if you randomize the site positions of these different colors, this is an example, you get two differences, two major differences. One is some genes, some dots, will not be accommodated. They will not be in groups with the others. So you would predict a weak, weaker transcriptional control. Second, all colors merge in the middle. So you would predict a high crosstalk between things that perhaps should not talk. Whereas here, without putting it explicitly in the model, we got separated colors, no crosstalk. So that's the key result. And now, I mean, we did a lot more, including on mammalian DNA. We used other approaches like it's a long-jevon type of model here. I'll skip that, really. It's not so important. So conclusions so far. First of all, and that's mostly our work, co-functional genes tend to position periodically in all microorganisms so far looked at. And second, I alluded to it. I didn't show it. The gene spatial clustering favors periodical positioning in an evolutionary model. Third, if you do have periodical positioning, it will favor clustering in factories, in space. This really we have been working a lot on. And also, as I just showed, the Sorenoidal Organization of Chromosomes. And finally, work from others. Gene spatial clustering optimizes transcription regulation, Mirahela and others. Transcription occurs in focal points. Focal meaning what? What do you mean by focal? Just gathering in the same places. Remember the morphological data? So that's what I meant. So these are the conclusions. This is how we understand the system now. And I'll finish in the last minute by saying that these are observations and so on. But observations that are precise enough and bear on genes where we can say this is the name of the gene. Is this gene doing this? Precise enough that now we are envisioning that we can engineer genomes by applying the principles of natural genomes. One of them being this. Okay, yeah. So we have work ongoing on that. We are rearranging. In answer to your question, I said that the gene encoding the factor in bacteria is in register with those targets. We have moved the gene. We are waiting for the results, currently. We have moved the gene that encodes the factor. We expect to see the effect. And many other experiments are underway actually. What kind of effect do you expect? We think that the efficiency of the transcriptional control will be weakened. And this we can measure in principle. And so bacteria will suffer. They will suffer. They will suffer either just like that or under pressure with some stimulus or stress. You can get some similarity. If you want to make, for example, a premature model, a model of a bacteria machine. The major problem was to eliminate undesirable interactions. This is exactly what happens here. We eliminate undesirable interactions. Yes, we think that it's also important. Yes, I agree. It must be important as well to eliminate unwanted. It's perhaps as important, perhaps more important. People try to do the most costly treatment problem. It costs about $10 billion. Because you need time. And it's exactly to plant a figuration. Yes, because there are examples that were known long time ago and unexplained. There is a factor in our cells that responds to glucocorticoids. And it not only responds but is the same molecule that actually influences the transcription of some genes. People were very surprised that it does not bind and influence a gene that has a nice binding sequence in vivo, in the physiological situation. But it does so on one that doesn't have a good binding sequence for the glucocorticoid receptor. These types of things are unexplained unless you accept that the binding sequence is one thing. But it's also relevant to consider the position with respect to other things. And so it seems that it's one of the tools in the toolbox of the cells to avoid undesirable interactions. To play with the respective positions. And it remains in my cooperation, so square, so to speak, 0. Two opposite things there. And they changed the medicine. These are huge facts. So if you arrange all genes, they're probably dead. So we are at the verge. I mean, people have been neglecting this phenomenon. Now, I didn't have the time to show some data, but we see that it's an important phenomenon. Quantitatively very important. And so just to finish, because we can have questions after the break, and I think we are all sweating, is now that we understand one more thing about how to organize a genome that works, that functions. We are using this information to try and engineer desirable features on chromosomes, on genomes, on bacterial genomes that can be used for different purposes. Be useful to produce a drug or the precursor of a fabric or things like this. And this is the reason why we founded this company called Cenovonset. You mentioned to use this principle, apply this principle, but also other principles that were known before, that are constraints on the architecture of the genome. But it can't be that the correlation is what is known about chromatin structure. So chromatin is the nugget after all, right? Yeah. So we don't know. I mean, the only case where we touched upon that when in the first paper in 2003, when I worked with yeast, all the other work has been done with bacteria. And for yeast we don't know how to articulate these observations. I think it's at the shorter scale than the chromatin story. Not that the nucleosome is smaller, yes. So it's at the larger scale than the nucleosome. But it is a smaller scale than the long-range organization of chromatin. But these are just words. I don't know. It's the honest response. In transcription factors, in the promoter regions, there are not many nucleosomes. And not heavy chromatin in the promoter regions. Yes. So it receives proteins. Less than in the open reading frame, even less. Less than in the open reading frame, yes. Yes. Okay. So maybe we'll take a break. Okay. I see. I was asking about possibility of modeling here. That you try to model and make some quantitative questions. Partly we can do it by, of course, analytically, but partly by modeling. True. So you could model from scratch. You could say, okay, this, give me, give me some information or model from scratch. Some people have been doing that. They didn't bring much new information. Another option would be to say, give me the average distance between the partners, which we can give with our Monte Carlo model. Really, we give distances, real distances. We cannot give time because it's Monte Carlo, but we give distances at the best equilibrium level. Give me distances. And I will model in order to give you now the level of activation. So if this is the DNA binding site and this is the transcription factor protein, I'll try to tell you how much this will activate or deactivate or repress the rate of initiation of transcription. Quantitative rate. What is rate? Quantitative rate. Rate. Quantitative rate. How do you measure it? Do we measure it? Do we measure it? What it is, how do you define the rate? So it's a number of initiations of transcription per minute or per second, per unit time. How many per second, for instance, how many times will this initiate transcription? That is what is controlled by the transcription factor. And this would be valuable. Now, on a more analytical approach, so this would be simulations. And before I forget, one of the difficulties is that we do not know the parameters well, so you could search for the parameter, but then you'd like to know the value in some cases to calibrate your parameters, and this is difficult to measure. There are a few data, but not many data on measuring this. This is the difficulty. For an analytical approach, I would suggest that he'll, 1910, could be a starting point. Meaning, what kind of vector field? It's mathematician field? Yeah. But I don't know how much. I never met him, unfortunately. I don't know how much. And the paper... What kind of vector is that one? Which field? Well, what I can tell you is biochemists use his work in the following way. And this doesn't tell you much about... Sorry, but I looked into that probably eight or nine years ago, and I don't remember the specifics. What I do remember is how biochemists use this paper. Biochemists. So any biochemist, he proposes a way to... I'll look. He can? He doesn't want. I see. Okay. His work is used for the following. Consider how? Very nice. Thank you. Consider a beaker with liquid and a semipermeable membrane here. And consider a solution of two types, one that can cross and one that cannot cross. One that cannot cross. Say a macromolecule that cannot cross this semipermeable membrane and a small salt that can cross. And put them all here. And then so the macromolecule will not equilibrate and the small molecule will equilibrate. Imagine you have a way to measure how this distributes. And, oh, sorry. I forgot the most important thing. These small molecules can bind the big molecule. And these small molecules, so to measure it, what do you do? You make it radioactive or whatever. Fluorescent, radioactive, something. So the small molecule binds to the big molecule and you can follow it. Now measure how much small molecule, small molecule, macromolecule, measure how much small molecule is in this compartment and how much is in this compartment. And then you can use Hill 1910 to find out the two things, the concentration of the big molecule, but also how many small molecules bind per big molecule. And don't ask me more. Sorry, I forgot fully. At that time we had a mathematical physicist in the team and a German guy was pretty good. And I tried to talk him into trying to restart from Hill 1910 and following up to obtain this type of information. Not this, but another application towards this result in an analytical way. And in the end he shied away, he didn't do that and then I forgot all the details of what we had been discussing. So don't ask me more. But I can dig up and see if I can find again. What was the IEDL? But here I meant a very simple kind of issue. When you have a kind of round of walk and in the size when you turn it, it terminates. And how much depends on distribution of distance from the way you start. This is a kind of elementary, kind of homogeneous space where you can easily do that. Because here is kind of more complicated. But this first I would make a simple computation and see how much it helps. In two spaces, in this space, but where it would be different answer, where it might depend on the dimension. But there is computation in this. And actually I can think of a mathematician who can easily do that. Okay, we still have to find the right person. But I'm open. I mean, of course. Okay, let me know. Thank you. I have some questions. About this periodic genes in bacteria, are there more assumptions and more concerns than genes which are not? So, I cannot answer directly, but indirectly, yes, these 500 genes from E. coli are involved in two categories of roles. Two roles. One is spatial organization, whatever it means, but it's funny because we start from a hypothesis on the spatial organization and we end up with a list of genes that are labeled spatial organization, funny. But most of those genes are involved in macromolecular synthesis, which means synthesis of DNA, RNA protein, which means they are very central to the life of the organism. One subhypothesis could have been cells had to optimize the expression of genes which produce a lot of products, massive products. For instance, we mentioned ribosomes. Okay, so then it would mean that they have to massively produce protein altogether and also massively produce some RNAs, like the ribosomal RNAs. So you'd think these are the massive products of the cell and that's where you need more optimization. Not, not because it's not the case because we also have DNA synthesis in the story and DNA synthesis is not massive unlike RNAs and protein. It's not massive. So the criterion is not, the criterion is not the fact that it is a massive production. But indeed it's DNA, RNA, protein synthesis is among the top roles of the genes that are periodic. So to evaluate what exactly, the importance, how many genes and importance you should put into the model to understand some interactions. It's a difficult question because it would not work if it was not for DNA synthesis. It would not work if it was not for DNA synthesis. It would not work if you put in four or five genes because it's a collective phenomenon. It would not work if you reduce the genome to a small size because it needs several loops and the loops cannot be small because of the lack of flexibility of the polymer DNA. So it's not an easily reducible case. You cannot say, I'll work on two genes and see how it works or even five genes. You need a genome scale pattern. In the hundreds of genes. Hundreds of genes. No, but maybe some kind of approximation. It's a huge thing but you can use for example, a cluster, an operation with this cluster. So it's not maybe modeling and point by point. Some kind of evidence of a certain looks. It's such huge. It could take one thousand. One thousand square. And it's my one million. It's not computerized. Then I should play with this parameter. It's a good data. I would like this to be more. Yes. Usually we would use more computations. No, but I'm saying that you don't have to do it point by point. You can have some vision. Immediately use computations. You have to first invent a class of models and think about how they work and then do it. Actually, I forgot there's a third approach which would be generalize generalize Vila and Leibler. Because Vila and Leibler Vila and Leibler Stan Leibler Stan Leibler In this paper, they deal with the case of two sites. Two very close sites. So not only are we saying that the sites can be far away along the 1D sequence of the genome but we are saying that it's not two. It could be 200 sites. And this model works for two. Intrinsically, it works for two. It doesn't work for three. But there might be a way to generalize their approach. Again, it's something that we were discussing in 2007-08 with this German mathematical physicist, Tim Wolf. And I don't remember the specifics but it didn't look to be an easy thing to do. The way he described it was so many. There may be a very simple mathematical argument and immediately we will show that he has the tremendous effect. In two issues where it's kind of clear, it's exactly multiply three. When we multiply numbers, immediately the effect becomes very strong. Together, so it's not one pair but many times the effect multiplies. It might be a very huge effect. Yes, yes. I agree from my intuition. I mean, your intuition is probably of a different nature. And furthermore, we have to find a person who can deal with this and will be motivated to deal with this. I'm ready to talk to the person to dig out these previous ideas as well and I suppose that it could be done but not me. Okay, so should we stop? Okay. No, it's so kind of tantalizing. Formalizing a rough combination of that. Of course, to make it interesting we have to know data, something more information, for example. When you have this size, you have kind of correlations. Then you may try to use it by by clustering analysis of that. There are kind of routine feelings that can be connected. For example, what these genes, what these proteins can experiment with. Of course, you can make a very dangerous mathematical model but you miss some crucial points. Sure, sure. You mean in simulations? In simulations? We do have some reasonable orders of magnitude for the energy, the binding energy for the polymers of the polymer DNA. We do have that. This is a very, very remarkable thing. It seems to be a huge factor. It's not just one. Yeah, because it's not just two. It's like 600 and they can exchange in a dynamic way. I leave on that side and then I can go back to the same or go to another one. Evolutionary, again, how do you estimate the structure being organized? It's a high-scale structure. So how evolutionary it should work? We had this simple model which basically said genes belonging to the same list, sorry, sites that belong to the same list will how is it will be subjected to an attraction so it was not very realistic. It was more like distance to the minus two type of gravitational or Colombian attraction which we thought would accelerate the reaching of an equilibrium and we had to be fast why? Because it's an evolutionary model so evaluating one chromosome for its fitness should not take more than hundreds of a second because we have to evaluate many and we have to go through many evolutionary steps. So we had two timescales where the ontogenic timescale that is evaluation of the fitness of the genome actually made by saying so we have let's say 200 sites and after folding it we measure all pairwise distances between these sites that's what we did all pairwise distances and we take the average and that's the coefficient of fitness actually it's the inverse which is involved probably the major it's not just point mutation it's also the scope that's what we were doing I didn't say that because that's the short timescale we evaluate the fitness of the genome then after evaluating 100 genomes we took so we had the fitness coefficient which was the inverse of this number of the average distance pairwise distance we take the best now we go one step further in the evolutionary so it's a different timescale longer timescale now we used variation operators exactly what you said so these were only of this type choose randomly two sites and a third site where you insert your transpositions we had only transpositions so we make from the winner chromosome we make 100 or more 100 copies using a random transposition and then we evaluate them and so on and so forth and we keep the best in terms of clustering again and then another evolutionary step again with 100 chromosomes made with this variation operator called transposition and so on and so forth and after a couple 100,000 evolutionary steps in a few hours with the computer we got things that were regularized that is we did have some proximity and periodicity patterns appearing from random initial sequence we can do with the activities you have related to sub-technical transpositions or some other mechanism involved transpositions are supposed to be random but there may be some problems attached inside the scheme I mean Misha you are going too fast it is possible but it's really a wild hypothesis it's a wild hypothesis it's really interesting just to relate and understand but if you relate it to something biological you have a nice idea we know that when you do horizontal when horizontal gene transfer occurs normally it's a continuous piece which goes to another place or to another organism horizontal gene transfer however your hypothesis tells to saying that it could be a piece of this one plus a piece of that one that together will and there are very few there is no proof of that there are some hints perhaps but no more than hints and if you ask people in the field I would say no no no it's not possible it's a proposition but it creates a book about evolution of genomes but for him it's completely random exactly the structure is not the picture what you say you have to write the whole book right it's random until we find the regularity it's a pattern of organization exactly I'm curious how you can it doesn't matter if you use something and this is I'm just curious if you do what he does but now make your assumption if you get the different answers it can be better to react to the real world you see this is a kind of computation constraints on organization of genomes and of certain type and so we live in a different space and in different statistics so in a two mathematical one one inside of the cell and the other in the space of genomes and evolution his book was written before that yes sure no the book was written about three or four years ago five years ago so he didn't read the good authors he didn't read the first one that was my conclusion hmm no this is a tool when you go out and look at it one in the cell and the other in the space of genomes true yep