 Let's start. I like the laid-back attitude we have. So we start 10 minutes late. Maybe tomorrow, let's try to get here in time. And the first thing I would like to do is to go through quickly the topics that we covered yesterday in the second part of my lesson. But this is, say, the analysis of experimental data showing the patterns in the organization of the chromosomes in the nucleus of cells. And then I hopefully move to what we can do with physics to try to understand those patterns. So you remember, the key point is that we have now technologies which enable us to derive so-called contact maps. The contact map is shown here. And you remember, this type of quantitative data tell us what's the probability to see in contact those two sites on a chromosome. And you have data information genome-wide. And you remember, we discussed the patterns in those data. And one important discovery was that if you zoom along the diagonal, so you move from the 100-megabase scale, so the size of an entire chromosome to a 2-megabase scale, which you see means that you are zooming along to the diagonal here, what was observed is that the matrix looks made of blocks. So this is the genome that coordinates on the string of characters of DNA. And what we discussed is that one notice is that you have that region which is strongly interacting with itself. And much less with the rest of the genome. And also the neighboring region. And then the neighboring, and so on. And that gave, at the beginning, the impression that human genome is composed of a sequence of domains which are strongly interacting within themselves and not with the rest of the world. They've been named TADs. And this was the picture in 2000 in the health of how our chromosomes are followed. But then, after showing that the definition of TADs is quite phenomenological, quite, say, brutal. And basically, it's the fact that you can define some simple methods to decide whether one region is more interacting to the left or to the right. You remember, the TADs are brutally defined in this way. You take a position along the genome, say that position, I. And you count how many interactions that location has to the left and to the right. And you subtract. And when you plot that quantity as a function of the position along the genome, looks like that you have coherent regions in sign. So here, there is a positive block. And then you have a negative one. And then another positive and so on. And so you can say, this is the way TADs are defined. You set a threshold. And you say, when I change sign above a threshold, I switch from one TAD to the next one. You see the number of complications that this type of heuristic definition arises because, of course, the definition depends on the threshold. And so for instance, if you have a very high threshold, the example is the red, then you get those big TADs there. In this study, you get a tiny threshold. Then you have all those different TADs. So what are TADs? Of course, heuristic definition are not good enough. Nevertheless, you see those patterns in the data. And so they've been considered to be sort of fundamental unit of organization of our chromosomes. And what I tried to discuss last time at the end was an attempt to show that the cartoon of our chromosomes or fault that I showed is inadequate because there are interactions among TADs. And therefore, higher order structures exactly as you have higher order structures in proteins, for instance. And what I showed you is, again, a brutal heuristic approach, which I tried to summarize. That is the following. Suppose that whatever is the exact definition of TADs, you take your TADs in your system. The original discoverer tell us what are the TADs. And in the example shown, the TADs are the blocks along the diagonal marked by black numbers. Then when you plot the data with a slightly better color scheme, a slightly broader palette of colors. But yeah, you see that it's difficult to believe that one and two are independent, are not interacting because they share context. They have contact one with the other. And so the idea, the approach to try to see whether those contacts are significant or not, was the one I mentioned yesterday at the end of my lecture. And the way we followed is the following. Take the list of your TADs, whatever they are, and then select the pair of TADs, which is the most strongly interacting. That means it has the highest number of contacts with respect to all other pairs. By definition, that's the most likely candidate pair to be forming in higher order structure because they share a lot of contacts. And so we call them a higher order domain, a meta TAD. And we had that domain back to the list of the other previous domains. And you iterate. And if you think about, at each iteration, you are bringing together the most likely candidate to be forming in higher order structure. And so the tree that you get in this hierarchical clustering because you get a tree, because this is just hierarchical clustering, tells you how the 3D structure is organized across levels. Now, as we discussed yesterday, this is a heuristic definition of higher order domains. Are they significant? And so I understood yesterday talking with some of you that, too quickly, I tried to explain why we think they are significant. Significant means, do the data statistically support the fact that there are those higher order interactions? And the way we assess that through a number of measure, I want to mention only one for sake of brevity. And what we did is the following. We want to run an hypothesis test. So we have to set a nil hypothesis, which is a random model, and compare the interactions we observe within those higher order domains with those you would have in the random control model. The random control model we used is the following. Take the high C data. Take the contact probabilities. Experimental it. By eye, without entering too many dates, I will discuss this more later on. By eye, you see that there is a fading of colors towards blue when you move away from the diagonal. This is telling you that if you take a point here, this corresponds to the interaction of two sides, which are very distal, genomically along the linear genome. This point, you see, you can see it in different ways. For instance, this point here is the interaction of that with that, so the opposite side of the chromosome. And you see that when you have two distal, genomically distal sides, along the linear coordinate on a genome, when the two sides are distal, their interaction is, on average, weaker, as you expect. So the random model we built is the following. You want to preserve the effects of genomic distance, because they are trivial in a sense. So you don't want to randomize fully the metrics of data, because otherwise you wash out also the effects of genomic distances. So the random model we construct to make our hypothesis test is a randomization of this contact data sub-diagonal by sub-diagonal. Because if you think about a sub-diagonal, it comprises all the points, all the pairs, sorry, which have the same genomic distance. And so we only scramble the entries corresponding to pairs of sides at the same given genomic distance. And we repeat that for all possible genomic distances. In such a way that the random metrics that we get has the average behavior of the original metrics. So the average value sub-diagonal by sub-diagonal is the same than in the original metrics. And so you preserve the average trend corresponding to the genomic separation of the sides you consider. But all the other patterns are scrambled. So this is done in the classic way by bootstrapping. So the idea is the following. You take the entries of a sub-diagonal, and you produce a new matrix where that sub-diagonal has the entries taken from those. And you can randomly. You start from the real data. And so you do not change the distribution of inherent in the original data. This is called bootstrapping. And this is a classic bootstrapping. And it is a classical method to produce random models from real data. The idea is you want to use the original data to keep the structure of the statistical distribution. But you want to randomize their positioning in the system. Perturb them. Perturb, I mean. Fermi would make sense. No, no. Let me clarify this. The question was, you are assuming that there are interaction proportional to the distance. Am I? No, I try to clarify this. If you have a signal which has a trend and you want to produce a random model of that, you have to take an account of the trend in the signal. Otherwise you are perturbing the structure of the signal itself. This is the only thing we do. So you take a sub-diagonal. Whatever is the dependency of the signal with the genomic distance, we do not make any assumption. We take that and we permute. Exactly, exactly. No, there's no assumption. There is an observation that the average interaction decreases with genomic distance. Because the farther you move from the diagonal, so the farther apart are the two, the less intense is the signal. But you are not making no assumptions on what's the dependence. Is it clear? You observe there is a dependence. So there is a trend in the data. You want to keep the trend, whatever it is. OK, I understand that yesterday I've been too quick at this. So what we found at the end is the following. You take the randomized system and you use that as a control. So when you have two domains, that one and that one, for instance, you can measure the interaction by, that is to say, the number of contacts they share in the real data and in the random control. The interaction in the random control, the one you would expect in the random system, we call it IC control interaction. And what I'm showing you here in this plot is the ratio of the real over the random expected interaction of two domains as a function of the sides of those metadata, of those higher order domains. The size of the metadata is expressed as the number of fundamental tubs they include. The fundamental tubs is roughly half a megawatt. So this scale goes from half a megawatt to 200, which means practically the sides, 100 megawatt, practically the sides of an entire problem. In green, you see the random, what would be the random representation. And in blue, the real signal. And if tubs were not interacting with each other above background, you would expect that the blue would rapidly collapse to the green. Because as soon as you are a couple of tubs apart, there is no interaction. And instead, what I concluded yesterday is that you see that the blue remains statistically significantly above noise, above the green, up to huge scales. You had a question? No. It's always the ratio. So blue is real data divided expected interaction in the random model. And this is only a noise level for a comparison. What we do here is basically the following. You try to see what is the, in an ideal world, in the random system, this ratio should be 1, by definition of control. Control over control is 1. However, in the data, there are fluctuations. And that's why this goes up here, because you want to take on account that to say whether the distance between the blue and the green is significant or not. It's just that. And so hopefully, I've told you why. Now, the picture of how chromosomes are fault has changed. Because rather than thinking of distinct tubs, not interacting one with the other, the picture which is emerging is of tubs, whatever they are, forming higher order structure, which have strong interaction one with the other. Strong means much bigger than what you would expect in a randomized controlled system. OK, so hopefully what I showed you, and by the way, the results I discussed are found in mouse cells, in human cells. It appears to be systematic across high order organism. So it's not just for a specific type of cell. What I showed you, hopefully, is that the impression we have from the data is that chromosomes are faulted in this complicated way, hierarchical way I told you about. You see, the next question is to try to understand whether such a folding, complex folding, has biological relevance. And I want to try to guide you through that. At this point, I may enter some more complex biology. If I go wild and you do not understand what I am, then please stop me. And it's absolutely natural. It's my fault. Maybe I'm going to use terms which are hard to digest for those of you who are not exposed to biology. So let me try to guide you step by step. When people discovered tubs, a lot of attention was given to the boundary between tubs. If you think the rationale is even trivial, why you have a boundary, what does a boundary correspond to? And so what people did was to focus on what is sitting at the boundaries between tubs, what is determining the fact that that's a boundary. And the number of statistical analysis have been made. For instance, you discover that tubs boundaries are enriched for GCs. But there are many more enrichments at camp boundaries. I want to mention only one of those and to give you a sense of what is now. CTCF is a protein which is known to play a number of roles in the functioning of the nucleus. And with technologies which are named chip-sake, and I'm not going to discuss in details, nowadays it's possible to measure where that protein and also other proteins are bound along the sequence. So you can tell how much of CTCF is at position x. How much is at position x plus 1 and so on. And what was observed is that CTCF, for instance, is enriched at tat boundaries. And I tried to show you how the data tell us that. So what you do is you align all the tat boundaries in your genome, take all the boundaries, and align the genome across the boundaries. And then measure how much of CTCF is at the left and at the right of the boundaries, and of course at the boundary itself. And measure the enrichment with what you would expect. To cash out a longer story, this is what I'm showing you here. This plot, look at the violet curve, is precisely that. 0 is where the boundary is sitting. And the violet curve is telling you how much CTCF is enriched with respect to noise, random excitation. And you see, it was discovered that CTCF is indeed enriched at tat boundaries. In the same course, for a number of other signatures, I'm showing you only a few of them. This is, you see, a form of polymerase, the machine-riched transcurrent genes. If I have time, I'll get back to that. Cages, I told you, is transcription, et cetera, et cetera, et cetera. By now, let's forget about more complicated things. So when we discovered metatars, the question was, well, OK, they may be architecturally relevant. They have strong interactions. But are they relevant also biologically? For instance, are boundaries between metatars enriched for CTCF? And the expectation once again was, well, the bigger the metatars you consider, the weaker should be the enrichment if it has no meaning. And you see, instead, the green line, which is the enrichment that you find, for instance, when you only consider metatars which are at least as large as 10 mega, so metatars which are 20 times larger than a single fundamental. Not only the signal is not washed away, but it's even stronger. And so since this is found consistently across a number of features, as you've seen, it was another first indication that there is a deep link between the way chromosomes fold in their activity. Or to put it in a different way, if tads, so I will show you, have a meaning from the functional point of view, well, metatars must be at least as important. The next question was to try to understand, to move, say, beyond the boundaries. I mean, would we try to highlight, and I have only one slide about that. Let me see if I can show you that too. Yes, is there a more profound link between the organization space of chromosomes and biological signatures? So I want to try to guide you through this slide. I am not sure I will manage, but please help me in case I am not clear. So this is an example of real data. And chromosome 16, and what I am showing you is a 40 mega of that chromosome. 40 mega is roughly one half of that chromosome. And what you see at the top is what would be the, what is the metatard hierarchical tree, constructed in such a terrible way, I told you about. Beneath, instead, you see different other signals. These are biological features, which are located around, along, sorry, the sequence. Let me see if I, there's a CTCF that you have already encountered. This strip here is telling you the intensity of the color, how much CTCF is positioned at each site along the sequence. And you also see cage transcription. You see the pole to signal. And there are a number of other features which are important for reasons which I will, if I have time, I will come to. But just to have a bird eye view of the data, I don't know if you see what I see. But the impression is that, first of all, the different features are correlated one with the other. I mean, this is visual, isn't it? You see that there are regions where the signal is weaker, regions where the signal is stronger. But you also see that they tend to be correlated with how the tree is organized. For instance, you see that there is a block of high levels which correlate with a big mated hat here. And that's distinct from the neighboring mated hat, where instead the signal is weaker. This is visual impression. I'm trying to guide you through that. So the visual impression I'm trying to deliver is that the way in which the hierarchy of a high hold structure is organized is correlated with the way in which a number of biological signals are distributed along the sequence. And to cut short a longer story, the way we show that there is indeed a complex correlation between the two is the following. You know how to measure a correlation of a function of a value that we did self along a line. I mean, if I give you CTCF, you can tell what's the correlation of CTCF at one position with CTCF at the next position and then a nearest neighbor and so on. And this is, you expect, a decreasing function of the genomic distance. But now that you have the tree, you can compute correlations alternate different ways. I think those of you who are paying more attention have already understood that. You can look at the correlation over the tree because what you can measure is not just the correlation along the linear genome, but the correlation between sites having the same distance over the tree. So for instance, look at those are, in my cartoon, those are four tads, neighboring tads along the linear sequence. You see the orange, this has the same genomic distance to its two neighbors along the genome by definition. I call that one. And for instance, the orange instead is distant two from that along the linear distance. But you can measure how far they are over the tree. So in the energy of folding of higher order structures. And so you see that the tree distance of the orange from the left is still one step because you have to go up one level from the tree. But instead, if you want to move to the right along the tree, you have to go up three levels. And so although their genomic linear distance is the same, this and this have the same linear distance from that, their distance on the tree is totally different. And the tree reflects the way they are folded. And so what you can measure is, for say, CTCF in the case shown, this is lamina, so the first of those signals. I'm not telling you by now what lamina is. You can measure the correlation of CTCF or lamina with itself along the linear. And you get the pink curve. How you see the correlation coefficient decreases as you expect with genomic separation. Or you can compute that over the tree. And you see the correlations over the tree. Notice the log scale extends one order of magnitude further than those over the linear distance. And this is found systematically across a number of biological features which mark our genome. And so that was an evidence that not only the genome is folded in the complex way I told you, but the way it is folded has crucial implications for the functional activity of the cell. Because the number of signals, including the presence of pol-2 transcription, CTCF, and so on, are deeply correlated with the way the hierarchy of metatarsis is organized. I told you that we run our experiments in three time points during neuronal differentiation. We look at embryonic stem cells. Those are the pluripotent cells which give rise to all other tissues in our organism. We look at neural precursors. And we look at that post-metotic neuron, so say developed neurons. And what we found is that the structure I mentioned, this higher order hierarchy, is consistently found across different time points. And the impression I'm trying to deliver with this slide is that the overall hierarchical organization persists during neuronal differentiation. But the details can change to try to highlight that visually without spending too long time with this. What I'm showing you here is the example of the higher order metatarsis in mouse embryonic stem cells top and in neural precursor bottom for synchromosome 6. So you have the synchromosome, and you look at how the tree changes in the two. And again, by visual impression, you see that similarities, but not fully, you can measure a stupid measure of correlation between trees is the co-fanatic correlation, which is 84. It's 0.84, so 84%. And this is representing the degree of similarity between the two trees. And to catch the long story, what this is that the overall structure is more or less conserved, but there are important changes. The color scheme at the center tells you whether the local region there is conserved or not. And you see, there are regions which are well conserved. That portion of the tree is really the same in the other cell type. And there are regions which, instead, are changing. And to catch a shorter, longer story, what do we try to understand is whether there is a connection between changes in activity, in transcription, and changes in the architecture. Because the naive expectation would be, well, I change the architecture, I change the structure, and I have a change of activity. And what we found is that it's much simpler than that. No, the correlation is not so simple. What we found, though, is that if you consider a conserved niche in the two cell types, then if you take a tent within the niche and you look how it changes its activity, well, the other tents in the conserved niche tend to behave in the same way. So for instance, if this goes up, the others also tend to go up. So within conserved niche, there is a coherent change of activity from a transcriptional point of view. But only within conserved niche, and only statistically, which means that you may have a lot of exceptions. So up to now, I try to summarize for you some key results on how chromosomes are followed on themselves. Of course, we have a number of chromosomes, and so what are the interactions across chromosomes in the nucleus? And I think I already briefly mentioned that yesterday. The picture which is emerging is roughly summarizing by that figure. In this figure, a ball is a chromosome, and what this schematically represents is the network of contacts of that chromosome. So this is a zoom, and this stupid network you see there is nothing more than a representation of the contacts. So when you see a link between two sites, that's because there is a contact. And so this opens to all what you know about applications of networks, and so on, to this type of contact matter, and so on. But nevertheless, the key take home message is that the network of contacts within a chromosome is much stronger than contacts across chromosomes, two orders of management. So there is a, as you expect from the territories of chromosomes that I showed you yesterday, that are strong interaction within a chromosome. Those are real, how to say, architectural determinants of folding. But then, this picture roughly summarized the picture which is emerging, that is to say, that distinct chromosomes also tend to interact in a non-random way. So there are also contacts across chromosomes. And so the nucleus can be seen as the final hierarchy of those networks, of those preys I showed you before, which is, in fact, a network of the distinct nets of contacts within each chromosome. And the system is organized at a global scale. This is the impression which is. So you have already seen that, to give you a break and to try to summarize what I did. So hopefully, I conveyed the message that the organization, the 3D organization of chromosomes is crucial for functional purposes. Because even at the scale of single genes, so within the tart, you have that the regulators of the gene and the gene have to enter in contact. And so there is an organizational scale really at the level of single genes. And then you see from the data that such an organization extends across scales, from the scale of tarts to the scale of mated tarts comprising entire chromosomes up to, of course, this is a nucleus, up to comprising functional, non-random interactions across chromosomes in a global organization formidable, if they may not really formidable. And the way life is, the way our cells are controlled, it's in some way written in such a formidable organization. So what I would start now, and then in some minutes we have a real break, is to enter the second part of my lectures. That is to say, how we can understand the patterns I discuss with you, not just by heuristic approaches and some clustering algorithm or whatever, but by heart physics. So from the principles of physics. And the second part of my lectures will be on that. And I try to guide you through the steps which have been made in the literature to date. So I insist. I think I convey the frustration I have with heuristic definition of patterns and so on. Because you can find them, you see them, so they are there. But defined heuristically, you never know what you're looking at. And so whatever you extract, you never know if you can trust it or it depends on the definition and so on. That's why in this field, physics is helping, is giving important contribution to try to understand what are the mechanisms, what are the real definitions of the pattern, what's the origin of the patterns and so on. And that's why there is important quests for fundamental theories of those patterns. What are the mechanisms, the fundamental mechanism originating. So to guide you through, let me start again from an analysis of the Bayer experimental data. And what I'm showing you here is what do you ask it at the beginning? So what is how the average contact probability decays within a chromosome as a function of the genomic distance? This is the following. On the y-axis, you have the probability that two sides are in contact as a function of their genomic distance. This is genomic distance. S. This is counterprobability. And when you average over all the chromosomes in the system, you get the blue curve. And this was already, of course, noted in the first paper, which introduced the high C technology where the data I'm considering derived from. And you see that was exciting because you have a sort of power law behavior. In fact, I think we have to be very critical about power laws. But at least as a first impression, you see a straight line in a log-log plot. Then if you look at it, it's more complex than that. Because you see maybe there is a power law. The authors of that paper found the power law in that range, which is one decade. And in a log-log scale, practically whichever function in one decade is a power. But nevertheless, if you want to find a power law, yes, there is a power law behavior there. And the problem with that power law is that it is close to 1 or minus 1. The reason why that's the problem is because if you take the basic models of polymer physics that I think, I hope, Angelo introduced, for instance, the self-avoiding walk model of polymer physics. Have you heard about that? Do you know what is a self-avoiding walk? OK, everybody? Not everybody, but it's the gold standard of non-interacting polymers model, the self-avoiding walk. It's essentially a random chain that cannot overlap with itself. That's why it's a gold standard non-interacting polymer physics. Anyway, if you take a self-avoiding walk as a model for a polymer, you expect that the exponent is 2.1 in three dimensions. And so very far from what is seen here. And that's why a quest started to search what could be a polymer model which describes the data. And what you see here in brownish is what has been dubbed the fractal globule model of chromosomes. Fractal globule is a known equilibrium, transient state, that they obtain for some polymers when you start from the right initial conditions. So for instance, if you take a self-avoiding walk, you compress it in a very small sphere, and then you release the constraint. If you look how the system evolves in time, there is a time window where you have this fractal globule behavior. I'm trivializing something slightly more complicated. But more or less, this is it. There is a transient window in time where a self-avoiding walk polymer goes through if you have it starting from the right initial conditions. Well, the matter is that within that window of time, if you measure the exponent of the fractal globule, it is 1. And you see, it fits decently well the data. It's not really 1.1, but close. And so at the beginning, there was an enthusiasm. Well, we have understood how chromatin folds is a fractal globule. But then, a number of complications arise. The first is shown here. So suppose you take a self-avoiding walk or the fractal globule mode. By definition, the contact matrix is featureless. Because by definition, they are uniform. And these contacts are random. And so there is no reason to see anything else, just a genomic distance effect. So you capture, at least in one decade, the k of the average contact probability with the average genomic distance. But all the factors, all the tads, all the action in the data is fully lost. And so it became clear that fractal globule cannot be a motor for chromatin or a fully general motor for chromatin. It may happen that in some circumstances, some portion of some chromosome does behave as a fractal globule. But it is not the general motor. And what we showed is that there are a number of evidences for that. For instance, if you look at the data in the tads, and you don't take the genomic average, but you look at different chromosomes separately, you see that even at the level of the average probability, so you are forgetting about the patterns, but even if you look at the average probability, different chromosomes or different organisms have different power loads. This is the different colors. There are different data from different cell types. And you see, if you want to fit with a power load, you can, but that's depend on the system. And what I'm showing you here is two different chromosomes. Take one example. Take this. In grade, you have the average across chromosomes. But then, if you look at distant chromosomes, you see they deviate around that. And so each chromosome has its own exponent. So this is crushing, if you're which has a single possible universal conformation for everything. If you look at the details of the system, they are dependent on the cell type, on the chromosome you consider, and so on. And so the quest was, how can we reconcile that with a simple description? Can we get back to statistical mechanics, to polymer physics, and explain the data? And this is what I want to do, but after a time is it now, I can't see the extent. I always forget. Our lecture goes up to 10.45. So shall we have a 15 minutes break? See you here. Please sharply add a quarter past 10. Thank you. So there is a secret entrance there. No. Can you go out from there? No, no. I don't know. So the starting point to try to build first principle theories of how chromosomes fall was roughly the following. If we want to understand why chromosomes take the shape they take, from a physics point of view, we have to focus on what are the mechanisms which produce the interaction, exactly as we do in normal physics. And so a scenario, of course, is the one which is depicted in my slide. It is to say that two DNA sites enter in contact because there is a molecule, something, which is holding them together. And so that scenario is summarized by the strings and binders mode. In fact, we introduced before high C data became available. And the idea is literally nothing more of what I said here. So you have binders. Those molecules are binders. And as you understand from the color scheme, you have, say, for instance, green binders which can bridge green regions along the polymer, and red binders which can bridge red regions. And what I want to discuss with you is step by step the advancements which were possible within this framework. And if I have time later today or I guess tomorrow, I'll discuss other mechanisms which we think are important in folding the chromosomes. So the scenario is the one I mentioned before. And it's this. Take a piece of a chromosome. U-modal, that polymer, with the gold standard of polymer physics, which is the self-avoiding work that Angelo discussed. So literally the strings of beads, which cannot overlap one with the other. The variant, though, with respect to the self-avoiding work model, is that along the chain, there are binding sites here marked in dark red for the binders of the strings and binders mode. So the string is the polymer, and those are the binders. And the binders can only bridge pairs of sites with the same color. That's it. So this is a standard model of an interacting polymer. Other than if Angelo has done it into that. We'll go step by step. So what's the idea? Suppose that this is a model, a good model for folding a chromosome. It's clearly oversimplified. But let's have it as a sort of starting point. The idea is that we can use Newton equation, the Hamiltonian of the system, to predict how the system folds and then try to guess if that explains the data or not. So the first step I would like to do with you is to explain, let's say, the basic physics of this type of interacting polymer models. So suppose that to let you understand what is going on here. Suppose you have no binders in this model. So you have no of those particles. You understand that at the end you have a freely and randomly arranged polymer, because the polymer subject to thermal fluctuations just moves around. And there's no constraint. And you have literally nothing more than a randomly folded chain. That's it. That confirmation, if you think about, is a thermodynamic state. Because the system spontaneously folds in that way. If you strengthen the polymer and you let it go, naturally refolds into such a random conformation. So that's the thermodynamic state of the system. And that's called the coil state of the polymer, or the open state. And so that's the state shown here. This is the system phase diagram. The simplest phase diagram of the system is slightly more complex than that. So you see on the x-axis, that is the concentration of binders. It's called Cm. So how many of those you have? If you have no binders, as in my example, you are far on the left on this axis. And the other control parameter, you immediately understand it, is the interaction strength. This is the y-axis here. So this is the binding energy. How strong is the interaction between a binder and the cognate binding site? If you have no interaction energy, of course, there's no interaction, so the polymer is free again. And so at low binding energy, again, you are in the thermodynamic state where the polymer is randomly folded and open. Now you see from the diagram that there is another thermodynamic state. Because let me see if I have a picture. Yeah, I have a picture. Suppose you start from the open state, and you start adding binders. If you have a binder, you can have a situation as the one described above. So you have a binder, whatever it is. And by chance, you can indeed form a bridge and then loop the polymer. Because what happens is that the binder floating around randomly finds its target, first target, locates there. And by chance, another red binding site floating may collide with that and a bridge is formed. Now if you think about this, it's very unlikely. And tropically very disfavorite. Because you have to have that the binder finds its target by chance. Before moving away, the other has to come find in the space exactly the same location and bridge, very, very unlikely. That's why the phase diagram has a broad region with the open state. Because if you start increasing the binder concentration, that's not enough to fold your polymer. At the beginning, it doesn't succeed. But then a phase transition occurs. And I try to explain why you have a phase transition. Suppose now you increase the concentration of binders. You have a lot of binders there. And suppose that by chance, you form a first bridge as before. Once you have formed the first bridge, if you have locally many other binders, the chance that you form a second bridge are much higher. Because these two are already close by. And so locally, if you have other bridging molecules, it's easy to form a second bridge. But the second bridge reinforces the first and helps the formation of a third. So you see there is a threshold point in concentration above which, once you have formed the first contact, that helps the assembling of the others. And so there is literally a transition point where the entropy loss, due to the collapse of the loops of the polymer, is balanced by the energy gain that you have in forming all those bridges. So it's an energy entropy balance as in standard phase transitions. And the other phase, which is entered, is called the globular or compact phase of the polymer. Compact because the confirmation that the polymer takes is much more compact than the open one. Because all those bridges are formed, and so the polymer is looped on itself. And that's a thermodynamic phase transition. And that occurs at a transition point. In fact, it is a line because you see it depends also on the affinity. That transition line is called the theta point of the polymer. You enter the new thermodynamic phase. In year, we have the power of statistical mechanics. Whereby you know that there are university classes. So those major thermodynamic phases correspond to a vast class of polymer modals. You know that they are not affected by the tiny details of the model. That's why I didn't even mention exactly what is the type of bonding that you have here. It is a Lennar-Jones or shifted Lennar-Jones, a hydrogen bond, whatever. If you have interactions, the concept of Nobel Prize-winning, concept of universality, let us know that those are independent. The structure of phase diagram, the properties of those phases, thermodynamic phases, are independent on the tiny model of case. And that's why we have a hope that even with very simplified models, such as the easy model for magnetism, we can explain much more complex cases, such as the folding of real chromosomes. And so the advantage we have with respect to biologists that we know that there is a Nobel Prize for universality. We know that there is a Nobel Prize for self-harding work and polymer physics and so on. And we know the power of that with very simple models. We can hopefully try to describe from very simple principles what is happening in folding. And when this was understood, an important step ahead was made, because you can make sense now. You can do as we do, if we theoretical physics is doing high-energy physics. You can tell, well, what's the energy range where you expect this type of phase transition to occur? What's the concentration of factors you expect? And that's shown here. You see that the binary concentration is multiple of nanomodes per liter. Now, of course, I'm not delving into those numbers in details. But what is interesting is that those numbers correspond to known typical concentration of binding factor, CTCF, in the nucleus of cells. In the energy range, a few units in KBT at room temperature is hydrogen bonds. So exactly what you expect, weak biochemical energies for the formation of this type of interactions. You don't want permanent bonds, because this is a regulatory mechanism. So you want to format when you have to activate a gene, and you want to deliver it to open it when you want to desactivate the gene. So you don't want strong covalent bonds. You want hydrogen bonds. You want weak bonds of biochemistry. And so the theory naturally predicts that this type of phase transition happens in the right energy and concentration range, as much as, say, standard model predicts that Higgs boson must be a given energy range and so on. And if this is true, there is another intriguing fact that we can understand how the cell controls those changes of conformation. Suppose you want to activate a gene. You want to bring the regulator in contact with a gene. How you do that without fine tuning in a robust and sharp way. You want to be sharp, because that's vital. You want to activate a gene. How you control that? In this scenario, we understand this out of statistical mechanics, because you have a phase transition. And so for instance, suppose that at the beginning you have no binders in the system, and you start producing binders. Because this is a phase transition, you have a threshold point. And if you are below the transition, although you may have binders around, by chance you produce a binder, you are not folding your polymer. You are in the open state. But as soon as you cross the threshold, the phase transition point, you don't need to fine tune to decide exactly how many particles I want. No. If you cross the threshold, you fold. You change state. You activate the gene 100% and sharp. So if the scenario is correct, we may understand how the cells can easily control such complex processes, as those I described. So the theory has a number of appealing elements in it. But let's test this theory, because we can now make quantitative predictions and compare with the experiments, because we have quantitative data exactly as much as in atomic physics. In high energy physics, there are hard quantitative data against which we can compare our models. Here we have the same. This is a perfect scenario of theoretical physics. So this is just to say that the phase dynamics is more complex. I don't want to delve into that, otherwise we never get out of this live. I want to know this is a, I didn't have to show this. But quickly, what you have is that according to the interaction strength, you have that system forms also ordered arrangements. So it's not randomly, it's not a random lump, the one I do for, but it is more ordered, very quickly. So to make a comparison with the data, I want to show you how the counter probabilities, first the average counter probability, changes in this toy model in the different termodynamic phases. No, I understand your question. What I said is that this is the number of binder. Oh, yes, of course, absolutely. But what is, of course, what you are saying is that, well, if you have no binding size, you cannot fold the polymer. So this is in principle another ingredient. So what's the density of those? To cut short a longer story, if you think about what matters, it's the product of the affinity and the numbers, because that's the total binding energy you can gain. And so for simplicity, I'm not showing that. But yes, that's another, I'll get to that. Because you see, now we can predict. As much as in high energy physics, you can tell, well, I expect those particles acting at that energy scale and producing that interaction, here we can do exactly the same. So I will come to that. So very quickly, if you want to compare a model with experiments, first of all, we have to see how the average counter probabilities are found in this toy model I'm showing you. And this is how that behaves. This is the average counter probability. In the toy model I showed you before, the string with the binders, one color, as a function of the genomic distance, the distance along the polymer, I should say. And as you know from the universality concepts of polymer physics and phase transitions, the counter probability only depends on which thermodynamic phase you are in. So it doesn't depend on details on exactly how many binders you have, exactly how many binding sites you have. If you are in the open state, it's one thing. If you are in the other thermodynamic states, it's another thing. And the two things are shown here. So what you see is that, again, quickly illustrating the concept, if you are in the open state, the average counter probability with genomic distance is indeed the power low, and you know the exponent. It's the self-loading walk exponent, 2.1 in three dimensions. And now you see it's really a power low over a few decades, not just fractional. And this is all well expected in the good old polymer physics of the 80s, maybe even earlier. If you transit into the other thermodynamic state in the global state, if you think about the structure changes, and so the counter probability changes, and it's more like the pink one. This violet is at the theta point, roughly. And the point is that those counter probabilities, at least the general behavior, does not depend on the details. And so it's either one or the other. Analogously, you have that, no. So what the idea we proposed is that, if you take a chromosome, it is a very complex thing. You have seen that there are complex patterns of molecules bound to its CTZ half, pole two, and so on. And so what we propose is that one chromosome is not in a single state, but it is a mixture of pure states. The idea is that along a single chromosome, you may have a portion which is in the compact, because you have to activate the genes there, and another which is in the open phase, because you have to not activate the genes there. So a single chromosome is a sequence of different folding states. And if you have a population of cells, a given locus, a given region, can be open in one and closed in the other. So when you look at high C data, in fact, what you have is a mixture. As if you have a paramagnet and a ferromagnet, mix it together, and you look at the average properties. And so at the beginning, to our surprise, such as stupid, basic polymer physics models explain the data to an extent we can say unexpected. This is real data. So this is the average counterprobability I have already discussed before. In this case, the different colors are different cell types. In this case, the different colors are different chromosomes. And this is the average counterprobability versus genomic distance. And it's roughly two, three decades. The gray region is the one where you could fit with a single power law which was nevertheless dependent on the system, the chromosome, and so on. And instead, by combining, you take those two curves from textbooks. This is standard polymer phases. You don't even need to compute them. By having a mixture of those two, I say 50% of cells at one state, 50% in the other, or 50% of regions in one and 50% in the other, you see that you can fit, considering that you have only one fitting parameter, the fraction in the mixture, you can fit comparatively well essentially all the data available, all the average data available to date. So this was an indication that maybe such a stupid, simple model is not too far from reality. You can also understand why you have bandings and it's not a power law. That's why it depends on the system. It depends on the system because, and it is well known in biology, there are chromosomes which are very gene rich. There are chromosomes which are inside a very gene poor. And so you expect that they have differences. And this is what you recover with this standard approach. It's a, I try to answer, I think I see what you mean. The reason why I say that there is one parameter is that those two functions, the counter probability, the average counter probability in the open and in the compact state are universal. So it's model independent. If you are in the open state, you'll find one. If you are in the other state, you'll find the other. So you have no fitting parameters or, in fact, I am exaggerating because the little details where the plateau sets in this kind of things depend on the model. They are not universal. But if you take the universal part of those, it's given by basic polymer phases. You don't even need to, any interacting polymer has a call to global transition. You decide which polymer model you want, but then you get those. And so what you need to produce those fits is only deciding what's the composition of the mixture. I don't know if I'm expressing myself. However, you're right. Of course, the, technically speaking, the model parameters are those I discussed before, number of binding sites, concentration of binders, binding affinity, plus the mixture. At this point, we start with the predictions. So, say, this is chromosome 19. What's the mixture which best fits chromosome 19? And we derive that. And then we compare that with what you know from biology. And it turns out that chromosome 19 is comparatively open because for its biological properties, it's known in biology, et cetera, et cetera. Chromosome X tend to be compact. It's known that, in females in particular, one of two X chromosomes is completely compactified for reason that I'm not explaining. It's called X chromosome activation. We infer the fraction by fitting from first principles to data. And we discovered once again that the fraction that we extract make sense against other biological knowledge. With this type of models, you can also investigate other data, such as fish data, why we want to limit to high C. Fish data is data instead based on microscopy. Fish is the following. You attach fluorescent probes to two positions on a chromosome and you, for instance, measure the distance. And you can measure, for instance, how the distance of the two probes depend on how far the two probes are on a chromosome. And you have tons of data like that. And this is briefly summarized what the strings and binders model predicts. Again, you expect the mean squared distance between the two probes in the strings and binders model depends on which termodynamic phase you are in. In the open phase, guess what? It is a power law with a standard known self-evolving work exponent, 0.588 in three dimensions. And instead, in the compact state, it has a sort of plateauing. And this is real data. This is data in different types of cells, different chromosomes. And you see how the mean squared distance between two probes on the same chromosome depends on the genomic distance of the two chromosomes. And this is the, this data, real data. The black line is the prediction of the strings and binders model. And the dashed line is what the, you remember I mentioned is fractal globule, what that would predict. So nothing to do. And once again, we, again, with no additional hypotheses and so on, you can fit also fish data. But one striking set of data, which was particularly, I think, interesting to me personally, is the one I'm showing now. This is really now we are delving into polymer physics. What I'm showing you here is the so-called moment ratio. This is the ratio of the average squared distance squared and the average of the fourth power of the distance of the two probes. This quoted in polymer physics is crucial because this is dimensionless. So either your model predicts that correctly or not. And I show you the comparison between strings and binders and real data. So what I'm showing you here is fish data. So real experimental data on that moment ratio, taken from that reference. And you see this moment ratio experimentally is found to be scattered, but more or less it has a sort of support line at roughly 1.5. If you think about this is surprising because you could think, well, the moment ratio is whatever. Could be anything. And instead, if I look at this stair to the data, you see that they have an accumulation point at 1.5. Why? We understand this out of thermodynamics because look instead on the right what is predicted by the strings and binders mode. Once again, you know from statistical mechanics that when you are well within a thermodynamic state, the world is Gaussian. And so the moment ratio must, must be three halves. And in fact, it is three halves in the opener and in the compact phase. And as you know, when you go through the theta point, the transition point, that changes and grows. And look, this is dimensionless. Grows up to a scale of roughly five. Roughly five. So you see there is a complete correspondence between such a dimensionless quantity, so totally independent of any parameter you have in your model. This is dictated by the fundamental theory of statistical mechanics that fluctuations are Gaussian in the depth of thermodynamic phases. And you explain the range of values observed. And explain why there is a, this sort of limiting point at three yards from above. And so if we take this seriously, this is telling us that, wow indeed, looks like that different regions, different portion of chromosomes tend to be around the thermodynamic equilibrium in one of the thermodynamic phases allowed. But if we take this seriously, this is predicting, this comparison is predicting, but the reason why you have values which scatter up may be because in a cell, you have regions which are close to the theta point. They are changing, they are regulating, they are opening or closing. And so we could make sense of the existence of also that other thermodynamic state in real-time. I was very, very excited by that, as you see from my talking about that, because of the fact that you have no fitting parameter regions and you have a fantastic limit in the data. And you explain that so naturally by just invoking the Gaussian property of fluctuations in thermodynamic phases. Okay, but now I want to make the further step. What I discuss at two-date is, is the average properties. So if I want to know on average, what's the counterprobability as a function of genomic distance or the average distance, physical distance as a function of the genomic distance, we can derive that from basic concepts of polymer physics. But what about the patterns in the data? Because the patterns is the action. When I see a spot of interaction, that's because an answer is contacting a gene and I want to predict that. So let me try to guide you through how we think those patterns can arise. So we want to explain not just the average behavior with genomic distance, but also why you have blocks of interactions and so on, and what they are and so on. And again, the problem with the Franklin-Roberts and the other simple model I told you about is that they are patternless, featureless. No, no regulator interacting with knowing answers, all of sort of random mess. So in the remaining part of this lesson, I want to try to give you a glimpse without entering technical calculations for that. I would like to give you a glimpse of how patterns can be obtained. And then the next and final, hopefully lecture tomorrow, I'll drop out. Let me try to guide you through this. Suppose you have a variant of the stupid toy model that I showed you before. So now, rather than having a polymer with one type of binding site, red, and one type of binders, red, you have another segment in the toy model with green binding sites and green binders. To this audience, I have not to explain what is happening. You immediately understand that the two different segments have their own permanent dynamic phase transitions. And so, for instance, if you go into the compact state, into the globular state, the system spontaneously folds in that structure where you clearly see the formation of a red and the green lump, just because they are taken together by their own corresponding gringing factors, and green binders and that's that. And if you compute the content matrix in this toy model, trivial at this point, you have a pattern of squares. And you know that, because the reds are interacting only with the red. And the only reason why that you have some remaining interactions with the green is by chance. The two blocks are not fully independent because there is a polymer which is linking the two. But that's it. So, average content maps, that one. And visually, it recalls tasks. How you obtain more complex patterns. Once again, it's just a matter of understanding what's the Hamiltonian. Suppose you take that toy model and you add a third color, but now the blue. But now the blue binding sites are interspersed with both the red and the green. Even trivial now. You understand that the two blocks start interacting one with the other. In fact, there is something which is non-trivial because what turns out is that the two blocks, the red and the green, remains together. And if you think about that, that's a matter of proximity. So you first fold the red and the green and then the blue glue the two. I don't know if I'm expressing myself. It's non-trivial that you keep the two substructures and the two substructures are then glued, bridged by the blue. But this is what is happening. And now Polymer Physics starts playing a slightly more exciting role. And then this system, if you map, the content matrix looks like that. So you have two blocks as before, and we know that. But you see that there are higher interactions between the two blocks, just because you have blue binding sites and blue binders which can bridge the two blocks. And so you see, this is an higher order domain. This is what is written into the data if this model is correct. TADS would be just distinct globular structure produced by specific particle image. And I will come back to that. And this is instead by the fact that you have particles which also mediate interactions between the distinct blocks. And so we can make sense of TADS, metatars and so on. But I don't want to leave you with the impression that this is all when waving. We can explain experimental data 95% with this. And this will be the topic of the lecture I will give tomorrow and see you then unless you have questions. Okay.