 Okay, maybe the first thing I would like to ask is if there's any question on the topics we have discussed together, also general questions over the basis I'm starting from, the biological basis, clear enough or should I spend some more time on discussing those, all clear. What I would like to do now is to start delving in some more details in the framework which I've highlighted in the first part of my lecture. And so the following, I think I will only give two more lectures, so overall what I will discuss is on one hand what's the data and what the data is telling us about our chromosomes are folded and is what I call part one and then the slide. And then in part two of my lectures, I guess tomorrow and then day after, I'll try to enter instead on the use of polymer physics and machine learning to make sense of the data and what could be the physical principle behind the organization that I discussed with you. I will touch a number of topics and the first thing I really want to do is to mention at least some of the main co-workers I have on this and I will discuss approaches done in collaboration with Ana Pombo in Berlin at MDC with whom we have an outstanding collaboration but also Stefan, Max Planck, Jim, Dag, Marieke in Oxford, Josie and Mikael, Colin in Edinburgh, Davideh also in Edinburgh, Hanperhella in Naples and most importantly I really want to stress the driving force behind most of what I'm going to discuss with you which is the team work with me in Naples and in part in Berlin. Okay, so what's the data and what's the picture which is emerging from the analysis of the data? So I told you already that we know that chromosomes are not randomly folded in the nucleus of cells. You recognize the picture, this is the human cell nucleus and those are the different chromosomes and you see they form territories so they form sort of geographical map and they tend to have statistically changing that but say specific pattern so say chromosomes one and three tend to be close to each other five with further apart and so on. As I told you the limitation of microscopy is a resolution limitation so how can we really go to the scale of single genes and single regulators to understand who's contacting. And that has been done in a real major step forward in the field by Jobdecker roughly 10 years ago when he envisaged the high C technology. Anja told me he has already given you a glimpse of what's that about so if I'm getting too boring please stop me otherwise I will try to summarize that for you again. The idea behind high C is also very very nice and very very simple and it's roughly the following we want to measure who's contacting whom genome white and in the high C matter the idea is the following suppose you have two sides the blue and the orange which are in contact in some way with something which holds them together then the idea is simple and is the following suppose you can by using some special enzyme restriction enzymes that's nice you can cut your genome in pieces in fragments so you see those enzymes cut at those specific position genome white and so after the action of those enzymes the genome is is scattered in pieces and if you think about you have two types of pieces you have single fragments those who are not interacting with anybody else and instead you have pairs which are hold together by the fact that they were in contact original and then by some biochemical trick you can ligate the extremities of those fragments and you come up with a circle and the circle includes a portion of DNA which originally was at a given position and another portion the orange one which was at another position and so you sequence those and now it's just a matter of power of sequencing it's just a matter of sequencing because you sequence what you have and when you sequence a fragment like those you change them in something like that I don't want to enter into many days when you one of those fragments you sequence two portions which should be very far away and so you can map where one piece of the fragment is from and where the other is from and you just count how many times the two fragments have been sequenced together and you come up with a map like this this code that I see contact map and the entries here tell you how many times those two fragments the orange and the blue have been sequenced together and so that's a proxy but how frequently the two are together and you can do that genome-wide and in this type of experiments the resolution is only given by the sequencing depth so how much you can sequence your DNA you have and in fact there is another technical thing which is the length and the other technical thing is is how big are the fragments the the average length of a fragment is given by the enzyme's uses and so that's the other limiting thing for the resolution but current developments you can you can go down to say hundreds of bases so very very very short fragments in fact in this technology there is an important bias which is summarized here and it's related to the ligation step so the fact that you want to for the technology to work you have to to bring together the two dangling ends of the fragment you want to sequence so this is the ligation step and the ligation step has a problem if you think about it is it is introducing a strong bias because suppose for instance that you have three segments close one to the other I told you that one gene and average has four regulators but consider the simple case where you only have three things together when you ligate by definition either you ligate one pair or the other pair or the other pair and all the other pre-key combinations you can think of and so when you sequence one of the ligation products by definition you're sequencing only one of them and so although you have three triplets sorry three doublets here because it is one triplet three pairs you only sequence one and so you have a bias of 66% in the estimation of contacts and that's one of the reason why we developed the other technology I mentioned gum but anyway the high C was a real a real major step ahead because it was the first time that we could access with very fine details in a quantitative way the frequencies of interactions of genome segments a genomic scales and so it was the first time we could access a genomic scales who's interacting with whom which genes and which regulators and we could start delving into that absolutely so I don't know if you hear the question the question is is the pattern random or you see non-random patterns and I think if you stare at it you immediately see that there are non-random patterns isn't it so you see that there are blocks blocks within blocks and there are portions which are more strongly interacting than matters oh yeah exactly you're this is precisely type of questions that the community is working on at this moment so I dose you're right let me let me go very slow on this so this is the contact map of an entire chromosome this is in mouth this is human chromosome for no this is mouse chromosome 14 so this is almost the entire chromosome 14 which is 100 megabases roughly and this is telling you uh which portion of it are interacting with itself and I think what you notice is that there are interactions at different length scales you have sort of let me call them more local interactions although on that scale these are distal interactions look at this block say this is 10 mega roughly find that you have a block of interactions this means that within 10 megabases things are are contacting each other in non-random fashion but what I think you notice is that there are all interactions in bigger blocks and if this is 10 this is 20 30 so very distal interactions maybe what you notice is that there are all interactions at much bigger scale at the scale of the entire chromosomes and this is the picture which I think is emerging from the data I'm going to discuss in a second so far from random long-range and this is exciting because you see from a physicist's point of view this is quantitative data it's not just the picture okay the two are contacting but what can we do with that we have frequencies of contact or proxy for with all the biases I mentioned so the gamma technology essentially was an alternative to that to to IC and I told you the idea is to avoid the ligation problem in fact we started earlier than IC it took 10 years though to to make all the steps I discussed before and in a natural I repeat the concept to avoid the ligation step you use the statistical idea so you can't slice this to the nucleus you see who's present in the nucleus by sequencing in the slice by sequencing and by collecting statistics you can come and decide well A and B are in contact or not and the advantage here is that you can do that without the ligation so with no biases with respect to high C and you can do that for very few cells because you see you we work at the single cell level and say in standard approach high C is not single cell so you have a you have to squash a number of cells to have enough DNA to reconstruct the content and you see the patterns as well so the patterns are there independently of the technology you use and now this is what I want to discuss so I'm following up on your question so the this is again the map I showed you same chromosome chromosome 14 roughly 100 mega and I think by staring at it by eye you see the patterns as we already started discussing but an important step ahead was made a few years later the introduction of IC with slightly better data what a group in California and in France discovered that if you zoom along the diagonal notice this is two mega this is 100 so this is two orders of magnitude higher in resolution you're looking within one of those blocks very very high resolution with respect to the previous experiment and what came out is that you see it is organized and sort of diagonal matrix the content matrix and I think you have understood what what that means are blocks of interactions the genome the chromosome is composed of a sequence of of regions where each region has strong interaction with itself and much weaker interactions with the rest of the chromosome because you see if I move along with the chromosome it is this block here which is strongly interacting with itself and then there is another portion which is also strongly interacting with itself but much weaker with neighboring objects and then another and those blocks you see my eye was the scale it's roughly one mega in fact it's half a mega one mega and they've been named very quick name for physicists topologically associated domains pads but anyway that's to mean that along a chromosome you can see a chromosome as a sequence of regions with strong interactions with themselves and very weak interaction with the rest of the chromosome it's in the picture which emerged cartoon how chromosomes fault uh was this in 2012 you have different regions which strongly interacting with each other with themselves and very likely with the others however I think by staring at the data as as you notice you see that it's more complex than that but first let me give you let me express my frustration in the way in which tats are defined and they're defined a very heuristic way so think of I think all the techniques you have studied in the previous lesson in this course and your background as a physicist is enough to go well beyond the state of the heart and the definition of tats they are defined this way at least one of definition now that is a little literature on the topic but say the basic definition of tats is roughly the following suppose you take a site on your chromosome you literally count how many interactions it has on the left and on the right and you make the difference of those two numbers and normalize in some way if you look at that this was named the directionality index di if you plot di along the sequence you see that this is sort of a step function it's very high very positive in one portion then it becomes negative and remain negative then a positive again and and you see and by I by I you see that more or less the blocks of consistent sign correspond to the blocks that you see in the matrix but my frustration is that this is by I in sense that would you call this the passage to a new tad or just a little fluctuations in the data and so that's why there is a literature on the topic how we define exactly that and my impression is that this is going to depend on on the depth of the data so the quality of the data the level of noise etc etc etc but even today practically this is the say heuristic definition of what it had is and of course what came out is that if you have better data then you start discovering structures within the top so it is no longer one mega but it's half a mega and then it's no longer half a mega it's 100k because there are sub-tads and and so on and in fact what we showed is that you have to be careful in this and you have to really take an account of confidence thresholds and and so on but the here the what I would really like to convey to you is that any good idea in this field is good you can improve on this without even me keeping lecturing or just mind I guess but anyway as I told you the impression is that staring at the data that not only you have fundamental blocks of interactions and maybe blocks within blocks but you have also bigger structures I showed you the map of the entire chromosomes and you immediately saw interactions a chromosomal scale so all just magnitude bigger than single times and so we showed that again in a very stupid and heuristic and frustrating way which I try to illustrate just to give you the sense of how this type of the picture we have of our chromosome folders has changed so we run our own experiments and this is again high c data and again in mouse cells and to be precise what we did was an experiments in the modal system where you move from embryonic stem cells to neural precursors and then neurons post-metotic neurons because we wanted to have two types of information on one hand how chromosomes fold and on the other hand how that changes during differentiation is the architecture of a neuron chromosomal architecture of a neuron distinct different from the architecture of a as you asked at the beginning of a nembronics stem cell and so we have a time course three points and we have high c data our own high c data and we have transcription data cage cage data is technology to to produce to have information which gene is being transcribed and I guess you have understood the reason we want to link architectural changes to transcriptional change so what do you see here is a an example of our data this is a by the way this this is a 2015 the original thing it's only three years ago but it's already old in this field the quality of the data this is five mega and I've changed the color scheme rather than using the one I showed you before used by by yob and the workers so just red and white we added shades being the data slightly better we had the shades to give a sense of the scales of interactions and I think here you immediately perceive what what you could already see in the original data that is to say that there are yes blocks of interactions along the diagonal if you want to call them blocks this would be the tads the black numbers the squares with the black numbers but you clearly see the tads do interact one way the other look at one and two you clearly see the day form a higher order structure so you see where I what I have in mind you know what is the higher order going to structure proteins this means that you have a local folding and then folding a bigger scale whereby distal portions come together here is the same you see half a mega blocks come together in a one mega block and they come together in a two me at someone you have I'll tell you better you're right I've not still described that that's why you are confused let me go to that so I was trying to first introduce the the concept and then they'll be into some more so by staring at the data if you define tads the way I showed you before so very heuristically the tads are those blocks but you see by eye that they are interacting this is the point I'm trying to to make you see that one and two are forming a bigger block and four five and six are also forming another block and this block is distinct from that and so on and so a heuristic approach to try to see to identify higher order structure has as he asked is the one that we followed this is a very nice very stupid one and is the following suppose you want to find them most likely candidates forming and higher order structure well you have to look at the pair of tads which share the most interactions and so we start from the list of tads in this cartoon the list of tads and we bring together and we call a higher order domain the pair which is the most strongly interacting pair by definition this is the most likely candidate to be forming an higher order structure and we add that higher order domain we call it a metatad back to the list of the remaining domains and iterate so this is stupid hierarchical clustering really stupid really naive maybe totally wrong but nevertheless the idea is that at each iteration starting from the tads identified as I mentioned before and I don't like how they are identified but whatever tads are we start from them that each iteration we are selecting we are bringing together clustering together the most interacting pair of objects and so by definition that's the most likely candidate and you come up with the sort of a tree as you expect by hierarchical clustering and the tree tells you at each level what's the organization at that scale now this is really naive and the first thing we had to do of course was to check whether the objects defined in this way are really bringing some statistical information so you bring together the most interacting pair but is that interacting above noise or not so of course you can apply that to whatever you want but maybe the objects you are defining the higher order structures are meaningless from a statistical point of view their interactions are comparable to background and so we of course made a number of tests including experiments to show the admitted tads do exist and I want to mention only one for the sake of privacy please absolutely exactly I like this this is the first question comes to the mind of any statistical physicist so is there a scaling variant if you apply Kavanaugh transformations to those matters is how they look I would get to that and it's not that trivial as you can think it's not just a power law let me put it this way it's more complex that's my but anyway let me let me go exactly in that direction step by step at some point we have to stop it's already lemon um very quickly uh one of the tests we we made to uh to show that those objects high order structures are real at least from an architectural point of view as shown here and I try to guide you through this and then we meet again tomorrow so there is the following suppose you have two domains that one and that one whichever they are fundamental tads who were high order meta tads we define their interaction i as littered the number of contacts they share in the high c or gam matrix so you count how many interactions they have overall high of course is interesting it's giving you a measure of how strongly they are interacting but you want to compare that with this certain say in hill hypothesis the background and then we produce a random control model which I want to briefly mention because this is important in the random model is the following you take the high c data and you randomize the matrix but you not fully randomize randomly randomize but you randomize sub diagonal by sub diagonal because you see immediately by either there is a genomic distance effect and so you want you don't want to mix up the trivial genomic distance effects with which you must keep and that you would have also in background system in a random system with real interaction and so if you think about a sub diagonal is the locus of pairs which have the same genomic distance and so we are randomizing only the contacts for all the pairs at that given genomic distance and that's why the randomization bootstrap is done sub diagonal by sub diagonal in this way in the random model we hold the trivial effects just linked to genomic distance you expect that interaction at large genomic distance are weaker than interactions closer to the diagonal you want to keep that this is trivial but then all the other patterns are washed away so that's the control system we have and so i come to intersect and so uh what we measure then is the real interaction i and what we call the control or background interaction that is to say what would be i in the control system i just said and now i'm going to discuss after answering that question but what we find right this is a bootstrap so you resample from the same distribution and so you can produce how many samples you want so i had very good quality a random model so what we find is to cut short a longer story is shown here in this plot you see here the ratio of what is the real interaction by the control interaction so how as a function of the sides of the metatars uh considered and the size of the metatars is counted as the number of fundamental tats included in that metatars and in green you have roughly the expected random background and in blue instead it's the real signal and the expectation we had was well if there are only tats in the system there are no higher order structures this is no scale scaling whatsoever the expectation would be well if tats do not interact with each other or their interaction is comparable to what you expect in a background system the blue signal should rapidly collapse to the green and instead you see that the blue remains statistically above the green at the huge scales comprising hundreds of tats so i told you that a tad is roughly half a mega so 200 tats is an entire chromosome this is showing you that what you noticed by a very same instant i showed the data that there are significant interactions at chromosomal scales and what we found is that this is not only in our mouse model not only in mouse embryonic stem cells not only in mouse precursors neuro precursors not only mouse neurons it is in human cells all the higher mammals where high-speed data are available show that hierarchical organization of chromosomes and that's tested and confirmed by fish experiments so we run experiments by fish microscopy so different technology to show that indeed bigger portion of chromosomes come together at very big distances genomic distances and so this has changed the picture we have of chromosomes which at least the way we think of it is that we have no longer single independent tats but whatever those tats are and i don't like definition tats as you understand they tend to coalesce and form higher order structures exactly as in proteins and this is and there is a hierarchical organization of domains within domains across scales from very tiny scales to very big scales order of magnitude bigger comprising entire chromosomes well i really would stop here so see you tomorrow unless you have questions