 are people are metal. Yeah. Excellent. Welcome. Welcome, Gabriele. Thank you for showing yourself. And welcome to everyone who's just joined. You didn't really miss anything. We were just giving some broad introduction, except that there are some small schedule alterations because unfortunately Shaya Mukherjee is ill and could not come. But if you, I guess, the changes to the schedule are on the website of the school, Antonio, right? So check and you'll find out. Without further ado, let me introduce Gabriele Schweikert, who is our first lecturer. She's a PI at the University of Dundee in Scotland, the United Kingdom, and she runs a computational epigenomics group. Gabriele did her PhD, has a degree in physics from Munich, and then went on to do her PhD at the Max Planck Institute. For in those days was called Biological Cybernetics. Now it's called Intelligence Systems in Tübingen under the supervision of Kunaric. And after that, she moved to Edinburgh, where she worked with some excellent people, including Sir Adrian Byrd, the father of epigenomics. And since 2018, she's running her own group in Dundee. She was awarded a number of prestigious awards, including a research fellowship from the Medical Research Academy, what it is called, I cannot remember exactly, and also UK Future Leaders Fellowship, which supports her research in Dundee on computational epigenomics. And with that, Gabriele, the floor is yours, and I will certainly enjoy and hope everyone else enjoy. If there are questions in the chat, I might just speak up and interrupt you, Gabriele, is that okay? That's wonderful, of course. Thank you very much. Thank you very much, Guido, for the very kind introduction and indeed for the invitation. I would indeed have loved to come to Italy in this December. Sadly, it was not possible due to family constraints. So I'm speaking to you from Scotland, and it's beautiful here too. So I thought for today, I will start giving you an introduction to epigenomics, given that we have quite a few lessons together. So I think I'm scheduled for today, tomorrow, and then again on Friday, I thought we have quite a lot of time for some in-depth motivation and introduction to biology, and the biology that I'm interested in, which is epigenomics. And I take it that this is a potentially very interdisciplinary audience, so I thought it might be useful to explain a little bit about that. And usually people know a lot about genomics, but maybe epigenomics is still not quite as well known. I do hope you can see me well or hear me well as well, and I would now share the screen with you. Here we go. Wait a second. I have to say this kind of, this is the first Zoom talk that I'm giving since lockdown, I suppose. We have now moved pretty much back to in-person. So I'm sharing my screen, but I think you're probably not seeing it. No, we don't see that yet. Okay, does that work for you? Do you see the screen now? You should be seeing the title slide. Yes. Okay, perfect. And yeah, I think it would be amazing if this is interactive, so it's obviously for you. If I am, do interrupt me if you can, either in the chat and Guido will interrupt me or raise your hand or just unmute yourself to ask questions if something is unclear. Okay. So I will start with the very basics. This is an interactive, so I've already been introduced by Guido, but from a biological point of view, I suppose I'm also very much described by the genetic context, the genetic material that is stored in my DNA. And as you probably all know, this constitutes a two meter long DNA molecule. I have something like two meter stream here in front of me. So, and it consists of A, C, G and T. So it stores quite a lot of information. So if you think about the numbers of letters in that two meter long DNA stream, it's roughly three billion letters, which is the equivalent to 4,000 copies of Charles Darwin's origin of species. So that's just a lot, that's quite a lot of words for something complex like human individual. So, but that's not the end of the story of course. It's not just the DNA that makes us unique. As you all know as well, we are made up of a lot of a very large number of cells. And in fact, the precise number of cells were raised, of course, and it's also only an estimate, but it's estimated that there are 40 trillion cells in the human body, which is a proper universe, of course. So what is remarkable is that each of these cells, of course, many of these cells can be classified as a certain cell type. So for example, nerve cells or skin cells, muscle cells here and there, you have these beautiful hair cells that are in the inner ear that help your hearing. And all of these cells, of course, have exactly the same DNA string that encode the DNA information. And yet they have a huge variety of different phenotypes and functions, in fact. And again, it is actually still a matter of research how many different cell types there are in the human body. Is it 150? Is it 300? What is the precise number of different cell types? And how is it possible that all these different cells have so different phenotypes, but are all containing the same instruction material? And then there's another little bit of a puzzle, of course, which is when we are comparing different organisms. So for example, the mouse and the human, and we are looking again at the same cell type. For example, prokina cells, they have different DNA, as we know, but the cell types look remarkably similar. And that's probably the reason why we can use model organisms. So for example, mouse or even fruit flies and so on. And we can learn something about humans by studying other organisms, other model organisms. So cell types can be very similar across organisms, but there's a despite not sharing the same DNA. And so I have to get rid of this. Okay, that's better. So phenotype and function clearly depend on the specific genetic programs, the specific books that are actually active in any given cell at any moment. So what you read is really what matters and what determines who you are. And how is that accomplished in the context and the molecular context of the cell? And that's where epigenetic mechanisms come into play. So the cell has a number of tricks in its pockets to really allow cells to very specifically access certain genetic programs at a given time point. So on the right here, you see an electron tomographic image of a cell nucleus. And what you are seeing is in red one chromosome and in green another chromosome. And so on. And clearly the DNA is not nicely sorted as we saw in this library, the picture of the library that we have seen before. It seems to be quite a mess. So the question is how can you actually relatively quickly access the right books, the right genes when you need it? And so one of the tricks that the cell uses is DNA matulation. And so what DNA matulation does is it changes locally the physical property of the DNA. So I've now managed to click myself away. I'm trying to find myself again. Here I am. So what I'm trying to show you is so if you have a long string of DNA and it's all muddled up in your cell nucleus, then how do you find very quickly some particular words or books in that mess? And one thing that you could do is you could use a little bit of sunny tape. And I've done that here. It's a different color, but mainly it changes the property of the DNA. So I can find that very easily, even though the string is kind of compressed in a bowl. And I think this is the way I'm thinking about DNA matulation. So it basically allows you to find certain bits of the DNA very quickly. So DNA matulation is shown here as these little red dots on the left corner. So that's just adding a material group to the DNA molecule. In reality, it would be the inverse to that. The matulation is covering most of your string, but there are little bits which are actually where there's no material group. There's no CPG matulation. And we'll discuss that in a moment a little bit more. There's a second layer of control or, I would say, epigenomic control, which is these histone proteins, which we are seeing in yellow down here. So the DNA is wrapped around histone proteins, forming something called like a beats on a string type of architecture. And again, this is quite controlled. So you have 247 base pairs that are wrapped around one histone protein, forming what we call a nucleosome. And the spacing or the positioning of these nucleosomes is not random, at least in some important, like at the beginning of genes, it's certainly very organized and not random. And you can change the position of these kind of beats, this chromatin remodellers. And that would allow certain other machinery to actually access the information on the DNA. And then in addition to that, you have histone modifications. And that's symbolized here with these yellow circles and the red triangles, where you actually also change bits of these nucleosomes, these histones, and you kind of add like a bookmark, a chemical modification, a very specific chemical modification along the genome, such that you have a good understanding, maybe that some process has already happened along this gene. Maybe the gene has been transcribed in the past, or that you are marking it for transcription. And there are some other levels of control, namely non-coding RNAs, but I'm not going to go very much into detail about those. So I will first tell you a little bit about histone modifications. And then in the second part, I will tell you more about DNA mutilation. And then tomorrow I will discuss more about the machine learning side and how can we make sense of quite a lot of data there. But before I do that, I want to dig deeper again into what epigenetics actually means. And there has been quite a lot of debate about it. It's quite interesting. So when we discuss science, we usually just get the facts and things are presented as facts. But it turns out that the community is actually not so that there's a discussion around facts and that the facts are what we believe in and how we understand certain words and theories and concepts. And sometimes that's actually quite interesting to follow on that. So epigenetics. So this was an important paper recently from New York. Mark Tashnis said there are some core misconceptions. And in that paper he would write and finally to that dreaded word epigenetics. So he clearly doesn't like that word. Histone modifications are often called epigenetic. One can only wonder why. So here I've introduced you a biologic mechanism, histone modifications. And the majority of people would classify them as epigenetic mechanisms. But he says histone modifications should not be considered as epigenetic mechanisms. So let's think about what epigenetics is, how it was defined and what the problem there is. Because that is really quite important to understand in the sense of why are they important. So epigenetics, the word was first coined actually in Scotland in Edinburgh by the Weddingtons. And he defined the epigenetic landscape. And really what he had in mind was so he was thinking about how you arrived from a pluripotent cell. So as you know, we all derived from a single cell in prehistoric times, in my case at least. And then these cells start to differentiate and to differentiate into, for example, neuron cells or skin cells and so on. And Weddington described that in analogy to physics really, similar to a ball which is rolling down, a traveling downhill, a certain landscape. And the landscape in this case would be the epigenetic, is what he calls epigenetics. So it starts out as a pluripotent cell, a pluripotent cell and then there come a number of cell fate decisions and eventually you end up with a large number of different cell types. And the remarkable thing is, of course, that these developmental changes, these developmental processes are driven by no changes in the DNA at all. So there must be some other mechanisms that at the time of Weddington weren't known that drive these developmental processes. And while he didn't know what kind of processes these were, he called them epigenetic changes. So very importantly, some experiments came about where we showed that what he suggested that the differentiation process is unreversible, so that the ball can only traverse downhill is actually not true. We can now actually reverse the process and induce pluripotency. So from our already differentiated cell type, there has been experiments by Takahashi Yamanaka that showed that we can reprogram these differentiated cells into a pluripotent state. And this is something that is quite routinely done in the lab today. And the amazing thing is that you only need to add four transcription factors to achieve this reprogramming. So the first expression of four specific DNA binding regulatory proteins can reprogram, for example, in this case, fibroblasts to stem cells. So the suggestion is that now that you know that you have transcription factors which recognize specific motives on the DNA which bind to the DNA and then control gene expression. So if that is the case, what role do actually the epigenomic epigenetic mechanisms play that I have been just transcribed like DNA, modulation, and chromatinory modulus, and histone modifications. So here the question is, is it really the transcription factors that cause this differentiation to happen and are the epigenetic mechanisms that I was describing not causal in that scenario. So this is kind of the definition of epigenetics that Vadik gave. So epigenetics is the processes by which the genotype brings phenotype into being. And it was similarly formulated as the systems that regulate the expression of the library of specificities. That is the genetic material which is meant to be the DNA RNA sequence. So here clearly what is important about this definition is that it has to do with regulation and control. So and when we turn that into something more machine learning, more related to machine learning, the discussion here is really about the direction of this error here in the middle where you have epigenetic mechanisms that cause the phenotype or expression. And in this case, its transcription factors are doing the heavy lifting. It seems that transcription factors are the very important parts here. So in that sense, are epigenetic mechanisms really epigenetic in the Vadik sense? Do they cause gene expression? So of course, we also know that when we have correlation between between X and Y in this case, histone modification and expression, it doesn't have to be that X causes Y. It could also be that there's a hidden confounder. For example, the transcription factor binding sites, which cause both the histone modification and the expression. And we can of course say that transcription factor, so that's a paper by Guido actually, where he showed that transcription factor binding predicts histone modifications in human cell lines. So it's, this would be a model that would suggest that would confirm this kind of model. And then you have another definition of epigenetics, which turns the error around. And it says that epigenetics is important because it really provides the cell with a memory. And that's quite important because you have to remember, you have to maintain your cell's state and remember it. So in what we'll discuss in a moment as well is that if you're thinking about, for example, tumor genesis, then in this cases, the cell will lose some of its acquired identities and start doing things that it shouldn't do. For example, it starts dividing again, it might start migrating in the body, all things that it shouldn't do in its fully differentiated state. And therefore, epigenetic mechanisms might be a very important way to provide the cell with a memory. So here's a definition that says epigenetics, the study of mitotically or mitotically heritable changes in gene functions that cannot be explained by changes in DNA sequence. So it's about heritability, but mainly for the most, for the main stream of epigenomics and epigenetics, its inheritance between generations of cells rather than individuals. So, and another definition is properties of a cell mediated by genomic regulators that confer on the cell the ability to remember a past event. And that's really important. So I think for me, the most important, the most plausible way of understanding epigenetics is that it provides a barrier. And so when you're starting from our blue repotenced cell with a single genome, and you then move into different precursor cells and down the path of different, towards different differentiated cells, you need barriers so that that they can't move between different cell states. And in the case of a disease state like cancer, these barriers would be removed and the cells would change. And then Adrienne Bird, he came, he kind of put his, he said, all of these mechanisms are probably true. So epigenetics is the study of structural adaptations of chromosomal regions, so as to register, signal or perpetuate altered activity states. So in this case, and I think this is the most important thing to understand is that it's a very difficult model of causal relationships. So it is both epigenetic mechanisms can cause gene expression. We also have transcription factors which may cause both the, the modification. I apologize. So apologies, that's me back. That was our lockdown. It's really bringing me back to lockdown this lecture. I'm sorry. This is our lockdown puppy, which is now a little bit of a horse. And yeah, he just wanted to say hello. So where have I been? Okay. So why is the study of epigenomic mechanisms so difficult? It is because there's a complex causal structure involved where you have both regulation and control on the one hand and on the other hand, you have memory function. And to disentangle these two is really difficult. So why are we interested in epigenomics in the first place? So it's of course, because we see that a lot of these patterns are changed and go wrong in disease, both for developmental diseases, but also during, for example, tumor genesis. And then it is not simple. It's not only important to understand correlations, but we want to understand causal relationships and mechanisms that make cause a certain phenotype. And in this case, it's really hard because we have the, we have this kind of very difficult causal relationships. Okay. With that, I'm going back to the, kind of the mechanics of epigenetics. So this was kind of the philosophical overview of what is epigenetics. With that, I'm going back to the mechanics. And I will start to talk about histone modifications. So histone modifications are shown here as these yellow dots or the red triangles. And I've already said they are really important because they seem to be changed in a number of different diseases from developmental disorders to cancer. So we want to understand them a little bit better. We already know that there are specific enzymes in our cells. We call them epigenetic writers. So for example, for one of these histone modifications, we call it H3K4 tree mutilation, for example. They are six different writers. It's a different, it's a family of proteins which establish these marks, both in a cell type specific way, but also in a temporarily changing way. And then we also have erasers. We have proteins that can remove these marks. And we have also proteins that can specifically recognize these marks and then either recruit other transcription factors or expression machinery and so on. So what is emerging here or what has been emerging here for a long time is that there's actually an additional code there. So you have writers and readers, you have erasers. So it's possible that there's an epi, a histone modification code on top of the DNA molecule. And that's, I think, quite exciting to think that potentially they could decipher this additional code. But we are not there yet, I would say. So what is also very important is that some of these marks are highly correlated with gene expression, so with the reading of a given book or the reading of a given gene. So for example, H3K4 tree mutilation is found. So I have to explain this, the slice a little bit. This is very important because this is what we are going to work on. So what you're seeing here is a bit of the genome, in this case of yeast. So on the x-axis in that case, that shows you a little bit of the genome on the DNA string. And the green boxes here are genes. They are very dense in this case. And the arrow tells you in which direction they are being transcribed. So these are transcribed in this direction. And this one is, this is the only one transcribed in the forward direction. And what you can also see is that this gene is marked in green here. We can measure that it's highly expressed, so it's being read, while this gene is not expressed and it's not expressed, it's repressed, it's silenced. So and when we look at different histone modifications, so for example, H3K4 tree mutilation, we find that it's highly correlated with the activation of genes like this one. And in particular, usually we find that the, so I think it's actually this, this mark is H3K4 tree mutilation, we find that at the beginning of the gene at the promoters. So we can measure, we can measure the enrichment of this mark along the genome. That's what this symbolizes. And we are doing that across an ensemble of cells. And then, and I'll discuss the experimental measurement tomorrow, I think. So, but this just means that there's a high likelihood at this location on the genome for H3K4 tree mutilation. And here, for example, there's no H3K4 tree mutilation and here's also not H3K4 tree mutilation. So what you can see is that where the gene is expressed, there's a high amount of H3K4 tree mutilation and where it's silenced, there's none. And then, on the other hand, you have marks, for example, H3K27 tree mutilation which require a different reader, different writers and so on. And they are more correlated with silencing. Again, so this suggests that they have a functional role in controlling gene expression. So, and both of this, so the fact that you have writers and readers, and that there's a correlation with active gene transcription, such as that there's a historic code, an instruction for gene expression. However, you could also think about a code when you go again to the library, you will also remember who has actually read the book in the past. So it's a system of symbols that represents a message and that can record information. And so it could be that when the gene is transcribed, you are actually setting a mark and you are setting a mark, an epigenetic mark to remember that this gene has been transcribed in the past and that it should be remaining transcribed, for example. So again, you have the instruction for gene expression and a record of previous transcription. And now if you are seeing changes, for example, in a disease state, you would want to know, is that actually a causal change in the epigenome that causes changes in gene expression? Or is that rather a byproduct, a consequence of altered expression change? What we do know is that these epigenomic patterns change dynamically, so both during development and development and differentiation. And we have also increasing single cell data, for example, we have an increasing understanding that there's also at least some causal relationship between these histone modifications and the phenotype. So what you are seeing here is that you have genes when you look at single cell data and you order the cells according to the differentiation states, according to similarity, you find genes that are repressed where you have the H3K27 modulation mark. And then they go through a poised state where you have both of these marks. And then in the last state and the active state, the H3K27 modulation has been removed. You have the transcription factor binding site and the genus transcribed. So very important is also that these, in this case, when we are looking at the epigenetic writers, so we can look at these mechanisms in a number of different ways, we can, one of the ways we can look at it is we can record the marks themselves across different conditions or across different time points and cell lines like here. So these would be the actual marks. But we could also look at these enzymes. So for example, the readers and the writers. And when we are looking at H3K3 modulation again, I already mentioned that there are six proteins in this family that establish these marks. And all of these family members have been identified in genome-wide association studies, for example, to carry mutations in very severe neurodevelopmental syndromes. So that shows you that these writers are really important. And it's not just the mark that has changed as a consequence of a disease. Also, when we are looking at tumor genesis, we also observe that there's abnormal changes in the epigenomic patterns again. So we can compare these epigeno-histone modifications in two more samples versus normal cells and we see differences. But also, again, we see that the readers and writers are very much implicated in a number of very, very different cancer types. So this was a very comprehensive study that was trying to identify cancer driver genes and mutations across a large number of different diseases, cancer diseases. And so it was pancancer data, 299 cancer driver genes, in this case, were identified. And a lot of these driver genes and mutations were shared across anatomical origins and cell types. And quite a lot of those harbour actionable on oncotinic events. So if we are looking at the types of genes that are identified here, there is a question in the chat. What does poised and repressed mean in this case? I think the concept of poised is particularly non-trivial. Yes, that's a very good question. So that was in this case. Yeah, so the gene is ready to be transcribed, but it's not yet transcribed. So it has the capacity of being transcribed. It's not fully silenced. I'm not sure there's a precise definition of poised there. But it's a state where the gene is not yet transcribed, but it has acquired quite a lot of properties such that it can be transcribed in the future. Okay, thank you. Other questions? All clear so far. Okay, so here I wanted to show what I find quite fascinating is that these processes are both important during development, so early in our life. And this is demonstrated here with all these different early problems, but they are also really important later in life during tumor genesis and aging. So they are kind of the same tumor genesis. And so in terms of the epigenomics, there's a lot of things going in one direction or another in these two processes. So the second part here I'm talking about more like cancer. And what is really interesting, if you are looking at these different cancer types, so you have them on the bottom, these are all different sorts of cancer. And then you have genes which are identified to be in these genome-wide association studies to be important for the onset of these cancers. And you find that quite a lot of them like epigenetics, DNR modifiers are implicated in quite a few. Chromatine, the SWE SNF complex is very important in a large number of cancer types. The histone modifications, readers and writers are important and other chromatin histone modifiers, chromatine others. So the context, the context, so the way which books are read in the library is really important for cells to function. And the important thing to remember is also that epigenomic patterns are, to some extent, reversible because we have these readers and writers. And so they offer potential really interesting drug targets. So again, this is the same family of teens. And you can see that they are not redundant because they all have, they are implied in different, there's some overlap there, but they are also implied in a number of different cancer types. So it's the same family of teens that I've shown you before for the neurodevelopmental diseases, also important for cancer development. So what are the challenges here? Why do we talk about epigenomics in something that's more to do with machine learning and data analysis maybe? Well, what you can see here is that the data is high dimensional. And it's also quite tricky. It's not independent. So you can see, so this is a larger bit of what I've shown you before. Again, you have the genes on the top, you have the different histone modifications on the y-axis. And you can see that there's quite complex dependency structures between these marks. So for example, here, it's pretty clear that these guys are not written independently. They show very similar patterns. And that is because a lot of the complexes actually have domains which, which, which are readers and writers. So it's possible that you have a domain that record in your protein that recognizes a certain histone modification and then writes another one. So they are not independent, but also these dependency structure arise over time. So this group, for example, I am not sure if I'm finding a good example, but, but you can believe me that the histone modifications, which are similar in one area of the genome may not be different, may look quite different in another area of the genome. So there's a non-trivial independence dependency structure. The data is, we have large amounts of data now, but the data is actually quite sparse because you can imagine that we don't have, we have not only the individual component, we have the different cell types, and we also have potential potentially a temporal component. So for example, during cell division, these patterns change or during development. So it's the age is important and so on. So the data that we are actually having is quite sparse. And again, it's very difficult to say if we see changes in one part, do they actually cause a change in the expression of the genes and ultimately in the phenotype of that given cell? We have another question in the chat from Tasmin. Can the epigenetic writers act as erasers and vice versa or is the function exclusively associated? That's a good question. So the readers and writers usually, they are bigger complexes. So they are made up and the erasers too, they are made up of a number of different proteins. So I do think that the erasers also have, so they need to be, in the first place, they need to be recruited to the histone modification pattern at a given time. So they need to recognize this pattern at a given time. So they contain readers as well. So basically there are shared elements between the complexes, but they are distinct. Yes. So some of the complexes have something like six to eight different proteins and the proteins have different domains and the domains can have reader and writer domains and some of them can also interact with the transcription machinery and so on. So it's really complex and I think this combinatorial pattern, combinatorial action is what makes the data analysis so tricky, I think. Thank you. Okay. So this is, I guess, where machine learning can come into play. So I was told I should start with the very, very basic. So I'm having a couple of slides on machine learning. So just to give you an idea how this could help, it's sometimes I'm sharing these slides also to biologists. So I hope this is okay. So when is machine learning useful? Well, I think it's useful when you have a lot of data available. I said data is sparse. That's true. But at the same time, we do have quite a lot of data, at least more data than can be analyzed via visual inspection usually. And there's also the assumption that there's a relationship between your different data sets. So epigenetic mark and genome expression, for example. So in a sense, the idea is that the data is predictable. And the problem is, however, that we do not really know how, what the laws that govern the relationship between X and Y. So we know that there is a relationship. Otherwise, I think it doesn't make too much sense, but we don't understand the laws that govern this relationship. And then you start with our, so most of machine learning in general, and that or what do you call probably even AI is has to do with supervised learning. So you start with a training set where you have, for example, X and Y measured for the same instances for engines, right? And then you try to find a function F such that F of X is Y. And once you have this function, then you might get new data for X, and you can apply the function F of X and get new predictions Y. And of course, there's a lot of different things in the middle here. This is in this case, I put a simple perceptron model, but it could also be much more complex deep learning model, or it could be a support vector machine, or it could be something completely different. So but in many cases, this is to some extent, a bit of a black box. So in most standard scenarios, this is based on correlation, not causation. And I think we have to, if you find the correlation, it is very, very easy to think about causation. But as I was trying to make clear throughout this whole talk, it's this is much more difficult. So and that's, that's where I want to stop here with the histone modification of the machine learning. So I think we'll go into more detail. One one example that I want to show is where we have used a transformer model in this case, to make predicts prediction of histone modifications themselves. So as I said, it's a sparse data set, we can't measure all these different histone modifications across a large number of cell types. So we have then developed our model to make predictions given that we have seen some of the histone modifications, can we make a prediction on others? And we'll talk about that in more detail tomorrow. So here I'm going to continue with the biological motivations, and I'm going to talk a little bit more about DNA manipulation. How much time do I have left? I think 20 minutes or something like that. Oh, no, you got much more, you have 45 minutes, but there is a question in the chat. And I'll read it to you. It says, I know ML is a suitable method for your problem, but I want to know about any modeling, the mechanism for epigenetics. Is there any model for that? So are there also mechanistic models that that address epigenetics? I imagine, you know, you could decide whether globally or locally, to some extent. Yes, I think that's really important. I think it's going back and forth. I think I don't think there's one model that answers that question. I think we are not, we haven't properly understood epigenomic mechanisms and how they contribute to the transitions, for example. But there are efforts to make these models more explainable, and also I think it has always to come. So I guess that the problem is that it's also, we are lacking a lot of, we are lacking a gold standard to some extent. So I think it's always a cycle between making predictions and then going back to the lab, validating, making experiments and so on. And that in many cases. Yeah, I think my sense is that if we are trying to to make global assumptions, or for example, and that has been made for a long time, like H3K4 trimatulation is associated with active promoters, so it might be related to switching on genes. I think that is too simplistic. I think we have to understand more different types of promoters in different contexts of these marks. So because we have seen experiments, for example, where some of the marks are almost completely removed, because we have knocked out some of the writers, but the transcription readout remained almost the same or the same for most genes, except for a few. But this is also depending on the cell type. So a lot of these experiments have been done in ES cells. So is that the same in fully differentiated cells? How about if we want to use or how about if we want to use these knockout models in ES cells and then try to differentiate these pluripotent cells? Are they still able to go through the same differentiation pathways? So I think in a lot of cases, there are so many different combinations that need to be tested and it seems like there's no global answer to that question. So I think we have to go back to some extent to more specific cases. And I don't think there are models out there that are satisfactory explaining the epigenomic code and what it does to transcription. But there are some efforts and we can discuss some of that tomorrow, I think. Okay, thank you. Okay. Okay. So the other thing is DNA matulation. So that's a second part of the epigenomic mechanisms that I want to discuss briefly. So DNA matulation usually happens in the context of, so it means that there's a material group added to the cytosine. So this one letter C in your DNA and it usually happens in the context of CGs. So these are just, you know, letter, like two words, two letter words, CGs. And in the context of these CGs, the C can get a material group added to it. And that is actually quite interesting. So what is quite interesting actually is that MCPG is very widespread in the mammalian genome. So about 80% of CPGs are matulated. So I've shown you before this long string and I've said this you can add a little bit of a tape to it. But in reality, it's the inverse majority of the string would be kind of taped. And then there's little bits that are free from matulation. Right. And I mean, you have to, I guess, if you don't know too much about genomics, you might also be interested to know that we have about 20,000 genes on our DNA. And they make up only something like I think 3% of these large, this very long DNA molecule. So there's a long, there are long stretches of intergenetic material. And some of that is coding. And some of that is actually not coding. But the majority, so it's not coding, but there could be other elements which are important functional elements. But for some of them, we still don't know what it encodes. And there's lots of repetitive elements and retroviruses which are entered into the genome and so on, which is mainly silenced in the genome. And so the majority as well is covered with, so has contains matulated CPGs. And interestingly, that creates a certain structure, a certain landscape on the DNA. So you have again here, the X axis would be your DNA. And the way we are symbolizing CPGs and matulated CPGs is by these kind of little balloons. So if it's black, it means that it's matulated. And if it's an open circle, a white circle, it means that it's unmatulated. And it turns out that in the bulk genome, in the majority of the genome, you find less CPGs than you would expect by chance. And also that these CPGs are predominantly matulated. And then you have, on the other hand, CPG, so-called CPG islands, and they have a high density of CPGs. These CPGs tend to be unmatulated. And they also tend to overlap with promoters and sometimes with enhancers and other functional elements. So you could very well imagine that this is a mechanism of really finding these important bits of the start of the genes very easily, this out having to read the whole DNA just by changing the property of the DNA. Okay. What is also interesting is that when you look at these DNA matulation patterns, they are highly conserved, but they are very cell type specific. So in this case, this is human. It's part of a genome browser. So it's a bit difficult to read if you haven't seen, if you're not used to that very much. So what you're seeing at the bottom is a number of genes, these kind of blue ones here. You can see that as opposed to the data that I showed you before, which was taken from yeast, this is human data, and the genes are much more spaced out. So there's a huge area here where there are no genes. This is actually a dense, it's still a dense area of genes, but you have an area here without many genes. And then the yellow lines here are DNA matulation, and DNA matulation in this case can vary between zero and one. Basically, it means that all the samples that you are getting from a sample of lots of cells are either fully matulated. So all of them have a material group at a given C that would be a value of one, or they are completely unmatulated. That's a value of zero. So you have little CPT islands, for example, here, there's something. Do you see my mouse, my cursor? Yes, yes, we see it. Okay, so here's the CPT island, for example, the blue one here. And then you have a structure here, and that's very well conserved across all these different samples. It's a fine structure as well. But you have samples here from the brain, and it has a very different patterns. And then you have blood up here, and it's again very different. So these matulation patterns, the yellow bits are the measured ones, and then the blue and green ones are actually inferred areas of hypermatulation are strongly cell type specific, but they are also conserved across different samples of the same tissue, for example. So, and that's where we think that they might be highly functionally important. And we do know again with DNA matulation, we have a better understanding how important DNA matulation really is. So what is also really important about these epigenetic mechanisms is that you can actually inherit them from one cell generation to the next. And this DNA matulation we actually know and understand quite well how that happens, because you probably know that we have the double stranded DNA sequence. And so we have a forward strand, and we have a reverse strand. And wherever you have a G on the forward strand, you would have to have a C on the reverse strand. And where you have a C on the forward strand, you have a G on the reverse strand. And then you read in this direction on the forward strand, and in the opposite from right to left on the reverse strand. And that means if you have a CG on the forward strand, you also have a CG on the reverse strand. And then if you have fully mutilated CPGs, they would carry them the material group on both of their strands. And you could also have hemimutilated CPGs that also only carry the material group on one strand or un-mutilated CPGs. And this provides some mechanisms for heritability. And there are enzymes, no, I'll start with that. So there are enzymes that specifically recognize these hemimutilated CPGs that have only one material group. It's called a DNMT1. And they are creating these fully mutilated CPGs. And then there are enzymes, again, which are erasers, and they are called TET proteins, which remove the material group. And you have also de novo material transfer races, the DNMT3A and B. So at the same time, so when you go through differentiation, you want to have the possibility to add new patterns. Therefore, you need the de novo material transfer races. But you will also want to go to remember that you are, for example, a progenitor cell on the pathway to a fully differentiated neuron cell. And then in the next while the cells are still proliferating, you have to bring that pattern over to the next generation. And that's done with the DNMT1. And how does that happen? So when you have the replication fork, so when the cell divides, your double strand DNA is going into, you know, it's going to be copied. You have the replication fork here. And this part is going to go into one daughter cell, and this one is going to go to the other daughter cell. And when you newly synthesize your DNA, there won't be material group. So you have DNMT1 to recognize that there's a hemimethylated CPG. And then it's copying basically the pattern from the maternal cell to the newly synthesized strand. And that means that the cell, the modulation can really be stably inherited across a number of different generations of cell divisions and cell cycles. And therefore you have kind of, so the most stable way of saving information is, of course, the DNA. It's stored, the information there is stored across generations of individuals. But then DNA modulation provides a mechanism that can store information over almost a lifetime. So that's why we also believe that, for example, events that happen early in your childhood might be stored into adulthood with these kinds of mechanisms. And they are also very important, for example, for things like learning and memory generations and so on. Also very interesting, but that's a completely different story. So the DNA modulation is completely erased basically at the beginning of life when you start over with a new individual in the case. Okay. That's a much more complex version of the same thing. We got more questions on the chat. We have a very active chat compared to the actual room. So guys, I'm expecting lots of questions from you at the end. So, Tasnim again, how can you say the different pattern is for hypermethylation based on what criteria? So I guess the question is, how do you say it's hypermethylated in a particular tissue? What do you compare to? Ah, okay. I think, yeah, that's, I mean, it's always a relative question, I suppose. So I think hyper or hypermethylation. So I guess, in a sense, you could say that the bulk of the genome is assumed to be fully methylated. So if you don't have any genes there, so for example, here, you would expect that the majority of the genome is fully methylated and you can see that with the, you know, solid yellow, yellow colors here. So the yellow is here is actually individual bars at every single CPG, but it's so dense that it's almost like a solid yellow bar. So it's fully hypermethylated. And then you have areas where you have predominantly at the starts of genes, but you have these CPG islands where you have a high density of CPGs and where you expect hypermethylation. In this case, hyper relative to the bulk of the genome. So this means that they are, so for example, the blue regions here are called hypermethylated regions. So they mean that they are unmetulated basically as compared to the rest of the genome. But you could of course also think about looking over it across cell types. And then if you, this bit in this case would be in these cells hypermethylated as compared to the brain data or you could say the brain data is hypermethylated as compared to this data. So it is, so what is also, if you're talking about cancer, for example, what is very interesting is that you can't, these separation of CPG islands and the bulk genome is to some extent lost. So you gain methylation in the CPG islands and you lose methylation in the bulk genome. And therefore this very strong contrast is kind of lost. And I think that's one of the problems why that caused quite significant problems. Excellent, yes. And there is another question still in the chat. Are there any non-genetic factors that could control or regulate DNA methylation? So non-genetic, I imagine we're thinking about environmental factors. Yes, I think that's really important. I think that the MPG norm is really able to integrate stimuli from the external stimuli and the DNA, the genetic basis. And that's I think really important too. But how this happens is in most cases, I think there are some events, for example, with pain. So we know that if you have, for example, in early childhood, you are exposed to severe pain repeatedly, then you observe at a number of different genes, you observe methylation changes, which are linked, but this is a lot of this is still kind of hypothetically. So these methylation changes are a response to a systemic experience of pain. And they can be remembered by these cells and then cause and then lead to a different way of chronic pain in later life. Yeah, so one interesting example, people that go on polar exploration vessels, ships, they have to train to dive into cold water progressively colder because if a person that is not used to falls into the arctic waters will die instantly. While if you train, you can actually train your body to not respond so severely to the cold shock. And it's presumably an epigenetic mechanism, although I don't think anyone has sacrificed one of these humans to see exactly which patterns of DNA methylation were changed. But certainly also in sports via blood samples, people detect changes in DNA methylation after training. So people, if you start running, you do a blood epigenetic test now, in a month's time, if you run every day, you will see some changes. But how these then link to actual gene expression, I guess nobody knows, but we have a question in the room. So the methylation for the patient is like, narrated to the next cell when there is a symmetry. So if both the strands are methylated, but it's just one strand, so this methylation is broken, is it always merited or so? So did you hear the question, Gabriele? No. The question is whether hemi methylation is inherited. So when there is fully methylation, you've explained that hemimethylation then becomes fixed to full methylation by the NMT1 or 3. But what about hemimethylation? Is it inheritable or it just doesn't exist? Inheritable, that's a good question, and I should know the answer to that. So I mean, we observe, so the data that I've been looking at, so I think that it's mainly a trans-scient state. I think it's difficult to die experiments, and I know that there are experiments and papers out there that have studied that. I can give you the answer tomorrow to that one. So I think what is difficult is that most of the DNA methylation data is still, I would say, not single cell data. So we are using instead averages across a large number of cells. So you would have to look in the same cell whether, so if it's the forward and the reverse trend is methylated in the same cell and in the same bay, or if it's just an average across an effect across average of many cells. So I know Jörn Walter was doing these herping experiments. Yeah, exactly. So I don't know what they came up with with that experiment. I would have to read up on these papers. But it's without single cell data, single cell data is even more sparse, I would say, and it has its own challenges in the analysis. It's rather difficult to say. So what I can say is, so when we are looking at, so we have done, for example, an experiment in looking at mouse brains, and so that is a mixture of a large number of different cell types, in fact. And we looked at the methylation patterns, and they are highly, so not just the zero, the unmetulated ones and the fully methylated ones, but the intermediate levels of methylation are also very, very well conserved. So the intermediate levels of methylation is, I mean, it's getting really complex here, because if you have cells which are proliferating, which are dividing, so there will always, so with every cell division, you're losing half of your methylation that needs to be reestablished. And there are different, so replication doesn't happen instantaneous. Instead, you have replication foci across the genome, where their replication starts at slightly different time points. And some of these happen earlier in the cell cycle, cell cycle, and other happen later in the cell cycle. So if you don't have synchronized cells, it's, so you would not only have an average over a large number of cells, but you would also have kind of a time average. And the time average means that if the replication, so how quickly is it reestablished after replication, and that will rise over different loci. So it's a difficult question to be honest, and I can't give you the answer straight away. By and large, it's believed to be a fairly transient state. So it doesn't really happen very often to have any methylation. I think. Yeah, I don't think it's a stable state. I think that's there. But you never know. Yeah. Okay, we've been there. So I just wanted to give you again a little bit of an idea of this methylation cycle. So and to some extent, that that might also shed some more light on. So you have an unmetulated CPG. You have then the de novo transfer races, which are called DNMT3A and 3B. They mainly create hemimethylated CPGs, but potentially also fully methylated CPGs. Then you have DNMT1s, which are copying the methylation pattern from one strand to the other. And you have this passive de-methylation, which happens during replication. So if you are going to replicate, if you replicate your cell replicase, the CPG, the unmetulated CPG will of course stay an unmetulated CPG. It will produce two unmetulated CPGs. The hemimethylated CPG will produce one in the daughter cells that produce unmetulated CPG and another hemimethylated CPG. And your fully mutilated CPG will go into a hemimethylated state. So here you can already see that after the first progression of a cell cycle, you have only half of the number of hemimethylated CPGs, so suggesting also that it's not a stable state. Then you have also active de-methylation, which happens through these different tetanzymes. And again, this active de-methylation can also happen to the hemimethylated state. Yes, so as you see, and these are kind of transfer in states as well. So to some extent, also the hydroxy-methyl-CPG might be functional, in particular in brain. It does seem to be quite functional. But with that, you are creating something that is quite remarkable. So on the one hand, you are allowing these dynamic changes during differentiation. So you can introduce de novo metule transfer races, but you also create an epigenomic memory which helps to preserve cell identity. So it's a very intricate system of information passage, which I find quite fascinating. And so I just wanted to also include some of the work that I did in Edinburgh. So in this case, with Adrian Byrd. And Guido said he's the father of epigenomics. I think they are as it is always many fathers and potentially one or two mothers as well in any bigger scientific development. But he was one of the first, he was the first one who identified a epigenetic reader, namely ME-CPG, metule-CPG binding protein 2, which specifically binds to CPGs and it binds to medulated CPGs in particular. And then a little bit later, it was linked, this protein was then linked to a disease, which is called red syndrome. And it's a very, very severe disease. And we have been studying it. It was a bit of a painful experience, I think, which I shared in that case also with Guido. So in terms of it's just a scientific, scientifically it was a difficult problem at the time and still is to some extent. So this protein is this single cause to a very severe neurodevelopmental disease called red syndrome, which affects little girls. So it's X-linked. It's on the X chromosome. And so these girls are born without many symptoms. But after maybe nine months, 12 months, they start losing some of their abilities that they might have already acquired, for example, speech and purposeful hand movement and so on. So, and it can develop quite severe symptoms. So someone said, imagine the symptoms of autism, cerebral palsy, Parkinson's, epilepsy and anxiety disorders all in one little girl. So it's a very, very complex disorder. But it's one of these disorders, which has indeed a very simple root, namely, mutations in this one protein. So in a number of other diseases, like in fact, Parkinson's disease or autism and so on, you have multiple, they are complex diseases, this many multiple loci being involved. All having a small effect, but together having a big effect, which are difficult to study. But in this case, we really know that this protein, which has the main purpose of recognizing or which has a functional domain to recognize medicated CPG is responsible for this very complex disorder. And so it seems central, really, for how our cells work and neurons work in particular. So as I said, it's X-linked. It's a modulation dependent DNA binding. And it's specifically highly expressed in neurons. And so you have a question in the room. Can we use what? Ah, can we use erasers to solve red syndrome? Well, not yet. I think we haven't solved the red syndrome yet. But I think Gabriele is going to tell us a little bit more about it now. So, I mean, that's a good point. But I think you got it the wrong way. What I'm trying to tell you here is how important DNA modulation is. And if you can't, so in this case, it's the reader who can't recognize it properly. So the problem is not with the DNA modulation itself. So in fact, what we have started to do in a different experiment, we have disturbed the DNA modulation patterns and knocked out DNMT1. So the maintenance material transfer is to understand to go one step beyond MECP2 to see what happens if the material patterns are indeed disturbed. But here erasing the material pattern, CPG patterns would actually make the problem more severe because apparently they are important. You need them and you need to be able to recognize them with MECP2. So what is also interesting is that there in little boys, there's another very severe condition and it's basically the opposite of red syndrome. It's MECP2 over expression syndrome. So if you have too much of MECP2, it's also a problem. So you need to have apparently exactly the right dosage. And in females, it's actually complex because we have two X chromosomes. And what that means is, so we have two copies of MECP2. So okay, I should have said so if you have a deletion or a loss of function mutation in males in MECP2, this is a severe continental encephalopathy and that causes death within two years. So it's really severe because you have only one copy of MECP2. And if you have a loss of function mutation in that copy, it's a problem. In females, because you have two copies of the X chromosome, you might have one MECP2 that works just fine. And the other that doesn't work so well. And then the problem is that you have, so one of the X chromosomes in females is usually switched completely silence. We call that X chromosome inactivation. And that means that we have a mosaic pattern in females where in some cells, it's the genes from the paternal X chromosome that are expressed. And in some neighboring genes, patches of cells, the maternal chromosome is expressed. And this is quite random, whether the paternal or maternal X chromosome is used. And that means that some cells will have the functioning MECP2 and some others will have our disturbed MECP2. And that's very difficult. And you cannot, if you have too much of it, it's also a problem. We have another question in the room, Gabriele. Yes? Say that again. So this is quite a complicated question. Oh yeah, of course. Well, or I can repeat it now, but in future you can come up to the mic. The question is, why is MECP2 particularly relevant in neurodevelopmental disorders? And actually, maybe Gabriele may want to specify whether these are actually neurodevelopmental disorders. And the second one is, the second part is, is DNA methylation particularly important in brain development? I think that DNA methylation, I'm starting with the second one, I guess. It's DNA methylation particularly relevant in brain development. I think it is. And I think that is because it requires a very high amount of plasticity. So in particular, in the time, in the early years postnatally or early time postnatally, you have a lot of external signals that you need to integrate. And you have these cells have to change dynamically in response to external stimuli. And if that's not working, so I think that is why there's this kind of delay of the onset. So if this integration of external stimuli is not functioning, is not working functional, the organism cannot develop normally. I have this, I have the feeling that it is also important, it's potentially also important in other cells, cell types, but they are not so much determined on the external signals. If that makes sense. So, but it could also be a bias that we are studying these, we have been studying DNA methylation a lot in the context of neurodevelopmental diseases, but also in the context of neurodegenerative diseases. And maybe that's a bias of how we study it. So, or study DNA methylation. As I said before, the DNMT3A and B, for example, also popped up in a lot of cancer studies. So in particular, DNMT3A and B, so for example, a lot of blood cancer types are related with problems with the, with the writers of DNA methylation. But my sense is that it's particularly important in the brain. Because in the brain, you have to integrate a lot of external signals and remember those over a long timeline. And that's really, and you have to, to control that. And that's why this is important. That's maybe not as difficult question. What was the first question? Sorry. So I wanted to know if these factors are tissue specific, the readers and writers, especially if MECP2 is specific to the brain? No, they are not tissue. Well, I think they have, they are very strongly expressed in the brain, they are highly expressed in the brain, in neurons, not in all brain cells. So we have about more than 70 different cell types in the brain. And in particular, in certain types of neurons, MECP2 is very highly expressed, but they are expressed in all cells almost. It's, it's quite universal, but very strikingly highly expressed in neurons. Thanks. Okay. So I think one of the question was also if it's a neurodevelopmental disease, and that's also quite interesting, because I would answer that question, it is not. So apparently you have a normal development in the very beginning of your postnatal life, but then the main symptoms started around one years of age. And you have a developmental regression in colluding loss of speech and hand skills. And you have also a lot of other breathing difficulties in those kind of things. What is really stunning is one experiment that indeed happened in Edinburgh in, in Adrian Bird's lab was that the symptoms can actually be reversible in a mouse model. So if you engineer a mouse to, to express MECP2, such that you can switch MECP2 off, and then on again at a certain stage, then you can actually cure the disease. So that means that the neurons are not degenerated, they are not broken in a sense. They still work, but you need MECP2, you need to have this readout of the DNA modulation for them to fully function. And I think also if you kind of engineer the mouse such that you can switch off MECP2 later in life, if I remember correctly, this will also lead to red central. So it's not restricted to a certain time in the development. And it can be, and it's not neurodegeneration that is happening, it's just important for the function of the neurons. And the reason why I'm bringing this example is really to illustrate how important modulation, DNA modulation and the proper readout of this pattern is for how the, how the, how neurons are working, but also other cell types. And the function is still unclear. It has been, so this is, so MECP2, how it, how it does what it does is still very much unclear. So there's clearly the idea that it should be recruited. So we know that it's recruited to see fully matulated, to matulated CPGs like here in the center, but how it, how it, what the function is, is still to some extent under, under negotiation. So the first model was really that MECP2 binds to the matulated CPG and then causes the genes to be switched off. But it was also shown that it could lead to activation of genes. So it's not just repressive, it could also be activating. And if we are looking at knockout of cells, which don't have MECP2 and we compare that to normal cells, we found as many genes being repressed in response to that knockout as being activated, falsely so. So this is still problematic to understand that there's chromatin condensation. There's also, so that the whole chroma, so bigger range effects that the whole chromatin is packaged more densely following the MECP2. It was linked to alternative splicing and to also protein synthesis and another number of other cases. Here's a function that it helps to form heterochromatin condensates. That's a new paper and it's very convincing to me where, for example, the whole organization of the DNA inside the nucleus changes as a consequence of having MECP2 or having pathogenic mutation. So it's not individuals genes, but it's more the bigger organization of the DNA in the cell nucleus that either allows the right programs to be read out or not. So the function is still heavily under, the precise function again is still under investigation. It's also very difficult to analyze the data. But what I wanted to show here is really the importance both of histone modifications and DNA modulation, despite the fact that to a large extent, we don't really know how they work yet and that the data sets are quite difficult to interpret. And I'm going to talk a little bit more about that tomorrow. I think that should be, so that's basically the last for that. And that would be the slides that I wanted to share with you today. And I think there's a little bit time for discussion if there are still questions. Thank you very much for a wonderful introduction to the mysterious world of epigenetics, a little bit less mysterious now for us, but still it's endlessly fascinating that at least I've been interested for many years and it's still not totally clear what it does, but it does something and how it does it. It's a big question. Are there any questions arising either in the room or in the, yeah, we have a question. I wanted to ask, going back to the machine learning, if there are some mechanistic models that are not good enough to use them on their own, but could be integrated in the loss function to help with the training of the machine learning? I think, yeah, I think that's definitely a way forward. I think it's difficult to answer these questions like very, very generically, because there are so many different problems that each sub-problems that each require very different strategies to tackle them. And so what I find usually the hardest bit is to actually identify the question that you actually are trying to answer. Yeah, and I think the most important bit is to understand as much of the biology that is confirmed by experiments as you can and use that when you construct your model. I don't know, would you want to answer that, give an answer to that question as well? I mean, I wouldn't want to abuse my powers as chair, but it depends what you want to predict, right? Okay, your loss function is generally formulated in terms of what is the prediction problem at hand. For some cases, there is no mechanistic model. For some other things, like for example, that cycle of methylation and demethylation, there was this model proposed by Farina Wolf in Zabrukan based on data from Jörn Valder, also in Zabrukan, which was a Markov chain model of mechanistically how the mark was deposited and then removed. Obviously, they contain also a number of parameters to be optimized, but in principle, that could also be done in a gradient-based way. The question obviously is whether in those cases, it tends to be the mechanistic models have been developed either completely separate from data as just conceptual exercises, or they've been calibrated on very precise and small data sets, which often are not so suitable for machine learning. But it's a very interesting question whether you can then take these and embed them into larger topics. I guess, you know, if one wanted to understand how the kinetics of DNA methylation or demethylation are affected by the sequence context, one could do a physics-based machine learning model, as you're suggesting. That would be my question. Yeah, I think it's the design and then also the validation is really tricky that you have to think about how do you validate either your model, your prediction, or you're not going back to predicting, to validating correlations. It's a very interesting question. I think at the end of the day, I think you will have to have a combination of mechanistically motivated models and others. Any more questions from the room or the virtual room? Oh, Alex, come here and speak. Thank you for the talk. My question was about data. Could you give us an idea of what is the state of single-cell data on epigenomics and have there been instances where people use single-cell data and the biological conclusion they draw from the analysis is very different from the ones you get from bulk data? Because for the DNA methylation data, you describe bulk, that really is a very, very different from single measurement, I guess. Yeah, I guess with epigenomic data, I think the most used technique would be attack-sec, where there's a lot of data, single-cell and genetic and expression data done. I think with DNA methylation, when you compare to gene expression, so gene expression in a given cell, you have, in most cases, for the higher described transcripts, you have multiple molecules per gene, for example, because it's not transcribed just once, it's transcribed a number of times and you measure it. But for DNA methylation, you really have, for every CPG, just one molecule, like two molecules forward and reverse turned. And that means that the data is quite noisy, to some extent. And that poses other questions. What I also find interesting is to look at combinations of different, so for example, the histone modifications, so I've been looking at panels of histone modification, like really many different histone modifications at the same time. And I think that because the readers, the writers often contain reader domains and they are not acting independently, I think to understand the epigenomic pattern, it is quite important to record many histone modifications at the same time, because, for example, H3K4 trimatulation in the context of acetylation at other residues might have a different function than if the acetylation was not there. So it's the combinatorial patterns. And with single-cell, it is still very difficult to assay the same nucleosome across many different histone modifications. So what you really would want to know is which of the histone modifications are present in this nucleosome. And that means that you have to do 10 assays for the same cell. And that's really difficult. You can do two and then combine them, but you can't do 10. And also the data is to some extent noisy and has other challenges. So what we are trying to do, we are trying to combine single-cell data and bulk data and to try to learn to separate the signals there. The other thing that I want to say, I think where single-cell really makes a big difference as well, is for understanding the role of the cell cycle. So for example, I haven't talked about it very much, but for histone modifications, we don't really know how the patterns are propagated across cell divisions. I mean, you have double-stranded DNA, so you can easily split that over two daughter cells. You have the CPG context, which is symmetrically, so you can kind of copy the modulation patterns to the next cell, but the histones are actually kind of diluted. So some of the histones, so they are removed from the DNA, and half of them are going to be inherited by one cell and the other half is inherited by the other cell. So if you're looking at a loco-specific way, it is not very clear how these marks are precise. So and if you are re-establishing re-entering histone protein into a given loco, then you have to decorate it again with the right patterns. So how that happens is not quite clear. So how histone modifications are actually inherited across cell divisions is not very clear, and I think looking at single cell data where you can say precisely in what stage the cells are in relation to their cycle, we will learn quite a lot about how histone modifications are being inherited from one cell to the next. We also have another question from the chat. It's a quite interesting question from Béatriz. Now are MSc2 mutated? Is it from exposure to external factors or are they just inherited from combinations? Also, are there instances where there are mutations, but the syndrome was not expressed? Can you repeat the second part? Well, let's do the first part. So I think it's mainly inherited, no, not inherited, it's spontaneous mutation. So the recombination of the third part is there? Yeah, it's germline variations. It's essentially lethal in males, so it cannot propagate essentially. It's not something that has a recessive effect. But the question is, are there instances where there are mutations, but the syndrome was not expressed? Are there mutations, but what is not expressed? But you don't have the syndrome? Yes, absolutely. So we don't have a very good, we still, as far as I know, so I haven't actively worked on MSc2 for a couple of years now, but as far as I know, we don't have a very good structural model of MSc2 yet, at least not the whole protein. So there's a lot of, we have models, structural models of the DNA binding domain and some other bits of the protein, and there are mutations. So if you're comparing mutations in the healthy population, they do carry mutations in the protein, but some of them are not functional. So there are actually quite a few, only two or three domains, if you are unlucky and you have a mutation in these domains and in specific amino acids in particular, then you're really unlucky. But there's always some, most of the time, there's some tolerance in other areas of the proteins. Yes, there is this bridge model of bovedrics, right? But if you compare with the DNA methyl binding domain, but also there is another position which interacts with another protein, and that is also a good specific position. It's a rare disease. I think the prevalence is for one in 12,000 girls or something like that. Yeah, having said that, I guess it's a, I'm not sure how, I mean, I think it takes a lot, it wasn't easy to diagnose for a long time. So I have, by chance on holidays, I have met one girl who was only diagnosed when she was, I think, seven years old. And I know, I also know by chance a boy who has MECP2 actually sadly died now, but who has MECP2 over expression syndrome. But so I think the prevalence is probably not, well, I don't, I mean, it depends in the UK, I think they would be genetically tested and in big parts. But this girl was from Austria and she got her diagnosis and I think seven or eight years. So I think the prevalence is not entirely clear now. And it's also, there are some mutations which cause a mild phenotype. So there's a spectrum of and also the random mosaic structure of the cells. So if you're lucky and you have a lot of cells which are expressing the maybe maternal gene, which is the healthy gene and not the paternal gene, which might be the gene that has the mutation, then your disease would also be milder. So there's a spectrum, despite it having just the original disease is very, it's just one gene. Okay, I see no further, oh no, there is one hand, there is one more hand. I was just about to complain much, but we'll wait for knowledge. Yeah, so MECP2 has both repression and activation functions, right? So it is associated both with genes that are repressed, that are being, so it is associated, so if you're intervening in the system and you're knocking out MECP2 and you're then trying to compare the expression levels of the different genes, you will find genes that go up and you will find genes that go down almost the same number. So it's associated with both repress, with activation and repression, if that is the, it's precise function, I'm not sure. I would say it's potentially, it has more global, it plays a global role in the packaging of the DNA in the cell. And if that's packaging is wrong, then other certain cells are being activated and certain genes are silenced. Yeah, so like what factors sort of decide which of those two functions it can perform? So I think it's an interplay of transcription factor, so I guess, Yeah, so I don't, can you see that? Yeah, so this is somehow the most convincing model at present. So maybe, I mean, I have, I have shown you this at the beginning, right? This is how the DNA looks like in the nucleus and you have areas which are much more densely packed than others. And you have these hetero... Yeah, here. You have these condensates of heterochromatine in there as well. So in the healthy cell, the ME CP2 helps to kind of package all this because it's binding to CP, mutilated CPG. Mutilated CPG is found mainly in the bulk of the genome, so the promoters are often in CPG islands. So the bulk of the genome that is maybe not expressed and that you don't need is kind of collected together in one big heap in the middle here. While if you have ME CP2 mutations, it appears that you don't get these foci. You don't get this kind of condensate of the bulk genome. And I think that will mean all kinds of different things to the rest of the genome that you actually needed to be easily accessed by transcription factors and so on and regulated. So I think it determines on other factors, for example, transcription factors, whether the gene is then really activated or not. But your whole library is just in a complete model. So if I understood you correctly, you said that the red syndrome symptoms can be on a spectrum depending on how many cells in the mosaic are expressing the mutant gene and how many have it inactivated. But from what I knew, X chromosome inactivation is a random process in which case roughly 50 cells should have one chromosome inactivated and the other half should not. Or is that wrong? Or what is the case here? I think it's probably roughly 50-50, but it's not strictly 50-50. So I think you can be, I think the proportions vary to some extent. And I think that has an effect on the disease. Apologies. Also where the mutation is can have more severe effects or less severe effects. So it's an interplay of all of these factors. There's some very interesting work from Angus's lab coming out, or it came out actually, where they looked at that. And one thing that they showed is actually that in some individuals there's actually a huge proportion of X chromosome inactivation escape. So they actually do express higher levels of the second X chromosome. And also some genes can escape X chromosome inactivation. And this in turn has a huge impact of the expression levels of the other genes on the other, on the autosomal chromosomes. So I think again, this is not very well understood. I don't think it's 50-50 in every individual. I think there's quite a big variability in X chromosome inactivation between individuals. Thank you. Welcome. Okay. It's now time for lunch. Thank you so much, Gabriele. You're welcome. And probably we want to make announcements. Yeah, but we can close the session. I guess it's too early for you to have lunch, but we can thank you anyway. Thank you. See you tomorrow. See you tomorrow. Bye-bye.