 Thank you very much for the fantastic invitation. It's a big pleasure to be here. Of course, I would like to be there in person, but it's very good to meet and hear many of the friends and collaborators over the years. So today I would like to talk about my perspective of the effort that has been going on for 20 years after the completion of the human genome. Of course, we know that the completion of the human genome has been given quite a lot of progress for many fields of science, biology, medicine, and the achievement of the human genome has truly produced a remarkable progress of science. However, although we have understood quite a bit of how the genome works, and here I put some of the main facts about the genome and the variability between individuals, actually needs a little bit of more studies, particularly also because it will become obvious that only a minor part of the genome encodes for proteins. Clearly, over the years, just after the completion of the genome project, Various projects started to ask what is needed to characterize to better understand how the genome is regulated. And immediately after the completion of the genome, the encode project started, was announced in 2004, and in parallel here in Japan with the other colleagues, we were starting the phantom project, it was quite different ideas, and actually at the end it turned out to be quite complementary to the encode project. I also mentioned Various other projects that have been very important to complement the human genome. But I'm going to focus essentially on those two projects, because I know them better have been interacting, particularly with the encode, and they've been also deeply involved in the phantom project. In the phantom project, actually, we have been focusing over the years to understand what the genome is producing. We've been focusing on identification of RNAs, capped RNAs, in order to capture all the RNAs that are present in tissue as a complete copy of them, so the full length CDNAs. And as everybody already knows, we identify we and others identify only 20,000, 21,000 protein coding genes. However, what was striking at that time is that there were other transcripts, particularly known coding RNAs, and then were actually the majority of the transcript that are present in the genome. At the beginning, we did not know so much about them, but year after year, many groups, particularly focusing on the biology of each of those RNAs, start to study them, and it became clear that apart of them, at least so far at the moment, no more than four or five percent of them have been further studied, and they regulate the regular chromatin, the regular transcription, translation, they interact with the product. So they do quite different things, although for 95 percent of them, we still do not know anything about. Here are about some numbers out of the phantom, three at the time, was mouse telling that 63 percent of the genome is transcribed, and many are long encoding RNAs, including quite a remarkable number of sense antisense. Actually, we inferred 73 percent of genes show some sort of antisense, and for many of them, we still do not know the type of regulation. So this is one of the first images of how the genes are in the genome. With many exceptions, we found the genes that start RNAs, that start in one gene, and finish in the downstream gene. We often find, for the first time at the time, quite a complex pattern with genes that are spanning into other genes, antisense region, antisense of antisense, so quite complex. And we defined as transcription unit everything that clustered together on the exon. So in this case, with so many different transcripts, we have only three transcription units for this specific example, and one gene forest, and where there is nothing we call a gene desert. But this also suggesting us that there are many different transcripts, and the transcript variability still has to be fully understood. Of course, there were other groups, and this is a very inspiring work from Tom Ginger. It has been a real pleasure to work with him over the years, with him and his group over the years, and this was a very inspiring work, because this shows that compared to the cytoplasmic poly A plus RNA, and actually there are many more nuclear and nuclear poly A minus RNA. So this paper has been inspiring us for a long time with the idea that we should really try to capture all of this difference that usually is not represented in our city and libraries that are commonly used to clone our, to identify our preferred genes. So a lot of transcription, and this approach was complementary to what the encode took, and here is one of the famous slides from the encode one, and the strategy was totally different, was to identify one percent of the human genome, was to use some of the cell lines because they can easily be grown and replicated and distributed, and actually this has been very important to work to develop technologies to identify hypersensitive sites, regions bound by chromosomes, by regions of the chromatin bound by transcription factors, and at the beginning was all based on micro-rehabilitation and suddenly sequencing started to replace all of this. There's been a very, you know, longer technology marathon. At the same time, we also developed technologies to map promoters that actually turn out to be very complementary to many of the other technologies. This is the cage technologies, and to make the story short essentially cage captures those randomly primed CDNAs and the sequence only the initial part of the transcript that is very closer to the initiation site. By doing this, we can essentially sequence the initiation site, and with this we essentially map gene expression, but also each promoter that drives expression in many, in many different tissues. And on the right, there is a simple comparison showing the difference of cage that essentially focus the signal on the initiation site without any sequence and that usually covers the whole body of the genes, but is not often very efficient to capture the initiation site. And just to show you a brief example of alternative promoters, just to show that all the cage tags here, this is a phantom tree many years ago, still shows that genes may have up to nine or 10 different initiation sites, many of them are tissue and tissue specific. And so, and actually this is quite important to understand gene regulation because the set of transcription factor and the epigenome differs in each of the set type. So, we did not plan to, we did not know that the next generation sequence will develop so vigorously and strongly, but clearly the development of the various technologies has been really very welcome, not only to sequence genomes, but definitely to understand what the genome is doing by using, you know, secret chip, secret and secret cage and many more sick technologies that have really been enabled functional genomics. And of course, this curve shows the decrease of the cost, we would really welcome the further decrease of the cost that seems to be quite stable over the past four or five years. And based on the sequencing, the next phase of the encoder has been expanding, of course, going genome-wide and including 147 cell types. And actually there were some findings that were similar to what was the early observation of the phantom, and particularly one that has been discussed very extensively that 80% of the genome participate in some of the biochemical events. So, seems to do something, although we don't understand very well. Of course, other parts of the genome are important for understanding genetics and the role of the GWAS. And there is a complex better understood quantitative transcription factor chromatin RNA relationship, both in promoter and splicing, transcription factor co-associated, and chromatin can be classified in multiple classes. This was been quite an important project to introduce new concepts to analyze the genomes. And here I just put some of the summary that, in particular, this 62% here of the genome that produces some sort of RNA has been quite similar to previous observation and actually tend to converge with observation in the phantom project, but also in other important projects. And so, clearly pervasive transcription for RNA is there. RNA is a very important component, although we know still very little. So, again, the phantom and the encode have been evolving quite in parallel without any common planning, but often converging and comparing the data around the end of the project or when the data were prepared. So, in 2010, 2011, we started to think that we should prepare a map of promoters using cage from as broad as possible collection of primary cells and collection of tissue, because we really need to have a collection of human promoters. And this was the phantom 5 where we took 100 primary cells and followed up with the 32 of them with the time course to see how the promoters and the promoter changes during the activation or various time courses. And this is a little bit fairly complex slide, so we cannot go in all the details of all the different cell types. There are more than 150 human primary cells and many more tissues here, but just to give you some impression of all the type of sample from the depocyte immunology and the mesoderm, endoderm, ectoderm and the differentiation of neurons and so on. And this was quite important to identify the diversity of promoter and something else that I'm going to comment later. Particularly, we did for all those samples cage, so with cage we can identify promoter and infer the type of network, the transcription factors that are important, that are enriching the promoter that are present in each cell. For a subset RNA sequencing, smaller RNA sequencing some other assays to identify all those different cells. And to have a first look at how looks this diversity, we've been clustering variety of promoters that have a similar expression features and how do they map compared to the cells of origin. So we can see the whole diversity of promoter activity here. But also what we can see is that those tend to cluster for different cell types, so immunocells and central nervous system and placenta, epithelial, mesenchymal cells. And I think that this was a quite important exercise because we also found that the cancer cell lines, and many of them were used during the ENCODE 1 and ENCODE 2, they occupied quite specific space of the transmitter. But this suggested that to really broadly identify the human biology, we really need to identify and characterize many different cell types, in particular primary cell types, and from now on as much as possible offer tissues out of the human body. Some surprises when you use those powerful technologies, usually you always find something new. One was that the promoter architectures can be different in different tissues. So the same promoter can have a different modality. So for instance in this, in astrocyte, we find only single starting site here. But if you look at immune cells, CD14 or CD4, the starting sites used a migrate more upstream. And this is one of the ENCODE cell line, and everything moves even more upstream. And there is nothing here from a promoter that has this shark and has a data box. So the idea is that promoter have a sub modality. So promoter, within the very same promoter, you have a various elements that are differently active. Usually data box promote a transcription form from a single position in the genome, CPG Allen, from more distributed or broad regions. And those can coexist in the same promoter. By counting here, essential one, two, and three groups, we had more than 223,000 promoters in humor, less amounts, not because the mouse was more simple, but because we just did not sample as much. The other things that we did not expect, we have already seen a little bit with the ENCODE 2 project that we can also characterize enhancer by directional transcription. And actually, because of the selection of the samples, they tend to be quite tissue specific. So the number of the enhancer that the phantom identify is much smaller than the number of the enhancer of the encoded. However, they tend to be, they have some good future because they are overrepresented in one cell tissue or group. And this may help you if you really are looking for enhancers that are very specific for a given cell type. And importantly, they also are important for genetics. And we have been comparing the GWAS that are mapping in Exxon. And those are the values, diseases which are related. So Exxon is a relatively small number compared to the promoters that is mostly out of the Exxon, but even more in the enhancer. So this has definitely been helping to map and give some possible function to some of the GWAS over the years of following this paper. And again, after the phantom project, there is always some ENCODE project and this was a really remarkable achievement. And I really would like to congratulate with all the authors and of those papers and actually the number of the cells and the number of the data set, the number of biosamples, we massively expanded 503 biological tissue types. And I would like to say that probably the biggest achievement is in here is the registry of candidate assist regulatory elements. So the CCRE. And so because they definitely can help us to understand what is happening here in the genome, what is regulated in the genome, and also in different cell types. So the registry is massively large with more than 926,000 candidate assist regulatory element. And I mean less in mouse, again, because the sample was slower. And they cover a remarkable part of the human genome, about 8%. And also they cover 80% of element marked by chromatin marks and a part of the previous GWAS and phantom collection. So essentially, as always, data tend to converge, which is good. And is also telling that those complementary projects are always very, very important. Another paper that I like very much is the landscape of chromatin loops in the human genome, which brings for 24 cell types, everything in 3D, identifying a tighter solution, 125,000 loops, with variation among chromatin loops in different cells, which is also quite, quite important. And also various features about the role of enhancers in genes that are associated with disease and associated with the features of the in the conformation. And a third important finding, and again, I'm not trying to summarize comprehensively all the findings here, but just some of the most important finding is anyway is the RNA binding proteins. This is only in 2 cell lines, but the extent of the experiment is very remarkable, and identifies quite extensively with different technologies what RNA binding proteins actually bind. And this is very important to understand the regulation at a different level from the promoter and the enhancer level. But also, in particular, our group is using this very, very extensively to understand the motives in many of the loan-ocoding RNA that we are studying. And particularly, we'll have some examples later if there is time to discuss them. So here is for the encode and a little bit of what we are working on, the loan-ocoding RNAs, because they really need some effort. We have been recounting the loan-ocoding RNA in 2017, with Chung. At the time when we publish, we found some diversions with the gene code that we have been discussing after this. I did not update my slide. Apologies for this. And actually, those phantom cut that for which we found evidence from a cage promoter and additional chromatin marks are quite remarkable. This is a robust dataset. If you want to look at the permissive, so the aggressive way to count them, there is even more. Importantly, also, the loan-ocoding RNAs are mapping, have interesting mapping data compared to the disease. So if you look at the expression of the non-coding RNAs versus the traits, which is traced that have some map. So we find that very often the loan-ocoding RNAs that are mapping on a trait that is important for some brain disorder, C. Alistair, are actually expressed in the region and in the cells that are important for that disease. So there is quite a strong correspondence. This is a zoom-in of a map for all the tissues that we don't have the time to discuss today. But actually, we found this also to be the case for all the other organs. So diseases and the expression of coding RNA tend to correlate. I will skip this and go into a little bit of our summary slide on the possible function of the loan-ocoding RNA inferred by genome conservation, particularly transcriptionalization region conservation, is this number, conservation of axon, loan-ocoding RNA implicated in GWAS traits, and the loan-ocoding RNA implicated in EQTL. So it's about 3,000, 2,000, and 13,000 each. So in total, we have some suggestions that 19,000 of loan-ocoding RNA may have some potential function if you also assume that conservation is associated with function. With this in mind, we have been over the years running the Phantom 6, with the idea to knock down loan-ocoding RNAs. See here on the left, we knock down and we see that there is something happening after knocking down, let's say transcription starts and the whole network of all the cells will work. So actually, we do this systematically. We analyze the loan-ocoding RNA to knock down. We do perturbation with a gap mesh, so anti-sense, that acts also on the non-coding that are in the nucleus. We extract the RNA, prepare a library, the sequence, and then we do the bioinformatics analysis and then we jump here on the right side where we see the pattern. So this is by knocking down about 250 loan-ocoding RNAs and looking at which pathways, which gut term have been changing. At the same time, also looking at the cell morphology proliferation with some robot, with incosite measuring cell growth. We can see, interesting, is that there is not just stress response after transcription, but there are various aspects like splicing, translation, cell cycle, metabolism, so they seem to do quite a bit of things. Caveat, the response is not always strong. You need to replicate the study very carefully and very often the anti-sense may differ also because we don't know all the structure of the non-coding RNA by using two different anti-sense against what we think is non-coding genes. We may hit two different splicing isophobe. Anyway, the paper was just published in research and the data is all available. Please feel free to download the data. There are several thousand caged libraries here as well, so you can download and check and draw your conclusion. We have a few more studies in the pipeline with IPS cells on a similar work that have not yet been published. This is a little bit of the interrogation of the function. This is not only genomics, but it is also experimented after scratching cells. How long does it take to regenerate with a different anti-sense? It seems that for this gene, one anti-sense of this zinc finger, actually one anti-sense, I need it quite efficiently, the migration of cells. We've been looking at cell migration and cell proliferation and correlating cell growth and proliferation with the GoTerms and Cag pathways that are coming out from the genomics analysis. Clearly, we see that when we see less growth, the non-coding RNA causes changes in genes related to apoptosis or apoptosis immune system. Otherwise, genes, if there is an increase of growth or proliferation, there is about cell cycle and DNA replication. So this seems to make sense. We've also been working quite a bit to characterize the non-coding RNA that are in the chroma team. Particularly, we know that there is a considerable amount of non-coding RNA attached to the nucleus in the chroma team. Those are some of the numbers. And with the former postdoc of my lab, Alessandro and other members have been developing a method to cross-link RNAs bound to the chroma team, bound to the genomic DNA. And after various steps, the key point is to add a link that we like it from one side specifically to the RNA. At the other side specifically to DNA, this is all in the cross-linked nuclei. So molecules don't move too much. So we think we capture quite a lot of the situation as it is cross-linked. After the cross-linking and the transforming the RNA into cDNA and putting our linkers, we sequence in the Illumina and we do one or two layers of this library and we study this whole interactive. And that truly is a pattern that somehow has a correlation with the high C pattern. But of course, one side is RNA. The correlation is between 0.5 to 0.6. But also we see quite a different things as well. We found quite a lot of interaction within the same chromosomes between the RNA on the Y-axis and DNA on the X-axis. And this interaction is usually caused by introns of protein-coding genes. So this is quite a lot of surprising. So introns are often not degraded and they interact with along the same chromosome. The other are those distributed spots often in green. And then often are long non-coding RNAs or other non-coding RNAs that show some values of interaction. Those are the number of events, huge number of interactions, and including also non-coding RNAs. And this is one example, NIT1, that is not so active in mouse and drone extents cells. So int interacts here with its own locus on the chromosome 19 and a little bit outside. But if you look on oligodendrocyte progenitor cells, this is all mouse. It interacts not only with its own locus, but with different regions of the same chromosome and also introns with different chromosomes. So we have hundreds and hundreds of those graphs. If you look at the paper also just recently out in natural communication, we can find all the data there as well. There is some specific direction those interaction RNA chromat interaction are enriched at the tied boundaries. And also there is an effect on the distribution. So the interaction of RNA within the same tide is larger than the interaction outside the same tide. It's telling us that those interactions are probably some biological meaning, which we still do not understand. And importantly, also what we see is an important role of repeat element. We don't know how important it is, but repeat element, if compared to a non repeat transcript, they tend to distribute over a long range distance along the chromosome. And this is all statistically relevant for sine, line, and LTR element. Repeat elements. And I'll just bring here in the last few minutes, one example of repeat element that are embedded in the transcript. And this is one repeat element embedded in this antisense in red. And actually, once we over express the antisense, it does not affect the level of the sensor RNA, but it affects the level of protein. So it over express the cause over expression of the protein. This was published before, but actually what and actually, this is caused by a sine element, sine B2 element. If you take this out, we don't see this effect anymore. So what we've been studying with Harshita still unpublished for is that this is not only one exception RNA, but we have a whole class of sine B2 RNAs, but that but also have a human repeat element embedded in the transcripts that are effective to enhance protein. So this is a class in the mouse is a class in the human, but we identify also in Arabidopsis in course in the fishes in human. And actually, they work in different species. So repeat element and has embedded in antisense, enhanced protein expression of the target RNA in different species. And then is this caused by primary sequence. And by looking at the sequence, we believe it's not is mostly look is mostly a combination of motifs. And actually, we just very briefly using a model of COPS seven B, that is helping sufficient gene that in Medaka fish, the mouse element can rescue a phenotype induced by morpholino that I need with slicing here. And the mouse sine B2 can actually enhance the translation of the fish protein and rescue the phenotype because this is quite interesting for future application in the human. And again, please do not underestimate the importance of repeat, repeat element. Now perspective, everything is going to move a single cell. And we just in the published last year, the cage that works in single cell first with the C1, we see similar things in the Tenex, we identify enhancers, bidirectional enhancers, but actually in each cells and then essentially monodirectional, they're not bidirectional in each cells. On average, they are bidirectional if you look at cell population. There's a good correlation with the DNA is hypersensitive site and the chromatin marks are essential. We can map in the single cell trace identified promoters and and go into genetics working with single cell. In particular, the concept here is that we will map those regulatory elements correlated to the SNP. I try to understand the expression of PtL in each of the specific cells. I've been going very, very quickly here. I'd just like to mention that this is also part of the human cell atlas and there is no need probably to discuss very extensively the human cell atlas here. But we are also quite strongly engaged with this project. There is a human cell atlas activity in Japan with a collection of data starting from various hospitals and we're looking forward to do this and some general details of how the project will go. And there's also a human cell atlas initiative in Asia. I'm going to have the next meeting later this month in China. So please stay tuned to if you are in Asia, please contact us if you want to be a part of this. And finally, I would like to introduce the Human Technopole, one Institute in Milan where I'm progressively moving a good part of my activity in this year and more in the coming years. Thank you very much for your attention. This was done in collaboration with many and there is no time but I just leave this here for any questions that you may have. Thank you very much for your attention.