 But in any case, I'm going to be talking about insights we have from, have gotten from the analysis of the C. elegans genome. And I should just say at the onset that I've really enjoyed being part of the modern code project for the past number of years. I mean, I think it's been a very nice community to work in and really kind of inspiring scientifically. And I'm hoping we get our first slide. Well, I'll just, oops, not in front of me, here we are, yeah, great. So I wanted to sort of start off by just putting the analysis that we talked about in the C. elegans genome in the broader context of the ENCODE project, and Elise has done a really great job in the previous talk of introducing all these projects. I think an important thing to realize, you know, as Elise pointed out, was that the modern code projects, in some sense, were pilot projects for the annotation of the human genome. And this year, we hope to see the annotation of the human. So what I'm going to focus on is talking about the annotation we did of the worm genome, and the lessons we learned from that, and how we applied them a little bit in the annotation of the human, so you can see how the research kind of translates. And in further talks later on, we should hear about future analysis that people are thinking of doing, where they're really trying to compare the worm flying human in a kind of apples to apples comparison. So I just thought I'd start off by giving you my own view on high level view on genome annotation or how you do it. As I see it, there's kind of two main tracks. One I call kind of comparative analysis, and the other is functional. The comparative analysis is basically you take the genome enters, the worm genome, you compare it to lots of other things. You find things that are conserved that don't change much, either a lot or a little. And those regions of conservation in and of themselves become annotations, and of course you can do this in quite a elaborate scale with the human genome, you can look at it versus itself within the human population and so forth. The second type of annotation is you might say functional annotation, and this of course is when you look at the readout of a functional experiment, such as RNA-seq or chip-seq over the genome, and there you'll get a noisy signal over the whole genome that gives you some readout of what the genome is doing, if it's being transcribed or something's binding to it. And what we do, often as we process the signal and smooth it, and then we, from looking at the signal, we get little regions that we call initial annotations. We might group them together into larger and more meaningful units, and that's kind of my high level on what we do in this process. Now much of the worm genome and particularly the human genome are not genes. They're large, intergenic regions, and people often call them the dark matter of the genome. And I thought I would just say a little bit of why I think they're really important to think about. I actually really like the analogy with the actual dark matter in the sky. So this is a picture that shows the stars, and it shows how the dark matter in the universe acts as kind of a lens to the starlight. And in a sort of similar way, I think the non-coding bits of the genome, they act as kind of a lens that sort of modulates the effect of the genes. They just were all the regulation and control of the genome is. Another thing that we found within ENCODE is a lot of the non-coding regions of the genome are transcribed and in some sense functional. They also form a very nice historical molecular record of the genome. They have a lot of molecular fossils, and of course, within the human context, many of the disease associations we find are in the non-coding regions of the genome. So I'm going to focus on the analysis we did on the worm genome. I thought I'd first just introduce very briefly the data, and I'm going to introduce extremely briefly the human data that I'm going to maybe connect a little bit with this. And so the worm data consists personally of this time course through the development of the organism, and we have lots of data on this time course. We have RNA-seq data, small RNA data, and long RNA data. We also have tiling arrays on various tissues, and then we have ship-seq of a number of factors, about 22 factors for the integrative analysis that we published. And we also have polymerase in most of these stages, and we have chromatin marks in embryo and larval. Now, just to give you a sense, because I'm going to be talking about how we applied some of the approaches to the human, in the human, they don't have this wonderful time courses, but they have lots and lots of cell lines. They focus most of the data on three of them. They call Tier 1 cell lines, and they have many more transcription factors, about 120 and so, lots of deep RNA-seq, just tons of it, and, you know, about maybe 12 main histo-mars. So this is a tremendous oversimplification, but just to get people on the same page. So what I'm going to touch on today is I'm going to talk about these five topics, the analysis, the expression time course, analysis of non-carrier RNAs, looking at the activity over the whole chromosome, talking about regulatory networks, and then talk about how we can kind of relate these things together, the histo-marks, the transcription factors on the gene expression with statistical models. And I put this little word, you, next to four of these five points, because for four of them, we really can talk about very, very directly how analysis approaches that we build in my end code really were directly applicable to human end code. So first of all, the expression time course analysis. We have all the different stages, and let's just start out by looking at the kind of traditional clustering that you get. When you do this, if you cluster the gene expression over the time course, you very nicely see the embryo and the larval separate. Now, this isn't probably not unexpected, but it's, of course, satisfying to see this type of thing. And then we looked at the tissue samples that we had for both embryo and larval. We could kind of, and we looked at the kind of principal coordinates, our projection of them, we could kind of see kind of all the embryo tissues being in one place and the larval tissues being in another place kind of moving into two different regions of gene expression space. Now, one of the nice things, really nice things about the modern code data set was that all the experiments were coupled. So we also have a coupled polymerase binding experiment where we can look at the binding of the polymerase in the same stages, same situations. And you can see here, if you cluster that, you get the nice separation between embryo and larval. And we can also directly correlate the binding with expression. And here we found something that was not completely obvious. We found this interesting situation, if we looked at expression early, it tended to actually be correlated with binding late. And we puzzled about this for a while, and we never got a completely satisfactory explanation. I mean, some ideas we had to do with the swirling of the polymerase, someone unexplained observation. Now, since these were RNA-seq experiments, we could, of course, look at splicing in detail, and we figured there's about 280 genes that changed greatly over the time course from the total worm genes. And here's an example of one of the genes that changed greatly in its splicing structure over the time course. Okay, so now we talk about non-coding RNAs. So one of the really nice things about the RNA-seq experiment, of course, is we could identify lots of regions of the genome that were active beyond genes. And we identified about 7,000 candidate regions, small regions that we thought could potentially be transcribed non-coding RNAs. And these were validated, and to some degree, we believe they have a very good sort of positive rate or predictive positive value. Now, one of the lessons we got from this is that no single individual experiment was really successful in pulling out the non-coding RNAs. And you really got a lot of value from kind of evidence integration. And you can see this pretty clearly here. So this is a total RNA tiling array from this. And here's an example of the known non-coding RNAs that we knew were non-coding RNAs. And here are protein-coding genes. And here's intergenic regions. You can see this with this one experiment. You could not draw a threshold in any one place that would really cleanly separate the blue guys from these green and yellow things. And if you start looking at more things, sort of plot two different features of non-coding RNAs relative to each other, you can start to get a better separation. And as you get more and more features, you're able to discriminate better. And so I think this was a kind of principle approach that was really born out here. And you can see this actually specifically if you look at a particular gene, I'm particularly interested in Pseudo genes. If you look at a non-coding RNA that intersects Pseudo gene and creates a transcribed Pseudo gene. And so here's the transcription of the parent gene. And here's the transcription of Pseudo gene. Now these are in different stages. And of course, when you're looking at Pseudo gene, you're always wondering, oh, am I looking at mismapping reads or some form of cross-hybridization and so forth. But you can see when you look at these many different experiments, this Pseudo gene has a very different transcriptional life than its parent, right? It's not in any way correlated with its parent. And that gives us a lot of confidence that it has an independent transcriptional life. Overall, in the worm genome, we found about 1,200 Pseudo genes with about 16% of them having fairly strong evidence of transcription. So we carried essentially this approach in these lessons to when we did the human analysis. And you can, we drew very similar pictures here and here's a particular example of the same type of thing for transcribed Pseudo gene. And so overall, the human project found many, many more sort of non-coding ideas. Within the gen-code project, there was about 5,000-some-odd link RNAs, about 12,000 Pseudo genes that were annotated to high quality and about 900 of them had very good annotation of being transcribed to. Okay, so next thing that we found in our analysis is we looked at the overall distribution of activity over the chromosome. And the first thing I'll show you is a kind of plot, it's kind of like an accounting plot of the different regions of the genome and how much they were covered by different bits of activity. So if you take the entire worm genome, you find that about a third of the genome is under constraint, okay? And then if you take that bit and you say, can you account for all those constrained bases in terms of some form of activity? And some, of course, is in genes, but lots of the other regions are counted for by TF binding or by particular punctate chromatin marks and so forth. And only about 20% or fifth is unaccounted for it. So we found that overall, we could account for most of the constrained bases in the worm genome in terms of some form of genomic activity. Now some of the patterns we found were kind of interesting. So we looked at the histone marks on the chromosomes. Here's a picture of a chromosome, this is chromosome three, and here are all the variety of histone marks. You probably can't read these very well, but these are all the different marks here. And one of the things we found which was very striking was that we had an elevation of repressive marks. This is like H3K9 at the arms of the chromosome and a kind of depletion of activating marks. And this was on all the autosomes. And if you look in detail at the junctions here and they're sort of zoomed in, you can see there's fairly sharp junctions between these kind of, for these repressive arms. And the sex chromosome was very different. It has, you can see just right away, a completely different marking than the autosomes. So very interesting large scale patterns. Another thing that we found when we started looking at the chromosome distribution of the binding was that we looked at the transcription factors. We found that often they were just bound all over the genome and sort of haggledy-pagledy, but there were a number of regions of kind of, I want to say coordinated, but clustered binding in a particular spot and we called these hot regions. And here's an example of what I mean. Here's a number of different transcription factors and you can see these are various places in the genome. We're one's binding here, one's binding here, but in this particular spot here, all of these guys are really zapping down on that particular spot. And overall, we found about 300 hot regions in the genome and then we looked at the properties of the genes that were nearby to these hot regions and they had very distinct properties. They tended much more likely to be essential genes and also they tended much more likely to be expressed in all the different tissues of the worm. And this type of approach for looking at hot regions was applied in human. Of course, in human we identified many, many more hot regions because of the much larger extent of the genomic space and also the larger number of transcription factors that we looked at. There was tens of thousands of hot regions identified in the human genome. But essentially this is really an example of the same approach that was scaled up pretty directly. Okay, so now I'm gonna talk to you about a little more detail about the transcription factor analysis and the work on the regulatory network. So when we took the binding of all the transcription factors in the worm genome, we found that we could arrange these into a regulatory network and one of the nice things about the worm genome is fairly compact, so it's fairly direct going from the binding site to the target gene. It does not huge intergenic spaces. And we looked at the regulatory network. It has about 25,000 edges. This is just a larval network that we're looking at. And these 25,000 edges involve about probably 6,500 target genes. These are non-transcription factors. And then here is the transcription factors and of course they have their own little network. One TF regulates another. And we found from looking at the connectivity network we could put some of the TFs on the top. These are things that mostly just regulate some of them on the bottom that tend to most to be regulated by other TFs and some in the middle. And then in a very rough way, we could find differences between these levels. And so for instance, I've colored the essential TFs red, the Hawks ones yellow. And you can see the Hawks ones tend to be more on the top. We also looked at the tissue specificity of these TFs and you tended to get the more specific ones at top. But obviously we don't have very many TFs here and the statistics are a little weak. But as I'll point out in the second, when we scale this up to the human, I think it was really satisfying to see the same approach work, but scale up with much better statistics. Another thing that we found when we looked at this hierarchy was that we could merge the microRNA regulation with the TF regulation and that it nicely fit into this kind of hierarchical view. So here I show the microRNAs that regulate TFs, the microRNAs that are regulated by TFs. And you can see you get more microRNA regulation at the top that regulating the TFs at the top of the hierarchy and so forth. Now another thing we could do is in addition to kind of looking at the global, overall hierarchy of the TFs, we could look at kind of little sub-clusters of TFs, a kind of, you might call network motifs. And here's for instance, a network motif of two TFs and this triangle represents a microRNA and how they kind of all work together. Now one thing you can do is you can enumerate all these little motifs and count how many times you see them in the overall bigger network and some of them are gonna be more common than you might expect and others less common. So here's the complete enumeration, you can't probably read this here, but here's how many times this little thing occurs and then here's a picture of the seven over represented motifs and let me just focus on this one over represented motif here. This is a feed forward loop and that means we have a microRNA that regulates the transcription factor and actually represses it. This TF activates another TF and that microRNA also represses that first TF and you can actually, this is a very over represented motif and you can actually think about what it might be doing. Well, if this microRNA wants to turn off this transcription factor, it sort of turns it off but it also turns off at the same time the thing that activates it and so it kind of makes sense. Now as I said, we scaled this approach up to human and I think it was really satisfying to see the exact same approach, literally the same machinery scaled up to human and just getting much better statistics because of the larger number of transcription factors. So here's the human transcription factor network. Now we're looking at 120 factors and we found that we could arrange this very nicely into a hierarchy. This hierarchy is built so that the downward pointing edges are shown as green and the upward pointing edges are shown as red and so you can see, you can arrange it very nicely into a hierarchy where most of the edges are pointing downwards, okay? And so you very much have a sense of some TF sitting at the top and some are mostly being regulated and then just as with the worm, we can take this hierarchy and we can paint it with various genomic properties and look at the differences and so I like to compare this TF hierarchy sometimes to social hierarchies. People have a lot of intuition for that and so we might say, well, what are the differences of the TFs on the top and are they more influential and so how can we measure influence of the transcription factor? Well, we can measure it in terms of how it affects the level of gene expression. So we take a given transcription factor and look at how correlated its binding is with the expression of its target gene, overall of its targets, a more influential transcription factor will have a greater correlation and then what I can do is I can color all of these transcription factors by their influence and that's what you see here and then what I can do is I can average that number over each of the levels and you see overall that the top levels tend to be somewhat more influential overall than the bottom level. Another thing we can do is we can of course look at the connectivity with the microRNA network. Now it's a little hard since there's so many TFs so many microRNAs, it's kind of hard to make that picture as I did for the worms. We found it better to kind of arrange things in a circle. So here's all the microRNAs, all the TFs and here's all the connectivity between them and you can see the highly connected TFs tend to be connected to the highly connected microRNAs and there's a very strong positive correlation there. We can also look at how this relates to the hierarchy and just as we saw for the worm, you get more microRNA regulation at the top than at the bottom. There's also another column too because you'll also have the degree of regulation of the transcription factors onto the microRNA and again, a little bit more at the top. So the top of the hierarchy is better connected and a little more influential. The other thing we can do is of course look at the motif analysis just like we looked at for the worm genome. Here I'm just gonna look at all triplets of TFs and we found just as we found for the worm this tremendous prevalence of feed-forward loops and so I'm just gonna show you feed-forward loop now that involves three TFs, one that regulates a second, it also regulates a third and the third regulates the second. This is the most over-represented motif in the network and also other over-represented motifs have essentially feed-forward loops with one variation, a kind of toggle switch variation on them and you can get some understanding of what these feed-forward loops are doing if you actually paint them onto the hierarchy. You can see that most of the feed-forward loops tend to involve kind of regulation to the middle level and so you can see very much how the middle level is kind of mediating the regulation through these little feed-forward loop constructions. Okay, so that's the analysis of the regulatory network and now I'm gonna talk to you a little about statistical models trying to put these things together. So I'm gonna talk about, as Elise pointed out, one of the main things that we looked at in the mod-encode project was the process of transcription and there's many different elements that are part of that. There's obviously the read-out, the gene expression, there's the binding of the polymerase but there's also binding of transcription factors, the modifications of the histones and so forth and how do all these things fit together? Now, the hours really go back and forth. You can't necessarily say that, for instance, the chromatin structure causes this. Sometimes the active expressing in itself changes the chromatin structure but we can look for statistical correlations between all these things. I'm gonna show you two main types of statistical correlations or actually three main types. One looking at the histomarx-religion expression, one looking at the TF-religion expression and then just looking at the histomarx relative to the TFs. So first of all, it's the simple thing you can do is you can take all the histomarx and you can just look at them, aggregate them upstream of the TSS of genes and so forth and look at how they look for the highly expressed genes and lowly expressed genes and so forth. Let me just zoom in on one mark here. This is H3K4 and here's what the mark looks like for highly expressed genes, for lowly expressed genes. You can see there's an obvious difference. It's elevated for highly expressed genes. This probably suggests that we can get some predictive value from this. And so what we did is we built a simple model. We took the TSS and we looked at lots of little bins upstream of it and downstream of it and we looked in each of the bins, we looked at the level of the different marks. And we tried to see the degree to which those levels would be predictive of the gene expression level. And so let me just show you the degree to which each of those predictors is successful. So here are all the different histomarx, here are all the different bins and what I show you in each bin is the correlation of that bin with the level of gene expression. And so you can see some of the marks are very correlated, some are anti-corrid, these are repressive marks and some of the correlations are very dependent on the positioning, okay? So the nice thing is you can put all this together into a predictive model and when you do a predictive model you often build one of these rock curves and here's a rock curve classifying highly expressed versus lowly expressed genes. You can build one of these curves for each of the different bins and you can see which bin is more predictive and that tells you where the histomarx are in a sense more important for governing gene expression and lo and behold, the most predictive position turns out to be right at the TSS. Then the next thing you can do is you can say, well how well can I predict the overall level of the gene expression in a statistical sense and so here's what you get and actually I think lo and behold, and this was very surprising to us, you actually get a good prediction. This is a pretty good R, R value of 0.75 where you can put all the histomarx together and get a good sense of expression. But what I think is even neater is you can take this model that we built on protein coding genes, okay, parameterized on protein coding genes without touching any of the parameters or anything, we can go upstream of the microRNAs and we can try to predict their level of gene expression and then we can match that against the match data sets we had for the small RNA expression and we don't get as good of an R but I still think we get a fairly satisfying result where we can actually find that that model actually encapsulates something about the thing that goes into transcription. So now we scaled this up to mammalian systems, a lot more data points as you might see. So this is what we have in the end-code production project. Look at that R value, 0.9, that's an extremely good R value, very, very good correlation, the relative importance of each of the TFs. Okay, so now I'm gonna tell you about the same thing but now I'm gonna focus on looking at transcription factors in addition as opposed to histomarx. So we can do the same game that we did for histomarx. Here are all the transcription factors in the worm genome and here's how each of the factor, and here are all the bins and here's how each of the factors is correlated with the level of gene expression. You see a sort of similar thing, some things that are correlated, there's positive, some things are repressive, they're negative and you can see how this is very punctate. It's very, very clear right at the TSS that's where the binding is significant. And then you can ask how predictive are the TFs of the level of gene expression and just for interest of time I'm just gonna show you the human result which is a bit better than the model organism result because of the scale and the really neat thing is it works really well. You can predict the level of gene expression from all the encode TFs using this machinery we developed a mod encode with an R of about 0.81 and here's the relative importance of all the different TFs. Now this actually produces a paradox that you might think about for a second. How is it with a fairly small number of TFs you can predict the level of gene expression for all the genes when there's thousands of TFs in the genome? This is in the human and mouse genomes. You might say, well I thought that the level of gene expression of a gene was determined by the intricate binding of literally thousands of factors. How can I do well so well with only a few? And that's actually shown here where I show just using one, two, three, four, five, six, seven factors how well we're able to predict and actually up at seven or eight factors we're able to predict extremely well. I think that people have had some various rationalizations for why this works. We're not actually sure why it does but one rationalization is that remember this is a statistical correlation and it might be that what happens is you have something like a pioneering transition factor that kind of comes in, price things open and really is associated with the specificity but once the chromatin is open you just have a large group of TFs that are gonna bind there irrespective of the type of gene and so you get a fairly good correlation but in any case a very surprising result. The other thing we found was we could look just for the TFs just like for the histone marks of which bins were important in here. You see very clearly that TFs are very, very strong single just at the TSS as opposed to the histone marks which are all of the thing. And then you could ask if you put the TFs together with the histone marks would you do better and actually you don't do it better. So they're completely redundant or they're redundant. There's no new information in putting the TFs together with the histone marks. And of course that might suggest to you that the TFs and the histone marks contain some redundant information and in themselves the TFs might be able to predict the histone marks or vice versa. So we played the same game. Bins for the histone marks, bins for the transcription factors can we go back and forth and predict them the usual machinery with statistical prediction. And here's what we found. This is the result now from the worm. This is what we had in the worm analysis. These are all the different histone marks. Here all the different TFs. Let's just look at this one row which I don't know if you can read it but it says HLH1, this is one factor. Here's how each of the marks predict the binding of that factor. The blue obviously not so strong. Okay, but then when you integrate all these things together that's shown in this column. You get a pretty good prediction and the value in this column is actually shown in this bar chart here. So you get a fairly good prediction just using a chromatin model. But you might say well geez I can figure out where a TF's gonna bind because I have a motif. I know specifically where that TF is gonna bind. And so you can actually put the PWM together with the chromatin model and you'll get a more successful prediction. And I think there's a very easy intuition for that. You find the regions of open chromatin with the model and then you find where it binds with the PWM. Just quickly say that we use the same machinery to find lots of enhancers, building a chromatin model and to find lots of enhancers for the human the exact same machinery that we developed in mod-n code. And very quickly I'll just say that when you do that you can build not only a normal proximal regulatory network but a distal regulatory network for the distal edges. And it has a very different structure you can see from the proximal network showing the very different type of regulation you get distally. And so I'd just like to summarize what I've talked about today. So I talked about insights we got from the worm mod-n code and how we've used this for the human genome annotation. I talked about how we did analysis the expression time course. We found coordinated binding expression and we also found lots of splicing changes the way we looked at non-carrying RNAs and the importance of evidence integration not just looking at one dataset. And I focused on this example of transcribed pseudogenes where we could find about say 10% of them were transcribed. Then I talked about the overall distribution of chromosomal activity. And we found these ideas of repressed arms in terms of chromatin and also we identified hot spots of TF binding. And we found that most of the constrained regions had some form of activity. And then I spent a lot of time talking about the regulatory network. And here we found that we could arrange the TF binding into a hierarchy with great differences in properties between the level. This hierarchy could be integrated with the micro RNA regulation. And then I also talked about how we could drill in down and look at little network motifs. And we saw this great prevalence both in the worm and the human of feed forward loops. And then finally I tried to put all these things together in the framework of statistical models looking at statistical models that could predict gene expression from histomarcs. It's rather amazing. I think you can do this for protein coding genes and micro RNAs. And the fact that we find similar results for transcription factors and it's a bit of a paradox that you can get such great predictive performance from only a few TFs. And then at the very end, I talked about how you could find where the TFs themselves were bind from the histomarcs both with the PWM and not and how we later found this very useful for identifying enhancers. So now of course I wanted to acknowledge all the people and I'll first just do the encode acknowledgments. As Eric Green said, we expect to see really big lists of people here and you're gonna see that right now. I'm not even gonna try to list all the people in the main encode project, but I'll list two sub collaborations within them. One's called the Nets Element Collaboration and the other is called the GenCode Collaboration. So in the Nets Element Collaboration, we focused on the regulatory networks. And this was partially led by Mike Snyder. A lot of the data production was from him. And a lot of the main analysts were Anchil Kandaji, Chow Chang, Jasmine Mu, Ekta Karana, Joe Wazowski, Roger Alexander, and Renquan Min. And also a lot of the enhancer work was done with Hugh and Bernie who led the overall encode analysis and Peter Bickle and Ben Brown. For looking at the human non-criminase, this is part of the GenCode thing, which really worked from Jennifer Howard and Frankish, Saganthi Balaswamy, Viking Pei, and Christina Sisu. Now finally, of course, most importantly, I'm gonna acknowledge the Wyrm Mod encode project. And so this is the entire Wyrm Mod encode bunch. And there's more than 130 names on this thing. It's a very large effort that really encompassed a lot of people. A lot of the data generation was, of course, led by Mike Snyder for the TFs, Bob Warrison for the transcription, Jason Leib, and Lincoln Stein participated a lot in the analysis. And there was a lot of other contributions from any of the other KOPIs, pick Ladino Hiliyay, Valerie Ranky, Goss Miklum. And a lot of the main analysts include John Lu, Eric Van Nostrand, Chow Chang, Tao Lu, Kevin Yip. They did a lot of the work with finding hot regions, finding non-corneries and building the statistical models. I should also acknowledge, I mean, there's 130 people on this list. I mean, this is a serious challenge, keeping all these people working together. And so I think we should also acknowledge the NHGRI program officers, particularly Peter Goode, who put a lot of effort into keeping us all working together. And with that, I'll thank you for your attention. And you have to speak simply because I'm a fly guy. It's very impressive work and analysis over time course and so on. It's difficult as biologists to know how to use sort of grind it up animal transcription factor levels. Also, I'm unclear on how, since different transcription factors can act wildly differently in different cell types and so on, how your models in your networks are either able to account for that or that I'm going to be able to make use of those to make predictions for my tissue of interest. So the way I would answer is, I mean, I think one of the goals of the integrative analysis in mod-n code, and also this is true of n code, is to build a reference annotation that can be used as a resource for people to come to. And I think to get a sense of how you might use a reference annotation, let's think about how you might use the gene set. So different genes are active in different tissues and different cells and so forth. But it's still useful having a comprehensive human gene list. And to go to reference, go to that as a reference, to look at which genes are active in different cell types. And likewise, we'd like to have a comprehensive reference of an overall wiring diagram for the organism with the knowledge that different bits of that wiring diagram are going to be turned on in different cell types. And I think that's the way I think about it. I mean, I think it was like the blueprint for this building. There's all the wires in this building. Only a fraction were turned on at this particular time in this particular room. But I think it's nice having the overall wiring diagram to get a global sense of things. And that's the kind of way I think I would deal with it. Since C. elegans is used as an aging model and since presumably in the metadata you have ages of all the basic assays done on this, have you looked at any of these parameters in an aging axis? Well, certainly for the gene expression measurements, there's a lot of very careful work done by Bob Worsten, also by Frank Slack, looking at old worms and so forth. And I can't say to the degree if we found a particular thing that would characterize the gene expression of things getting older. But I think that's more of my lack of knowledge on that particular thing. I don't know if Bob would want to comment or something. But we certainly looked at that. Was there any signatures for age that we found there? Frank worked harder at looking at the microRNA data with stages. We did less of that with the polyA RNA. And Frank certainly has microRNAs that look like they're signatures or certainly are more highly expressed in older worms. And others that are lost. Yeah. So I think one of the big challenges in biology today is figuring out which cis elements and trans elements actually regulate a particular gene in an animal, as opposed to some simplified model system. And hearing you talk about the pseudogenes, I'm struck by you have these 200 pseudogenes that are differentially regulated with respect to their paralogs, if you will. Has anyone gone back and said, what are the differences in the cis elements, the trans binding factors between these to try and figure out what's regulating the so-called wild-type paralog or what's regulating the pseudogene? No, that's a really excellent question. And I should say I'm extremely interested in that. And some of that was done to some degree in human encode. I mean, one of the things that we found when human encode, which I found was particularly interesting is we found lots of examples of partial activity of pseudogenes. And particularly we found some degree of active chromatin, some degree of transcription factor binding upstream of them relative to their living parent. And you can speculate what that means. One idea is you have a gene that dies. And well, it doesn't all die at once. People are only aging. This is the aging gene. It's not transcribed. But maybe its upstream still is maintained to some degree. And you can see things bind to it. I think that's an extremely interesting question that we haven't done enough on. But I'm particularly interested in it.