 They're testing things upstairs, I think. So while they're trying to get that set, I just want to thank Elise for organizing all this and giving us the opportunity to summarize what we've been working on for the last several years. And we're going to shift gears a little bit. Mark and Manolis gave you a great sense of what you can do by taking all the data sets that the different labs have been generating and how you can make sense of it. I'm going to tell you we're going to get down and into the trenches here and talk a little bit about what data really was generated, how it was generated, and some of the things we've learned about it. So if I can move this up here, maybe not, we're back. OK, maybe? Yeah. All right, so I'm going to try to tell you about what we're learning about the transcriptions of both fly and worm. These are efforts that have involved basically long and short RNAs from both flies and worms, from our group involving several different labs, Fabio's group, Sue Selnicker's groups with Eric Lye and Brian Oliver, and analysis helped with Manolis and Mark. So the idea of both worm and fly genomes were reasonably well annotated at the start of the project. But for instance, in worm, about half of the transcripts really had no experimental evidence in 2007, despite the very significant efforts at capturing cDNAs and directed studies by both Mark Vidal and Yuji Kohara and others. And so our idea was that we were going to do RNA-seq proteomics and some directed studies as well on a whole variety of different stages, figuring that some things had been missed because they only occur at particular times, or they're pretty rare. We wanted to look very deep. We also wanted to be able to capture particular cells and tissues from the worm. And for that, David Miller has been working on methods to isolate fluorescently labeled cells by fax. And then we've been making our RNA-seq libraries from them. And similarly, Sue's group in flies looked at, again, RNA-seq across development, various times in development. They've looked at cell lines. They've looked at particular tissues. They've chopped off heads and pulled those and done different conditions and looked at RNA-binding proteins. And just to give you, we're now trying to see what we can do in context of the landscape of the transcriptome compared to human. And so we're also trying to work with Tom Jingeris. They've collected a massive amount of data from different cell lines. You can see down here that represents a billion reads for each of these different cell lines under a whole different bunch of conditions. And so there's a huge amount of data available in human. In terms of what we've actually produced in worms, this gives you an idea. We've looked at 106 different samples from embryos, replicates from samples taken every 25 minutes apart, different other parts of the life stage. We've also looked at four different species. We've treated worms with pathogens to see what kinds of things can be induced under those conditions. And with David, we're looking at different tissues and cells. Similarly for fly, cell lines, tissues, treatments, poly-A tail enrichment, the developmental time course. Whoops, that I didn't do. And just lots and lots of data. You can see Sue's group has generated more than 12 billion reads here. And I don't know how many nucleotides that reads. What have we learned from all this? Well, what we get from the RNA-seq are parts. We get splice junctions. We get coverage of exons. We get, in the case of the worms, we get splice junctions. I mean, splice leaders, poly-A tails. And then we have to put them together. And sometimes it's fairly straightforward. Here's an example where the gene structure is pretty simple. There's just this one alternative splice. There is an alternative five-prime end as evidenced by different splice leaders in the worm, and then different three-prime UTRs based on the poly-A, different poly-A signals we get. These are the intron. These are splice junctions, basically. And so we can reasonably put together a model like that. But sometimes it gets harder, where you get too many different signals. Trying to put all these pieces together gives us some uncertainty about really how many transcripts there are. We'd love to have technology that would give us millions of, or billions of, three to five kb reads or something like that. We could actually get this straight. But this is from Fly, from Ben Brown's work, where they're trying to sparsify and yet represent all the different signals that they get. And then you come up with really hard ones, and we're doing our best guess as to what's going on in those. Underlying this, of course, we have evidence for each of these different elements. It's the commonatorics that we only can infer. But with this kind of thing, we've been able to significantly improve the gene models. Here's one where we started out, with genes going four different genes in worm base. With our data, we see more than one transcript for this, and more than one transcript for this, with alternative five prime ends. Here, for instance, whoops, I lost this slide. No, I'm all right. OK. Anyway, so what happened is that these two genes actually merged, and these two then in turn merged with that one. And this is well supported by the underlying evidence. So instead of four genes, we actually end up with two in this situation. Another example here is something we saw both in worm and fly quite a bit. The gene model started out, most of the body of the gene was OK, but we found additional five prime exons, or three prime exons, in this case here for one of the fly genes. So significantly altering what you think might be the promoter region. We also find long-coding RNAs. Here's an example from the X chromosome on worm, where this block here represents the longest open reading frame we could find in this gene. It's about 4KB altogether with splices and some evidence for alternative splices. And so we've been finding and adding to these kinds of features throughout the genome. So when you take all that together, where do we stand? So with the worm, these were the numbers that we started out in terms of the different elements, and we're pretty confident of those. Worm base had 105,000 splice junctions. Only 70,000 of those were supported by evidence. We now have 131 splice junctions that we've incorporated into transcripts. We have more if we look for rare splice junctions, but we're ignoring those for now. We have a lot better sense of what's the start and stop of genes. We have lots more exons. And in particular, we have added to the gene count. We suspect a lot of these are non-coding RNAs at this point, but there are multiple genes that appear to be coding from the mass spec data. And we've increased the number of transcripts some four-fold. Similarly, in the fly, you have another comparison there. Again, more splice junctions, more exons. And again, in this case, about three-fold more transcripts than were represented in fly base. All this has changed the landscape, our view of the genome a bit. Here from fly, you can see that this is the intergenic distance represented in fly base 5.32. And this is the intergenic distances after taking into account the modern code data. So we're encroaching and reducing the intergenic space. Similarly, if you look at the worm, if we look at the number of transcribed bases, it was 28 million in worm base 230. We've got about 37 million bases with good evidence of transcription. So that's changing things considerably. Now, can we compare this worm and fly and look at this landscape now with respect to human? And so we're going to use human gen code, which is a combination of Havana and ensemble. We're going to take the modern code data from June 15th and the worm data from the same time. And so the number of loci worms, oh, PowerPoint has encountered a problem. See, that's too bad because it was going to be a pretty good punchline. Worms are going to have more genes than people. Let's see. Let's hope this works. All right. Sorry about this. I'm sure this is coming off my time, too, right, Elise? Let's see where we are. It's slowly waking up. OK, we're about, I got about the right place here. We'll back up a little bit. It did it again. Oh no, here it goes. So protein coding genes, it looks like here anyway. Worms are slightly ahead of humans. Flies are down still about 15,000. These are estimated protein coding genes. But humans have still lots more transcripts. The transcript landscape for worms and flies is increased dramatically, but humans still have many more transcripts. And in terms of exons, humans have many, many more. And from Tom's work, they have evidence for some 40,000 additional genes, 94,000 other exons and 73 other transcripts. These are probably mostly non-coding RNAs of various kinds and at various levels. But this is the comparison with the current view of human. In terms of numbers of long non-coding RNAs, you can see, again, human has many more. And so there's still considerable difference. And in particular, well, you could worry about those kinds of comparisons because we're using different methods and we're trying to put together things in different ways. But here is the human pseudogenes and versus the fly and worm pseudogenes processed in the same way. And you can see humans have a huge number of processed pseudogenes. We knew that. And a bunch of duplicated pseudogenes. Worm has maybe 1,100 of these. Fly has about 500. So a huge difference in the transcription landscape between the three organisms. OK, so I want to give you a couple anecdotes at the end here about what we have been able to do with this. As I mentioned, we've been collecting data from different stages of the lifecycle for both worm and fly. And Jessica Lee in Stephen Brenner's group, working with Stephen, asked the question, can you find genes that are associated with specific stages in one organism and see if those are also, are they associated with particular stages in the other organism? And so to start out, you have to find these are the different data sets that were available. And to start out, Jessica defined the stage-specific or the stage-associated genes and then looked across different stages within fly. And it's not surprising you get a nice diagonal. And adjacent stages share some of those same genes. Now, these are just looking at the orthologs, of course. For worm, you get a similar picture. Again, a strong diagonal. And then so the question is, what do you get when you compare worm with fly? And quite satisfyingly, there is a strong diagonal here. Especially through the embryonic stages. Now to make sure you understand what we're talking about here, I want to explain how Jessica got the significance of this one score. So she started out, as I mentioned, looking at worm fly orthologs for each organism. She then looked to see which genes could be associated with one stage or another. In this case, this particular stage had 762 genes that were associated with the 12-hour embryo. And worms had 363 genes associated with the dower. And then she looked to see how many of those were shared between the two sets. Here it's 107. And then used the hypergeometric distribution to determine the significance of that. How likely would you get this number by chance? Importantly, she applied a Bonferroni correction because this is a 36 by 30 array. She basically increased the p value by three orders of magnitude and still got very highly significant values. I want to come back to this one then and look at this a little harder. And I want to point out these two regions. And we didn't see this, if you remember, on the fly, against fly. But when we look worm against fly, we actually see genes that are associated with late embryo also appear in the pupae. And what we think is going on is that there are genes. Now remember, these are orthologs, not just one to one orthologs, but many to one and one to many. And what we think is going on here is that fly has taken a bunch of genes that are important in this aspect of development, duplicated them. And so when we use worms, they have a single copy. And we can see both copies when we look against fly. And so it says that this element looks like there are indeed expression patterns that are shared with orthologs. And indeed, they maybe reiterated in development here, something we did not see by itself. OK, so Mark mentioned that you can take histone marks and correlate that with transcription levels and do very nice predictions of gene expression, and not just in worm, where they started, but with human and in fly. And so we're now at a stage in the project where we're getting a lot more data. Mike Snyder tomorrow is going to tell you about the data on 110 transcription factors. But we would like to be able to get still other kinds of expression data that would limit the kinds of hypotheses that you could make and see if there would be even stronger predictions that could be made from it. So one of the avenues that we pursued is with David Miller looking at individual cells. This is an example of two neurons from the pharynx. So what we're sorting on are just the two NSM neurosecretory cells in the pharynx. And you can dissociate the worm with STS and pronase treatment and up with cells. Here are some examples of those. And if you sort on those and do an RNA-seq experiment and look for enrichment, these are the top 10 genes that are enriched in this neurosecretory cell. This is a serotonergic cell. And these are some of the genes that you would expect to be involved in those. And these are here. So Mod 5, TPH1, CAT1 are here. And these are very highly enriched. Remember that these are two cells out of 1,000. And so this is about as much enrichment as we should expect. These are probably just because these genes are so hard to detect. The signal is so low in the whole L1 stage that the signal just, it's probably noise. Anyway, so that's one approach. We're doing this on a variety of different cells. I think if we can do this on two cells, we probably can do it on one. And so we should be able to get signal one cell at a time if we can do it, if we can find the right labels. And the other aspect that we're doing that we'd like to be able to bring to bear on this, it's praying on mice. Oh, there it goes, is to look at movies. So we're using these confocal movies to see if we can get gene expression at the single cell level. Each nucleus is labeled with histone GFP. And I don't know, this is about 28 cells or something. So these are confocal stacks. We take a stack every minute. And we're just showing you the movie. We don't ever actually look at things this way. But in the same strain, we have introduced a red fluorescent reporter. And you can see it coming on here and labeling these other cells. And because the worm has a constant lineage, and we can follow all this, and Zhirang Bao and John Murray have done amazing work with all this, the idea is that now we have a picture of what genes or what cells are expressing this gene at any moment. And I'll just go to the next slide. And basically what we can do, this is now the C. elegans lineage, not a cluster diagram. And you can see the individual cells expressing. These are they here. And we know what lineage they came from. We know the anatomy. And so the idea is, can we begin to intersect this information with the transcription factor information and other expression data that we've been getting from other things and begin to think about how all these things interact to create a regulatory network. And I'm going to let Mike talk about that tomorrow. In the last couple of minutes here, I just want to tell a story from Eric Lye and Sue Selnicker about three prime ends. And basically, when you look at tissue-specific things, expression patterns, you see that the proximal poly-A sites are used in the testis. Here, a medium poly-A site is being used in ovary. But if you look at the head, there's a very long three prime UTR, in this case, 18 kB. And this is only really used in heads. And they went ahead and actually did a northern analysis on these to confirm that the heads for these different genes all had significantly elongated three prime UTRs compared to other body parts. And testis was generally short. I'm not going to go through all of these, but you get the picture. And they then also did in situ, where they took probes from the whole message and compared it to the probe from just the extended three prime UTR. And these are all neuronal specific. These are showing different patterns. For instance, these spots here are gone here. So these are long UTRs that are specifically used in the nervous system. And presumably, these are excellent targets for microRNA regulation. This just gives more of that so that you get a picture of how long they are compared to the rest of things. And indeed, the longest ones are way out here, and they're all neuronal specific. So that's an unexpected finding from all this data. So I've told you that the worm and fly genomes, the annotations are much more complete. They're much more accurate. And unfortunately, for the worm investigators, they're much more complex. The worm and fly stages do share common stage associated genes, and we're anxious to see if we could possibly correlate this with mouse. Human is going to be another challenge. But do these same sets of genes, are they echoed in mouse embryogenesis? We hope that the expression data combined with other modern code data can begin to predict expression in a serious way. And then I ended with this little vignette about neuronal specific alternative polyadenylation and long UTRs. And as everybody else has done, we have a long list of collaborators. This is the People on the Worm transcriptome project. These are the people involved in the fly. And I think that's it for now. Thanks. Bob, right, oh, go ahead. Go first on your left. Great presentation. My question is, I mean, it was pretty striking to see these different lineages from worm coming up. I mean, the gene expression coming on in all those different lineages at the same time. So the obvious question is, is it some extracellular signaling that sends a message to each of these cells and they're all responding? Or is it that each of them is programmed to independently come on at exactly the same time, creating this resulting sort of uniform structure? So that was that particular example. Susan can talk more about Faw4. We certainly see other transcription factors that come on over different times in different lineages. For Faw4, I think that's thought to be cell autonomous. And so these should just be all coming on. Is that fair? OK, so a little bit of both, basically. A little bit of both? OK, fascinating. Wow, thanks. Ben. Hey, Bob, two things. One, I just wanted to credit Nathan Bolli, who is here today for the fly annotation and it never ever would have happened without him. Must thank Nathan. And two, sorry, Nathan. Two is with the cell sorting, is there a concern about RNA degradation during the sorting process? I mean, how fast are you treating with RNAs inhibitors? So all of these are concerns. I mean, the cells are viable at the end of this process. You can plate them out and culture them. We don't do that. We take them and harvest them. But it's true that the cells are dissociated and then put through a fax machine, unfortunately, room temperature. And all this takes time. They're then plunged immediately into trizol. So that while they're alive, we don't know what's going on. But we are at least encouraged that among the things that are there, they make sense. So have you tried doing a time course where you literally just sort slower, do things slower, and see if you get big changes? That would be a good idea. Bob, I was struck by the low exon count in the drosophila genes. I'm wondering if you think that relates to the low repeat density in the eukromatic arms. We have noticed that the fourth chromosome genes where we have a 30% repeat density have an average of 6 exons compared to an average of 3 exons in the eukromatic arms. I don't know. Nathan or Ben, you want to? That was the slide you sent me yesterday. Rapid evolution here. Is that right? I put the wrong one on the bar chart. I thought I got it off your slide. OK. Sean. Bob, I really appreciated in particular the color development. It's very nice. Is that from your lab or from Fabio? No, no. The development is all from the movies and so forth. Oh, no, that's from John Murray is in the audience. He probably took that movie. Very beautiful. Then I have a question. With exactly the same method, we have been annotating for 10 years 50,000 genes in human and about 20,000 in worm. So we always had this big factor. Now, are you sure that you are counting 20, 20, and 15 with exactly the same method? So as I said, when we compare worm and human and fly, we are using different methods. We're using the data as best we can. And so GenCode, they do it. This is a combination of manual annotation and ensemble. And so we're going to try to dig deep into this and see what we can come up with. OK, thank you. OK.