 Okay, good morning. We can get going while people continue to struggle in. So I'm Brent Gravely. I'm going to be the moderator for this session and the first speaker. But first I'd like to just thank the members of the Data Coordination Center who've really put a lot of work into organizing this meeting. I think they've done a fabulous job and hopefully making it a great experience for all of the users. So yes, especially Jean. Not that anybody, everybody else has done a lot of work. And for also having a very good music list during the breaks. Okay, so as I said, my name is, oh, we need to get to the beginning here. There we go. Okay, so my name is Brent Gravely. I'm one of the production PIs within the ENCODE project and I'd like to thank myself for inviting me to talk to you today. So what I'd like to talk to you about is a project that we've been working on, characterizing the functional RNA elements in the human genome. And this is a new component of the ENCODE project. So during the past few phases, we're currently in phase three of ENCODE, which is ending in the next few months here. This is one of my favorite figures from the previous publications from ENCODE. And part of why I like it is that it highlights sort of the diversity of cell types and assays that have been done. And so the other reason I like it is you can see here in RNA binding proteins, there's actually very few proteins or cell types or assay types that were actually conducted. Yet RNA binding proteins are really an important part of the biology of the genomes. In fact, there was a paper recently published in Science from Jonathan Pritchard's lab highlighting that RNA splicing is one of the primary links between genetic variation and human disease. So this is a really great paper, so go read it. But it's highlighting the really important aspect of RNA biology in human disease. So there's many aspects of RNA biology that occur when you synthesize a pre-M RNA. These are then spliced. They can undergo RNA editing. They can have cleavage and polydenylation at the three prime ends. And then these RNAs are either retained in the nucleus or exported to the cytoplasm where they become translated. And then they can be localized within the cytoplasm to different regions. And they can be also controlled at the level of RNA stability. And each of these steps can be controlled by a large number of RNA binding proteins. And these proteins recognize elements within the RNA. So for instance, proteins involved in splicing can bind to intronic or exonic elements that can either enhance or repress splicing of a particular exon. Proteins involved in polydenylation can bind to either exons or they can bind downstream of the cleavage and polydenylation site. And they can either enhance or repress use of a particular poly-A site. And then proteins can also bind to many different regions of a messenger RNA, and they can control the translation, the localization of that RNA, or the stability of that RNA. And so we're really in this project trying to identify the binding sites for all of these proteins. And this is a figure that was published in Cell several years ago highlighting the proteins that people knew of at the time that could regulate splicing. And there's maybe 30 or 40 proteins on this list, but there's over 100,000 alternative splicing events that occur in the human genome. So there's no way that only 40 proteins can regulate all of those splicing events. In fact, censuses that our group and others have done over the last few years suggest that there's at least 1,000 up to possibly 2,000 different RNA binding proteins, yet only a handful of these proteins have ever really been characterized in much detail. And these proteins are involved in nearly every aspect of RNA biology within the cell. So there's many different complexes and processes that go on within the nucleus and the cytoplasm. And there's many different categories of RNA protein complexes. Okay. So the goal of the project that we're engaged in is to comprehensively identify the elements in the human genome that are recognized by these RNA binding proteins. In the current phase, our goal is to do this for 250 different RNA binding proteins, and eventually we would like to get to doing all of them. We also want to characterize the binding affinity of each protein to all possible RNA sequences and then determine the functions of these protein RNA interactions. And so the way that we think of this conceptually is sort of like developing something like a periodic table for human RNA binding proteins, where for each protein we're going to have the same sets of assays that are conducted in the same manner and the same cell types so that you can actually directly compare them, and every protein will have the same data sets for themselves. So we're creating sort of a reference set for these RNA binding proteins. So for instance, for each protein we'll be doing an assay that Eric Vinostran introduced called ClipSeq, where we characterize the protein RNA interactions between each of these proteins. We'll also be doing an assay called RNA BindingSeq, and this is being done by Chris Burge's lab in which they measured the binding affinity for each RNA binding protein with all possible RNA sequences so that we can then go in, and if we see a mutation in a particular patient you can predict what would happen to the binding of a protein to that particular site. We're also for a subset of proteins that are localized to the nucleus, we're performing ClipSeq on them to see how those proteins associate with chromatin. We're also trying to assess the function of these binding sites, and to do this we're doing either RNAi knockdowns or CRISPR knockouts for these RNA binding proteins followed by RNASeq so we can see how depleting these proteins impacts the transcriptome. And finally, unlike transcription factors which interact with DNA in the nucleus, we hope, RNA binding proteins can function anywhere within the cell. So with the reagents that we're generating for these we're actually performing immunohistochemistry experiments so we can look at where within the cell each RNA binding protein is. And these experiments are being done by Eric LeCoyer's lab in Canada. Okay, so again we're going to be defining the binding sites, hopefully identifying the function of these sites. One thing we're really interested in once we get a large collection of these is we can start looking at the composition of RNPs. So this is sort of the equivalent in the RNA world of chromatin structure. So you can take RNA strands and then start decorating them with the proteins and we can see how individual transcripts are bound by different proteins and how that might impact their function. We also hope to be able to predict how mutations will impact RNA processing and hopefully us or others will use this data to obtain new insight into RNA biology. So for all of these assays that we're doing that we're conducting them in two different cell lines, the K562 cells and the HEPG2 cells which are being extensively studied by others within the ENCODE consortium. So I think when you merge the RNA binding protein data with the transcription factor and the DNA methylation and the histone, chip-seq data, we'll have a very large collection of genetic data that we can hope to really understand how these particular cells work. Okay, so there's over a thousand RNA binding proteins. So how do we come up with which ones to study and which ones to prioritize? I'd like to tell you it's this really scientific method, but the way we did it was we scoured the earth for antibodies and we started testing them. And so we acquired over 800 antibodies for RNA binding proteins. We also have over a thousand SHRNAs to these RNA binding protein genes. And we've so far have tested 700 of these in IP Western experiments. And so for each RNA binding protein, we perform an IP also with the control. And so we identify those that efficiently immunoprecipitate the protein of interest. And of those, we have 438 antibodies that work. Then we also perform for each of these antibodies a secondary characterization to make sure that the band that's recognized on the western is actually the protein that you think it is. We perform SHRNA knockdown experiments and we make sure that the band actually goes away when you deplete the transcript. So of these, we have so far 362 of these done. And so we have reagents that have passed all of these standards for 276 RNA binding proteins. So for the IP Western experiments, we've come up with a grading scheme from really good shown in green to the really bad antibodies shown in red. And we're really focusing all of our experiments on the ones shown in green, which are really good antibodies that efficiently IP the protein of interest. And if we look at just those that have good antibodies, this is sort of the domain composition of the proteins within that. So most of them have canonical RNA binding domains. Some of them have domains that are known to interact with either DNA or RNA. And then there's a bunch of proteins that have other domains which are not typically annotated as an RNA binding protein, but these proteins have been shown to be covalently cross-linkable to RNA, so we're studying them anyway. So the antibody resources are, these are all available at the DCC site. So some of this is similar to the DCC tutorials that have been shown on. So you can go there and search for these. We have, there's a large number of them, which are in this category called not pursued. And those are the antibodies that are bad. They don't work very well, and we think that information is just as important as the ones that are good, so you don't actually go and order a bad antibody. But so you can identify these, and if you click on a particular antibody, you can see both the IP experiment and the RNAI knockdown experiment. And then also if there are data sets that have been generated to date, you can actually access those directly on that site as well. And just like the antibodies, where there's a gradient of how well these things work, the SHRNAs, there's a gradient for how well those work as well. So some of the SHRNAs work extremely well and deplete the protein very well. Others don't work very well at all. And some proteins, even after trying 10 or 12, SHRNAs can't be depleted. And so for those, we were actually shifting now to using CRISPRs to knock those out. Okay. So like I said, for many of these proteins, where we have these antibodies, we're also doing immunostaining of the cells to look at where within the cell these proteins localize. So we have, for each of these antibodies, we're staining with the RBP antibody itself. We have a collection of 10 different markers for different subcellular localizations. And then we merge them. And so this is just an example for three different proteins. But Eric LeCoyer's lab in Canada has an entire database that you can go to. And the URL is shown down here. And you can go and search for your favorite antibody, or you can just browse the images. And we hope within the next few months to get many of these images over to the DCC. So if you were, for instance, to go in and look at a particular protein, this one's AARS. You show the antibody standing here. And these are all the different markers. And then we have the cold labeling of them here. And then you can just sort of browse these. And there's another shot where it'll show the merged images for all of these. So go look at these. They're really amazing images and just fun to look at. Okay. So as far as the genomic assays go, Eric Van Nostrand introduced this to you in the workshop the other day. So this is the eClip assay that he developed in a really high throughput assay that's really good at identifying protein RNA interactions that occur in vivo and with high throughput. And so I won't go through this in much more detail since Eric spent the whole workshop on it. But I can sort of give you some highlights for some of the types of data that we're getting now. So for instance, we're able to classify different proteins in having different types of binding profiles. So for instance, proteins such as PRP8, which is a core component of the spliceosome, tends to bind to five prime splice sites preferentially, whereas another spliceosomal protein U2AF2 binds preferentially to three prime splice sites. There's proteins such as RBFOX2, which binds to intronic sequences, IGF2BP1, which binds to three prime UTRs, and then proteins such as FXR1, which binds to coding exons. And so we can categorize these proteins into these different binding profiles now. Another thing we can do is start to look at co-association. So we can take each protein and ask what are the other proteins that we've characterized that have the most similar binding profiles to it. And so we can try and identify co-associated proteins. And so there are things that we know about. For instance, I think down here in the left bottom corner is a group of a cluster that contains both subunits of the protein U2AF. And this is a protein that has two subunits, and it's a very tight heterodimer. And so we're seeing that those two proteins have the same binding profiles. We can also have another cluster here, which has some proteins that we know of that work together, but also other proteins that we actually don't know that they work together, but we have never even characterized these proteins. So we can start to look at the functions of these, and we can make guesses on what these functions are based on their co-associations with other proteins. Okay, we can also take this RNA-centered point of view, and Eric showed a little bit of this in his talk. For instance, if we look at XIST and the proteins that bind to it, we can see that HNRBK, PTBP1, HNRPM, and SRSF1 bind to the XIST transcript, but they do so with different locations. Similarly, we see a collection of proteins that bind to the non-coating RNA mallet 1. So this is what I was talking about, where we can now start to look at R&P structure by really decorating individual transcripts with different RNA binding proteins. Okay, and so just to give an overview, this is just a demonstration of sort of how many binding sites we're identifying when we do new assays with these proteins. So if we were here looking at the cumulative number of data sets that have been generated, and on the left panel is the number of peaks that have been identified, and on the right it's the number of nucleotides in the genome that are being covered, and we've done 100 different iterations of this, basically putting the assays in different orders, and in red shows that the results for HEPG2 and in blue it's for K562 cells. So we're still, every time we add a new protein, we're still generating new binding sites that have not been identified in previous assays, and we have a lot of RNA binding proteins to go, I think, before we start saturating this. Okay, another assay that's being done, and this is being done in Chris Berger's lab, is this assay we call RNA binding seek, and what this is is they take a pool of randomized RNAs, and they incubate it with recombinant proteins at several different concentrations of those proteins, isolate the bound RNAs, and then sequence them, and then they can look at, they also sequence the input RNA, and so they can look at the enrichment of different hexamers, 7-mers, 8-mers, et cetera, and these different binding affinities, and they can actually calculate the relative binding affinity of the protein for these, and they can get out of that the motifs in vitro that these proteins optimally recognize, and so then what we can do is we can intersect those motifs identified from the RNA bind and seek data with the peaks identified from the clip data, and so when we do that we're actually able to, this pointer's not working very well, all right, so when we do that we can actually identify the actual motifs within the peaks, so this provides sort of an orthogonal validation of the peaks, or the motifs identified in the peaks, because we're now able to cross-reference the bind and seek data with the clip data, and this allows us also to sort of pinpoint the binding sites to very high resolution within the transcripts, so another type of assay that we're doing is the knockdown followed by RNA seek, where we can then look and identify exons that change in their splicing, for instance, or gene expression changes that occur after knockdown, and then we can intersect that with the clip data, so these are some figures what we call RNA maps, and so what we're doing here, for instance, on the bottom is taking all of the exons that are regulated by this protein, RBFox, and we can separate them into ones where when you knock the protein down, you see inclusion of that exon, or when you knock it down you see repression of that exon, and then we can look at the binding profiles for RBFox based on the clip data, pointers, okay, and so what we see is for the exons that are activated by RBFox, we tend to see binding of the protein downstream of that particular exon, and so what we can do now is not just identify the elements that these proteins recognize, but we can partition them into functional categories, okay, so we're using this now to sort of assign functions to the different RNA elements, so some other use you can get out of this data, so this is a paper that we published with Chris Burge recently this year, where they identified a set of alternative exons, their alternative last exons, and they wanted to identify proteins that would regulate them, so what we're able to do here is actually highlight proteins that really enhance the utilization of the most distal last exon versus repressing utilization of the most distal exon, so just by scouring through this data, we can come up with really good candidates for proteins that regulate certain types of splicing events, and Grace Yao in the next talk will actually talk about some of this in more detail for other types of RNA processing events. Another type of experiment we've done is to take cells and we fractionate them, so we have a total fraction, and then we generate subcellular fractions, so we have a nuclear, a cytosolic, a membrane, and an insoluble fraction, and then we can look at where within these fractions different transcripts partition, and so once we have this data we can start now doing integrative type analysis where we can look at a protein, so this is a protein RBM 27, and what we find is this protein actually tends to bind to mitochondrial transcripts, more than other transcripts, and if you look at the imaging, it turns out that the protein is very highly localized to mitochondria, which is good, and if we look at the gene expression, genes whose expression changes upon depletion of the protein, it turns out most of these tend to be mitochondrial transcripts as well, and then what we can do is, using those fractionation data that we generated, we can sort of partition the data into where each transcript tends to localize within those, and then in red and blue, respectively, we can look at the genes that are over and under expressed upon depletion of those, and these tend to correlate with where the mitochondria migrate within those fractions, so this all provides evidence that this RBM 27 protein is really involved in the function and activity of mitochondrial transcripts. Okay, so here's a snapshot of about where we are to date, so far we have at least one data set for 344 RNA binding proteins, we have about 1,300 data sets that are completed and or released, and as I said, they're in both K562 and HEPG2, and so these are the numbers for individual data sets, and what we're really trying to work on now in the next few months is trying to maximize the number of assays that are available for each protein, and our goal is to sort of all these black squares hopefully fill as many of those in as we can, so we have the same data for every RNA binding protein, and then eventually we'll be able to have our periodic table for RNA binding proteins. Okay, so this is a lot of work from a large people, so my lab is at UConn, these are the people involved in my lab, Chris Burges' lab is doing all the RNA binding-seq experiments, Jean Yao's lab is really spearheading all the clip data, doing a lot of heroic work, which is really led by Eric Bonastron. Eric LeCoyer's lab is involved in the imaging data, Foo's lab is doing some chick-seq data that I didn't really talk about, we're collaborating closely with Grace Yao, and I have to thank the data coordination center for working with us very closely, I'm getting all the data in there so that you can have access to it, and I have to thank NHGRI for funding, so thank you for your attention and happy to take any questions. Maybe you already mentioned this, do you have a sense of how many binding proteins bind to the sequence element, and how many bind to the structure, secondary or tertiary structure? I don't have numbers for you, but there are a lot, I would say most of the proteins that have, for instance, RNA recognition motifs, they tend to bind to particular sequences, but some of those do so within a structured context, so the sequence would have to be a single-stranded sequence, but it's in the loop of a stem loop, for instance, but the actual sequence of the stem is irrelevant, as long as it forms a structure that displays the binding site within a loop. And the example would be excess RNA, which binds proteins, not at the same place. Yeah, and then there's double-stranded RNA binding proteins that we're looking at, so those tend to bind to obviously double-stranded RNA. There's also a lot of proteins that just seem to coat RNA, so they don't really, there may be like one site that's like a sort of a nucleating site, and then it just sort of coats from there, and so most of the sites for some proteins are pretty nonspecific. Thank you. So there's a one over there. Oh, oh, you already have the mic, okay. I'm like, you don't? So perhaps it's a very nice question, but you make an analogy with the RNA elements versus the DNA elements. Is there anything similar as this, you know, communication that we know now with the histomorphs in the DNA elements, something similar in the RNA elements, like a long distance, like interactions within the proteins and the RNA bound? Yeah, so that's a good question. So we're, so we have information on protein RNA interactions, and some of those will be occurring within the nucleus, and some will be in the cytoplasm. So the nuclear ones could be sort of interacting at the chromatin level. So for some of those proteins, FUSLAB is also doing chip seeks, so we can sort of get an idea of like where the protein is on that. But there are other groups that are also working on sort of a chip seek like assay that instead of just sequencing DNA, we'll sequence both the RNA and DNA that's bound, so sort of like a chiopet, high C type thing where you're looking at protein, protein mediated RNA DNA interactions. So we're not doing that ourselves, but I know there are other groups that are doing that, and that would probably be the best way to address that question. And even more general, so what is the the role that those RNA elements or in general those RNA and proteins play versus the gene regulation? Because it's like it seems like two different worlds, but they interact, right? Yeah, so there's some experiments showing that there's proteins, so for instance like the HIV TAT protein will bind to the TAR element, and it sort of reaches back to the promoter and enhances transcription, so that's sort of one way in which some sort of protein, RNA, mediated interaction in the nucleus could occur to interact with the transcription machinery to enhance it. I think a lot of these are also involved in regulating translation or transcription elongation. They could regulate transcription termination, so I think there's a lot of different ways in which this could work and hopefully us and others will be able to figure this out. So Jason. I was wondering if and how much signal you see outside of annotated genes? Outside of annotated genes? So I would say there's very little signal outside of annotated genes. I mean there's some, but I would say the vast majority of it is in something that's annotated or at least within the intron of an annotated gene, so we see very little signal throughout the genome outside of where we know transcription occurs in these particular cells. For the proteins that are clustering together, are they binding the same sequence and then competing, or are they part of the same complex? So it seems like for the most part they're binding or they're forming a complex and then they bind together, so they'll actually have different binding sites, but they're binding as a complex together. So can you say how many complexes there are then from your data roughly? We probably could, I don't know, number off the top of my head, but okay. All right. One more, they're way back there, and then we should probably move on. Have you looked at all at RNA editing using the, like, our total RNA-seq data? That's an excellent segue. In fact, our next speaker will talk precisely about that. So, yeah, so Jeeping, we can catch up, but yeah, so we'll move on so we can keep on time. So I'm going to introduce the next speaker, which is Grace Zhao, and she will actually talk about these RNA binding proteins how they control splicing and possibly editing today. I know she's working on it, so.