 Hi, my name is Brent Gravely from the University of Connecticut Health Center, and I'd like to present to you today the work that we've been doing over the last five years as part of ENCODE 3 and going forward as part of ENCODE 4 under our own sub-project called Encore, which is the encyclopedia of RNA elements. And this work is a large collaboration over many years from the labs of Shendong Fu, Eric LeCoye, Chris Burge, and Jinya, who is also a PI of this project. And today what I'd like to tell you about is the work that we published as part of the ENCODE 3 publication package that recently came out. And our goal of our project is to generate a large-scale binding and functional map of human RNA binding proteins. So we believe that there are approximately 1,500 different genes in the human genome that encode RNA binding proteins. And the majority of these have only been characterized based on the fact that they have a motif that's known to function as an RNA binding domain where they've been identified in mass-spec experiments as proteins that associate with messenger RNAs. So our goal of our project eventually is to study each of these and determine the binding sites and functions of each of the binding sites for these proteins. And we're doing this by using a series of five different assays. Our main assay to study binding in vitro is E-Clip, which stands for Enhanced Cross-Linking Immunoprecipitation. And this is an experiment in which we use ultraviolet light to cross-link proteins to RNA and vivo. Then we lyse the cells and purify the proteins using antibody-specific to those proteins and sequence the bound RNAs. And this will allow us to identify the binding sites for each protein in vivo. Another binding assay that we're using is called RNA-Binding Seq, and this is an in vitro binding assay in which we incubate purified recombinant protein with a pool of synthetic RNA containing all possible sequences and we purify the proteins and incubate them in several different concentrations and then sequence the bound RNAs. And this allows us to determine the relative binding affinity of each protein for all possible RNA sequences. And therefore derive in vitro motifs from them. We're also performing immunofluorescence to look at the subcellular localization of each of the proteins within cells. And this helps us to gain better biological insights into the binding and functional data we observe in other experiments. We are also performing knockdown RNA-Seq, where we use either SHRNAs or CRISPR to either deplete or delete the protein of interest, followed by RNA-Seq, and this allows us to identify RBP-responsive gene expression and splicing changes, and we can then associate those with the observed binding from E-clip experiments. And finally, for a small subset of proteins that localize to the nucleus, we're performing traditional chip-seq experiments, which allows us to look at the association of these proteins with chromatin. And I won't go into too much more detail about these experiments, but they are discussed in our recent publication in the ENCODE 3 package. So this is an overview of all the experiments that we've performed to date. We've generated at least one confirmed data set for of those five different types of experiments that I just discussed for 356 different proteins. And on the left, we've listed the functions of these proteins based on annotations from the literature. And what I'd like to point out is that 23% of the proteins that we have studied to date have no known function in RNA biology other than having a motif suggestive of RNA binding or being identified in a mass-spec pull-down experiment. So there's a lot of new biology we can uncover by studying these previously uncharacterized proteins. In the center section, this depicts the localization patterns for these different proteins. Next to that are the different types of domains that each of these proteins contains, and on the right is the list of different types of experiments that we've performed for these proteins. So you can get an idea of how much data is available for each of the proteins. So in the next slide, we can look at the clip data and associate this with different regions within the annotated transcriptome that these proteins might bind to. And here's just some representative examples. For instance, the protein PRPF8, which is part of the spliceosome, strongly associates with five prime splice sites. U2F2, just shown in blue, is also part of the spliceosome, but this protein binds to the three prime splice sites. RBFox, shown in red, is a regulatory protein, and this binds to intronic sequences, IGF2, BP1 protein, shown in orange, binds to UTR sequences, and FXR1, shown in red, binds to coding sequences. So we can use this type of analysis to bin the various binding sites for each protein into the annotated features of the transcriptome. And on the next slide, what we observe is the binding profiles to these annotated regions for all of the data sets that we've generated to date. So on the left are those that bind to coding sequences. In the middle are those that bind to three prime UTRs. In the blue to the right are those that bind to intronic sequences. And in the orange on the far right are the non-coding exon binding proteins. So we can get a good glimpse of what these proteins might be doing based on just the regions that each of these proteins tend to interact with. On the next slide, we can look at this from an RNA-centric point of view. So this allows us to take a RNA of interest and examine the proteins that might bind to that RNA, and not only which ones bind to it, but where within that RNA they bind. So on the bottom left you can see this is for the excess RNA, which is a non-coding RNA involved in dosage compensation. And here is shown data for four proteins that bind to this RNA, HN-RNPK, PTBP1, HN-RNPM, and SRSF1. You can see each of these proteins binds to this RNA, but it binds to different regions within the RNA. So we can now use this type of data to build up what RNP particles might look like within the cells for any given RNA of interest. And the same type of data is shown on the right for the mallet 1 non-coding RNA in which we see about five or six proteins that, again, bind to this RNA, but do so in different regions within the RNA. So this is type of data that you can get out of this is, if you have an RNA of interest, you can get an idea of which proteins bind to your RNA, but also where within that. This slide is a little complicated. It's an overview of the RNA bind and seek data, and what I'd really just like to point out from this particular slide is that by characterizing the motifs that these proteins bind to, we can also examine at the same time the types of domains that these proteins have. And one of the surprising findings that we found is often proteins, multiple proteins, combine to the same sequence motifs. But what we observed is that often these proteins have different types of domains. So you may have a protein composed solely of RRM domains, and another protein composed solely of KH domains, yet these two proteins will bind to the same motif sequence. And if we then compare the motifs, observe that each of the proteins binds to in vitro from the RNA bind and seek data on the left to those for clip on the right. So this is motif analysis from the clip peaks. What we observe is usually very strong correlation between the the in vitro and the in vivo binding sequences that are preferred for each protein. And so this strongly suggests that the sequence specific binding in vivo is primarily determined by the intrinsic affinity of the protein for RNA rather than by cofactors that may modulate the binding affinity of a given RNA binding protein. Another type of thing that we can do is build what we call splice maps. And so these are maps that describe the location dependent activity of a protein. So what we'll do is take all of the pro all of the, for instance, alternative axons that a protein binds to and also regulates and will then make a map of when the protein tends to activate splicing of that axon, where is the binding location for that. And if the protein represses splicing of that axon, where does the protein bind when it does localization. And from this, we can derive the splicing maps where in the middle, what we can observe is that, for instance, this protein RB Fox two, when it binds downstream of an axon, it tends to activate splicing of that axon. And so this is a location dependent splicing regulator, and this can be depicted in a heat map format shown on the bottom. And in the next slide, we can compare splice maps between different proteins as shown here. So what we generally observe from these types of analyses is that proteins such as H&RP proteins, when they bind to axons, they tend to repress splicing, whereas when they bind to intron sequences, they can enhance splicing. In contrast, proteins from the SR protein family have the opposite activity, whereas when they bind to axons, they tend to activate splicing, whereas when they bind to introns, they tend to repress splicing of those axons. One of the surprising things that we observed from these types of experiments was that in traditionally, people who've studied splicing have always thought of the elements that regulate splicing of a particular axon to be located very close to that particular axon. But one of the things that we find is the strong enrichment for binding sites for RNA binding proteins in the far upstream region of the intron, flanking the alternative axon. So what we see is in this particular axon in the middle here, which is alternatively spliced, the most enriched locations of binding sites from a genome-wide manner is actually just downstream of the splice site of the very upstream constitutive axon. So this is a region that very few people have looked into as far as being involved in the regulation of alternative splicing, but it strongly suggests that this is an area where researchers look in the future. And the final data slide I'd just like to show is the type of analysis that can be done going forward with this data in helping to interpret the genomic variants that are observed in patient samples and whatnot. So this is an example where we intersected the variants from the exact consortium exome aggregation database with our clip peaks, and this is one location in the UTRM gene where we observe a clip peak for RBFox that overlaps this particular G2C variant. And this G2C variant is predicted to convert a perfect binding site for RBFox2 to one of the worst possible binding sites for RBFox2, and suggesting that this mutation may disrupt the interaction of RBFox with this RNA sequence. And this may be functional because if we look at the RNA-seq data, we see that the upstream exon in control cells is almost constitutively included, whereas in RBFox knockdown cells, this exon is predominantly skipped. So this strongly suggests that this particular SNP observed in the exact consortium database may disrupt the binding of RBFox, which in turn may disrupt the binding or the splicing of the exon upstream of it. So by using this data, we can actually go from not only understanding splicing, but we can actually start to interpret genomic variants without doing any additional experiments to figure out what those variants may be doing. And so just to finalize, our future goals for this project are to continue this, are currently funded under a U41 consortium or community resource project to continue generating this data. And we're affiliated members of ENCODE and we are working to generate data for 300 additional RNA binding proteins from the two cell lines that we've done using the same types of experiments. And we hope over the next year or two to be able to complete generating that data and get it released to the public. And finally, just like to acknowledge the people involved in this work. Again, this is performed in my lab, the labs of Jean Yao, Eric Bakoyer, Chris Burge, and Zhen Dong Fu. We've also collaborated a lot with Grace Yao's lab from UCLA. And I would like to thank NHGRI for funding this work over several years. And thank you very much for your attention.