 Thank you, Brent, and thanks to the organization committee for the invitation to speak. And I'm going to mainly focus on genetic variants instead of RNA editing today, but we are working on how these RNA binding proteins may affect RNA editing for another day. Okay. Okay. So, Brent gave a really good overview about splicing and gene regulation, so I'll be very brief. And since today's talk is on genetic variation, so I'll mention the different pathways that genetic variation may impact gene regulation. And this is from a review paper written by Jonathan Pritchard and his co-authors. And shown here are SNPs, and they're influencing different pathways of gene regulation. So obviously, SNPs can impact epigenetic modifications or transcriptional initiation, and we have heard a lot of exciting talks in the past two days about these aspects. And in addition to these epigenetic and transcriptional regulation, there is increasing appreciation about post-transcriptional regulation and how genetic variations may influence, such as pre-MRA processing, including alternative splicing and post-transcriptional processing, and MRA stability, or RNA export, or translation. So there are different aspects of post-transcriptional processes that may be influenced. And so I will be very brief on splicing, and I think this was touched upon by the questions in Brent's talk as well. Most DNA elements and RNA elements may influence splicing, and there is crosstalk between epigenetic regulators and splicing regulators, and the most always pathway that affects splicing and the most dominant is perhaps the RNA binding proteins, that they influence the regulatory elements that are encoded in our RNA, and also influence other aspects of RNA processing in addition to splicing. So regardless of the mechanism, what we are interested is identification first to identify what splicing events may be influenced by genetic variation, and then after that we can deal with the question of mechanism. So in our lab we first took a little specific approach. So because in the in-code and other big projects we have accumulated a large amount of RNA-seq data. So using the RNA-seq data alone, can we examine the genetic variants within the RNA-seq reads. So we took a little specific approach, and this is just a hypothetical example that to show the underlying principle in our approach. So suppose you have an RNA binding protein or splicing factor that has a very specific sequence preference, and if you had a C-aleo in a SNP, then the factor binds very well, but if you have a G-aleo, then the binding is abolished. And suppose this splicing factor enhances the splicing. So if you had a C-aleo in the exon, then recognized by the splicing factor, and you may splice by including this middle exon in yellow. But in other case, if you had a G-aleo, then binding does not occur, and without this splicing enhancer, then you may not have splicing of the middle exon, then you have the skipping of this exon. So in the RNA-seq reads, what we observe normally in the poly-A-selected RNA-seq would be that the RNA-seq reads contain predominantly the C-aleo in this case, because the G-aleo induced exon skipping, and you cannot observe that in the RNA-seq reads. Okay, so given this kind of rationale, we developed approach and published in 2012 to identify these exonic SNPs that may affect splicing. And what about the intronic SNPs? So using regular poly-A-selected RNA, we actually missed most of the intronic content in the RNA-seq reads. And to go after these intronic splicing regulators, we actually made use of the encode self-fractionation RNA-seq data that was generated in Tom Jinger's group a few years ago. So for a number of encode cell lines, we have four different types of RNA-seq data, and they are separated into nuclear RNA, or cytoplasmic RNA, and they can be poly-A-minus, which means without poly-A-tails, and then poly-denylated RNA. So we have these data, so we developed a new approach to deal with the intronic regulators. And first of all, we wanted to examine the data sets. And after mapping these reads to the genome and transcriptome, we found that as we expect, the nuclear poly-A-minus RNA is enriched with intronic content, so a lot of reads map to the introns, and then your intronic reads decrease. So we were very happy to see this because this means that we can make use of these nuclear poly-A-minus RNA-seq data to examine the intronic content of these RNA. So our new method is called IGMAIS. It's identification of the intronic tag SNPs for genetically modulated alternative splicing. And this is the rationale for this approach. Suppose now we are focusing on an intronic SNP, which has an A or G allele here, and this intronic SNP is a regulator. So that's our assumption, a regulator of splicing. If you had A allele, then you have axon inclusion. If you had G allele, you had axon skipping. And just ignore the axonic SNP for now. And if we look at nuclear poly-A-minus RNA, then you would observe some of these reads came from the spliced out products. So these are the splicing intermediate products that will eventually be degraded. But at the snapshot where we took the RNA, we may have captured these kind of RNA molecules. So if we look at paired and reads, in this case, all of the encode RNA-seq data in these cell lines are paired and. So if we look at the pairs, and if we have reads covering the intronic SNP, and we have reads that cover the axonic regions, then we can examine the allelic content for the SNP of the intronic SNP that we are interested in. And in this case, if you are examining spliced out products and in this example, the A allele will not be associated with pairs of reads that cover the intron and axon, but the G allele will. So you will observe a significant allele bias by comparing A and G. Of course, there are some subtleties here. We have nascent RNA that can also be captured by this protocol and they're not spliced out products. So we developed a basing approach to estimate the fraction of the nascent RNA, which I don't have time to go into details. And because we also have cytoplasmic poly A plus RNA, so we can examine these mature RNA molecules as well. And if there was an axonic SNP to start with, then we can observe this SNP in the axon in the mature RNA, which serves as a validation of this approach. So we applied this method to a number of incosal lines listed here. And in total, we applied both the intronic and the axonic methods to identify both types of SNPs. And we identified 622 SNPs that can be predicted as regulators of alternative splicing. And we did some randomization to control an estimated FDR, which is about 3%. And we did experimental validation. And here we use splicing reporter ICs in the Hela cell lines. And our validation rate is 80%. If we use an additional cell line, so because the splicing factors may be cell type specific, then our validation rate can be as high as 90%. So with these events, we wanted to understand a little more about the splicing regulation. And first of all, we examined whether the different cell types share common events. If you have a SNP in two different cell lines, the same SNP does it induce the same type of splicing regulation? And the answer is that, yes, it is much more significantly overlapping across cell lines than you would expect by chance. So indicating that the genetic variants that alter splicing may be quite ubiquitous across cell lines. And we also examined evolutionary conservation of these events. And as you can see here, we separated coding events and non-coding events and examined their controls respectively. And if you look at both coding here and non-coding, you can see that these GMS exons are less conserved than the controls, and it's statistically significant. And indicating that these regions may be evolving faster than the controls. And if we look at the sequence identity between human and other genomes along evolution, then you can see that the divergence occurred before primate appeared in the evolutionary tree. So the GMS exons have accelerated sequence evolution in primate lineages specifically. OK, we also examined the overlap of GMS SNPs with the GWAS SNPs. So there are a lot of GWAS SNPs, which we do not have an indication of functional involvement. And we have over 100 GMS SNPs that are in the IOD with GWAS SNPs associated with different traits. And for GWAS SNPs and GMS SNPs that are in the same gene, located in the same gene, which accounts for 66% of these GMS SNPs, a lot of them are in the introns. So a lot of GWAS SNPs that we could explain through the GMS pathway were located in the introns. And similarly for GWAS SNPs that are in different genes as the GMS SNPs, there are also a majority of them in the introns. So we could explain a lot of intronic GWAS SNPs that were not known to be associated with non-functional implications through the splicing regulation. OK, so now given these hundreds of GMS events, what are the mechanisms that is mediating the SNP regulation of GMS or alternative splicing? So as we talked in the background slide, the most dominant mechanism or pathway is RNA binding proteins that regulate splicing. And we predicted through motif search to see which splicing factors may affect the GMS SNPs. And we identified this list of splicing factors. And I want to highlight the first one, which is SRSF1, that affects the most number of GMS SNPs. And luckily, we had the encode RNA-seq data site generated by Brands Group, where we had SRSF1 knockdown. And we also had Eclipse-seq data generated by Jean-Yaw's group for this protein. So we could validate whether our predictions were correct. So basically, these axons that we predicted to be under regulation by this protein, the majority of them showed a splicing change when we knocked down the protein. So that confirms the prediction. And similarly, the majority of them had Eclipse-seq binding sites. And also, using the Eclipse-seq data, we could actually analyze the SNPs within the binding sites. And we saw that most of these Eclipse peaks are associated with allele-specific binding. So there is an allele bias in the binding sites themselves. So these serve as very nice validations. And we also did experimental validation. In this case, it's an in vitro gel-shipped assay to show the allele-specific binding of this protein given a G allele or A allele in one of the target axons or introns. And we did a bunch of these gel-shipped assays. But what I want to draw your attention to is this pictogram, where we see the motif of SRSF1. And then we ask, where does this SNP that influence splicing often fall into? And interestingly, we found that the majority of them actually target this nucleotide in the motif. And a small, about 30% of these SNPs target the strongest consensus motif. So this is interestingly consistent with what Mike Snyder yesterday reported for transcription factors, that the SNPs oftentimes do not target the strongest consensus nucleotide, which I will come to later in this talk. So we have all these splicing factors predicted. And we are using the other encode, RBP knockdown and Eclipse data to make global predictions, which I will not have time to talk about today. But what I will talk about is the conservation patterns of these splicing factors. So for the predicted splicing events, the GMS events, I said that they are evolving faster. So their conservation level is less than the controls. But for the splicing factors, it's the opposite. These splicing factors tend to conserve at higher levels than controls than other splicing factors that are not predicted as regulators of GMS events. And if we look at the consensus motifs of splicing factors, just as what we did for SF1, of course, on average, these splicing factors have strong consensus. Because if a single nucleotide change can alter the binding of this protein, then that means the protein is very likely to have a strong consensus, a strong binding affinity to certain motifs. So that's what we found. The motifs are, on average, they have higher information content in their motif stress. But if we look at single nucleotides, just as what we did for the SRSF1 protein, then we found that the GMS SNPs oftentimes fell into the nucleotides that had higher consensus strengths than what you expect by chance. But oftentimes, it's less than the nucleotide that had the maximum consensus strengths, which is consistent shown here for the SRSF1, the nucleotide that often targeted by the SNPs is not the strongest one. And interestingly, if we go to these SNPs that have linkage disequilibrium with GMS SNPs, then these GMS SNPs tend to have higher consensus strengths, indicating that these likely functionally relevant SNPs are disrupting the stronger nucleotides in the motifs. So in summary, we used encode RNA-seq data and identified more than 600 alternative splicing events that may be regulated by SNPs or mutations. And we found that oftentimes, these events are regulated by splicing factors through a little specific binding. And these axons demonstrated accelerated evolution, but they are regulated by highly conserved proteins with strong sequence consensus. And the encode RBP knockdown and E-clip data have been essential for inferring the regulatory roles of the proteins in the GMS. With that, I'd like to thank people in my lab who did this work, computational work, and experimental work. And I specifically thank the data production groups of encode brand gravelies group for producing the RBP knockdown RNA-seq data, Jin Yao's group for generating the E-clip data, and Tom Jinger's group for generating the fractionation RNA-seq data. And thank NHGRI for funding. And thank you for your attention. I'd like to take questions. Hi. How far into the intron from the exon boundaries did you go to find the SNPs? Oh, that's a great question. We used about 300 nucleotides only. And also, because we require the paired and reads to cover the intron and the exon, so by default, because of the RNA-seq library generation, they are quite close. So we are restricted to intron regions that close to exons. And did you find a difference between the ones that were closer to the exon boundaries than farther away from it? Yes, in general, there is a trend where if the SNPs fell into the splice-side regions very close to the exon, it tends to have a much stronger impact on splicing. And then there is a distribution of strengths negatively correlated with the distance. Hi, beautiful talk. I want to ask you more general question and more related with your opinion about that. But related with the imprinted region, right, or imprinting regions, have you, and there are like, in some of them, there are very long mRNA transcripts, right, that they don't even know where the promoter is. Have you looked or know if these SNP differences between alleles or splicing variants can be involved in the expression of different transcripts from different alleles in the imprinted regions? Yeah, great question. So for the imprinted genes, if we do a little specific expression analysis, we often see them to be very biased. And unfortunately, for splicing, we are looking at a local region, a small region around the exon or within the exon, so we don't deal with the imprinted regions in this case. But they are in the RNA-seq data and they're very obvious. They oftentimes are the strongest alleles specific expression targets. But for splicing purposes, we didn't consider them. Hi, Grace. So your conservation and GWAS analysis suggests that there's many of these SNPs that aren't strongly deleterious or don't have a big functional effect. Have you thought about next steps for trying to figure out some of those more subtle effects that they might have, for example, computationally by looking at allele frequencies or functionally through experiments? Yeah, yeah, that's a great question. Yeah, so we were trying to follow up different directions for that. And we tried to look at the population-wise these allelic frequencies and relate to or ask questions whether these SNPs tend to be positively selected and it turns out that a lot of these SNPs are located in positively selected genes and their allelic frequency and distribution across different populations in the 1,000 genomes project tend to show evidence of positive selection. Other than that, we are more interested in the regulatory mechanisms after we identify them using the RBP data sets within the ENCO data. So that's another direction that we were going after for the follow-up work, yeah. It is a very nice work. So in your study, did you see there's another factor, SRSF3? Have you observed two? Oh, I have to go back to the list. Right now, we don't have it. So we are trying to analyze more data sets. So here, we analyze only a limited of seven or eight data sets, cell lines. So we are trying to expand them to much more different cell types with different genetic backgrounds. So that may enrich our analysis. Thank you. OK. Thank you very much. Dr. Gravely had to run to catch a flight. So I'll be closing the session, but I'd like to thank our speakers for-