 or a heterozygous mix, or you could have little b, little b. And in typical RNA-seq or expression analyses, we don't care which allele of the gene is being expressed. We just count from either allele. But there are reasons why it's interesting to look at the expression of the two alleles. So here I'm using some language. So I'm showing this gene now along the genome. And let's say upstream of the gene is a SNP or variant, or it could be even a copy number variant, which is somehow regulatory for the gene. And so sometimes that's called an RSNP or regulatory SNP. And then inside of the gene, there could be another SNP, just called a transcribe SNP or an eSNP. And it could be interesting to take into account which allele is being expressed. So if this is a regulatory SNP for this gene, if you look across from the homozygous to the heterozygous and then the homozygous of the alternate allele, if you see a change in the number of transcripts, that's indicating that that is an EQTL, an expression quantitative trait. Locus, the pair of the SNP and the gene, it describes a locus. And that might tell you about what that SNP is doing. So it's important for understanding non-coding genetic variation. And if you look across these different types of individuals and you see a change, that's called an EQTL. You can also see that within this middle group. So within a heterozygous individual, you can count the difference between the two alleles if you can observe it. So this is a little diagram. This on the y-axis is whether there is a lelec imbalance. And so on the top two panels, there is no lelec imbalance. So you can see the number of reads are the same. And that's because there's no genetic variation in this regulatory element here. So in both cases, it's the same. And on the bottom, there's some lelec imbalance because let's say this SNP, this little blue or a brown line, is somehow disrupting the binding of this yellow transcription factor. So then there's less of the transcription. So on the bottom two diagrams, there is a lelec imbalance in the system. And there's true biological lelec imbalance. And then left to right is whether there's a SNP inside the transcribed region, so this TSNP. And that's whether we can observe the lelec imbalance. So there needs to be genetic variation in the coding, in the transcribed region. It doesn't have to be in the coding region. It could be in the UTR. But if there is genetic variation there, we can see it. And if there's both genetic variation in the regulatory regions and genetic variation in the transcribed region, then we can detect an imbalance. So I hope that diagram is not too complicated. And then I also wanted to mention that it is a context-specific property. So it's not that within an individual there will always be a lelec imbalance, but it actually differs by over time and by cell type. So this is a diagram showing like, depending on what elements are accessible, so they're epigenetic state, and depending on what transcription factors happen to be in the nucleus, you might see or not see a lelec imbalance. And this is, we'll see this today. So we'll see an example of dynamic lelec imbalance. And then another member of the LUB lab, Wynson Mu, worked on a package to detect a lelec imbalance across cell types and single cell data. Let's see. I'm just gonna, I see a question from Ryan. If you see a lelec imbalance, it could be explained by an EQTL, but also by something like splicing QTL. Yes. So I'm giving a super quick intro to a lelec imbalance. And we're gonna kind of focus on this kind of regulatory region type of imbalance, but in the text introduction of this workshop, we talk about there's a bunch of other things that could be going on, like imprinting or splicing QTLs. And so we're kind of, I'm just showing a cartoon, but there's all these other interesting things that you can detect and those are kind of covered in the text. So why do we have this, why a workshop today? There are many workflows for detecting a lelec imbalance and they have lots of advantages. And so a very popular one is called WASP. And often a lelec imbalance is considered at the SNP level or at the gene level. And it does not account for isoforms at all. So it just looks to see if there's a stack of reads at a SNP and if those are imbalanced. And there's a number of advantages of that approach, but one that we, a disadvantage that we focused on is that there could be across the two alleles different imbalance for different promoters. And that's kind of what we're gonna focus on today. So for example, here, there's a SNP here, which is producing less expression of this promoter, transcripts starting at that promoter, and then suppose this variant on the other allele on the other chromosome is producing less expression for transcripts from that promoter. And these dotted lines would represent like chromatin contacts. So maybe this is some complex looping thing here, but we do see these examples and we wanted to be able to observe this isoform level regulation as well as the gene level. So let's see, coming up on like nine minutes, our input for like, what type of data do we need to run this? Oh, and then let me just try to respond to that. So Ryan asked a question about those are, yeah, these are alternative promoters like for the same gene. And this, we see this happening in real data. So that's kind of what we're focused on. We wanna be able to detect that as well as gene level imbalance. So we're gonna assume that there are nasic reeds and we're kind of focusing on genes where there's some heterozygosity. So we need to have measured, there needs to be a heterozygos snip in the transcribed region of the gene or else we just don't see the imbalance. There could be imbalance, but we're stuck with the total count. We can't disambiguate from the reeds. And so our pipeline that we've worked on with a number of collaborators, I'll have a slide that acknowledges all the collaborators in a minute. You start with a FASTA for the genome and like a strain specific VCF. So we've done a lot of work with model organisms and F1 crosses. I'll mention that if you wanna do human allelic imbalance, we are very interested to support that, but right now our pipeline is really for when isogenic. So all the samples are from the same F1 or it could be a heterozygous human donor, but you're kind of doing something over time in a single donor. We're working on generalizing this. So you start with this FASTA and VCF and you build a diploid transcript sequence using this tool called GTG tools from the Churchill lab. And then you take your RNA secreeds and you quantify allelic expression against a two in transcriptome. So it's got two versions of every transcript. And here we're using salmon to quantify expression of the two alleles. And you also generate bootstrap distributions for the allelic data. We import to R with a new function in the Fishpond package called import allelic counts. And then we do differential analysis, which we'll see today. So just I wanna pause and acknowledge a lot of people we've been working on this project, including a team that's for the real data analysis, this mouse osteoblast data set. We've done methods development with Rob Patrick's group, including Nor who's here today and Mohsen and Biostats mentorship and community at UNC and the financial support. Okay, so I'm gonna off from the slides and switch over to this pipeline. And I'm gonna go fast because I went slower than I thought on the slides. So let me just zoom in real quick and then I'll hand off soon. So there's a lot of text above, which is kind of redundant with what I just was talking about. And then I'm jumping in here at when we talking about how to import this data. So we've quantified with salmon to the two alleles of the organism and we have bootstrap data that tells us about the uncertainty. So now we create transcripts from some transcript database. We assume we have all these different samples and then we basically import the data and there's an option about what level of analysis do you wanna do the allelic comparisons. So you can do it at the gene level. That's kind of, we've done that and it gives you very similar results to WASP or you can also do it at like a sub gene resolution. And so you can do it at the isoform level, but we found that it's preferable to do it at the transcript start site level that you can find interesting things. And the way you do that is you just, we've added a function that helps you like group isoforms by their transcription start site. And also we have like a little wiggle parameter like plus or minus 50 base pairs. We group those together because we're interested in potential imbalance that's affecting like specific promoter regions. So after doing this, this kind of grouping of the isoforms to the TSS level, you import the data using this new function called import allelic counts. And it'll build a, what we call a wide summarized experiment. It's wide because it has first samples and their like reference allele and then come the alternative allele. So it's double wide compared to a normal summarized experiment. And then the last thing before I pass off to Yuffie, I just wanna very quickly describe the data set we're looking at. So it's a really cool data set from our collaborators. So Cheryl and Gary who are, who among other things are mouse geneticists. And Cheryl is also, Cheryl Eckert-Bicknell is also a studies osteoblast. So the bone cells that generate the bone density and the matrix in the bone. And this is a differentiation time course where they start with precursor cells to the mature osteoblasts and then they differentiate them over 18 days. So we have nine time points of an F1 mouse and we can track how the two alleles for every gene, how they track over time. And this is, so we're gonna release all the data in like very shortly. This is the data that we have here today is a subset of that is just chromosome one. Yeah, so I think I'll switch over now and I'll just be around if there's any questions I'll just be in the background. Why are we switching over? Does anybody have any questions for Mike? Including like what is allelic? I totally missed that. I know it's not a common analysis. So I'm sure there'd be people in the room who haven't done it before. So very basic questions are totally welcome about allelic analysis or why you do this. Okay, so this workshop is on orchestra. If anybody wants to try to log in and follow along. Thank you, Mike for the amazing introduction. So if you wanna follow along the code, you can open up the RStudio right now. So in this workshop, we have pre-package to the data set. As Mike has mentioned, it's the F1 osteoblast data set that is short down to only chromosome one and the one is on gene level and the one is on transcript starting site level. So first of all, we just wanna explore like what the data set looks like. And here we first load up this gene level data set as a summarized experiment data set. And then we can see that in the assay, we have counts, abundance, and there's supposed to be length and then in wraps one to 30s. And then that is the bootstrap replicates from salmon that is supposed to capture the inferential uncertainty. And the role name will be the gene name because we're looking at gene level. We can also look at the metadata. There's three pieces of informations and their alleles, which A2 will be the reference here. That would be the black six and then A1 will be the alternative. It's either one to nine or cats. So we have two different cross over here. And then the day column capture the different time points. As Mike has mentioned, we have nine different time points from day two to day 18 every other day. So why do we need, okay, let me just go down here and show you the assay name as I just described there, the read counts, abundance calculated from TPM and the length will be the transcript length that I've reached out of all the transcripts within the given gene. And then we have 30 different bootstrap replicates. We think that 30 is a good number because it doesn't take too long to run in salmon. And I also do a pretty good job of capturing the uncertainty. And I had a question here that why do we need bootstrap replicates? And I feel like I just said to you guys that it's used to capture inferential uncertainty. So essentially when, I think you guys probably familiar with that that when you're doing the read mapping, a read could go to multiple places. It could go on different alleles, different transcripts, different genes. And especially when we're doing a little like level as a form analysis, there are more options where this read can go. So this uncertainty of where exactly the read is mapped to will decrease the statistical power downstream. So hopefully when we have 30 different bootstraps it will be able to capture the variance of this uncertainty instead of if you just have one number you might land on like the extreme side of this spectrum. So to visualize this bootstrap replicates, one option is to use the function gate trace and you can plot the histogram. So here we're plotting the first genes in the sample cast cross black six, day two of the reference allele. And you can see that it's not really, the histogram is not going very wide and kind of averaged out. And if you look at the mean and the variance of this count, you can see the mean and the variance are kind of close. I can show you a not very close example. One sec. Okay. And here you can see that the estimate counts is going from 100 to 350 instead of just like 80 to 120. And if you go to mean and you can see the mean and variance are like not similar at all. Since the bootstrap is following a percent distribution. So when the mean versus similar we consider this uncertainty is pretty low. So this is an example of high certainty. And the previous example is example of kind of low uncertainty. And another way of... Go ahead. Can I interrupt? Cause that's a great, I didn't see that example yet, but that's fantastic because so one, two, nine is another mouse strain. It's too similar to black six. So that there's more uncertainty because we can't tell the alleles apart. Yeah. So that's fantastic. Yeah. Thank you for that information. I actually didn't know that. Okay. So another way of visualizing the bootstrap distribution will be using plot in wraps. This is actually my favorite function, FYI. Cause I'm going to show like a lot of this plot later on. So the way to show the bootstrap rough kids is you just specify acts alleles. So essentially you're telling the function that you want to group by alleles. And then the left side will be reference alleles. And then the right side will be the alter alleles. And then if you see like the bar goes huge, it means this sample has a really big uncertainty during the mapping. And then those small bars are just low, low uncertainty. So this is an example of just grouping by alleles. Things we have multiple time points. And if you want a better visualization of how your sample does with time by alleles category, you can just specify covariance days. So here it is the same, a similar piece of information but categorized differently. So the first column will be the outer Leo and reference a Leo of day one for this cast cross black six and then this second column will be the day four. So in this particular way of plotting for V we're actually viewing days as a categorical variable. And then in a scenario that you want to actually view it on a continuous scale, you need to compute in RV first. So computing for RV essentially just compute the mean and the variance of the bootstrap. And then you can plot the same way using the plot in wraps function. And I highly recommend this shift X. So essentially it shifts your outer Leo by writes 2.6, 2.2. So when you have two time points that exactly the same it won't overlap with each other. You can still see both of them on the same plot. So that's like a little trick that I would recommend to any researchers. Thank you. So after exploring the basic how your data look like we move on to filtering out the informative features. So in a scenario that when two alleles have the exact same sequence you wouldn't be able to identify where the reads go. So in a scenario like that the bootstrap will have exactly same counts for those alleles. So essentially those features give us no information and then we want to get those out of our dataset. The way of doing that is you can just looking at one bootstrap here. We're looking at the first bootstrap. And then we, so this one will be the count our first bootstrap for the outer Leo and this one will be the reference Leo. And then we are asking are those counts the same for all the samples? If they are the same we filter them out. So essentially we're just keeping all the features that don't have exactly same counts in alleles. And then we can see that we're filtering out like 50 genes. So you might also want to check the systematic bias of your data set. So here we are plotting a rough log for a little like log for change against the rough total counts. So the idea is if you don't have a systematic bias the large counts LFC is supposed to center at zero. And if you does there will not be center at zero. And another way of viewing it is you can plot your allelic ratio. And then here we're looking at the large counts alleles if the ratio is center at 0.5. And the question for you is what will we see if one of the parental genome and genotype was poorly annotated in terms of this histogram? So when one of the parental genome was poorly annotated the reads will not be able to properly map to this poorly annotated genome. So in the scenario like that we will see a shift in this rough ratio histogram. The red line or the main will not be around 0.5. It will be like shifted. So that's a really good way to quickly check if your dataset is you expected essentially. So after we check some of the bias and filter out the nine formative features and then we move on to testing. So here we are using SWISH which is developed and maintained by Dr. Mike Love and Enchi. I think I saw Enchi in the room as well. So we're using SWISH which is used to sign well Cox and sign rec test to compare alleles and then we are introducing I think I'm ahead of myself. Okay. First of all, let me just start with we're switching to TSS level like data instead of a gene level data because there are a few additional plots that we found particularly useful with TSS level then gene level. So we're just gonna do the filtering and then as we did before and then why do we think the TSS level AI will be different from gene level AI? As Mike has mentioned before that in some scenarios that a little imbalance will be masked at gene level. So for example, if a gene had two different transcripts and the one is more abundant in out and the one is more abundant in reference then at gene level you will not be able to detect a little imbalance but they actually exist in every transcript. Okay. So we move on running to run SWISH. Let me just get it started because it takes a few seconds. So traditionally running SWISH has three different steps. The first one will be scaling the inferential replicates and then you have to filter out the rows with insufficient counts and then you calculate the statistics like log for change and test the statistics, et cetera. Here we're skipping the scaling counts step because essentially since we're comparing alleles we're comparing within the same sample. So to correct the sequencing desk you are cancelling out this number that you're using to correct it. So we actually don't need to scale it in a scenario that we're testing a little imbalance. And here we're using label keep to flag all the features with a minimum count of 10 in at least three samples. So that's another filtering step and then we compute infRV as we described above. We will be performing two types of allelic imbalance testing. The first one is global AI for consistent allelic imbalance across all samples. So essentially the idea is global AI is you're pulling all the time points together and to test if there exists allelic imbalance overall. And then another testing is dynamic AI testing for nonzero correlation between the log allelic full change and the continuous variable. So that one is you're testing if the correlation between the allelic full change and here we have time. If that correlation is significant if the allelic ratio itself is changing over time. Okay, so here we're performing the global test. You only need to specify alleles which is what you're trying to test and the pair is there. So essentially you're telling the switch function that you wanna compare alleles with in the same day. So it doesn't get messed up. And then here is the function that used to test the dynamic AI. You have to additionally put covariance. So that is telling the system that you wanna test the correlation of allelic full change against this variable day. And then for the correlation we're using Spiritson but you can also use Spiritson if you want. Oh, that is the reason why we specify day twice in the dynamic analysis. So we can take a quick look of the results how significant are the TSS groups at 5% FDR set and then we can see there are a lot more global AI than dynamic AI. So my take in this is since for the dynamic AI you're testing the correlation. So you're adding additional layer to it. Not only there exists the allelic imbalance this allelic imbalance need to be changed with time. So that's just putting more criteria than just the global AI testing. And I wanna show you really quick how the results look like. Oops, I'm not really used to this. Okay, so you have like many informations in this result essentially. So the TX ID is telling you since we're testing TSS groups it's telling you what transcripts are grouped in this transcript group and then what genes it belongs to and then the transcription starting site, the symbols and then you have all the test statistics like the mean for RV log to full change P value and Q values. Okay, and then after we get the test results most exciting step is to pop your results and to see how your data is doing. So we start with looking at MA plots. So this is a global MA plot. It is a very similar idea as the plot we were using to check the systematic bias. You're plotting the log to full change against the log 10 main. So here the log to full change is better estimated than the previous one because here where the switch actually factor into the 30 bootstrap replicates. So the inferential uncertainty was factor into the log to full change calculated over here. So in this plot the blue dots are the significant AI global features and then you can see when we have larger log to full change usually those plots are flag the blue a flag that significant and then we can also view the same thing for the dynamic AI and then which agrees with the results we just thought that they're a lot less dynamic AI than the global AI. And then the reason that the extreme log to full change here are not flagged as significance is as I mentioned that dynamic AI is actually testing the significance of correlation. Oh, and then you can specify here we're testing 5% FDR stat but you can specify like the threshold as you want. Okay, so to visualize the global and dynamic AI results we recommend download ensemble dataset from annotation hub I pre downloaded if you're using this RStudio and when you're running this annotation hub step it's gonna ask you to create a directory just say yes and then it will download. So we're introducing this new function called plot a little like gene was in fish pond. So what it does is it actually help you visualize the test results of one feature which including the gene model. Maybe I can move this one. So it takes a little longer too. Okay, here, oh, there we go. So looking at this plot we have gene over here that we're trying to exam. So we put it in gene name but you can also use simple name if you want. And then we also need the database. Here we are using ensemble but you can also use TXTV. So this is the gene model that we extract from ensemble database. And then I wanna point out that cause the resolution is not great. So the direction of the transcripts are from right to left. And then we have six different transcripts over here and transcript one, three, four, five, six are grouped together because the starting side are similar or same. And then transcript number two are its own group. And the first bar will be the negative log 10 Q values. And then we can see that both of them are actually pretty high in the Q values, pretty significant. And then the second bar is the log two full change. And then we can look at the log two full change with the allylic proportion. So if we look at the first transcript group we can see the cast is more abundant than the black six. So we see a positive log two full change over here. And then the second transcript group we see the black six is a lot more abundant than cast. And then this we see a negative log two full change. And the last bar is isoform proportion that is also calculated from TPM. And then we can see, my guess is since we have like five different transcript groups in this transcript group and then this transcript group will be more abundant over here. So we can verify what we see in plot allylic gene with plot infRV reps. And one thing is plot infRV reps only takes index. So we kind of have to grab the index from the summarized the dataset. And then we renamed the label. So it shows black six and cast instead of just a one and a two. So here we're essentially looking at, so samples over here will be different days of cast samples. And then here the left the y-axis is samples. And it is showing the transcript group over here. So the solo transcript group, which in plot allylic gene we have cast is more abundant than B6. And then in the plot infRV reps plot, you can still see the cast is more abundant than black six, which kind of verifying agrees with which we saw in the previous plot. And then another new plot that we are introducing today is plot allylic heat map. So this function itself will call heat map. The one, so this is the same as the plot infRV plot that you have to grab the index from the summarized experiment. And then the only additional work that you have to do is you have to grab the Q value itself from the summarized the data after running switch. And then we kind of customize the label over here to better visualize the day. So when there's more red, it means the out is more abundant. And then when it's blue, the reference is more abundant. And then the green bar over here is the greener, the more significant Q value you got over here. So that is for the global AI. And then the same set of function can also be used to visualize dynamic AI. And then I personally like the plot infRV reps the best for the dynamic AI because you get this really neat trend with time plot. So what you do is you first plot infRV reps, again recommend shifting it a little bit. And then you essentially you line it up which gives you a better visualization on the trend. And then here you can see that the black six is changing over time, but cast is not. So that means this allylic ratio is definitely change over time for this gene and this transcript group. And then within the same gene, let's look at the other transcript group. Here, both cast and black six are increasing but we can see like from day six to day eight, cast is actually increasing much more quickly than black six which also indicating a change in allylic ratio with time. And here's the question is what do you notice with the two infRV plot hint? Is the allylic imbalance in the same direction? It's kind of hard to fit them in, it's not. So this one is the black six is much more abundant in the first one and the cast is much more abundant in the second one. So plot allylic gene can also be used to visualize time series allylic imbalance data but the difference between dynamic AI and global AI is here we need to bring the time points together because we have nine different time points would not be able to fit them in one plot just not enough room. So here we banged day two to day six, day eight to day 12 and day 14 to day 18 together. So the strategy to bang the time points could either be depend on your biological motivation or actually look at the plot infRV reps to maximize the difference in this scenario. So let's run this. Okay, here we actually have just two different transcripts but interestingly they're going different directions in the allylic ratio. You can see four, let's just call this one the first one. The black six is increasing over time but the cast is decreasing over time but the other transcript group, the cast is increasing over time and blacks is decreasing over time which also agrees with the plot infVREP plot we were seeing. The good thing about infVREP plot is if there's not many transcript groups in this gene you can plot them in the same plot which is a really good visualization to compare to two transcripts. However, if you some genes they have multiple transcript groups if you have like four or five it's going to be really hard to fit them all in in one plot with plot infVREPs and then it will, in this scenario plot allylic ratio might be a better idea but I actually think the heat map will be the most suitable for a scenario like that if you want to view visualize your results across multiple samples and the multiple transcripts. So the difference between global AI and dynamic AI in terms of heat map is we're adding this time bar over here so the way to add time bar is quite easy just essentially just build a data frame that extracts the time variable out of your results and then we also customize the label here saying the time but if you have multiple samples you can actually customize it saying different things and then it will be a lot easy to visualize this heat map especially if you have multiple transcripts here we only have two but if you have multiple it will be much easier to visualize with heat map. So that's it for the code demo itself and a few take away or not take away like clarification is first of all we found that with seesaw it's very hard to fit all the information in one plot because we have many layers of information there is isoforms, alleles, different time points and there's many measurements there are uncertainty and then there's test statistics so we recommend you explore different plots because we don't think any one of the plot will be able to capture any information. So we also recommend you look at a line reads we looked at few in IGV and you can also view them in other genome browser and then that's a good way to examine the distribution of the reads and also as Mike has mentioned here we're primarily motivated by examining transcript groups that has a little imbalance because of the cis regulatory elements but the same concept can be applied to to group the transcripts based on your own biological motivations. With that I'd like to thank my amazing advisor Dr. Mike Love, our collaborator from Osu Blast and Samma and the financial support I'm happy to take questions too. Thank you. Anybody in the room have questions? Okay, there's, if you haven't been logged in there's a really long good conversation going on in the online chat I'll see if I can copy it and put it somewhere because it's not really any unresolved questions in there. Thank you, very nice talk. Can you go back to the very first box plot figure that you showed? The one you said you really liked. Yeah, so I was wondering so it looks like there's sort of two groups of samples ones that are really variable and then ones that have a pretty tight distribution. Do you know what's going on there? Just give it back to her, she's got an answer. Okay, Mike correct me if I'm wrong. I actually think that corresponding to the information Mike was just saying that we have two different crosses. We have one, two, nine cross black six which one, two, nine black six are more closely related and then cast cross black six which are more distinguishable. So when you see a very, I think when you see like this high variables we're looking at one, two, nine cross black six because the sequins, the alloys are less likely to be distinguished. So you have the bootstrap replicas capture more uncertainty instead of when you have very different sequences with cast and black six it's a lot easier for like just less certainly in that. Is that correct? Okay, I think it's correct. Thank you for the question though. So on the very last heat map that you presented, I think I just missed what was the significance of adding that time bar at the beginning? Cause I know the previous heat map also trapped changes over the days. I actually think that here we're since we only have one variable that is time but if you have multiple samples that capture different information if you like not only label just by days you can change the samples over here and at the time bar. So that's like two layers, two different layers of information in there but here we only since we only have one so it's kind of like repetitive. Yeah. Yeah, thanks for both of you for the, for this worship. There's a lot that I think fluidly bit over my head cause I'm I don't work in this area but what I'm trying to understand in my mind is like you have these two groups of samples because of the two alleles. How does this compare against let's say if you compute the mean and variance and just divide it to, right? Now you have a single curve for a particular isoform and you could use maybe the some of the methods we saw earlier have I see like trying to capture or like trendy and find those where there's a those isoforms where there's a significant trend or maybe the difference. Maybe there's a trend on the significant difference between the, sorry, a significant trend for the difference between the two groups of samples. How would that compare against what you're doing? I actually not sure if I understand it correctly. Are you referring to the dynamic AI which is calculate or Mike, do you have any input? I guess one thing that's unique about this analysis versus like trending trends in gene expression is the covariance structure across isoforms and alleles is really complicated. And so we're doing a nonparametric test to avoid any kind of parametric assumption on that, but like there is all kinds of uncertainty across isoforms and alleles. And it's hard to like compress it to a summary statistic and just use like a spline or something. So yeah, if you can collapse it to a level where there's no more uncertainty, then you could like throw away the uncertainty and use a point estimate and like go do a spline with that. But at this point, it's too, like at the level shown on the screen now, there's just so much uncertainty and the distributions are not well behaved. They're like spike at zero and then a long tail, like they're very nasty. And so we don't make any parametric assumptions on that. I have a question. This is Ryan. Could you use a similar statistical model to analyze the ratio between spliced and unspliced transcripts? I haven't thought about that. Obviously the null hypothesis would not be 0.5, but other than that. The, that would be more complicated. I wouldn't, I don't want to say like, yeah, use it. I think it's more complicated. We, the nice thing about the alleles is like, there's just a single SNP difference, whereas the difference between an unspliced and a spliced transcript and the counts you get has all kinds of biases. Like I don't know, there's a bunch of, there's a couple of papers right now about the, like there's one from Albert Quo and Casper and Stephanie right now on BioArchive about all the biases that affect the unspliced versus spliced counts. Here, we assume that there's no real bias difference between the allelic counts because there's only a single letter change. Yep, thanks. Okay, I guess we'll do one last one from online. From either, I'm sure it's pronounced Thady or Tati. For example, on day one, gene X is expressed at 10, three from A1 and some from A2, but on day 10, gene X is expressed at 20, 10 from A1 and 10 from A2. How could you quantify the contribution to gene expression changes from changes in A1 and A2? I'm kind of, I'm kind of, for example, when they want to express that 10, three from A1 and seven from day 10, gene X is expressed at 20, 10 from A1 and 10 from, how could you quantify the contribution to gene expression changes? I think at each time point, for example, on day one, when you have this allelic imbalance, three from A1, seven from A2, you have a little racial at day one and then on day 10, you like the allelic ratio is gone and then this changes can actually, first of all, be captured, use the infrap plot and then if you use the dynamic AI testing, it should be able to capture this change you are referring to if that answers your question. All right, with that, let's give another thanks to a good presentation by Yufi and Mike. We spilled a little bit over, but we now have a break until 3.30, so see you all at the next one.