 by Andrew Jaffe. And we show a couple of things here. So this is a D equal plot, which I'll explain more on the next slide. But the idea is the Y statistic is RT statistic from degradation. So that times series data across the degradation experiment. And X here is RT statistics from differential expression. And in this case, schizophrenia versus control. And what you see here is with a naive model in A, we're heavily confounded. So a lot of the transcripts that are highly associated with degradation are also highly associated with schizophrenia. So this is a problem because that shouldn't happen. Controlling for RIN, RNA integrity number, which is a standard in the field, fails to remove this effect. Also controlling for common clinical variables, such as age, sex, race, still leaves a 0.18 correlation. And using our QSVAR method reduces it down to 0.042. So adding those principal components from those transcripts to the model seemingly removes that effect. And so how do you know if you want to, oh, wait, wrong side, expanded data? So from 2017, we just used a DLPFC. And a major feedback we got was anytime anyone used our brain data or our transcript data in another region, it just removed all of the signal. So nothing came back positive. And so people weren't really using QSVI. And so we've expanded it to six regions, the hippocampus, amygdala, medial prefrontal cortex, and SACC. And hopefully that makes it more broadened for different brain regions. Now, I hadn't got far enough to test this in GTACS data yet, but that would be the gold standard. So how do you know if your data needs QSVs? And the simple answer is I made an R package with a D equal function. So we're calling this differential expression quality plot. And essentially, if it looks like this, where there's a strong correlation, you should use QSVs. So this function takes the output of top table. And feeds the t-statistics from your differential expression into just a plot versus the t-statistics from my degradation experiment. And if those two are correlated, as they are here, that means your data is confounded and you should explore using the QSVI analysis. So this plot shows that the feedback we got from collaborators about not reproducing across brain regions was fair. Because as you can see, that association isn't constant in these different brain regions. The y-axis here being degradation t-statistic and the differential expression of case control on the x-axis again. And you can see if you just compare call day and DLPFC, those signals of degradation are not really that similar. And so someone studying call day couldn't have used our data from before to regress out those signals. Additionally, what I work to do is make this computationally more accessible, also known as an R package, and move from express regions to transcripts. Because express regions are really hard to work with. So this is our updated pipeline. I've talked about the expanded brain regions. We added, in addition to a main effect model and interaction model, so this was just the main effect degradation over time controlled by region. And we're asking here in this additional model, is there a transcript that degrades in a specific region? And so as you can see, this one might not have been significant, but because in the DLPFC, it degrades so much, we decided to include it in our model. And hopefully, this will make our model much more generalized to specific region signals. We moved to express transcripts. So that's nice because, again, they're easier to generate, far less computational time. And you can make it like a range summarized experiment, which is just easier in the bioconductor world. We have that expanded data. So we expanded to 10 individuals from five and added four brain regions. I talked about this. And then conceptually, this is what the prefront will be about. And really, this is what any of you would use. So this is the R package. These are the names of the identified transcripts that would be embedded in the R package. And then you would just use that. This is symbolizing your data here. And you would pull your matrix, generate your principal components or QSVs here, and plug it into your Lima model. So this is what the R package really will serve to do. And so this is the QSVA function. It takes a range summarized experiment of transcripts. There's a couple models that I don't have time to explain here. But I want to get to some results. So this is all of our brain-seek data across different, I guess, experiments. And we have by color here, you can see which models still heavily associated with degradation. And what you see here is these are our three QSVA models that effectively remove that. Additionally, we perform replication analysis. And what you see here is what we call our expanded and our standard model of QSVA are performing better than the previous models and other competing models. Unfortunately, we actually included cell components from deconvolution in one. And it looked up until this plot, like it was going to be the best model. But it appears removing that cell effect also removes biological signal. And that one just doesn't perform as well. So I'm not sure what I'm going to do with that. But yes, to me, I was really happy with the turnaround on the replication here. I'd like to thank Leo, Ran, Tao, Andrew Jaffe, Thomas Hyde, Juhan, and Louise for helping me out with this and the entire Lieber Institute. Thanks for a wonderful talk. Does anybody in the room have questions? And Sean, if they do, can you supply them with the mic? Go check for online questions. Great talk. Hi. So I guess the effects are going to be transcript-specific and tissue-specific, right? Are you planning on expanding this model outside of brain tissue for people studying cardiovascular diseases or something like that? So the 2017 paper had some preliminary analysis from PBMCs. By probably at the Lieber Institute, I don't foresee them doing anything but brain. You could use the D-equal, though, as a justification for generating your own matrix and then use the pipeline we've generated to find those transcripts as well. Second question here, would it be possible for a user to just generate their own data? I mean, isn't that actually preferred? Because then it matches their cell type and body site. So you're saying even if you had brain data, why not just generate your own degradation matrix? Yeah, because it seems to me like degradation data, if I understood correctly, is as simple as just sort of leaving a sample out and collecting different time points. Mm-hmm. So it is expensive to do that. So and it also, we've shown that it fairly replicates across. So if we look, this middle one's the common mind consortium. And it still performed really well, even though that wasn't done at our institute. So I'm not sure that would be necessary, but you could. If some people might not use one of our six brain regions, in which case, you might want to. So I think we should probably move to your lab mate. If there's further questions, please feel free to ask them online. This goes by another name, the Fed Exaspect Effect. So this has been a problem for a long time. And it's fairly thrilling to see somebody take a principled approach to addressing it. So let me rack up your slides.