 Yeah, yeah, okay. Yes, I'm Stefano. I'm a postdoc at Melbourne University in the lab of Tony Pappenfuss. Today I will talk to you about a robust differential composition and variability analysis for multi-sample single cell omics. And we have a print if you look for this title in Google, you'll find it. Okay, so we know that in biology, composition and analysis are quite important. For example, we might want to understand if the abundance of a T cell subtype in a cancer tissue is depleted compared to healthy samples. And with single cell technologies, now we can have a quite direct observation of tissue composition. We can gather information about DNA, protein, and RNA. And well, we are familiar with this 2D representation. We can cluster cells based on their similarity. And the compositional data itself is quite simple. We count the number of cells in each cluster for each sample. So in the x-axis, we have 10 samples here. You can say that you can see that for some samples, we have collected a lot of cell types. So we have more information and for other, not so much. So if we factor out these total cell counts, we can observe proportions, that's some to one. And we can appreciate some heterogeneity. And we might want to seek some patterns of differential abundance between two categories. So the rationale why to develop our method that is called SCCOMP is that I started to understand that these type of data has five main properties, that the available methods do not join model altogether. They compromise for some or others. And so we had said that these data is observed as counts. These counts underline proportion, which is a latent variable that is weakly negatively correlated because it's compositional as to some to one by definition. So the variability of proportion is specific for each cell type. This is something is emerging. And we have an association between abundance and variability. This is something that in our study we described, we described. And also we observed that data might include outliers and might be prevalent. And that's why we developed these methods that model all these properties jointly. And so I won't have time to go into the statistical model, but basically we use some constrained beta binomial distribution and is unique in the sense that it can jointly model counts, compositionality, group specific variability, and also allowing for missing data. So it's quite convenient if we want to remove outlier to do our estimation. These are busy schematics, but I'll guide you through briefly. This is present in our preprint. It's basically describing the core model here in the center. We have some data of which includes cell type identity. We provide with our method, we have a layer of outlier identification and filtering. And then we have our estimates. And we can do quite a few things with our model. For example, well, we can do hypothesis testing, of course, on the differential abundance, but also on the differential variability. We also, as I mentioned, model the relationship between abundance and variability. And we use a Bayesian model. So this is all jointly modeled. And this relationship is hierarchically modeled. So we can transfer information across cell types. And also, we can inform this association transferring knowledge from other reference data sets in case we have a low data data set. And we can also perform other things that you can read more in the preprint. So let's start with this mean variability association. So the idea is that if we represent proportion in a linear dimension, so for example, we take logic proportion, one thing that we observe is that we have three cell types. For example, here, for abundant cell types, such as T cell, we observe a relatively higher variability compared to rarer cell types. And this is taken is the same plot as before taken from a real data set. Each point is a cell type. So we have roughly 30 cell types here. And these are the estimates. So we see the credible intervals there. If we plot the logic mean and log variability, we see a quite striking linear association. And so we can use these for doing shrinkage of the estimates. So for example, for rarer cell types where we don't have so much data, we can estimate more accurately the variability knowing the mean and transferring information about all cell types. And one thing we have done in our study is to take 18 data sets across technologies, you can see single cell site of a microbiome. And we have modeled them initially being naive on this association. So this association was not in the model. And nonetheless, this association emerged from pretty much all data sets. So it's quite ubiquitous also across technologies. And we show in the footprint that when we embed this association in the model, we get a shrinkage on the estimate as it was our goal. In our another aspect is outlier identification. So we developed the model last year for doing outlier identification for count distribution for RNA sequencing data. But this method is transferable to our distribution. In this case, again, this is a representation of a real data set. We have proportion on the y-axis. We fast set based on cell types and each dot is a sample. So we have roughly 20 samples here. We are comparing categories. Well, this is very small, but I will just point you that, for example, in this cluster seven, we have a decrease in abundance of this cell type except for some of the samples. And so our model, our method takes the linear model and all the uncertainty and identify probabilistically this outlier that we can explore further and might even be an interesting sample. But we can decide to drop them from the estimate and so have a more robust estimate. And we have done analysis according with the design of seven different studies and the bottom line that we identify outliers ubiquitous across this data set. This bar plot represents the number of outlier data points we identified. In this case, 20, some of which in red were present in cell type that were eventually, once we exclude them, differential abundant. So that could have caused false negative here. And these dot represent how many cell types this study had. So here we have a study with 50 roughly cell types and one third of them roughly include one or more outliers. So the bottom line is that our outliers are present and they should be not ignored, take into account. Another aspect of our method is that not only we can do hypothesis testing on differential abundance, but also differential variability. So why this is important is because in biology, for example, an increased variability of a variable is often index of loss of homeostasis. Think about blood pressure. If a young individual, an older individual, might have a similar average blood pressure, we have older individual with hypertension and hypotension. So doing just a test on differential variability can highlight these loss of homeostasis in this case of the circulation. And so you can imagine this is important for cellular biology and molecular biology as well. And so I just show you, this is again an analysis done on a dataset. I guide you through these plots. So this is a simple linear model where we are comparing health individual with COVID affected individuals. And so here they're intercept and here is the effect. And so again, we are plotting logic, mean and log variability. And this is the baseline. As again, we can see the striking association here. Now, because mean and variability are associated, even also their effect will be associated. And so as you can see here, if we don't take into account this association, every time we called a differential abundant, we also call a differential variable. So we will have learned nothing of the variability basically. But when we regress this baseline association, here some variability effect start to show up that are independent this time on the effect of abundance. And so here I show you just three cases in which, for example, NK and CD4 naive, they decrease in abundance, but also increase in variability. And this might be important because let's say we want to develop an immune therapy that targets some of the cells is very important to know if our cohort has a consistent, consistent presence of the cells, or like in this case, some individuals might have a very abundant NK cell and some other not. That's it. So just to mention that this method is called SC comp is available on GitHub and by conductor, accept all sorts of data structures. And it's pretty easy to use such in a linear model fashion when you declare your linear model for composition and variability. I have to thank Victorian cancer agency and my laboratory and many of collaborators. And the last opportunity that I will just mention that I also invite you to the tomorrow workshop of tiny transcriptomics since we are here. Thanks a lot.