 Hi, I'm Marco Varone and I'm a PhD student in the lab of Professor Giovanni Ciriello at the University of Lausanne. Today, I'm happy to introduce you to Cell Charter, a framework to study the special organization of tissues from special omics data. So I will show you how it works, how it was the base for an interesting discovery in the study of lung cancer, and something that I'm showing for the first time, which is, let's say, unconventional use of Cell Charter to detect artifacts from immunofluorescence data. So tissue architecture is being intensely studied using techniques like single cell technology where every individual cell in a sample is separated and then sequenced. However, this process makes us lose the information about the original location of the cells in the tissue. But in the last years, special omics techniques have emerged as a new tool to study tissue architecture because they allow to capture the molecular information RNA for special transcriptomics and proteins for special proteomics, while maintaining knowledge on how the cells were arranged specially in the tissue. And so in addition to the usual cell by gene matrix, we also have the coordinates of every cell in the data set. And so these data are having a huge success in many fields of biology, such as developmental biology, neuroscience, and the study of diseases like cancer. And so this special information allows to complement cell clustering in identifying cell types and states with special clustering, which means clustering cells based on the composition of cell types and states around in the neighborhood. So special domains represent cellular niches where specific cell types tend to co-localize. This pushed us to develop Cell Charter, a framework to identify, characterize, and compare special domains. The workflow of our special clustering method is the following. We start from multiple slides of special omics data, where every point here is a cell or a spot containing multiple cells. And for every cell or spot, we have molecular information, such as RNA, protein, or even multi-omics data. So then we perform classical dimensionary reduction and batch effect removal. And then we use the special information to perform what we call neighborhood aggregation step. And this is the part in which we incorporate the molecular information of the neighbors into the cell data. And this neighborhood aggregation is fairly simple, but that's what makes it extremely scalable. And to do that, we encode cells in the special omics experiment as a network, with cells connected if they are in special proximity. And here we use the term cell, but of course this study also offers spots in case of technologies that don't have single-cell resolution. And so then for every cell, we do the following. We take the feature vector of the cell, and then we move to the first hope neighbors. So the neighbors are distance one, and we take the features. We aggregate it in some way, and for example, taking the mean, and then concatenate with the results of the previous vector. Then we move to the two hope neighbors to take the mean of their features and concatenate it again. And we repeat this until we reach a certain distance L, which is usually not big, around three or four hopes. And the final features of the cell are the concatenation of these vectors. And so we repeat this for every cell in the dataset and obtain a matrix of cell by this new set of features that include also information on the neighbors of every cell. And then we can use standard clustering on this matrix using a Gaussian mixture model to obtain our special domains. But in my PhD, we're interested in studying intratumor heterogeneity. So can we use cell charter to characterize it? Intratumor heterogeneity can manifest itself in two ways. A heterogeneity of the tumor cells mean that subpopulations can form from selected mutations or by adapting to different environments and tumor microenvironment heterogeneity. Meaning that within the same tumor, cancer cells can be surrounded by different cellular niches, and this may influence how they evolve. And it's important to note that these two types of heterogeneity are not independent, because different phenotypes of tumor cells can induce changes in the microenvironment and vice versa. And so we analyzed a special transcriptomics dataset of lung cancer patients from the nanostring cosmic technology, and among all domains identified by cell charter, some of them really picked our interest. Two domains were almost exclusively composed of tumor cells, which are the cluster 0 in pink and the cluster 12 in brown. And we can see this through the cell type enrichment plot in the top right. Here the size of a dot is proportional to the enrichment of a certain cell type in a certain domain. And these two tumor subpopulations are molecularly different. The pink domain is undergoing response to hypoxia, which is the lack of oxygen, while the brown one is more proliferative. But what is interesting is that the two domains are in contact with distinct special domains. In particular, the pink domain is in contact with the one in green, enriched in neutrophils and NK cells, and the brown one is in contact with the light blue domain, enriched in CD4 memory T cells and other tumor cells. And this wasn't the only patient in which we saw the first association between hypoxic tumor and hypoxic tumor niche and the neutrophil niche. So we saw it in also other two large-scale special transcriptomics and proteomics data sets of lung cancer. And so this analysis showed that through cell charter, we were able to characterize the two types of intratumorogenicity that I described before. We saw two subpopulations of tumor cells with different transcriptional states, but also surrounded by different microenvironments. And we showed all these results in a publication that came out in last December. But for the purpose of this video, I also wanted to show a sample that we did early on during the project, but it didn't end up in the publication, which is how we used cell charter in a slightly different way from what we designed it for in the first place. So I was analyzing some immunofluorescence data. This is, for example, a spleen where almost all cells are positive for the marker CD45. And sometimes the immunofluorescence experiment can give some artifacts where the antibody is not washed away properly. And this leads to some false positive signal that sometimes cell segmentation algorithm can mistake for real cells. A possible solution would be to find a better segmentation algorithm, but an alternative would try to find a way to identify and filter out these false positive cells. And if you look at the image closely, you can see that the areas of true tissue show quite some variability and heterogeneity in the expression of the CD45 marker, while in the artifact area, the signal is very smooth and constant. This means that when we construct the special features of a cell in the true tissue area, the neighbors are going to have heterogeneous value for that marker, while for a cell in an artifact, all these neighbors have very similar values. And as I mentioned before, one is not limited to use the mean to aggregate the neighbors values. For example, we can use the variance. The aggregation of the neighbors in the true tissue will have a moderate variance of their values, while the variance for neighbors of the artifact cell will be near zero because they have very similar values. And to show this on our sample, we can plot the mean neighbor variance for each cell, and we can clearly see which cells are an artifact because they're darker in lower variance. And on the right side, we see the distribution of these variances with a sort of bimodal distribution. We can easily set a threshold to separate the two modes, and we can see that it perfectly identifies the artifact that we can then filter out. This is something that we didn't show in the paper, but shows the flexibility of choosing the aggregation function that better suits the type of signal that you want to capture. To conclude, I give you a brief sneak peek into cell charter, which is a framework that we developed to study cellular niches from specialomics data. And I show that in the context of lung cancer, tumors undergoing response to hypoxia seems to be associated with the presence of neutrophils. However, we still don't know which causes which, and it's something that we would like to explore in the future. We'll learn more on the computational and biological side in the paper, so if you're interested, please give it a look and let me know what you think. And finally, I showed a use case demonstrating that the simplicity and flexibility of cell charter allows to use it for purposes different from the original one. So if you want to know more, check out the paper and the repository, and of course if you have any questions feel free to contact me and I will be happy to hear from you.