 So what we will be looking at, as I said, is some analysis that you can do after cell type in identification. The possible tools that you could be using is trajectory analysis or ordering of the cells, you could use some metacel or cell-cell communication or go to some more in-depth method like deep neural network. So as of today, the single cell analysis tools, there are 166 tools and we will not be looking at all of them. So, first, I guess you need to have an understanding about what trajectory analysis is about and when it is that it is useful. So, in terms of gene expression and cells, some of the dynamics that you see in your cells could be attributed to, for instance, cell cycle. The dynamics could also be attributed to cell differentiation, like you would have some cells that are in an earlier state and some which are very differentiated, some more stem-like. You could have some that are more responsive or not to an stimuli, so you would like to be able to classify those which do not really respond and which do respond. And trajectory inference will be able to order the set of cells or the set of clusters along a path or sometimes called a trajectory or sometimes called a lineage. So what you are actually doing is create an axis of time that you have inferred on your cells, so which is called often pseudo-time axis. That's why they also called pseudo-time analysis. And then you can project each of the cells onto that time axis and therefore know which cells are more close to the beginning of your axis and which cells are more close to the end of your axis. So that would be a starting point for then further analysis or further differential gene expression analysis to understand which are the genes that are very high at the beginning of the trajectory and which cells are very high at the end. And this can then drive to have you more hypothesis and further understanding of the cells and the data sets you have. So the big question that I would like you to ask yourself is, should you run a trajectory analysis? Does it make sense in my data? I always make the analogy of my son who draws now very well, but he could draw a trajectory through yourself. This doesn't mean that this is meaningful and that this isn't that this is really relevant. I'm not aware that you can always force cells to go along a trajectory, so always be forced to be along a pseudo-time axis, but this doesn't mean that this pseudo-time axis reveals something biologically meaningful. And so therefore I would love you to first ask yourself those questions and if one of those questions you say, definitely, then it might make sense for you to actually look at trajectory analysis. So this is not an extensive list, so there might be also other questions that you might ask, but one of those will be most relevant, I guess, for your setting. So first of all, are you sure that you have a sort of developmental trajectory in your cells? So do you have cells that are in a more developed stage or more differentiated stage? What you could have also is some intermediate states, and if you have intermediate states, then it makes sense to run the trajectory because then you definitely know that these cells should be somehow in the middle of the trajectory. And this makes you having an opinion about if the trajectory is good for you. Do you believe that there is any branching in your trajectory? And this is an important question to understand which trajectory tool you will use. I have to answer this by yes or by I don't know. You should maybe go to different trajectory tools. Do you have a time scale on your cells? And by time scale, I mean some people do actually measure cells at different time points. You actually know that they should be part of a trajectory, and then even if you measure them at 24 hours, 48 hours, etc., you can still put them along a time scale and you know if it fits the trajectory that you have inferred on yourself. Last question, do you have a starting state or an end state? I guess some of you might answer that with yes by having some stem cell like cells. So they know these are the cells that can differentiate into any other cells. So then you have a starting state and then it does make sense to do a trajectory analysis. Before you know that you have a very end state of your cells, so cells that are super differentiated, for instance, and then you know that you can or you should try to make a trajectory on your data. You're aware, as I said, that any data set can be forced into a trajectory without having any biological meaning. So it might not make any sense to know that B cells in your trajectory come before T cells or come before epithelial cells. Maybe that is not biologically meaningful. But then maybe inside B cells to know which cells are more differentiated than others might make sense. So as you can see in this small example that I tried to make out here, there are some ways to make trajectory analysis meaningful and somewhere it's making it's more difficult to understand the biological meaning behind the trajectory that you have inferred. So to know there are many different trajectory analysis tools and they are working on different maps behind. So some of them will generate graphs that look like that. So cycles. So then, for instance, if you have, if you're looking at cell cycle, then inferring a trajectory that looks like that where you can come back might actually make sense. You have some tools that are linear so that you only have a starting state and an end state and that's it. Some have bifurcation so where at some point you have a cell that can differentiate into several cell types or multiplication or tree like structure or connected graph so you do have some cycles but you also have other things. And some disconnected graphs. So as you can see here, there are very different ways of generating a trajectory so you already needs to be aware of the trajectory tool that you're using in which category it lies and then know what is the conclusions that you might be able to get. So there is an example of application that I would like to go through with you such that you can see actually how the trajectory method functions. And here you can see it is from a paper called since on a sec reveals dynamic random monolithic gene expression in mammalian cells but just for you to know that they have sequenced cells at different state stages of development. So you either are in from all sides to blaster side so here since it's developmental stage. It might really make sense to try to order the cells according to how developed they are so finding a trajectory here makes biological sense and could be a high interest to try to classify your cells and then to be able to assess which genes are part of that trajectory. So just for you to know that actually ordering cells according to PC one for instance could already be a trajectory analysis method that is linear you only go from one way to the other. But at least it is actually a trajectory analysis. So here is just for you to see the PC a plot. So as I said you have from all side to blast us to blast aside and here you can see how the cells are ordered if you order them according to PC one. And so you can see that at the early stages of the development, it still does not fit what you would like so you would like to have the red cells which are at the very beginning or at the very end of the trajectory, and it's not really the case here. Also, for the early blast mid blast and late blast, it's all quite mixed so it's not a perfect tool to do trajectory analysis but this could already have been one. That's why I want you to urge that you will be able to infer many trajectories it doesn't mean that they're biologically making sense. So here then I want to tell you about most many of trajectory tools how they function if they are graph based, and then you will then know quite a range of trajectory analysis how they function. So what some tools do is that they take a weighted graph and they might be different in the way they generate this weighted graph. Then they take what is called a spanning tree and I explain in a second what it is, and then you will take the minimum of all spanning trees in your cohort. So what's the spanning tree here I just have made an example of a weighted graph. So this could be really the case that you have in your data set with cells. This could be linked with certain weights, such that you then have a knowledge about how close points are the weight could be for instance exactly what we see in an SNN graph. So how many neighbors they share for instance, this is a some tools do rely on K and then or SNN graphs. So if a spanning tree is that you find a way in your graph to link all of your points together to a path. So for instance, I have put in dark one possibility, but another possibility would have been to go. Instead of this link here you would put this link there. It could be another another spanning tree, or for instance, instead of linking that point to the rest of the graph I link it with here with that edge that might be also another spanning tree. So is you just the way to have all the points that are somehow linked to the to the to the court of the graph. And understand, you will actually look at the sum of all the weights that you have included in your spanning tree and this will be giving you the weight of the spanning tree that you have generated. And at the end you will just take the minimum of all of those trees to be able to select a minimum spanning tree. And that's why the minimum spanning tree here in that small example would be the one I put in dark and not the one where I put here this edge instead of this one, because if I put this that edge I add 20, whereas if I put this one I added only five. And so this is why you would prefer this link, then this one to add that point to the to the spanning tree. So that's the idea of minimum spanning tree. And as I said, what could be different is the way you generate the graph, the way you generate the weights. And the weights can be, for instance, a distance in dimensionality reduction space, for instance, this is sometimes happening. And you can also have just correlation between cells so this would be a correlation score, and you can have as I said for instance the number of shared neighbors, etc, etc. So you can guess from the way I just said it out loud since you take the minimum spanning tree, there are actually no cycles. Why, because if I, for instance, add this year to generate a cycle. This just adds for nothing 20 to my to my tree, whereas I actually have already this point which is linked to the rest. So you will actually have zero cycles, if you go for a method that goes for minimum spanning tree. So if cycles do matter in your analysis, such as if you want to look at the cycle for instance, then you should not go for trajectory methods that do rely on minimum for their calculation. One such method, which is popular is called slingshot. And slingshot how it functions is that it will generate a graph on your data. The graph is actually based on a certain, so the weights between the points are given by distances between cluster where distances between cluster are calculated in that way so it really resembles a sort of a distance between averages basically where Xi would be the center of your cluster I and XJ would be the center of your cluster J and here you have the variance. So this is really a link that would describe how close to clusters are. So as I said it now, it actually generates a minimum spanning tree on your that and each of the points are actually clusters and not cells. So this is how slingshot works. And once it has generated this minimum spanning tree using that weight on the points and the distance, it will actually use a technique that was invented a few years ago which are called principal curves. And principal curves are smooth one dimensional curves that pass through the middle of a P dimensional data set providing a non linear summary of the data. This means that what it does it is trying to to instead of having this line here to have a smooth line that passes through your the middle of your data. So here as you can see it passes straight once you just look at the spanning tree because it's a graph, but what you will try to do is that you will try to pass through your data and going along that line. So what it does is that is taking that line and smoothing it out using the, the, the theory of principal curves. And you can have a look if you if you write question mark slingshot and you add the library slingshot of the reference of principal course and how they're calculated, but this is how they function. And so at the end, what you can then do is that you project down the cells onto your onto your principal curves that you have generated, and then you you will be able to infer where they are on the trajectory. So here for instance in that example they have generated two curves right at the beginning they're sort of together but they divide so you have a bifurcation here. We could also have a multiplication in that in that method. And what's important is that you have some cells that might be part only of one curve and some cells that might be part as those here of both curves. In, in our it's quite easy you have the function slingshot, and the function slingshot has the uses a single cell experiment object. So you need first to convert your surat to single cell experiment object but as Tanya showed it's quite easy to do. And then you have to provide the labels of the clusters this is important as I said slingshot is based on the assumption of trying to find the minimum spanning tree on your clusters. And so therefore you need to provide those labels and then you need to say on in which reduced dimension you would like to work. And this is for being able to know where to generate the principal curves. So this should always correspond probably. This should correspond to the way you generated the clusters probably should correspond to PCA. But sometimes if you use PCA, then it will generate the principal curves in the PCA reduced space and it will look very weird on the So then some authors actually prefer here to use your map to generate the principal curves on on your data set. So this is up to you to select, but here in this in this example in this paper, they have actually projected the cells onto PC space, they wanted to generate therefore their slingshot on the principal component analysis. So here is how it looks like, you will then have the information about lineages and curves, and here you can see you have two lineages and two curves. Lineages respond to what you have here so whenever you have bifurcations and curves corresponds to the step here. So for the lineages you then get the ordering into in one lineage in the order in the second lineage of yourself. And if you do not provide a starting cluster, then it will randomly select if it starts from the end or from the beginning, basically. And so here we did specify the starting cluster. So we said to so all the lineages will start at two. So we have the cluster two first the cluster four after zero five and three. And the second lineages will have cluster two for and as you can see here separates, it goes to cluster one. So these are my two lineages and then you have your two curves. You have the length of the curse this length will be the sum of the spanning tree so all the information you have of distances between your clusters, such that you might know how big they are. So here you have an understanding of the samples. So the samples means the number of cells that you have on each curve. This looks very weird because it's not an integer number. And this is because some of the cells might be part of several lineages. So I don't know. You have the picture of how the slingshot pseudo time of the first line. So the first curve looks like onto the cells that you have here. So some cells will be only part of the second curve and some parts own cells only part of the first curve. What you see is that it orders quite better. The early stages that before with P with trajectory of PC one was not so good, but still the end of the trajectory is not so good in a to to separate the blast stages. Monopla is another algorithm that is quite popular and that you might be able to practice in the exercises and it's based on an idea that was developed in Python first, which was quite popular at the time before Monopla came in, and it was the Paga algorithm. It's an algorithm in Python and it was used to construct first a K nearest neighbor graph on the cells, then identify communities in that cells and this this generates, this generates communities, which they call and then two vertices which are these communities will then be linked with an edge when the cells in the respective respective communities are neighbors in the K nearest neighborhood graph. So this gives you a way to also generate this lineages through communities. Here it's not working on principle curve. So at the end you have really straight, a straight graph that is on your clustering. And this is how Paga works. And Monopla tree found this idea quite interesting, but wanted to work on on the cell level. And so they first create the K nearest neighbor on the cell level in the human space. Then they group them together in the Louvain communities but that's not. They can also work instead of communities actually work on a higher level, on a level of the cell, and then they test each pair of communities for a significant number of links between their respective cells. And by significant number of links they actually generate something with a P value where they use a null hypothesis of what they call spurious linkage. And this is a method that will then tell each of the links how significant the link is and if the link is to be removed or not. And this is how it works for sure since they test so many links they have to go for multiple testing correction. And they therefore they use an FDR. And they would use an FDR lower than 0.01 to describe if the link should remain or not in the final graph, and the final graph is then what is reported. It's quite a lot of additional tools here I listed some, but these are quite some methods that work a little bit differently. Some are working on K means some are working on DB scan at the beginning for the clustering and then the link between the cluster are always about property in space, and this is how they would work. One of the methods I would like to mention is super time, because it's a method that was specifically developed if you have time series data and generates therefore absolute time on time series data so this is quite interesting so if you do have in your hand time series data you should look it up it's quite nice. And then I would like to mention RNA velocity because it's also a popular tool, and it's working so differently that I think it's important to be mentioned. So here I took out in the paper what they this how they describe RNA velocity, and they say RNA velocity is a high dimensional vector that predicts the future state of individual cells on a time scale of hours. So we try for each of the cells to predict where the cells would mostly go to so to towards which stage they would move to, and that's why it's also trajectory analysis because it also tries to link cells in a certain scale. So it aids the analysis of developmental lineages and cellular dynamics because it will predict towards which state the cells or which developmental stage the cells are actually going. What it does and there you would actually need the first few files to understand this is that it will try to calculate the relative abundance of a nation so unspliced and mature so despite mRNA to estimate the rates of gene splicing and degradation. They are mentioning this fact here is that during a dynamic process, if you have an increase in the transcription rate you will have rapid increase in placed mRNA and increase in spliced mRNA until a new study state is reached so therefore if you would like to if you would do the ratio of unspliced and spliced you would have a knowledge about if you have increase in transcription rate or decrease. And a drop in the rate of transcription is then a drop in unspliced mRNA and reduction is less than one is this is how it works. So the reduction of gene expression, you will have unspliced mRNA that are present in excess, and during repression you will have unspliced mRNA that are present in lower amounts. And therefore, you can have an understanding about which states you are in and what's the, the next so the balance of unspliced and spliced mRNA abundance is therefore an indicator of the future state of your, your mRNA and then it gives you the future state also of the cell. And that's their, their general idea of the algorithm and how it should work. And I think Tanya you used it on a data set I don't have that expert experience. Yes. To comment on how good it was and how useful you could have it. I used it on one data set but I think they didn't include it in the publication because it wasn't so convincing. And then another colleague of us tried to rerun it changing the parameters and it gave totally different results. So we're, I think in the group we're not super convinced about RNA velocity, but I guess it all depends on your data set. So personally running it is not difficult, but it's intensive. Yeah, it's competent. We mentioned. Yes, exactly. But then yes. So more or less convincing. Okay, perfect. So then the last part of what I want to show you is how you could do cell cell communication. And I did not put here the single cell I think are tools yes, but there are also quite some cell cell communication tools that would exist. I mentioned some of them here because I have a little bit of sense of what they did because I did try them. There is some others that exist and that maybe you at the end would like to add. So there is the ligand receptor interaction potential which is a number that if it is bigger than one, you have a high interaction potential. And this is, you would calculate it only with one cell against all the other cell types such that then you know how interesting. The communication between one cell type and the others, how it's working is just by looking at the number of pairs of ligand and receptors that you would have expressed in your cell type and in the other cell type. And with that, it will understand the potential of interaction. It will then also generate randomly mixing it all up and then understanding how many times you actually got the same result. And so how significant the result is or if the result can also be obtained by chance and this is how you then will be able to get the confidence interval and then understand which are the communication that are significant. So this is how ligand receptor interaction potential would work. Cell phone DB is also to be mentioned there is a clickable version. There is also working with receptors and ligands being expressed. So it's only looking at expression. And it is there for trying to infer communication like that. Each night I will just discuss it afterwards and because it's quite different and cell chat is also you do also have an online version and you do have an R version. And it does outputs graphs that are quite nice to look at and nice to understand. So working only on pairs ligand and receptors. So you would be able to to put you should look it up because it might be irrelevant for your work because it's visually appealing. So then niche net is working quite differently. So I would like to to mention how it works. You need to have some prior prior knowledge and we will explain how and the idea of niche net is the following. You have a certain list of differentially expressed genes in a certain cell type. So you have to condition you had healthy cells and you had cells with a certain disease and inside your I don't know be cells you assess genes that are significantly different between your two condition. And you would like to understand. Can you associate that pair pair of ligand and receptor that might be responsible for the change that you observed in that list of significant genes. So can you try to understand what's causing the genes to be so different between two conditions. And so this is actually quite useful because then it might point biologists to possible pathways that I need to target or possible. Communication or cell communication cell communication that they might want to look at in order to stop whatever they see as a process. So how it works is that they have a prior knowledge. And so they have a table of agents and targets that they know are being able to be regulated by those ligands. So they have a certain list of targets and for all the ligands that they have included in their data set. They would assess a potential of regulation. And so I put in dark red the, the, the, the higher potential for this illustration. So and in, in yellow, you would have less potential of regulation. So this ligand number one would be able to very much influence the gene expression of that gene number and and the ligand number M is very likely to change the expression of gene number one. So this is their table of prior knowledge and you have your list of differentially expressed genes. And what you would do is that you will change that into a vector of zeros and ones. And so then you know which are the target genes that are differentially expressed in your, in your setting. So for instance, this the gene number one was differentially expressed the gene number two, as well, all the others were not and the gene number and was as well, differentially expressed. What you then do is that you just generate a correlation between your vector and the prior knowledge that you have. And with that, it should enable you to point you towards ligands that might be interesting to look at because they are regulating the same target genes that are among your differentially expressed list. So here in this setting probably that ligand and that ligand might correlate with the path of the gene expression, the differential gene expression that you have here. And so at the end what you do get is that you get a score you get Pearson correlation and spearman run correlation of your ligands such that you know which ligands are more likely to be regulating the list of differentially expressed genes that you have. And with that you can then try to look at the ligands that are potential candidates for regulating your regulating your list of genes. And then you can have a look at where they are actually highly expressed. And then you can make the assumption that here that gene is highly expressed in fibroblast and in this input side so maybe these two cell types are possibly communicating with another cell type to dysregulate the genes as you have. You do the same, you have done a list of ligands and their receptor. So this is also provided by niche net. So you then know which are the possible receptor for the ligands that you have found being probably responsible for your changes, and you can then have do the same thing. Checking where these receptors are located. For instance, without plot, and then you can again make the assumption that they might be communicating to that pair of fligan receptor to dysregulate the changes you see. So at the end, they have a list of ligands and receptors that they have narrowed down to a shorter list of ligands and receptors and this shorter list of ligands and receptors that have been manually created. They, they do say that you can trust this a little bit more so this is about but however it's a smaller list. So you could also use that pair of fligan receptor to try to infer the cell cell communication so here really niche net is a little bit different because it's, it's not only enabling you to look at pairs of fligan and receptor but associate that pair of fligan and receptor with changes and therefore it might be quite relevant to try to to make some assumption some some hypothesis that need to be tested but of some communication that might happen in your in your in your setting. Or in your dysregulated setting so in your comparison that you that you did. So I think that's the last slide yes.