 The work that you have generously awarded is both a concept, a conceptually new measurement that you can apply to single cells, which is called RNA velocity, which is essentially the first derivative of gene expression, but also bioinformatics toolkit that we developed that we called VeloCyto, and I will acknowledge all the authors at the end, but in particular three people contributed primarily to this work. Joela Lamano in my group, he was a PhD student at the time, and he now, because he's brilliant, already started his own group at EPFL as of this year, and this was a great collaboration between our group and the group of P. T. Kharchenko at Harvard, and in particular Ruslan Soudatov in his group, a very talented post-doctor. So I'm going to do a quick overview of RNA velocity and a little bit about the VeloCyto tool, starting by giving you a quick background on single cell analysis just so that we set the stage. Single cell analysis is a general approach to understanding complex tissues, and when I started my group, I wanted to do single cell analysis, and I wanted to in particular use a strategy that's shown here on the top, where we like to take cells out at random from the tissue. So here's an example of taking cells from the somatosensory cortex, and then analyze them one by one by single cell expression profiling to get the kind of a fingerprint of the identity of each cell, and then using computational tools to understand cell types by clustering or of course trajectories and differentiation and commitment and all of these things. When we started, it was a very challenging experimental process, particularly single cell expression profiling was difficult, but through various technology developments, this has turned into a robust and extremely scalable method. So as you can see on the bottom there, starting with modest numbers of cells in the tens or hundreds in the first few years, we're now at the stage where in recent publications people actually publish with several hundred thousand or even a million cells, and it's become a really mainstay of biomedicine. As an example, last year we published a molecular atlas of the mouse nervous system, where we sampled about a half a million cells from all parts of the mouse nervous system. This was worked by Amit Zaisal and the Hannah Hocken in my lab, and at the bottom you can see the kind of sort of taxonomy and systematic analysis you can do of all the cell types, all the major cell types in the entire mammalian nervous system, and you can learn a tremendous amount of biology from such atlases, and also they are very useful as reference atlases for example looking at disease samples. But one major problem with all kinds of single cell analysis is that they are destructive. So we can only get static snapshots. We can get a picture of what the cell is currently, but we cannot directly measure what the cell is going to be and where it's going to end up. And this is actually very similar to a problem that faced the early histologists. Here is Del Rio Ortega who discovered and named oligodendrocytes, which are the myelinating cells of the brain with his microscope, and his tool was the microscope looking at static sections, histology sections, where you can see cells in their current state, but you cannot see where they're going and what they're going to become. He was able to infer dynamics by careful observation and observing in different cells, in different individual cells of course, the consecutive steps of differentiation. So his drawing there on the left for example is a pretty good summary of brain development where starting at the top you have neuro-repetitial stem cells that then differentiate in three major lineages to neurons, astrocytes, and oligodendrocytes. But the missing piece is really dynamics. We don't have in destructive measurements access to dynamics. Oops, there's a black slide there, okay. Never mind. So but if we use a metaphor, the metaphor of motion blur, it is actually possible to get a little bit of dynamic information, even from a static snapshot. So here is a photo showing my student Joelle, who is the first author of the paper in the foreground, and you can see he's standing still because there is no motion blur. And that's me in the background running back and forth, and you can actually see, even though it's a static snapshot, you can see that I'm moving. You could infer something about my speed and a little bit about my direction. So if we take this concept and reply it to transcription, what we can do is we can separate the different stages of transcription. So that's the key insight in RNA velocities, that RNA goes through stages of maturation in its lifetime. So at the left there you have a simple model of transcription where first RNA is transcribed at some transcription rate alpha, generating an unspliced nascent RNA, which is then spliced to form a mature spliced RNA, which is sort of the active form of RNA that generates protein. And then eventually it's going to be degraded with some degradation rate gamma in this case. Now, the key insight here is that this creates a timeline between unspliced and spliced mRNA. So for example, if you look in the middle there, at the gene that's first being induced and then goes back to a silent state as given by the transcription rate alpha, you see what happens at the induction phase is that the unspliced RNA rises rapidly and then reaches a plateau. And the same thing happens with the spliced RNA at the bottom there, shown by the symbol s. It rises and then it plateaus, but there is a time lag because of course the spliced RNA has to come from the unspliced. And the same thing happens when the gene is switched off, the unspliced RNA drops because it's being consumed. And the same with the spliced RNA, it gets degraded eventually. But again, there is a time lag. And we can show this more clearly if we show a face portrait on the right there where we plot the spliced versus the unspliced. What you can see is that a gene that's initially zero, so starting from a state of no expression, when it's being induced, it takes the upper red trajectory and reaches a steady state which is given by the degradation rate actually. So the ratio between spliced and unspliced. But when the gene is being repressed or it's switched off, then it will take the lower trajectory, the blue trajectory and go back to zero. Which means that we can distinguish up regulation from down regulation by looking at the spliced unspliced ratio and comparing it to the steady state which is given or to the degradation rate if we know it. Now, this gives us the ability to predict whether gene is going up and down. And actually the distance from the diagonal there also gives the rate by which it's going up and down. And this mathematically, it is the first derivative of gene expression. So that's why we call it RNA velocity. Now, the reason we don't just call it RNA speed is that it also has direction. And I'll explain how. If you do these measurements on a single gene on the left there and you observe that this particular gene in some particular cell, it is below the diagonal. So there is less unspliced than expected, then you know it's going down. So you could plot it as an arrow on an axis and it goes from the observed current state of the cell to a future state where the gene is a bit lower expressed. But if you have two genes like on the right, one is below, one is above, then one is going down and the other is going up. What you have is actually an arrow pointing to the future state of the cell in this case a two-dimensional space. So you have an arrow and the arrow has a length, so it's actually a velocity. It shows you where the cell is going in the expression of these two genes. And this of course extrapolates to all genes. So in reality, if you measure a whole transcriptome, RNA velocity is a measure of the future speed and direction of the future state of the cell where it's going in gene expression space. Now of course it only predicts a short time into the future. So we estimate that we can predict, we can extrapolate about two, three hours into the future from a single cell. So that doesn't tell you very much. It tells you a little bit roughly where the cell is going, but not where it's going to end up in the end. But actually in single cell experiments, we usually have access to many, many cells. We usually do a pool of cells from the same tissue. And even if for each individual cell you can only make a short prediction into the future, if you put all of them together, it actually again works like motion blur, but with many individual objects moving. And this can be illustrated by the photo that you're seeing here where you have motion blur sort of light trails from many, many individual cars. And for each individual car, you can only see a little bit about where it's going. But when you have many cars, you can actually reconstruct the whole trajectory. So you can see exactly where cars can and cannot move in this space and where they move faster and where they move slower. So that's what we can do, but in gene expression space with RNA velocity. So I'm going to give you an example. This is an example of the developing hippocampus in the mouse. So the hippocampus is a folded structure. You have a cross-section illustrated there on the left. And it has a number of neuronal types. And there are pyramidal cells of the CA1, CA2, and CA3. And then there are astrocytes and there are oligodendrocytes, which are two different kinds of glia cells. And there are the granular neurons that sit in the little folded structure that are called the dendrocytes. Now we did single cell RNA sequencing here using droplet-based RNA-seq. And bottom left, you see a Disney plot showing about 20,000 cells from this tissue. And I've labeled it. So the labels here are based on the markers that we observe in different parts of this manifold. But if you weren't given the labels and if we didn't have prior knowledge, you wouldn't know where is the beginning here and how this structure develops. It could be that the stem cells were over here that's labeled granular and that all the cells move during differentiation towards these other fates. Or it could be that the beginning is some other arbitrary place. You just don't know. But if we now superimpose RNA velocity information, we can actually learn how cells move in this gene expression space during hippocampus development. First notice that there is a state here in the middle that I've called neuroblast and there's a state called radioglian. Somewhere there is going to be the origin and I'll show you how we know that. If you look at the two genes that are plotted here on the right, let's start with the IGFPPL1, the lower one. You can see this is a phase portrait now from single cell data. We're plotting spliced versus unspliced. And you can see that indeed we get this nice trajectory showing cells in which the gene is being induced in yellow and green and cells where the gene is at sort of roughly steady state in cyan and then cells in which the gene is being downregulated in dark blue going back to zero. And if you look at the far right on the expression field, this is the normal, this is the expression data for IGFPPL1. So it's the spliced RNA. That's what you normally would measure. You can see that there's a peak here in the middle. So this gene is high in the neuroblast stage. But you don't actually know when as this tissue develops, where is the starting point and where is the end point of cellular differentiation. But if you now look at the velocity field in the middle, here we're showing the magnitude of the velocity as given by simply if the cell is above or below the diagonal. You can see that there is a region on the lower left of this field that is red indicating positive velocity. So you know that here are cells that are inducing the gene. And on the other side of the plot, you have dark blue velocity, which is negative velocity, which means that there are cells that are downregulating the gene. So it means that the cells really have to go during differentiation. They have to go from the red part to the blue part. That's the only way that is consistent with the expression and the velocity. And actually, if you look at the colors that I used to plot the face portrait, you can see that the cells that are orange and yellow and green, those are the same colors as in the Disney plot on the left. Those are indeed the cells that are down there in the bottom left corner that are labeled radiogly and NIPC. And so those are the cells that are inducing the gene and they're moving towards the neuroblast stage. And then as they differentiate into various kinds of neurons, they will downregulate IGFP. But that's just one gene. We, of course, have all the genes. And so, again, we can put arrows on the cells that give us the direction that each individual cell is going in in gene expression space. So here is a plot of the same plot actually, but now with arrows on it. And I hope you can see that there is a region down here in the middle that's kind of at rest that doesn't have a lot of arrows. And that all of the arrows on the lower left actually I can zoom. Here, all the arrows on the lower left are pointing away from that zone, the sort of progenitor zone, the stem cell zone, and pointing towards the astrocyte fate. So these are cells that decided to become astrocytes and they differentiate towards astrocytes and then they come to a stop here because they are fully differentiated. And there are some cells that actually somehow make their way to the lower right here to the violet cluster and they are the ones that become oligodendrocytes and they will continue to differentiate. As you can see, they still have velocity and we just don't have samples of cells that are more mature. So that's why we don't see the end of that. If we look at the neuron differentiation, you can see that cells that decided to become neurons, they then either go out to become granular cells and there's a long trajectory here and still very high velocity towards the end and indeed granular cells will continue to differentiate for about three weeks after this sample was taken. Or they decide to become pyramidal cells and then they make a choice here whether to become pyramidal cells of the CA1 and 2 or of the CA3. So it really makes it possible for us to observe in detail how cells differentiate and we can observe the actual time point in sort of differentiation time or pseudo time when cells make the choice to become one type or another. And of course, we can summarize this. So here's a summary of the velocity field that makes it a bit clear what's going on. So on the left there you can see local average velocity arrows and you see there is a region in the middle there, the orange and yellow region, where there is no velocity and those are indeed the radial glia, the stem cells of this structure. And as soon as they go towards the astrocyte fate or the oligodendrocyte fate, they pick up speed and they really have a unidirectional velocity and some of the cells will instead, some of the stem cells will instead make the choice to become neurons. They go through an initial neuroblast stage that's common to all the neurons and then they differentiate towards all the major cell types here. Or in our velocities also it's a second layer of information. So we have basically now two matrices. We have a matrix of gene expression and then on top of it we have a matrix of the first derivative of gene expression. And so we can make velocity fields that I've shown you on the left and we can also use these mathematically to, for example, we can use diffusion on this vector field to predict what are going to be the endpoints of differentiation by diffusing forward or what was the root of the whole differentiation process by diffusing backwards as you can see there on the upper right. So in the end this allows us to to really construct the underlying trajectories that cells take during development. So in summary, RNA velocity is a measure of first derivative RNA abundance. It can be applied to any data set that's out there, single cell RNA-seq data sets. You don't need to inject anything into your animals, it just uses regular data that's already there. It applies to any organism including human. And it's probably scalable to whole organism including in the future human. It has some powerful computational properties that I think primarily that it's grounded in transcriptional kinetics. So that link is very interesting and I think it's something that we want to explore in the future to see if we can ground the sort of descriptive observation of gene expression in actual gene regulation and underlying transcriptional kinetics. And it can reveal the whole manifold of differentiation, so the branching path that cells take during differentiation without any genetic labeling which again makes it applicable to humans. Now there are many assumptions and limitations and I refer you to the paper for that. There was a bit of a mystery of why this would work at all because it's all based on distinguishing spliced from unspliced RNA, but all these methods use oligotii priming which will only detect polyadenylated RNA which will always be spliced because splicing happens before perlonellation. But as it turns out, all of these protocols that are used for single cell RNA-seq have a prominent artifact where there is a lot of internal priming on intronic sequences. So that's why we can detect it and actually it's a happy coincidence that kind of technological limitation of standard single cell RNA-seq protocols actually made possible and a really interesting new way of studying cell behavior. Now a couple of words before I finish about the velocito toolkit. So RNA velocity obviously it's a concept, we show that it works, we explain why it works, we show how to apply it to many different systems in the paper and you don't need any particular tools to analyze this, you just need to separate the splice from the unspliced reads. But of course there are many things that you might want to do with this that we have developed along the way to summarize and to smooth the data and to do quality controls and so on. And then we have released this in the form of two packages, one for python called velocito.py and one for r called velocito.r and these are freely available, there is a very detailed documentation, we provide tutorials to reproduce all the figures in the paper and there are some notebooks which torres for particular use cases and so on. It is obviously free and open source and it's been used by many people. The paper itself has been cited, I just checked 160 times already, we get a lot of feedback on the velocito package, people are really using it and are getting good results with it. We put this on the by archive pre-print archive before we submitted to nature and we were amazed at how much impact that had. The paper has been downloaded 13,000 times from by archive and it's been a tremendous pleasure to see how many people are using this tool. But as I said, it's not that you don't need to use our tool in order to study RNA velocity and there are other packages already out there and there is SC Velo from Fabian Ty's lab and Callisto Bust tools from Leo Pachter's lab and I'm sure that there are others by now. Finally the acknowledgments, many people contributed to this work but again Joelle Ruslan and Peter Hachenko were the main drivers of this work and this could not have happened without this great collaboration with Peter's group and I also want to thank all the great funders who made this work possible and with that I'm finished and I think I can take questions if I can hear you.