 But by next week or the week after there's going to be some great new technology that we're going to want to use for single cell sequencing, but I'll give you a little bit of the historical context and try to build that bridge between those of you who took the bulk RNA sequencing course. And then some of the lessons there that are going to transfer to single cell and some of those lessons that actually may not transfer to single cell. So the point of the session understand this conceptual shift moving from bulk to single cell become acquainted with all the things that can go wrong and some of the things you have to think about when you're designing a single cell experiment in the lab and how all those design decisions are going to impact what you do with the data when it comes out of the other end of the sequencer. I'm a cancer geomics researcher. So I'm going to show 2 examples of how you can apply single cell sequencing specifically to cancer. So brain cancer project. And a multiple myeloma most project, but sort of use those not so much for the cancer biology, but really about some of the techniques that you can use to derive cancer genome variation from single cell data. And right at the end, I have a little like clinical vignette where we use single cell sequencing to try to understand drug response patient treated with a targeted therapy and you can really see clonal selection in action using single cell sequencing. Okay, so the fundamental unit here is single cell, and it's not really a bag of MMS, but it's kind of like that it's a bit of a jumble of different cell types and sizes and colors and labels. Tissues are made of these and where tissues are quite different is they're much more organized in a bag of candy. But the concept is really the same. It's extraordinarily heterogeneous. Some cells, some candies are similar, some are not. And we now actually have the tools to start to dissect how every cell is different. And most absolutely recently, how cells interact spatially within a tissue. I'll try to get to that point towards the end of the talk. Single cell analysis is not new. So I've talked a lot about new technology, but people have been doing single cell genomics in general, basically since the invention of genetics. So I've just shown here, in the upper left, that's actually a caretaker on one of my cells. So the old fashioned way to do single cell work you drop your cell on a glass slide in the right cell cycle. You literally use scissors and glue and you line up all your chromosomes side by side. Now we can do that for hundreds of thousands of cells digitally using next generation sequencing. But the idea is to say the order of the chromosome bands, how many chromosomes do you have? We used to do it one by one, and I can do it on mass. And same thing with RNA seed. There's actually already still clinical tests that are a single probe. In this case, this is a single Epstein bar virus probe. So in that case, all the cancer cells lit up. Now you can put hundreds of programs or look at the entire transcriptome at the single cell level and look at them spatially as well. But the concepts as discovered in singles and original care types still have totally hold like the rules of genetics haven't fundamentally changed just we measure them in a different way, we can measure way more cells than we've been able to in the past. The other revolution is at least in genomics is we're not reliant on averages. So, in your bulk RNA sequencing course you took all the cells you ground them up and you got the top bar graph which is the average of all RNA sequences in a cell, or in a tissue rather. In the bottom now you can have individual cell level measurements of RNA seek, but everything you learn in analyzing data from bulk RNA seek is actually relevant to the single cell as well. You can take a lot of what you've done differential gene expression analysis, spice isoforms all those lessons that you've learned in bulk actually will translate now to the single cells as well. I put this title in quotes because this is the understatement of 2019 in this paper they said there's a wide variety of single cell methods this paper is already four years old, but it was basically a really excellent review of all the many different methods that you can use to measure many different compartments within the cell so you can look at the genome sequence you can look at cytoplasmic RNA you can look at intercellular proteins. And then once you've done these measurements in the cell. In general you can sort of bend these into three broad analytic groups so lineage, how does the cell related to progenitor cells and daughter cells state so for a single cell, what is it. So instead of an existential question. How do you define a cell is it tags is it the expression programs is a protein, probably get into that question multiple times throughout the next couple of days, and trajectory so you're when we do an experiment we capture a single cell at single point in time. What could happen to it cells like it, potentially in light of treatment, and what has happened to cells that gave rise to the point in time that you looked at so some of the most powerful experiments are really those longitudinal methods that you can look at, for example, a most model at multiple different time points and see how that cellular makeup but also sell states actually change over time and there's a lot of bioinformatics methods to sort of operate and see these three broad areas lineage state and trajectory. So I do want to highlight some of the considerations and experimental design parameters that especially as an analyst you really need to know, like what has happened in the lab and ideally, this is a partnership between the wet lab scientist and the dry lab scientists to co developed what is actually going to happen to the tissue or the cells. Before you analyze it it's very painful sometimes kind of inherited data set, and not to have participated in the wet lab design and vice versa it's very tricky for the wet labs scientists to design those experiments without knowing what's going to happen in the dry lab downstream these two aspects of essentially single cell genomics to happen simultaneously. I put this also in quotes because it's also a bit of an understatement it's actually on the y axis is time and then or and the y axis is number of cells the x axis is time, and it really is an exponential scaling of single cell so it used to be a single cell that happened off to then it was in a plate, then there were micro fluid devices in a robotics to let you profile hundreds of plates simultaneously. Almost not all, but a huge fraction of the data presented generate nowadays is using a 10 x genomics platforms is a droplet based method. And then there's other methods that are flow sorting cells and individual wells. So the point of the slides is to show the huge growth and actually continues to ramp upward. But also the many different ways to isolate and do molecular biology on single cells and this paper is a really great job, marching through all these different technologies that ultimately result in very similar data which is gene counts from a single cell. That being said, there are sort of three main steps that are going to influence the type in the nature of the of your data of your data as you receive it. The first is tissue digestion and cell purification. So even just the very act of collecting a cell will change its state. And if you freeze it, buyably freeze it or snap freeze it. That is going to fundamentally change the transcriptional representation in all of those cells. And that's okay, as long as you're collecting and sampling all your tissues in using a common protocol. If you don't want to mix single nuclear data and single cell data, at least not know that that was consciously done as an experimental design consideration. You want to have a single protocol that's really trying to treat all your samples, especially if you're trying to compare them over time. Like your barcoding this is sort of the secret to high throughput sequencing of single cells. If you have a plate, a three or four well plate, a single cell and every well, the way to be able to sequence these is to give each individual well a different barcode. There are lots of different barcoding strategies, you have to know what barcodes are used for your specific experiment. So you can sort of unscramble that I actually done the sequencing. And even a sequencer choice. Certainly a luminous sequencing is the most ubiquitous platform, but there are other longer platforms for example that will give you full length transcripts and the data actually looks quite different than some of these short read technologies and I'll have a few example of those later on. So here are some of the design decisions at these three steps to consider. Do you have bulk profiling data is the plan to do single cell and to combine that with bulk profiling data. Do you are you interested in all cells in a tissue or do you actually only care about a subset you only care about T cells so go through a specific manipulation to isolate T cells flow cytometry be beat selection. So after bulk coding, are you going to do full length transcripts so you can do just the end reads like is done with a 10 x genomics platform. Each of these is has to be a conscious decision as you set up your experiment. And then there's a huge number of bioinformatics platform so I had that lineage those lineage state and trajectory. You have to think carefully and understand those specific algorithms and you'll get some some experience with those this week. So think about why you're choosing a specific algorithm or another, or if you're not sure have a benchmarking plan to have two algorithms or four algorithms, potentially combined or compared against one another that has to be a conscious part of your analytic design strategy. And this paper here really goes through all these three steps in great detail. There's a little more detail in those three specific steps there's lots of ways to isolate single cells so this is just to figure with a few of these. I mentioned plate based methods. There's literally a pipette of pulling individual cells out of the media. There's close cytometry you can sort either a cell population of interest, or just digest the tissue and sort all the cells into a plate. So if you capture micro dissection, you can literally take a laser cut out a single cell, pick it up and sequence it micro fluidics this is essentially encapsulating your cells in an oil droplet. Each cell gets an oil droplet nice sort of march through your micro fluidic device. Open single cell chambers essentially a field with little cell size pits. So the cell comes along it drops in your molecular biology inside there. This is a little esoteric essentially it's magnetic coast with antibodies you can flow a blood sample through and the cells will stick to your post so you can really work with biomedical engineers to get extremely creative. What you want at the end of the day are essentially free floating or isolated cells that aren't stuck to any other so you can actually isolate that cell as a single unit. A protocol we've been using quite a bit in our cancer genomics lab a single nuclei sequencing. So just because you've frozen the cell doesn't mean you can't use it one downside, especially with snap freezing is you actually will lice the cytoplasm. But it turns out actually nuclei are quite hardy so our solution to working with frozen tissues is to do a nuclei preparation so you take a frozen tissue. You can isolate the nuclei but that's now a conscious design decision you've now lost all cytoplasmic transcripts you're going to see that in your data. You also are enriched for unspliced RNA so you're going to have a lot of the introns in that data as well. So that decision in the lab has now really fundamentally changed your data downstream, but it does open up a whole world of biobanked or fresh frozen samples that are quite ubiquitous and relatively easy to get your hands on. So by the nuclei, you kind of treat them like a cell like they can go into the 10x protocol they can be encapsulated in oil droplet. And it has some pros and cons which I've just listed there on the screen, but it does fundamentally change the sequencing reads that you're going to get out at the end of the experiment. And then when working with fresh, never frozen versus fresh frozen tissues is different cells are more susceptible to lysis when you freeze them. So this is an experiment where they took the same tissue, they ice they profiled half of it without freezing it. The other half they isolated the nuclei and then single cell sequencing. And what this shows down here is the cells have now all been clustered into different transcriptional groups and it clustered all the stated together. And you can even see by eye this great big purple cluster, hardly seen at all in the nuclei. And then some of these more minor clusters here are not necessarily seen the cells and you can actually see the difference in frequency so there really is a cell specific freezing artifact so by going down the road of fresh frozen versus nuclei. You can sort of appreciate that your cell fractions are going to be different because different cells are going to essentially come out of freeze or not freeze, not survive the freeze process. If these two methods were totally equal. These frequency bars should be perfectly 5050 and you can tell by either totally not. And that's really a fundamental difference between looking at intact cells, including the cytoplasm versus just looking at the eye. So it's not that one is better than the other. It's just what kind of samples do you have and what's your scientific question. And you're going to hear and probably work with at least one of these different types of data to the most commonly employed RNA sequencing strategies are 10x genomics end reads so this is actually priming off of the three prime and the five prime sequencing the ends of your transcripts. So originally described as drop seat but now it's martialized is this 10x genomics chromium device smart seat two is different you actually put cells in individual wells, and you sequence the entire transcript so it has the benefit you get full length, but the cost is much higher and the scale is much lower. So in this case you get to hundreds to hundreds of thousands of cells. This is sort of limited to what you can manipulate to enter a plate in the lab, but it has this benefit if you really are interested in full length transcripts you want to mutation calling across the whole transcript. If splice site and for spicing information is very important. The seed to method is really where you want to go. If you want to just count transcripts. If you want to look at T and B cell rearrangements or T and B cell repertoire 10x genomics 10 the 10x approach is going to give you the scale number of cells that you're going to need to sort of dissect the cellular populations, especially if you're looking at a large intact tissues. And here's kind of example and I just talked about so smart seat to gives you really nice coverage across an exon the inch on spliced out get another inch on exon spliced out and so on chromium is totally different you just get essentially very one side to data you prime here and all your reads sort of pile up at one or the other end of the transcript. And then differences in scale as well get many many more cells for the same price using the 10x approach. Another method especially for the very very long transcript so the downside with short read sequencing is you're really only going to read 100 to 150 basis transcripts are much much longer they could be roughly 1000 or multiple multiple thousands. The downside with short read sequencing you take your DNA, you fragment into these small pieces and you try to assemble them. You're going to get a lot of overlapping sequences, long read sequencing you just read across the whole fragment in one big shot so really let's you look at some of these very subtle, especially exon usage information and transcript structure information, because you just have a single read that covers the whole thing. There is a cost difference you get many many more short reads using alumina, you get fewer reads using long fewer long reads, but they're long so you can cover the whole transcript so you sort of have these two options. Can I cover this in the both sequencing but I forget why, like why not prime on the internet DNA and then just kind of get the. It's a problem with the sequencer itself so every base you read suffers a slight decline in quality so you start out with perfect quality then you're 99 then you're 98. And by the time you hit 150 it's that quality is actually really falling off the cliff so it's a problem with alumina sequencing and other short read methods specifically. Mana poor will just go until you get the end essentially. And this is sort of an example of what that type of data would look like side by side so if you have this gene, you have all these different spice isoforms short read sequencing you basically have to look for missing data and little bridge that bridge between exons with long you get full fully assembled transcripts. And so that's sort of the introduction to sequencing types and generating data. I then wanted to get into trying to take some of your past knowledge and apply it to single cell and this is especially in the early days this is really quite surprising where I especially would work with a lot of immunologists where a lot of the markers on individual are extremely well worked out. And when we tried to look for those transcripts in our single cell data, they were the expression level is relatively spotty or some of those markers we didn't even see at all on the gene expression level. So this paper described so they did a large mostly T and B cell experiment, and then you sort of score out each of the clusters that are up here in the upper left for individual markers. And this by I, you know some markers are looking pretty good like this cluster here really express CD3, but some of these you can hardly see at all you get this very spotty expression level. And this is really a downside with single cell sequencing since you're reliant on that single cell. If you're a little, if you're a rare if you have a gene that's expressed at a very low level you have these very rare fragments and they don't have to be sampled or amplified or reaction, you're not going to see them so if you have genes are expressed right at the limit of detection of single cell. You're either going to have what I sort of showed up here, you'll sort of have this stochastic detection of those transcripts, or no detection of those transcripts at all. And really where we've had to go for single cell is not necessarily rely on these individual markers but instead move to gene sets that would tag these individual cells as surrogates for those individual are usually originally protein based markers so there's really a difference between the RNA expression and the protein expression. To get around this there's this this protocol invented called site seek. Cellular indexing of transcriptomes and epitopes by sequencing the way this works they sort of co opted the droplet based method, you actually have this antibody with this synthetic bar molecular barcode attached to it so every single cell. If you're accessing your protein of interest will bind this antibody that cell coated with however many antibodies you've tagged all gets encapsulated into the oil droplet. You then burst the cell you sequence the RNA like you normally would, but also all those antibody bound tags also come along for the ride so you know what tags you put into your experiment. You can count those tags and that enables you to count antibody binding events, as well as all the RNA sequencing so this is a very powerful method to start to integrate RNA sequencing with a protein measurement, literally from the same cell and you can see in this experiment. This is the same figure I just showed earlier but now I've added the site seek the protein data, and you can see for some of these are previously quite spotty really wore out using the using the protein based method. So, in this case you can actually integrate these two different data types, literally from the exact same cell and use that for cell identification. I have a question. It's your ability to sort of think of your gene expression levels is a distribution you have some very high, highly expressed genes, most of them are in the middle and some are very expressive very low level. So you're sequencing you're basically grabbing or sampling from that distribution. So you see, basically all the highly expressed genes you see some of the genes expressed in the middle. But if you're grabbing from this distribution, you're actually very unlikely to grab those genes that are expressed at a very very low level. And depending on how many sequencing reads you generate. That's how many sample samples of that population that you have so if you only do. If you have 50,000 reads for a single cell has 50,000 chances to try to grab a rare transcript and sometimes it's actually not enough. So limited detection is a function of the ability to turn the original RNA into CDNA that gets sequenced, but also the number of reads that you happen to generate for a single cell. So there's a perfect limit detection. I couldn't say it's like one in a million or one in 100,000, because it's quite confounded by your cell type, that actual distribution of transcripts in a single cell which is quite different depending on your cell type. But in general the rare transcripts are much harder to find, especially using techniques. It would help to a point. The 10 X sequencing works as it relies on poly A priming and then an extension. But if that poly A on your rare transcript just didn't happen to be primed, didn't have enough primer, or the primer just didn't find the poly A, then you're never going to see it you could sequence it to infinity and you still wouldn't see that transfer. Any other questions yeah. Is it possible to distinguish like highly express broken and low express broken that cannot fire the floor. Yeah, it is quantitative. So because you are counting antibody binding events. It's limited by your ability to find antibodies. So it's different from single cell or from whole transcriptome where you sequence everything. You can find antibody or sites, the panels which are on the order of 200 to 300, maybe 400 antibodies that you could include. Yeah, good question. Anybody else. Right. And then yeah, this is one of my newer slides. Everything we just talked about can now be done in relation to a cells position relative to all other cells, all other cells in a sample. So I talked a lot early on about cell isolation or purifying cells. If you really aren't interested in a single cell population you just want to know what you have in your actual tissue, you can just cut your tissue cut a section and micron section for micron section, put it on to a glass slide. And now there are methods to actually image the RNA in place on a slide so that's what the starry night figure is showing. One of the little tiny dots is a cell, then all the colors are different transcripts so the way this method works is you get a panel of 200 to 400 RNA probes, just like I showed that original virus example, I could do 400 of those probes. And now you can analyze not just the expression of transcripts in a single cell, but you also get this matrix of all cells by all cells, and how far they are away from one another so it's almost like another piece of metadata. And this picture just turns into a huge matrix of how far are you away from other cells, and you can get quite creative to say I'm interested in this cell state, or this cell trajectory and I want to know how that trajectory changes as you're looking inside of a piece of tissue. So spatial marketing looks like this but in bioinformatics it's actually just a huge square table of distances. So don't let these type of figures sort of worry you. It's really all about what's the distance between two cells and it's just another piece of metadata that lets you draw relationships between different cells. And that's what this local neighborhood idea is. I want to know how far my T cell is from a cancer cell, you can now really do that you go through a round of labeling to label your T cells and your cancer cells. And you go through a round of distance analysis now you know what your cells are, what's their distance like as an example. And I guess a plug for Princess Margaret genomic center we just bought one of these so if you want to do single cell we offer our single cell spatial transcriptomics, we offer this as a service. And I'm not going to belabor this suffice to say there's a huge number of bioinformatics tools. I heard at the single cell Atlas someone said there's more tools than users. There's a huge number of you have tools out in the world. You really need to read these papers carefully and work with your colleagues to really figure out what is the right tool for the right, the right jobs this is already a date 2015. There's just been an explosion of tools to do lineage state trajectory. You want a good reality check from your colleagues in the field as to what has worked for them for a question is similar or the same to yours. Okay, so I'm. I'm just going to go right on time. Yeah, so I'm just going to show. Yeah, Gary. So, same, like, 15 tools, kind of solve the same problem. Probably, almost all of them do the same thing. There still are good idea new ideas out there, but a lot of them are done than ideas to. So, I said I'm from cancer background so I'm going to really put all the background I just talked about into the context of two cancer projects that that we worked on in the lab. And really the idea is the complex heterogeneity that is a tumor. So a tumor is dynamic cells come and go there's immune responses. There's different cancer subclones there are mixture different cell types there's cancer cells immune cells. And they change over time so single cell sequencing has totally transformed our approach to trying to understand cancer biology landscapes at a single time point. But also in the context of treatment and trying to understand how cancers develop from the earliest stages and then how do they change once they're treated. So single cell or any second practice, how did cancers, cancers and immune cells change over time so this is sort of a theoretical patients journey tumor burden to serve a surrogate for size or tumor stage. So the earliest of cancer so I'm going to talk about some single cell RNA sequencing of a mouse model to understand the earliest of cancers. So this is a mouse that is going to get myeloma within 70 weeks, you have the opportunity to look at how does that evolution happen. And then I'm going to talk about single cell RNA sequencing while you're on a trial. So once you're treated. What is the effective treatment, why do you relapse and then what what next treatment might be effective based on a single cell profile. So I'm going to talk about the myeloma mouse model first so really starting to watch specifically immune cells evolve at the single cell level as cancer develops over time and there's two papers here that you can read if you want to get to really get into the details. So a little bit on myeloma multiple myeloma actually began it's a it's a cancer B cells cancer the bone marrow. It begins as this benign condition actually it's quite common, especially as you age you get this monoclonal gamophathy of undetermined significance, it then transforms into what's called smoldering myeloma. This is sort of a malignant state, you again get a full blown diagnosis of multiple myeloma. The goal in myeloma treatment is to debulk the cancer and get it down to minimal residual disease myeloma is actually not curable so the current standard of care is monitoring a minimal residual disease usually through a bone marrow aspirates you have access to cells from the bone marrow. And if there is a relapse, you want to understand why do they relapse and what is the next treatment that we could use to treat this now relapsed myeloma. So we have a mouse that does exactly this. So it's called the b cap and mick mouse so it is a B in the immunoglobulin locus with the stop codon and essentially you wait around for those B cells to mature until that stop codons reverse and essentially now you have over activation of this mick oncogene. And what happens over time over 70 weeks is you have these no disease mice. At some point that mick gene gets turned on so you get this AMG which is amegas for mice. And then you get multiple myeloma and you can actually take their their femurs and take your bone marrow and look at this specific protein so it's sort of a protein marker that you can see build up over time. So you know when you can actually take these samples for single cell analysis. So we had 12 mice captured at these different time points. So we integrated all 90,000 cells as we did this all with the 10 x genomics platform. We didn't do any cell selection because we didn't know what cells were going to be important. We knew they were going to be cancer cells. We knew they were going to be other cells also in the bone marrow. So our laboratory strategy was comprehensive sequence every cell and then see how they change over time. So step one was to integrate them so we could compare all these all the data over different time points. And you can see all the many different cell types we saw, some of them the same cell type, but their states report trajectories are changing over time. The beauty of single cell data is you don't have to eat the whole cake at once you can look at specific cells you're really interested in for a specific question. So we wanted to know about plasma cells at the very beginning so we're interested in how do normal plasma cells become cancer cells. So we knew from a gene set from a gene set analysis, which of these were from the B cell lineage. So they were all of these so we pulled out informatically pulled out these individual cells and recluster them so we're really looking for different for specific and more high resolution clusters specifically in this plasma cell population. So this little plot here is only the cells. Yeah, only the cells associated with that B cell lineage. And if you call them by mice. There's some mixing of mice and there's some, some clusters that are unique to my so this huge purple pot relations all coming from one mouse and some of these are all add mix. So sort all of these clusters by time point, you can see that normal plasma cell clusters are shared basically by everybody so even when you have late myeloma, you still have a few of these, a few normal plasma cells. There's these curious, not the original B cells but they're not cancer B cells they're kind of there but they're only found in the no disease mice so we tagged these as early disease. There's this AMG cluster these are still B cells, but only in the AMG time point. And then when you get to myeloma, every single cell is different so essentially you see this broadening and differentiation over time you start with a B cell that is healthy and every mouse has it. And as you sort of march down this sort of journey to cancer, you essentially completely diverge and none of the myeloma is even though all these mice are genetically identical. All have completely different transcriptional programs for their full blown disease, and it kind of reflects what we see in cancer myeloma is extremely heterogeneous. And you can sort of see everyone starts out with these normal cells that eventually become totally different from each other. So that was sort of our lesson looking at the cancer cells. So saw this trajectory this movement of the of the B cells. And now we're interested in the T cells you want to understand why do these myeloma cells get a foothold and why do they even persist why isn't the mouse's immune system clearing these cancers out. So we pulled out specifically the T cell population and one of the roles is the T cells is cancer surveillance they're supposed to be going around killing all these malignant cells. We actually saw in our control mice and no disease mice. We actually scored all these T cells for actually multiple exhaustion T cell exhaustion signatures we took these from the literature and scored our mice for this exhaustion signature so it's a list of. I forget it's like 20 to 30 genes. So control mice and no disease mice who essentially have not exhausted normal T cells, but as you go through AMG and the full blown myeloma essentially you are losing T cell T cell function, basically your exhaustion score is trying to go up. And if we pulled these cells out, and then validated them using markers for exhausted T cells using flow cytometry. We actually see the same thing so really these be kept a myth mice over time are getting increasingly exhausted and essentially these T cells are not doing the cancer surveillance as they should be allowing the myeloma to fully develop. So the question was okay we have these two models we know myeloma is developing. And we know T cell exhaustion is an issue as a feature of these cancerous bone marrow. Can we combat exhaustion so in this case we had an anti lag three anti PD one antibody so these are therapies that reinvigorate T cells. And that actually showed the ability to delay the onset of myeloma so it didn't completely turn on these T cells forever but essentially push them into a functional state where you could delay the onset of myeloma. And you could see that right at the single cell level by looking in that case at the at the end protein, but also looking at the mouse survival over time. The goal of our single cell experiment one was to understand cancer evolution understand T cell responses, but then that last step, and the plan is right at the beginning of the experiment is that validation what's that validation of your biological observation going to be. In our case it was mouse survival and we were able to show that. The subject is actually when I worked on with Gary's group looking at glioblastoma stem cells and looking at a subclonal analysis of subclonal cancer populations that exist in that in brain tumors. Also published if you really want to get into the details has 40 pages of supplemental methods so if you really want to get into the details it's like a little textbook on how to do single cell work. The idea here was really all based around there's a glioblastoma stem cells the idea you have these cancer stem cells that inhabit the bulk tumor. And the idea is chemotherapy is clearing out the bulk tumor, but the cancer stem cells are resilient to it either because they're not proliferating or they're either just not being selected by chemotherapy. They're the ones that give relapse to the bulk tumor. And the goal of this entire project was to come up with therapies directed against cancer stem cells to pick them out specifically, then the bulk will just die. You know, the glioblastoma would lose this ability to repopulate the bulk. So that was the hypothesis going in. And that's why we're so interested in single cell because bulk sequencing grinds up all the signal and you can't see the cancer stem cell signal. In this case, we had worked with two sort of two brain experts Peter Dirks and Sam Weiss, they have the ability to take a patient's tumor digest it into free floating single cells. And then each lab has different ways to do this one use flow cytometry to put these into wells, and then you grow up your culture in a single well. Whereas Peter's group had essentially a plate, you grew up your cells in a plate, you select using media select for brain tumor stem cells. At the end of the day you get the expanded and enriched brain tumor stem cell cultures that we could then use for single cell analysis. So very different from the myeloma experiment myeloma experiment we're grinding up and sequencing every single cell. This is the exact opposite where we knew the cell population we wanted to go for. So we went through all this trouble to isolate the cancer stem cells specifically. So the first question was, are these all genetically and transcriptionally the same. So our brain tumor stem cells, totally homogeneous, or are they totally heterogeneous. So we had 29 patient derived stem cell cultures. I wanted to use single cell sequencing to answer this question. And the answer was it's both. Some of the cultures are perhaps not surprisingly, some of the cultures are clonal so I've sorted all these cultures by the number of sub clones so these ones up here have to what appeared to be two sub clones. It actually turned out, the first sub clone was always a cycling as a population of cells and the second was in a different cell cycle. So, while you see brown and green, they're actually the same cell identity, but they're at different positions in the cell cycle, even though they're supposed to synchronize we always had this little non proliferating population. We always had a proliferating population. So that sort of explained the first row. As you get to the second row you can see three, four and if you get to this really complex is IDH mutant brain tumor you see a huge number of different sub clones. You sort of see this distribution here, but it really was giving us insight into how different these different cultures were. And this may explain the differences that we saw in some of the drug screening experiments that were being done in these cell lines. One of the differences we have for all these sub clones is it being driven by genetics are there additional chromosomes additional mutations that it sort of explained the subclonal structure. So if we took three, three of these samples we had a tumor for treating immune cell we had cancer cells, and we had the glioblastoma line. In this case we use the trick instead of doing pathway analysis using biological pathways we use pathway analysis using chromosomes we took all the chromosomes on chromosome one all the genes rather on chromosome one, and then did a differential gene expression to all the other chromosomes. And what we found was gain of chromosome seven so the two cancer populations here are showing very high up regulation of all genes on chromosome seven. And we knew in these two lines they had a copy number gain had extra chromosomal copies of that specific chromosome, and you don't have that in the normal diploid T cells, and also works for deletions as well so inferred down regulation of this pseudo pathway that actually is all genes on chromosome 10 all showed low expression and that because one of the chromosome and the copies of chromosome 10 was just missing. So you could use a tool that already existed, but throughout the biology and just look at the physical position on the chromosome and you can infer copy number changes, copy number changes across the entire genome. So taking that lesson you can actually apply that to an entire glioblastoma stem cell line so here's one line. We saw four sub clones, and the way to read these each little tiny like one pixel thick line here is a different single cell, and each row is a different chromosome seven every single cell basically had gain of seven only a subset of the cells actually had loss of 10 so every single gene on chromosome 10 in the subset of cells showed low expression that's extremely unlikely. Due to gene regulation or a pathway regulation it's much more likely that that DNA was just wasn't there and that's why those genes are being expressed. And you can go down even lower so it's a little tiny pink cluster as characterized by gain of chromosome 20, and you can really go down the rabbit hole and start to pick out like little sub sub sub sub sub sub clones and basically until you run out of cells but you can build these really beautiful phylogenetic trees of all the relationships purely using a copy number data still derived from RNA. So even though it's a DNA feature, you can infer the DNA feature from RNA expression. Just like in the myeloma mice when you try to cluster all the glioblastoma stem cells at the global level. They never cluster together so there's really something fundamental at the both the DNA level, but also at the transcriptome level. So this was a bit of an issue. In general if you had lines from the same tumors they're similar, they're more similar to each other than they are to other patients. So you almost sort of get that blending like you did in the mice of all the T cells and B cells and all the other non malignant cells. And this is a phenomenon that's been reported across virtually all cancer types. Each patient's tumor really is really is distinct question we want to answer is, is there any common biology so even if every single gene isn't beautifully correlated across patients. Is there a therapeutic vulnerability that you can exploit. So in a study we're looking at developmental at at gradients we're trying to say okay we know they're not different but is there just like in trajectory. Is there a gradient of biology that sort of explains this variation across all glioblastoma stem cells. And there are really two signatures that came to attention, a developmental signature and an injury response signature so developmental here in red so if you're high in development. If you're really low for injury, if you're high for injury, you're low for development. So the model that we extracted out of the single cell data is these this is basically an axis and depending on where your cell went wrong and sort of came out of that gradient. So that's where it defines the biology for that tumor. And you can see your in general, all the cells from a glioblastoma stem cell are heavily development and low and injury, or high and injury and low and development. So there's still some of these in the middle where you sort of get the two gradients kind of crossing. So much this is a better visualization of it where you can really see not trying to put hard thresholds on biology and that was the other like big finding out of this paper was moving away from being overly compartmentalized with these cancer sub types, and instead sort of tolerating some continuity of scoring across all your different cancer models and that's really what this figure is showing here. It's not like there's development or injury to continuous model. We then took what we learned from the purified glioblastoma stem cells and we jump back to the primary tumor so you still want to see can we find the signal, these little tiny cancer stem cells in the overall bulk. And did we did so if you take the injury and response gradients, super clean and the cell lines you get two really nice peaks, the developmental ones and the injury ones. But you look at bulk cells, you can still kind of see that signal there but it's being masked by all the bulk tumor that don't necessarily follow this gradient so the stem cells around the gradient, but the bulk tumor is not they're basically fountaining out these abnormal cells that aren't necessarily perfectly on the gradient where the where the stem cells were some of these tumors more pure than others you can definitely see the signals in there. But it's not something you necessarily would have seen certainly using bulk sequencing, and even using unselected cell sequencing this biology is not necessarily apparent. And this is the final model Gary actually named this the peanut wizard hat because we used to have it oriented this way with the hat going at the back for going up. And the, the peanut is the developmental injury response gradient, and these stem cells are then fountaining out these mature astro site tumor cells so essentially they're pumping out bulk tumor. But this is the problem with combating the bulk tumor, the black part, you still are retaining this the stem cell model and the research now is around. What do we do about these stem cells, how do you pick out cells that are either end or in the middle of this gradient. So this is a single cell project sort of took us into this whole new area of research. And then we validated also using a CRISPR screens as well so this is a work with Stefan on Jay, he said we found this gradient, they have this massive, this massive CRISPR library, and essentially, all the glioblastoma cell lines that run the injury response genes all had there had CRISPR vulnerabilities for injury response genes, and same story on the other end of the gradient those are all the developmental response genes. So really encouraging you to think, not just to your UMAP for your Disney plot but that one extra step what's your validation experiment going to look like are you going to go back in the bulk tumor so you can go do a CRISPR experiment model most model. I'm going to plan that ahead. And I'll just know I'm running out of time here so I just wanted to skip to my last anecdote, which was this patient here. So this is an individual multiple myeloma, just like we did with the mice, we've got a bone marrow aspirate. We did single cells we can sequencing on their bone marrow aspirate, we isolated the T cells and able to pick out definitely an enrichment for exhausted T cells in their population. We also zoomed in specifically on their cancer cells so this big population here is the multiple myeloma cells. And we in this case we actually had two samples we had before treatment and after treatment. So the treatment that they were on was an FGFR3 inhibitor, and actually worked very badly and actually that drugs been taken off the market now because all the patients were progressing. So we want to understand why that why that was. So when we zoomed in on their sequencing reads so here's their here for clusters the top two were only seen pretreatment bottom two were only seen post treatment. And then on their individual sequencing reads, all these cells in cluster two I've shown four of them here, all had this in frame, activating deletion within FGFR3. And in fact, that's the exact cluster that fully melted away. And then this large cluster that has deletion of 17p which is a very aggressive marker of multiple myeloma. This subclone that only had very, very few cells pretreatment essentially filled the niche. So it actually turned out this drug is amazingly specific and highly effective against the subclonal population with this deletion of FGFR3, but it's totally ineffective against this resistant population so it actually turns out it's an excellent drug, but the selection of the patients is a problem in this case they're choosing patients who have these header highly heterogeneous tumors. It worked perfectly in the exact cells that were only visible with single cell sequencing, but it's all the other cells that didn't have this mutation that explained their relapse. So, just to revisit the learning objectives we have to start is a huge number of single cell sequencing technologies you really want to tailor them to your specific scientific question. You can measure the same biology multiple different ways so you can measure biology using protein like with site-seq, you can do that CRISPR screen experiment, you can have a mouse model and then compare it to humans. There's lots of different ways. These are all hammers looking for nails, so you really need to tailor these experimental techniques to your specific question. You can assay multiple different cellular compartments if you're really interested in nuclei, you can isolate just the nuclei, if you want the cell surface protein, you can use site-seq. Think about whether you need to do cell selection, if you're only interested in T cells or do you want to do agnostic sequencing like we did in the Myeloma mouse, and then you really want to be critical of your own data. Data quality, if you've gotten outlier, is it technical or is it really exciting biology? Be critical of your integrations. Your T cells in general should integrate together. It's very rare that cancer cells integrate beautifully together. And then use orthogonal techniques. Use single cell sequencing maybe for discovery, but think ahead to what that validation experiment needs to look like. Is it a clinical outcome? Is it a functional screen? Think all the way through from the tissue all the way through to your final analysis. And I'll leave it there. I listed a huge number of papers. They're all in the slide deck. I'll post off with Gary and I, her name is Shamini. She runs this single cell group that's really focused on spatial transcriptomics. If you'd like to join that group beyond this course, that's our Slack channel link there. You can click that link and it'll link you into that working group. And my email is here too.