 Very first lecture on single cell transcriptomics, so in this lecture I will just give you a brief overview of what single cell transcriptomics is all about. We will focus on single cell transcriptomics only, so of course there are also other applications you can have at the single cell. For example, single cell A-texture, we have to make some choices during this course, so we will not be treating single cell A-texture, but a lot of concepts that we teach during this course can also be applied to other methods you can do at the single cell level. I am sure many of you have seen the comparison between bulk RNA-seq and a smoothie and single cell RNA-seq as the editor of foods. I am a bit lazy, I will use that comparison over here as well because it is a pretty good one of course, so once you can compare single cell RNA-seq to bulk RNA-seq with, again it is a smoothie and an individual fruit, so the bulk RNA-seq would be a smoothie, so what you do is of course you take a certain tissue, you grind it, extract RNA, so everything from the tissue is together in a big smoothie of RNA, so you have no idea where that RNA actually came from, what the individual cell types actually were, and what most of you probably know very well, probably better than me, is that the cell diversity inside the tissue can be huge and all of these cells can have different functions and some genes are much more relevant to be expressed in some cell types than in others, so what we see during bulk RNA-seq, if you see a difference in expression, of course that's real, that's completely going on in the tissue, but we do not really have a good idea where that is coming from. With single cell RNA-seq, what we can do is know what kind of genes were expressed in the individual cells, so let's say the individual fruit, and we can actually investigate on the level of the cell type, on the level of the fruit let's say, instead of having everything together in this one big smoothie. That means that you can answer biological questions usually in a much more specific way. So why then single cell RNA-seq? Well, I hope I already explained a little bit with the comparison between the fruit and the smoothie, but cells are of course essential for biology, so they are the basic structural often and functional unit of life and they can be very diverse, they're very diverse functions, and because of that of course for many applications it's very interesting to actually do research at the cell level and not on the tissue level as we are used to with bulk RNA-seq. What you can do with single cell RNA-seq is first of all anode-detailed, meaning I have a certain expression profile and I know it came from one particular cell and based on the expression pattern I can already tell what kind of cell type it probably was just by looking at the expression pattern, and that's already very valuable I would say because then you have an idea of what kind of cell you have in your tissue. In addition to that of course you can also compare certain treatments within the cell type, so you know what kind of cells you've had in a certain tissue and then you do a certain treatment, for example you give it a certain drug or whatever or you compare certain tissues and then you can also even say something about state of the cell or the differential gene expression between and within cell types. So those are two very important things you can do with single cell tracheotomic. They are very abstract what I'm saying now but they find their applications in very in very different ways depending on of course the research question you have. So how would you usually generate single cell RNA-seq data and is also again a very broad overview of typically them. So let's say you have a tissue, let's say you have fixed tissue meaning not for example blood, liquid tissue but you have whatever let's say a tumor for example, you have to differentiate the tissue into individual cells, obviously. And then we need to isolate the single cell in order to actually sequence all the transcripts that are in the single cell and that can be done in many different ways. We will talk about the most frequently used methods to actually do the single cell isolation after that sequence. Then there is some capture of RNA where you have a lot of different molecules in a single cell and of course we are now interested because we're looking at transcriptomics, we're interested in the RNA so we have to get it out somehow very often that is done by making use of polyadenylation. So this polyethyl messenger RNA typically has and then the reverse transcription so we can actually make it into DNA so cDNA and then sequence it. After the reverse transcription there is the library preparation in order to be able to sequence the cDNA so we're going to add some adapters there and then the actual sequencing takes place and by doing the sequencing you generate our Q-fels usually and we use these Q-fels in order and with our further downstream analysis. So then we have a quality control at least at the read level and then go on with downstream analysis. And a lot of most of the part of this course especially the exercises will be focused on this downstream analysis. We will be saying quite a bit on this first part so up from the dissociation to the sequencing in lectures today but day two and three at least and also the afternoon of day one we will be focusing on the quality control and the downstream analysis. So there are many technologies to create single cell transcriptomics data. So it started relatively recently I would say in 2009 where people actually were able to sequence the RNA of individual individual cells so then the library preparation really took place at the individual cell level where you have a single cell in a single well and during time both the throughput or mainly the throughput, the number of cells you could sequence in a single study, strongly increased and a lot of different research groups and companies tried to create methods that would enable you to actually do that. There are four main methods that are used nowadays to generate single cell transcriptomics data. I think the most basic one and also very quickly used one is just where you separate your individual cell with a packed machine and then you have your individual cells in an individual well and they do a library preparation on those individual cells in a well. Typically the method that is used to do that is called smart seek. Then we have droplet-based method and a very famous example for that is the 10x platform when you have your cells in individual droplets and that's how you separate the cells from each other. Then we have combinatorial indexing and that's where you basically add barcodes to your cell, shuffle them, add new barcodes and with that you have some kind of combinatorial index and you can figure out which origin came from which cell, more about that later in other slides and there's also Michael well-based where you have a Michael well-played so a played with very tiny, tiny wells that can fit basically a beat and a cell together and by that you individual life, the individual cell, or you separate the individual cells from each other. I will introduce those four frequently used technologies in the coming four slides. First of all there's smart seek with smart seek and you use FOC to get a single cell in a single well. We do the library preparation for a cell in an individual well and that has quite a bit of advantages because for example you can sequence the entire gene. It's very typical for single cell transphytomics that you only sequence a small part of the gene in order to estimate expression but for smart seek you can sequence the entire gene as you might be used to with bulk RNA where you also typically sequence the entire gene. Then there's Tenex genomics, more about that in a complete separate presentation later on so that will be after this presentation. So what do you do with Tenex genomics? You isolate your single cell in a GEM, a gel bead in emulsion so you have your cells inside an oil that in those oil you have an emulsion with your cell in there and the cell is combined with an individual gel bead and because a gel bead has specific properties you know which read origin you came from with cell. A bit of broad explanation about it actually does more about it in the presentation after this. So with Tenex other than smart seek what you do is you sequence your gene only from the 3 prime end so only the 3 prime end of your gene or of your transcript get a sequence meaning that you do not have any idea of other parts of the gene so you cannot do for example differential ideal form expression so you only measure gene expression with this typically frequently used Tenex platform. So just to explain a little bit more about this gel bead so we have in a solution we have gel beads with oligos attached to it and those oligos contain an adapter so those which is needed for the sequence thing then there is the barcode that is unique for the bead so let's say this bead has the light green barcode well this big has the pink barcode this one the red one and so on so each bead has an individual barcode and then there's a UMI more about the later on and then there's a poly BT bill and that is of course you to actually capture the messenger RNA that is poly adenylated so that's poly A. So what you will have is you move these beads and cells through some kind of a cell disorder and because you have less cells than beads just by chance you have a big chance that a single bead occurs with a single cell at some point in this oil and because of that you have a single cell associated with a single bead and because you have a unique barcode for each bead you know that everything you capture on that bead comes from a single cell from an individual cell because you have these barcodes you are going to sequence them later you know which beads all came from the same cell. So if you then look at what you are actually sequencing in if you're using the 10x 3 prime platform compared to for example SmartSeq well with SmartSeq obviously you often you sequence the entire gene so if you have the gene body over here so let's say if you take all genes together and you look what part of the of all the genes are sequenced so for 10x for the 3 prime method you mainly sequence the 3 prime end of a gene well for SmartSeq you sequence the entire gene so you have much better coverage over the entire gene if that is important to you then it is relevant to take it into consideration of course. Usually with the 10x platform you have a much higher throughput in terms of number of cells so you have you can sequence way more cells compared to SmartSeq because with SmartSeq you sequence the individual cells in an individual well and at some point these places become too big so you can go only up to what you can handle in let's say a 384 well plate for example so typically people do not use more than more than four plates for 384 well plates in a single experiment so therefore you sequence often way more cells with a 10x platform but with Smart SmartSeq you do not only sequence the entire gene but you also usually detect more genes per cell just because you can more specifically generate libraries for a single cell if they can optimize it more and therefore you have a higher library complexity per cell so you can see there's more type more genes and therefore you have a much better representation of the whole transcriptome of your single cell so if you compare the two so SmartSeq and 10x genomics so both of them are based on sequencing messenger RNAs a fully adenylated RNA with 10x genomics there's a strong bias towards the pre-primand SmartSeq you can sequence the entire gene for 10x genomics you only can use it for expression analysis maybe differential polyadenylation if you would be interested in that but that's a very specific research topic I would say for SmartSeq you can both measure expression but also either form analysis if you would be interested in that so you have a low number of transfers per cell you can measure with 10x genomic at some point you can just kind of the beef that be used for 10x genomics can just kind of capture more RNA that is already in there for SmartSeq you can have a much higher library complexity per cell and therefore you capture you actually sequence more transcript per cell for 10x genomics you use this cell sorter which is quite an investment for SmartSeq you only need a vaccine machine which is in many that already available for a drop that base or 10x base you go up to 100k cells per run for SmartSeq and you usually do not go up to more than a thousand uh cell per run um a bit of a downside of 10x genomics can be that you have to run all itself at one through this cell sorter which means that a single sample is a single library is a single run can be there are ways to to do to do with that but by default you only have one sample as one library which means that the sample is always compounded with the library for SmartSeq you can use more complex or experimental designs so that means that one cell is one library and can for example have plot design with that so isolated by droplet so you can have quite a bit of doublet in there meaning that a single beat is associated with not one cell but just by chance with two cells and by having that you of course do not estimate the expression profile of one cell but of two cells at the same time can be a bit of an issue also for example if you load too many cells on there you have a lot of doublet for facts it can you can have some bias towards the outside for example it's easier apparently to have bigger cells involved with with facts than smaller cells so you might actually have a lower representation of smaller cells in there first cell of course 10x genomics is way cheaper because the throughput is much higher compared to to SmartSeq so it very much depends on your application whether you want to use 10x genomics or for SmartSeq I have a question about that now I'll ask it now that's most convenient so let's say we are well we're having comparing SmartSeq with 10x my question is by having this kind of information which method would you use if you are mainly interested in rare cell types so cell types that do not occur very frequently in tissue and which one would you choose if you would be interested in mainly low reactivate genes the genes that are not very highly expressed in your cell this is them great so most of you have just correctly I would say so droplet based for rare cell types and SmartSeq for low reactivate gene the droplet based you segment a lot of cells you can sequence a lot of cells in a single rung meaning that you are also cells that are relatively rare so that do not occur very frequently are probably also in your sample or in the number of cells you are sequencing so you have some representation there that you might miss those for SmartSeq because you are just limited to a certain number of cells so usually up to a thousand let's say but with SmartSeq you have much more complex more complex libraries because you can just sequence more transcripts within a cell and therefore you sequence also these lowly expressed genes there's a question of Tui yeah I would like to ask that if the cells are too rare and then if you use the um how to say if you use droplet based and then you have another population that is so dominant it's going to be at the end it's going to be kind of overridden by this big population no is it the better way maybe to sort out this rare population for SmartSeq yeah definitely definitely yeah so if you can of course enrich your sample for these rare cell types that's always a good idea to do that and they could even lose smart things to actually sequence those if you have it enriched enough but there you can of course also imagine that there are situations where you are not pretty sure what kind of cells they are and therefore you can really sort them out or maybe you are just very you're you're the very exploratory exploratory phase so you are just looking for a method to sequence a mark a diversity of cells instead of a diversity of transcripts within the cell and that's where you would use 10x if you are interested in these well these big libraries for cells that go for smart thanks good question yeah and so cell sorting is always an option of course definitely okay so what we have done now is compared the droplet-based methods and then maybe the 10x platform with platform that is based on individual cells and individual wells which is often SmartSeq the SmartSeq method there are two relatively new methods that have quite high potential and not super new but they are immersing let's say they are used more and more frequent so that's why I wanted to also discuss them over here one of them is combinatorial indexing and there is a company that actually commercialized it which means that you can actually buy kits which is usually very nice therefore much usually much easier to apply much easier to use so it is a completely different way of being able to individualize the single cell so what you do is you have a cell your cells in an emulsion let's say and you fix it with formaldehyde and after that you do a ligation with barcodes to your RNA but this RNA remains inside those cells after that you split your cells into multiple pools and you add a new barcode then combine those again and split them again at a new barcode and then just by chance you do that four times just by chance you have a unique set of barcodes and unique set of combination of barcodes per cell meaning that just by sequencing this unique set of barcodes you know by a certain by certain chance that all of those reads you have you have been sequencing originated from a single cell so by doing four rounds it is very unlikely that you end up with depending on the number of cells to put in of course that you end up with cells two different cells having the same combination of indexes and therefore you have individual reads per cell or reads per individual cell you can trace them back to the individual cell so the nice thing about this is that it's quite flexible so you're working with fixed cells and there are no specific devices needed so you did buy a kit which is not super expensive and you do this combinatorial indexing just by yourself you can do it by yourself in the lab and then do the sequence thing a bit of a disadvantage disadvantage of it is that it's quite laborious so of course by doing the splitting and the pooling again that requires quite a few minimal steps but still it's a good alternative to to droplet-based sequencing like the Tonex platform for example because you do not need one of those cell disorders and in terms of throughput and everything you are at a similar level so you can also go up to more than 10,000 cells per experiment for pool then another alternative is based on micro wells commercialized by VD it's called VD-Repsidy so what you do there in a way it's kind of similar to smart sync but what you do is you do not use fuck who actually have an individual cell in an individual well so what you do is you have a micro well you have beats in there and you sparsely load cells on the micro well and because it's sparse meaning that you have very few cells in there just by chance it's very likely that you only have a single cell in a single micro well so many of the micro wells do not receive a cell and a few of those they do and because of that you it's very likely that you have a single cell in a single well within that well you lyse your cell hybridize your messenger RNA to a beat again in a way similar to this droplet-based and you'll message your RNA hybridizes to that beat and then you continue with the sequencing like Tonex genomics and also like VD-Repsidy by the way the only sequence the three prime and of a gene one advantage of this is that you can actually see your doublet in there so if you have two cells associated with a single beat you actually know that that is happening so you can estimate your the percentage of doublets you would expect in your dataset you cannot trade it back to the individual barcode by the way so you only know that there are doublets in there but you do not know which barcodes are associated with doublets or which beats associated with doublets okay so this was the question I just asked so then a few words about experimental design in a way experimental design is not different from any other biological experiment so that means also for single cell test with Tonex you require replication you require randomization and you require blocking if if needed which means that well if you are comparing a drug for example so treated with uncreated for example you need replicates of treated with uncreated and if you have for example cages with mice make sure you include those cases in the block design also with experimental design like any other biological experiment be aware of confounding factors so confounding factors are factors that confound with your treatment so that are not your treatment but are associated with the same sample that receives the treatment or did not receive the treatment which means that for example make sure that if you have multiple people handling the the different samples make sure for example that not one person is doing all the treated ones and the other person is doing all the untreated ones that's what that's what confound is also this can happen with reagents or for example the time of processing do you process one of them at one day one treatment at one day one treatment at the other day for example that's also confounding and can even be the sequence itself or the lane or the library so lane of the or mainly the lane can also have an effect on expressing your measures so if you have for example all of your treated samples on one lane all of your own treated samples on another lane you could just be measuring lane effects if there are any of those factors and it's they pop up during processing for example realize that they are there try to have it randomized as much possible have it blocked as possible as much as possible and if there are any factors that could affect the expression make sure to record it because if it is nicely blocked they can actually correct for it later on. There are quite a few papers on experimental design in single-cell transcriptomics and there are two links to those at this slide. So just another example of a confounded design let's say you are doing smarttik and what you can of course do what would be the most easy would be to have let's say we have the light gray ones are untreated and the brownish ones brownish mice they are treated and principle what you can do is having all the cells of the individual mice on a separate plate process these plates for example you processing takes quite a bit of time so maybe you can do two plates in a day process these plates individually and then have them also individually on sequencing lanes. Of course what you get then is that you get an orient effect of the plate and if you cannot process all the plates on a single day they already have an effect of the day and then you also get an effect of the sequencer lanes and they're all confounded within the plate and you can imagine that these effects can become pretty big at some point they just build up. A better design there would be a more balanced design where you actually divide your different mice over the plates maybe would even sometimes be better they're completely randomized but within those places can be quite of a challenge but then let's say you have all your in all your plates you have your different mice represented so your plates are in the blocks and then you have these different plates divided over different sequencing lanes so you nice when you have a block and you know which plate well of course which mice is in with individual wealth or your late one can even correct for example the plate if you're doing differential gene expression for example uh question for you there you go for the late two experiment design so let's say you want to compare two groups of mice three in an untreated to this rock and this preparation is labor attentive as is very often the case obviously so you ask a good colleague and what will be your plan there so your three possibilities here okay once we answer so stop there we go nice very good so the message came across obviously uh that's very nice so of course well the third one is is just not not correct uh i process all the treated and processes all the untreated so we don't get mixed up so that's where you start having confounding effects because your colleague might do the this preparation slightly different from you than you or completely different from you and then you're just not measuring the treatment of the drug no you can be measuring the actual different ways of how to prepare this issue and usually you're not interested in that you're interested in the effect of the drug and you cannot separate them anymore because they are confounding uh asking a colleague is a bad idea i will take multiple days process which means that you will get a day effect of course at some points you just sometimes you just have to have multiple days or very often you just have to have multiple days then you try to process of course not all treated samples you meet one and all untreated samples maybe two now you try to um you try to randomize that but of course if we can process everything in a single day um and not too long time span we randomly assign the mice from both reasons of each of it so we can and we write it down so we can correct for it later on so then a few um related methods uh to single cell transcriptomics what you can do in addition to measuring um gene expression um is to quantify proteins so what you do there and that's the method that is uh very frequently used in combination with the 10x platform with um sci-thick for example it's where you have an antibody uh get me any antibody and attached to that antibody you have your adaptive sequence that is required for sequencing an index that is specific for the antibody and then let's say for example a poly-A till and um if you make sure that these antibodies you can let them bind to to yourself you wash away the unbound antibodies and then you proceed to the sequencing and because it has the poly-A till also these sequences they are captured by the 10x platform and because you have a specific barcode associated with an antibody you know that that antibody actually attached uh to yourself nowadays um this is this has become more evolved this is not a poly-A till anymore but a specific capture sequence so you can actually know you have uh those um antibody associated uh oligos they do not compete with the messenger RNA to a B um so what you can do is therefore quantify um uh proteins together with messenger RNA which can be super powerful of course and you can have quite a few I think you cannot have up to three to four hundred maybe even more maybe even five hundred uh different antibodies and have a very uh good idea of the entire uh proteome of the cell in combination with the entire transcriptome this same method a very similar method uh you can use to actually combine multiple samples together in a single run for 10x or drop of patient method so what you do there is you have an antibody that uh binds to any cell and uh you have again an index uh on your uh attached to your antibody and of course an poly-A till where it needs a capture sequence but that index is not specific for certain antibody or certain protein no it's specific for a certain sample so you make sure these antibodies they attach to yourselves before pooling so for the individual samples you hybridizing with these antibodies then you pull yourself together and then you proceed with uh regular uh 10x protocol so you make sure they are associated with individual uh beats with individual indexes and then you do the sequencing but you also then are therefore sequence for each cell anything all the uh and the the barcode sequences of the antibodies that were associated with the cell meaning that you figure out where that cell cell origin came from from with sample it came so you be multiplex it but also if you have multiple barcodes within a single cell let's say you also know that it was a um a duplex so a doublet I could say a doublet so multiple cells within a single well so that can be quite powerful and also can reduce cost quite a bit I have another question for you okay so of course related to what I just said so if you're using antibodies to quantify proteins and I just gave the answer in the in the presentation which sequence do you use in your downstream analysis for quantification so you want to quantify the protein right so the more antibody binds to your cell surface proteins more like the the more proteins there likely work right okay I think I will give you the answer or at least show what what you have entered okay very good okay so most of you have really understood what we have been talking about it's great so it's indeed the barcode link to the antibody tagging the protein of interest so you have all these antibodies uh bind to your cell surface protein and the more of these antibodies bind the more of the barcode you will have in uh in your individual cell let's say an integral thing that was captured by the by the oil and is together with the same uh gel bead and therefore you have more of the barcode the barcode specific for the antibody and the more of the barcode you have the more of the protein was in there just to see an image so the more of those guys binds the more of that index you have in there and the more you see okay last slide um so we have been talking about single cell sequencing a lot and we will be mainly focusing on that of course but there's also something called single nucleus sequencing it's kind of an alternative to single cell RNA so you get very similar data out of that you also at some point get counts for cell but then is for nucleus actually um can be very interesting for example for tissue that are difficult to dissociate so that are difficult where to actually uh get the individual cells out but usually it's more easy to get the individual nuclei out of there um uh an advantage of that is that because you extract the individual nuclei you also get rid of all the ribosomes which means there is no translation of transcription factors anymore that can affect them the transcriptome again during processing so you kind of stop this whole uh translation process meaning that you might have a better representation of the actual messenger RNA that was in the nucleus in the tissue itself um uh a downside of that is of course you miss old messenger RNA uh that is not in in the nucleus and i'm not very sure whether it's because of that but at least what people have found is they found a lower representation of immune cells and surf proteins um and when you do single nucleus RNA so that's a bit of a downside meaning that if you are mainly interested in those kind of genes then single nucleus RNA think might not be a very good idea or maybe you might not if make me do