 There we go Because a lot of you and we all ready of course expected before we started the course a lot of you will use the genomics chromium method in order to generate single-cell tritotomics data Because so many people are using it we can go we can spend a little bit time to go a little bit more in depth It's actually going home when you generate the data and if you understand what's going on that also makes things easier to troubleshoot for example so it's about the Think about seven or eight flight or so just some visualization on on then it's you know mix in general Or actually specifically with chromium method So you have seen this MSB4 in a previous Presentation just to as a recap. So what do you have? What is the main invention? That allows you to generate single-cell tritotomic data with this chromium device So this room device is kind of a cell sort there where the cells and gel beads move through through the oil actually and The gel bead and the cell you hope that they are captured in a single DEM and Because you associate a single gel bead with a single cell you get and this is the the the gel bead has specific barcode for the bead you can actually Find out later on after sequence Sequencing which we actually came from which cell actually which we belong together in the same cell This image It looks like in this image that you actually a Lot of cells move through the device more than beads but actually in the case that Cells are moving way more sparsely through device than the beads and because you do that just like John You expect that if a bead gets associated with a cell then it's only one single cell just because very few cells move through the device so on the gel bead is that a sphere is completely packed with oligos and And those oligos they look very similar for all of them So they all have the same adapter sequence within a delby all have the same barcode forces difference between delby UMIs they differ between oligos more about your mice later on and Then we have the poly a tail or actually kept your sequence and nowadays We have more different capture sequences not only a poly a tail But also other sequences that allows you to capture for example These indexes related to antibodies Okay, so in this DEM what you get is The cell gets lies you capture Messenger RNA that is poly adenylated with this poly T oligo and That means that all of those capture transcripts from a single cell They have an identical Within the same cell and you need barcode because all the barcode oligos on a single delby are the same So when you have that Then you get a reverse transcription that still happens within the GM then the GM's are broken down and You get fragmentation step meaning that the messenger RNA get broken into smaller pieces Then you get the ligation of the primer. So the other chance basically of the the library is basically generated and You get an index PCR a measure what you end up with is this contract over here You have to be five and the P7 that are required for Illumina sequencing in order to be able to renew the sequencing lane we get the Read one and We do which are the typical Illumina adapters get the barcode that is specific for the jail beat the UMI the poly DP and Then the part that is actually sequence the part of the individual gene the other adapter You get a sample index. So that's a typical index that is sequence by Illumina in order to be able to be multiple samples and of course the P7 so When you start sequencing this contract what you get is of course the adapter sequence for the forward read and the adapter sequence of the revert read The depth of sequence word for it read is used to only sequence the barcode that is specific for the jail beat and UMI Anything else is kind of irrelevant because that there'll be only poly T or poly A and The other adapter is used to sequence the actual Sequence of the transcript and that is of course used later on to do the alignment for example and it also We also usually get a third Post Q12 or sample so we get rebound we do entry three or I one actually and that contains the sample indices So how does it look so you have brought your library or maybe your sample to the sequencing provider They did the Library prep and the sequencing for you usually you end up per sample or actually for sequencing lane you get three Post Q file we drawn entry to read wrong with only the Barcode and UMI we do with the actual transcript index and I won with the sample barcode If you use dual indexing you can sometimes or even get an I to Sometimes that I won and I to are both missing and then usually in these is they are added to the fostering type So we are actually added to the fostering files themselves the R1 and the R2 So after a sequence in so you have the role of a few files what's next so then what's next is of course We have to figure out with which read actually came from with cell or actually which we belong together in the same cell the alignment and then the quantification for gene so yeah, yeah at some point you want to end up with count per cell per gene and Also, then figure out what is a cell and what's not because they will also be background RNA in there more about it in a latest slide So for these four steps typically for Tenex data What you can do it use a software called cell ranger that is Provided by Tenex genomics and last of all those four steps So it's a single command relatively straightforward to use especially for model organisms and You just run that on your fostering files and what you end up with what you're mostly interested in usually for downstream analysis Is this countable with cells and and teams? Alternative to use in tar ranger are using star solo Actually, our solo is used by tar ranger under the hood let's say My apologies for the dog Thank you very much Starts getting active Somehow that's an important reason Another alternative is using elevin so that's from the people who also developed Selman and has a bit of a different way of Generating down tables, but I guess tar ranger is really the most used Software programs to actually perform all these four steps To come from read data to actually account data count table Of course, what cell range it needs is a reference sequence because it's one of the steps So step two is of course aligning the sequencing read to a reference genome So you need a reference sequence and it has to be relatively specific In a way you can just use your reference genome and and the DTF but the DTF has to be in a specific format Can download it from the 10x website for human and mouse. So that's like the the the standard Would be the standard reference to you So just down to make it and download it and you can get started for other Organisms you have to make a custom reference with cell ranger and gay rep Also relatively straightforward to run if you have highly standardized genome and highly standardized DTFs if you have any excellent marketing like DFP, so you have done experiments with the DFP in there and you want to actually Estimate the expression of DFP. You have to add it to your reference before so You off if you want to confide something you also have to align again the transcript Of course DFP is not part of the human and mouse genome at least not by default Human human cell and DFP is possible. So You have to add it to your phosphate and enter DTF and then again run cell ranger and gay rep in order to create your reference And then you're good to go If you have other features other than than genes for example proteins You will need a feature barcode reference. You've actually specified which barcode is associated with which protein And then cell ranger can also work with that So then about UMIs. So what are UMIs and why do we use them? So UMI So the abbreviation UMI stand for unique molecular identifier and basically it identifies authority Kind of in the name it identifies each molecule uniquely meaning it identifies each of those constructs We are creating to do the actual sequencing uniquely Which means that Before you do the actual sequencing on those constructs we create so with the barcode and the actual sequence in there And you imagine everything in there we do a PCR because without PCR We do not have enough of those constructs in order to be able to do the actual sequencing so we have to do a PCR and Then everything all of the molecules or all the fragments construct after this better work I get all the construct originating from the Same initial construct will have the same UMI Meaning that if let's say you are going to sequence your entire library very deeply at some point You will just sequence the same construct over and over again with the same UMI They do not get any more information out of it so we want to What we basically want is to have sequences in there with unique UMIs because then we know We are drawing a part of a transcript that is not sequence before So therefore we have UMIs in there also because Usually for 10 economics as you remember well the Libraries you create are usually not very complex Which means that you very quickly are sequencing the same construct over and over again But counting these UMIs you're actually correct for that So you do not count the actual alignment as you may be used to for an AC Or actually count the number of individual unique UMIs per gene So you directly correct for sequencing the same construct over and over again So when you have run a cell ranger you get a report a very nice report. I always look at it quite extensively Which gives you the main statistics that are important for well the next steps in your analysis one of them of course is for example the number of cells you have that are called they are Considered by a cell ranger as actual cells You have the number of my mean number of reads for cells So that tells you about how many reads you have generated for a cell and the median number of genes for cells So how many genes you are actually measuring? Can of course differ for a tissue some tissues you can expect more Reads more genes to be expressed than others And some of course and it depends very much on the cells there are in there Then you get information about the total number of reads you have for that run the for example, how many Reads aligned to that genome can be important, of course or is usually very important and how many reads Have met to the excerpt so to external ingredients to the actual teams for human amounts usually the The genome is well annotated or a very well annotated which means that Many of the reads are expected to align to an exonic region to a gene If you work with non-modal organisms usually annotations are not so so good meaning that Reasons that are actually genes are not annotated as being a gene maybe a UTRs are missing for example Typical problem is that pre-prime UTRs are missing which means that actually the the position That you are aligning your reach to is just missing in your reference genome And then you get very low number of reads that maps to an exonic region and therefore you do not Take them into account than estimating mean expression. So these numbers over here about the mapping can be Or are very relevant and especially if you are working with normal organisms are very important to check You get some information about the sample about the chemistry that you that's usually often detected by by cell ranger That's very convenient And of course about the cell calling And about this You might have noticed I own a dog and sometimes a bark my policy. So about dog calling Very important part of running cell ranger So what happened is let's say you have your droplet and in your droplet in ideal case most of the case You both have a cell and a beat However Usually you also have free messenger RNA. You have free a messenger RNA in tissue in general And if you have been this this associated Dissociating your tissue very often you break down cell you do cells get broken And they release messenger RNA and if that occurs very frequently What happens is that you get a lot of background messenger RNA in there and that can be a bad thing because you do not know where this messenger RNA Where it comes from? so very often it also happens that you capture only background RNA to get in in your In the droplet meaning That you have well your measuring messenger RNA, but they do not come from the same cell so well What cell range to try to do is it compares or it looked at the distribution of the number of you my first cell and it orders that in a crop and Then looks for a steep drop because what you expect is that Droplets with a cell they contain a lot of messenger RNA and droplets without the cell They contain very little messenger RNA. However, if things didn't went very smoothly with dissociation, for example You expect a lot of background messenger RNA and then it becomes very difficult to separate Droplets with a cell and without a cell because you have that much background RNA in there and then you get plots like the The bottom right where you do not have a clear drop in The number of you my first cell and then it becomes very difficult to figure out what the cell and what not what has only background in there Typically to solve this or you just have to deal with it as a biophonetician So often to me if I showed it to a customer, then they say yeah Can you do anything with it? And I say yeah, well you can just try to get cells out of there Have a try multiple threshold for example and see whether you get Biological meaningful analysis out of there, but still it becomes very difficult to know what is a cell and what's not maybe in a UMAP later on You can find signatures It's not too breezy to work So some other parameters to look at in the cell range report is of course the number of captured cells And of course usually you won't quite a bit of quite a few cells because the more cells you have The the better that this offer for example statistic the more likely it is that also rare cells end up in your sample But you can often not have too many because if you have too many cells then the chances are much higher that you have Single be associated with multiple cells And of course if you have single be associated with multiple cells, then it's not single cell transcriptomics anymore than it to cell so There is a relation between a number of cells and the number of expected doublets So if you go up to 10,000 tell you expect about seven percent of those Beats being associated with more than one cell And of course you do not want too many of those because they are actually quite difficult to identify the doublets So in principle you could filter for it there itself there for it, but it's not super trivial to do So you want to be in between these thousand and eight out and so not two little cells because otherwise well Of course, you do single cell Transparency can use 10x have quite a few cells in there, but also not too many because otherwise you get doublets Number of reads per cell depends very much on the library complexity you have meaning that At some point just generating more reads per cell does not give you more information because you're sequencing the same construct all over again over and over again But you won't end up in order to be able to Sequence the entire trust with them as much as you can sequence from that cell Typically it's between 30,000 and 100,000 weeks per cell And then sequencing saturation tells you about whether it could make sense to generate for example more reads So the sequence in saturation if it's low, which means that you do not sequence the same construct multiple times, which means that it can make them to Generate more reads if the sequencing saturation is very high That means that you have been sequencing the same construct over and over again and then also generating more reads doesn't make a lot of sense And of course, I already spoke about it the number of weeks mapping through the genome and through the transfer And typically what you can do for example is change your transfer If you're working with a non-multi-organism look at the annotation of for example the three primary pairs If that's not a very good annotation they might want to Change it by for example extending the gene length