 Anyways, I just wanted to draw a distinction between alignment assembly and pseudo alignment. So everything we've been doing so far was alignment based. So we have RNA secretes that are like 102 by 100 or 150 bases long. And we have a reference genome that the whole reference genome is 3 billion bases or chromosome 22 is 54 million bases. Alignment is basically taking each of those reads and trying to say like where on the reference genome, did this read likely come from. So that's alignment and the alignments created in that way or what we've been using for all the expression and differential expression analysis up to this point. But that is distinct from assembly, which compares the reads, each read not against a reference genome, but against each other. So if you do assembly without a reference genome, you're just basically like taking each read and you're trying to put reads and piles where it looks like they overlap each other and to form contigs. So you can do like de novo transcript discovery with that approach and figure out without any prior knowledge about what the transcripts look like or what the sequence of the genome is just from the raw data itself. So we're going to try to figure out what the transcripts sequences actually are. And we didn't do that but there is like a module that we don't we don't get to that uses Trinity to do this de novo assembly. And then the third approach that we're going to talk about a little bit now for the last few minutes is pseudo alignment, which is what Callisto calls its approach. And the way it works is by comparing each of your reads against the reference transcript dome, but it does it in a way that isn't, it's not doing like a computationally expensive alignment where it tries to figure out like, how does my 100 mer like exactly match our transcript sequences. Instead, what it does is it basically takes each transcript sequence that you think might be there and it breaks into really small pieces little words called camers. And you can choose the length but the default length is 32. So pretty small pieces compared to what the full length transcript sequences are so you take like your 2000 base, you know, Jean transcript and break it into 32 base pieces the first 32 the second you move over one and take another 32 more move over one, do that from the beginning to the end, do that for every transcript in the transcript dome, and create a database of all of these sequences. And then you go through each read, and you do a similar thing you say basically I'm going to take my read and I'm going to break it into chunks of 32 and I'm just going to look for exact matches basically. The concept of like looking for short exact matches is you can do very, very fast the computers are very, very good at efficiently checking, like, is this exact thing in this list in the, like exactly in this other list. And that's a gross oversimplification of what cluster does but that's the basic premises that it's looking for these exact word matches by building a big list of the words that could be there, based on the known transcript sequences and then just looking for those words to find out sequence, your RNA sequence data. Okay so that's the like two minute explanation of pseudo alignment and came are based analysis and that concept of like, like making words and searching against them is used that's like a general concept in bioinformatics that's used over and over again for all kinds of useful things.