 Welcome to MOOC course on Introduction to Proteogenomics. After understanding the sequence centric proteogenomics, we will now listen Dr. Kelly Ruggles and she will talk about how to use IGV using one of the examples of a human gene. The prerequisites for the hands-on is to download IGV on your system. You can see this link on your screen. If you have not downloaded the IGV yet, please pause this video and download the software to move forward. Keep in note, you need to have Java on your system before you start using this software and this hands-on session, else IGV will not work. You should also have the VCF file and dot BED files shared with you this week. Please download those files because those will be the input files for the software which we are going to use. Files name will be sequence pgsnps.vcf and sequence pg underscore junctions.bed. So now you have Java and IGV installed in the system. So let us welcome Dr. Kelly Ruggles for today's hands-on session. Okay, so move the rough seek up, click on a specific area and then you should be able to like, you'll hover over this area here and you should be able to drag it down. Sometimes it takes a little, you have to zoom in quite a bit and you should be able to get the sequence information which of course I'm not going to be able to do while I'm doing a demo. There we go. So you should be able to zoom in enough that you're able to then drag it down and you'll get the sequence information. And what this is, is if you zoom in further is the nucleotide sequence and then the actual amino acid three frame translation. So everybody at least trying it to the point where you can see the sequence. Show translation. If you're only getting the nucleotide sequence and not the amino acid sequence, if you left click and it'll give you some options or right click and it'll say show translation. So click on show translation and then it will show you the three frame translation. Yes, same thing. So if you right click, show translation. Yeah. Oh, if you don't have Java, it will not work. Thank you. Java eight. Is that the issue you guys are having? Oh, okay. Sorry. Take for granted, these things. Yeah. So again, if you only see the nucleotide sequence, right click on the sequence and I'll and then it will give you the option of opening the translation. Yeah. So left right click on the sequence. Show translation. Okay, I'm going to give you two, two more questions. Oh, yeah. If you can't open the sequences, you just have to zoom in really, really far and you'll see them. Okay, but you're getting something. Yeah. Then you have to right click on the sequence track right there. I'll show you. And then it should open translation. You're good. But zoom in really far. So here I'll show you. You can do like this. Oh, yeah. You can just select the window. Keep going, keep going. And then it should pop up. And if it doesn't, you drag this down. You play with this until it. Thank you. There you go. Yeah. Okay. You should have your VCF and your bed files ready to go. So go to load from file, and then pick your VCF file. So it should be sequence pg-snips.vcf. So you'll want to put in the example location of the second snip, which is chromosome one, and then a very long number that you should copy. I guess I could say it, but it'll be easier if you copy it. It's one, five, five, six, four, six, three, four, eight. And you'll click on that and it should bring you to that location. So you'll see here that I have a gray box at that exact location. So it's indicating that based on the VCF file we uploaded, the snip is in the same location as that VCF file. So if you've gotten to this point, what you'll notice with this gene that's different from the last one we looked at is that it's on the negative strand, so you have to flip your sequence or else everything's going to be wrong. So hit the arrow here and it'll flip the nucleotide sequence, and it will also do a, it will also flip the translation, so it will be in the correct frame and it will be in the correct direction. So we're moving in the negative direction, right? So we know that our variant is right here and we know from our VCF file, so it goes from a C to a G. In this case, it's a G to a C because we flipped it, right? The objective here is to go through the sequence and just you can go from wherever you want to start and just by hand just figure out how to create this peptide that has this snip in it. So just take some time and what you'll want to do is kind of what I did in the last one where you go from amino acid to amino acid until you hit the snip and then you just make sure that you encode it correctly and then you keep going just to get the final triptych peptide. Does that make sense to everybody? So you're going to do an in silico translation. I chose to start at this D. You can choose to start wherever you want to and you're essentially just going to keep moving, oops, until you hit the variant. So you're just going to move in this direction until you hit the variant and then you figure out, and I already gave you the answer, what this is going to look like. So just kind of think through it and then the junction one is a lot harder so I would rather us work through this now and then. And so you're just going to create a triptych peptide from right to left and then when you hit the snip just make sure you change it accordingly because you know that this G's changed to a C. All I wanted you to do is to take a look at the to go. So this is what you should essentially have in front of you in some level. So if we wanted to put this snip into our database, we would be moving from right to left and we would be creating a peptide sequence. So we know that triptych is going to cut at R. So I guess we could start here and we can go L, Q, Q and then at this Q, we're going to have instead of a C AG, we're going to have a CAC. So when I was showing you guys this during the talk, okay this guy. So you should be able to figure out and so what that that snip is going to encode instead of the one that it is currently. So right now we have a C AG, but really it's changing here. So figuring out what that what that looks like. Is this the right? This is the right example, right? So this is all I wanted you to think about was was how replacing this this nucleotide would end up with a new amino acid that then we would then change the fastafile that we would use in our database. So here is what the quilts. You could put this into quilts as a VCF file. You don't have to actually do this by hand. So if you were to put this one line of the VCF file into quilts, it would give you a new protein sequence that would include that change in the amino acid based on the snip that was given in the VCF file. So that's just sort of what I wanted everyone to be able to like get a feel for using the IGV itself. Does that sound? Does everyone kind of get where the next one is going to be a little more difficult? So the other thing we can do is looking at these novel splice sites. So I made this junction file. So if you upload the junction file very similarly to what you did before. So if we go to file load from file junctions.bed and you open this. So this one I only, if you, I just have junctions for ERBB2. I made it easy. So in the field here, go ERBB2. And it'll bring you to the ERBB2 annotation. Once you're there, you'll see that there's a couple of junctions that should be in your junction file. There, I changed the color so that the purple ones are ones that are annotated. So you can see that they connect known exons here. And the red ones are novels. So the one, I did, we looked at one of these novel ones during the lecture. So the other novel one is what I was. So once you have the file open and it looks like this, we're going to look at what this novel junction to what the translation of that novel junction to would look like if we were to throw it into the sequence database. So what you want to do is zoom in on this second red junction. So what you do, how you zoom is you can just come up here and you actually just create the window around what you want to see. So it might take a second to load. There we go. So you should see the end of this exon and then the beginning of this junction. And so the junction is just showing like essentially the RNA seek indicated that there was a connection between this exon, the end of this exon and this some area within this intron. This is what it's telling us. So if you go to the end of the exon, you'll see that you can get, oh, by the way, we're looking at a gene that's going, it's on the forward strand. So change your sequence back, arrow back, or else it's not going to work. It'll be very confusing. Does that make sense to everyone? Okay. So what you'll do here is we're going to make another tryptic peptide. So again, we'll start after the arginine. So it'll be P, E, D, E, C. And then there's this extra G hanging over like we saw in the original. We had two Gs in the last one, but now we have one G. So I'm going to show you in my PowerPoint because I think that might be easier. So you zoom in here and we can get the sequence because we know when there's a boundary. So there's a junction between a known exon and something new, right? So the thing that's known, we can keep as is. We can just take the sequence from that. And then we just have to figure out with the boundary next to it what that extra sequence is going to look like. So we take this sequence. We have this G that's hanging. And then we look at the other side of the boundary. So zoom in to the other side of this boundary here. And this is where you'll get the nucleotide sequence that will continue on from the original exon. And then you can figure out, you can do an in silico translation to figure out what that full peptide would look like. So what these are showing are the boundaries. So it's just showing that these two connect by splicing. So it's just showing that this exon connects to this exon, which we know. This one is connecting these two exons. This we know now. Yeah, we already know that. That's already annotated in the genome. We know that these two exons spliced together. We knew that. That's like somebody else figured that out. It's in the database. So why that is my question. So why it has not come as a exon here then. So it's the junctions just show the boundaries between exons. Okay. So it's just showing how they connect. Okay. Yeah, this is only the junction part. Yes. Yes. So the green are methionines and the red are stop codons. It's just showing starts and stops essentially. So you should be able to get from the sequence data. You should be able to manually figure out what the nucleotide sequence would look like at that boundary. So here in the file I sent you, it actually has it in there. And then you can do an in silico translation to figure out what that amino acid sequence would look like as well. So this sequence here, you would throw into your database to see if this boundary actually came up at the protein level, essentially. So we have software to do this. You don't have to do this by hand. But I think by doing it by hand, you better understand what these databases actually are. And if you, you may at some point have to do something like this by hand. So yeah. So you can hear it just saying that these are the different junctions, right? Six junctions for ER, BV2. What are junctions exactly? They're splicing. Yeah, they're kind of showing how things splice together. So they're showing the connections between exons. Okay, these are the connections. So if I'm zooming it in. Yes. So here. That's just because you're too far in. So, okay, those two ends are going to join. So here, these are the stop codons. So those are only, so you have, since you have three frames, that's why one of the frames in your, in an intron here, so like none of this really matters. Okay, so this is the intron there. Yeah. Okay. So if you zoom out, you won't see a stop codon in an exon. Okay, so these two are going to join because this is the junction, so these two will join. Yes. And here there is no joining, but here it is joining here. In the middle of nowhere. So that's, so that's what you're, what you do want to do is find the sequence here and then the nucleotide sequence here and then figure out what that would look like if it actually joins. Okay, so if it actually joins, what the amino acid does exactly. Exactly. Okay, okay, okay. Yes. First, we are going to the genome. The file will load it. Is it loading for the file? Then can you just interpret what exactly we are doing and what we'll get it? Yeah, so. Please conclude then it will be. So I just wanted to show two examples of if you have a SNP and you, how do we think about how that SNP would be encoded into the, into the proteome so that we could throw it in the database and find it. So if you have that one SNP, how does it impact the peptide? For the junctions, it's if you have some expression. Because of that change in the SNP. Exactly. Whether a new peptide would be there or not. It will be. Yes. And then this one, so for the junctions, the junctions are showing. How to see that? Where, where the two exons are connecting. So, so that say that again? So like this is showing these two exons are connecting. Okay. And this one is showing, the red one is showing that there's an exon that's connecting to the middle of an intron, which makes no sense, right? Like we wouldn't expect to see that because we expect to see two exons joining but not exon, an expression in an intron. So if it's like cancer, we want to see, is that intronic expression real? Is that like some new isoform that we've never seen in normal tissue that is existing in cancer? And if we want to see that, we have to be able to encode the sequence in the intron as a protein because it would never be in a normal database because it's a new expression in an area that's not normally expressed. So, yeah, so that's a good question. If you have, so the whole, the reason we would do this is we would, we would get the new peptides and then we would search the MS data with the new peptides and see if it comes up in our data. So you just want to look at what, so in the FNB file, what this nucleotide was changed to and then change the amino acid to fix that and then I just, that will be the new peptide you would put in your database. Look what we have learned, how to see SNPs in a gene with respect to the reference genome of human and one could also look into their data using the reference genome of the target organisms. SNPs in the genomic viewer enables us to look in all three frames of translation and possible effect of the SNPs on the translation. We also saw how one can find the truncated proteins and slice junctions. We also learned how to look at the junctions which may include axons and part of introns to form the variant peptides due to the SNPs. I hope this hands-on session was useful and now you will be start using this for different applications. Thank you.