 Welcome to MOOC course on Introduction to Proteogenomics. Mutations in different genes lead to different cancers, such as the Breaka1 mutation, Breaka2 mutation and Her2 mutations, they lead to breast cancer, while mutation in isocitrate the hydrogenase gene like IDH mutation leads to brain cancer. Therefore, understanding the modalities for efficient treatment of the cancer, it becomes very important to know about the mutations which are relevant to a particular tumor type. In this slide, Dr. Kelly Reggels today going to talk about how to use some of the online available tool like C-bioportal for accessing gene mutations and its expression from the published datasets. Dr. Kelly Reggels will take the case study of breast cancer and 6 most common oncogenic mutations in the clinical conditions. She will show how one could access the published dataset and understand the correlation between various genes and their mutations. So, let us welcome Dr. Kelly Reggels for her lecture today. You should have this slide, so if you, oh that is the about. So, if you go to this C-bioportal.org, that is the first step so everyone go there and then you are going to want to select a specific data type. It is called the breast invasive carcinoma TCGA provisional dataset. So, if you look in the, there should be something that looks kind of like this, this you have to scroll down a bit. Also make sure you click on this mRNA expression Z score, there is a sub panel at the bottom that has mutations, copy number and then we want to look at mRNA expression in addition to all of these. Yeah, make sure you click on the TCGA provisional and not one of the other ones because our data is not in the other ones, it is not easy to pull out. So, once you get to, you go down a bit and you will see this select patient case set. Did everyone see that? Go down. Yes, select patient case set and you want to select this protein quantification mass spec. There should be 77 samples that you see. So there is, and then there is six genes you will put in too in the gene section. So it is these six genes, TP53, PIK3CA, GATA3, ESR1, PGR and ERBB2. Yeah, you type in the gene symbols and that, just these, yeah. We want six proteins in that field. Okay, so we are going to start over for anyone who still hasn't found it. Yeah, I think we can pause and see. So is everyone able to get to the CYF portal page? Left page? Anyone has a problem with your mask? No, don't change anything, yeah. Okay, so here is the page. If you go down, you click this breast invasive carcinoma and then there is the select genomic profiles and you click on MRNA expression, the second MRNA expression, so this is the RNA-seq. For this patient case set, you want to find the one that says, here, protein quantification mass spec. This is the 77 samples we used for the proteomics CP-TAC analysis. And then we are going to type in, so these six genes were the same genes that we focused on on figure 1B, which is why I wanted to have us look at them in a little more detail. So that is TP53, PIK3CA, GATA3, ESR1, PGR and ERBB2. So you should be able to enter all of those in and submit the query. Oh, last two genes, and PGR, ESR1, PGR, ERBB2. They're up here now, too, if you need them, yeah. And you should get this. So how many people were able to get to this? Awesome. Killing it, guys, great, okay. So what you'll see here is what this shows you is the mutation status. So each of these columns is a sample, and then each of these just shows what mutation status based on this here you see for each of the different genes. Okay, so what we want to do is we don't have mRNA on here, we just have our mutations. So what we want to do is you go to heat map. So find this heat map and click down on it, and you say add genes to heat map, and it'll give you the RNA expression for each of these genes below the mutation status. One step behind, how did we get it? Heat map, click down on heat map, up on this arrow here, and then click on add genes to heat map, and it will add them at the bottom. And then we'll just have to do one more thing, and then it's all exploring and clicking things and looking at plots. So one more, we're going to add a clinical track. So if you look here at the clinical tracks, so what the clinical tracks add, information about ERPR status, HER2, so anything that's like survival, anything that's in the clinical data you add with the clinical track. So I'm just going to have you add one clinical track. Let's do HER2, let me see what we looked at here. Yeah, do HER2, so type in in the clinical track, IHC HER2, and it'll come up, and you can click on it, and it'll add that track to your data here. Is anyone having issues? They're like, okay, good, great. So what you can do here is you can sort by, so if you click on the three dots next to IHC, you can sort, so if you sort A to Z, it will sort the samples by their HER2 level. And HER2 is one of the PAM50 genes, it's one of the genes that's used for prognosis, it's associated with the HER2 subtype. So what I wanted to just show here was that you can do this, so if you pull up the Mertens et al figure 1B that we had, there was, you could see that the ERBB2 and the HER2 subtype, which is these reds, were very much associated, which makes sense, because ERBB2 is the gene that is actually the same as the protein HER2, so it makes sense that these are related. So you can kind of, just by using this website, you can get very similar plots to the plots that we actually got and showed in that manuscript. The percentage of all those mutations is different, and it's the same, what you have it, okay? Yeah. But if you see the 27%, the map is different than what you have accepted. Yeah. If you see the 27, the last... Yeah, you have 27, right? Yeah, but if you see it here... Oh, it's because I sorted. So if you sort, if you go to the three dots with IHC, click here, click on the three dots, next to IA, this one, these guys, oops, no, the three dots next to it, and you can sort A to Z. So okay, so once we have all of this, what I wanted you guys to look at was, if you go to the plots, there's this plots right here, you can play around with how copy number of one gene affects the mRNA of another gene, or how the mutation status of one gene impacts the mRNA. For example, if you look at mutation of TP53 and mRNA of TP53, you see if there's a miss sense mutation or a truncating mutation or if it's wild type, it changes the expression of TP53 based on what kind of mutation you have. So you can, what I would do right now is, if you've gotten to this point, play around with this data so you can see how different data types impact, or how different, yeah, omics data impacts the other level of omics data for the same gene and then between genes. So just spend some time playing around with that. So one is like the pictorial which I'm getting is different from the pictorial which you had shown. Oh, it's just because I sorted it, I'll show you. So if you go here and you add in the clinical track IHC HER2, so then it has HER2 status, it adds it. So one, one minute, so first we have selected data of proteomics with a 77 dataset and then now we are saying select only for whom we have done the IHC. Yeah, well this is actually just showing what the results of the IHC are so you can see whether or not the HER2 is up or down in those samples accordingly and then we can sort it out. Yeah, so then you can sort it. I'll show you. If you click here, sort A to Z and then it sorts it. Okay. Other thing, what do you mean by your truncated mutation because here I can see many. Yeah, so truncated means that there was a stop codon that was introduced so it made the protein shorter. Okay. No, that's truncated. I know, but it is showing only in the sample. Weird. Oh, these are, so each column is a sample. Yeah, so there's a lot of truncating mutations associated with ERBB2. Let's start over. Okay, so if you're starting from the CBIO portal and you've nothing worked, oh no, it's kept my CBIO portal. Here we go. Okay. So you're starting from the home site. You're able to see studies, select studies here and you are able to see, you can scroll down and see all these different cancer types, right? So if you go to the breast, you scroll down to get the breast. There should be an invasive breast carcinoma header and under which you'll see a whole bunch of different studies, right? At this point you can click on the breast invasive carcinoma or the TCGA provisional. Has everyone found that that was not able to find that before? Okay. So that's good. We found our sample. So now you can move down and look at the genomic profiles. We have mutations, copy number and then add in the mRNA expression, the RNA seek, which is the second one here. Don't change anything with the threshold, it's fine. So for the patient case set, you go through and you'll see different patient case sets and we're going to pick the protein quantification mass spec 77 case set and then we're going to select a certain genes from our data set and that's going to be the following. TP53, PIK3CA, GATA3, ESR1, PGR and ERBB2. So submit the query at this point. Okay. So now you should see this. If you don't see this, we see this. Okay. Great. Okay. So what this is is different, each, again, each column is a patient and it shows different mutation status for each of the genes based on the color here. You can see that there's a key here to see. So the other thing we wanted to do is add in the mRNA expression as that, as the underneath as the heat map. So you go to this heat map tab and you click add genes to heat map and it should populate that. Yes. Good. Were you able to get this? You weren't. Okay. For some reason it doesn't, yeah? You can put your own data on here as well. Yes. Okay. So now you should have your heat map. The other, there's one other thing to do and that's to add a clinical track. So what you do here is you just type in IHC HER2 and that should add in another row of clinical data and that just indicates what level the HER2 expression was at when they did their immunohistochemistry. And you can sort any of these. So if you click on these three dots, you can sort the track. So if you want to sort by HER2, you can do that. If you want to sort by TP53 expression, you can sort by that. If you want to sort by ERBB2, you can sort by that. So you have a lot of control here if you want to play around with the data and any kind of, you can pick any of these data sets. You can pick all of the data sets and you can play around with the data quite a bit. And then so the other thing I just wanted to point out to everyone was this plots. So if you hit the plots tab, you're able to do more quantitative assessments. So that was more qualitative. You can just see how everything is related looking at these heat maps. But if you want to see how does one level of omics impact another level of omics or how does one gene impact a second gene, you can come here. So here we have the copy number of TP53 versus the mRNA of TP53. So if there is a deletion or an amplification or a normal or an amplification, you can see how that impacts the mRNA expression. You can change this to mutation and that's one of the ones I wanted you all to look at. So I guess we could look at one specific thing within this. So let's look at how, I mean picking one of my examples. So okay, if you go into, you can change the data type so you could put in clinical attribute for example. And we put in the HER2 status as our clinical attribute on the last one. So you can say IHC HER2 here, right? So you can get HER2 status and then for every different gene. So if you do HER2 status, let's say with ERBB2, so you have clinical attribute IHC HER2. You have negative, positive, and then you have, you can look at the different levels of let's say GATA3. You can look and see how it changes with HER2 status. So I don't think we're going to have time to go through specific examples here, but I just wanted you all to at least get to the point where you can play with the data within the portal and sort of know that this data is available and that you can ask your own questions of the data and really make some nice plots and do some actual statistics within the site. You can also do survival analysis. There's a lot of, you can play around with this all day and really find some neat ways of looking at your data, at this data without having to generate your own data, which is nice. And you know, when we put in those six genes, you can also just, there's a default where you just say highly mutated genes and it will just pick for them for you. So it just depends on what you're looking for. If you're just looking at exploring the data, you don't have to say anything, but if you are really interested in a set of genes, you can look at all of these data sets and see if your gene has changed in certain, at certain omics levels. So I think it's really, they did a really, I think they did a really good job. I had nothing to do with this, so I'm not patting myself on the back. So yeah, I have to look and play around with it. I have all of the data I've ever needed, they've already put on there for me, but I know that you can upload your own data. So I can look into that and then we can talk about it. Yeah, I know that you can. It's probably, it's going to be expression level data, like VCF files, so really processed data, not like raw data. But the samples, the mass spec data is also connected. The mass spec data, yeah, they, it should be in there as well. I didn't want to go into that because we haven't done proteomics yet, but yeah, yeah, yeah. I have a sample from, you know, responder, non-responder survival like that. So I wanted to draw this graph, so I would like to know what type of data format. There's probably easier, I think there's easier ways of doing that than uploading it to this. And I can, I can send those to you, they exist. This might be harder than it's worth. So conclusions, today you have learned how one could study the mutation patterns in various cancer types in the global population. And there are online tools, a lot of information available which could be leveraged, utilized to first get a very good idea about possibilities of these mutations for a given tumor type. In the next lecture, we are going to have another speaker, Dr. Bing Zhang, who will talk about the correlation of variations in genotype, gene expression and its phenotype. Thank you.