 Welcome to MOOC course on Introduction to Proteogenomics. In the last lecture, Dr. Bing Zhang gave you very lucid elucidation of studying polymorphism. The cutial studies help to understand the effect of genetic variant on another gene located on the same or different chromosomes. G-Vas studies help in identifying whether a particular gene variant from the entire DNA set could be correlated to a variation in the phenotype. In this lecture, you are also introduced the single nucleotide polymorphism or SNPs and genome wide association studies or G-Vas studies. Today's lecture from Dr. Bing Zhang will be another effort to explain the power of integrating expression quantitative trait loci study or EQTS with genome wide association studies or G-Vas. So, let us welcome Dr. Bing Zhang for today's lecture. So, let us look at some examples. The first example is this group, so basically they know this sleep is associated with hemoglobin concentration in blood cells and then, but they do not know how why this happened and of course, we are interesting interested in how this could happen right. So, they did a cutial EQT analysis and through the six EQT analysis, they found this sleep this exact sleep has a good association with the gene expression right next to the sleep the gene is called the SMIM 1. So, and then you can generate a hypothesis that the sleep probably affects the this hemoglobin concentration through this gene expression right, but that is just not hypothesis, but at that time there is nothing known about this gene, but what they did was to and through some gene network analysis I am going to talk about later in the next lecture and then they found this gene is particularly associated with the hemoglobin metabolism genes in the network indicating it might have the similar function in that process and then through some functional experiments in human and also in model organisms that demonstrated indeed and this gene is involved of a through deletion experiments, they found causal relationship between this gene expression and the phenotype you are interested in. So, this shows the power of doing the six EQT analysis and also we can think about the trans-EQT analysis. Let us see this sleep is associated with the systemic colopus iris metastasis or SRE this disease, but at that time nobody knows how this sleep is associated with that disease and through EQT mapping they found that this sleep is associated with multiple genes and one gene is C1QB and with decreased expression of C1QB and also increased expression of multiple genes involved in the type 1 interferon response pathway. So, this helps them to think that maybe this sleep has a effect because I mean it has I through the order the expression of this genes and the interesting thing is that and the decreased C1QB and increased type 1 interferon response has been a hallmark I mean it is already known the disease has this phenotype. So, but now we know which genes are mediating this impact and similarly I through the traditional G1 analysis there are multiple sleeps that has been associated with type 1 diabetes, but again we do not know how this sleeps execute their effect, but through EQT mapping this group found that although this sleeps are located at different locations of the genome, but they converge. So, they alter or they control the expression of the same genes that means different alterations genotypes converge to the same gene expression alteration and eventually change the phenotype. So, this shows I mean examples basically show how you can integrate the G1 analysis through getting the DNA sequence and also the for example, R6 to get the gene expression and the combining them and of course, the phenotype and then you will be able to not only associate the sleeps with disease phenotype, but you also know how what are the gene expression that are involved in this. You would think I mean the protein expression is also important and we now know that the protein does not necessarily perfectly correlate with gene expression. I mean if you do this at the protein level and you probably will also get I mean additional new insights right, but because of the technologies and less matured or necked behind the R6 those technologies there are relatively few PQT error type of analysis, but this audience I think we should think about this approach and in order to integrate the G1 study for example, with PQT PQT error studies and actually there are some groups started to do this and in this study. So, basically they look at the 75 on lymphoblast cell lines and then they did the genome wide genotype study and the iron stick ribosome profiling through ribosec and the salic proteomics experiments. So, now, you have the genotype you have the MR expression and the ribosome occupancy and the protein expression. So, and then they were able to identify the they focused only on the 6th EQT error RQT and PQT errors and they found hundreds of and sometimes thousands of this 6th QT errors and then they asked I mean if I found 6th EQT error what is the likelihood I am going to find the same like RQT error and PQT error like meaning the SNP controls the gene expression also the MR expression also the ribosome occupancy and the protein expression that means consistent right, but they found for example, if you look at the RNA of course, RNA RNA is 1, but only 88 percent of the RNA the EQT errors can be replicated in the ribosec experiment or the RQT error level and only 67 percent can be replicated as the protein level the PQT errors. So, this basically indicate not all the EQT error effect reproducible as the PQT error level also it is the same I mean if you have a protein QT error only 35 percent of them can be observed as the EQT error that is kind of consistent with our observations and the MR and the protein expression are not perfectly coordinated, but this also indicate that for example, a SNP might have effect only at the protein level, but not necessarily at the RNA level this could happen for example, if the SNP is in a region that is controlling translation of the protein rather than the transcription of the gene. So, but we can look at this examples for example, in this case I we can see the effect is consistent as the protein level or the ribosome occupancy level and RNA level, because I mean the 3 genotypes the effect is I mean kind of colored by 3 different colors we can see the difference can be observed as the RNA level the ribosome occupancy level and the protein level, but in this case I mean we can see the effect is only observable as the protein level, but not as the ribosome and the RNA level. That means, this SNP is affecting the translation of this protein or the I mean maybe the stability of this protein without affecting its RNA and the ribosome translation ribosome occupancy. So, this indicates if we do both EQT analysis and PQT analysis it will give us more information than just doing the EQT analysis, but for this study particularly I mean they only look at the CIS EQT L they did not look at the trans QT L's, because as I said I mean when your sample size is small and the trans EQT area is less difficult to observe, because the effects is relatively smaller. And finally, I want to show one example and what we have talked about so far are all the about SNPs which are the German alterations right, because this audience is interested in cancer. So, we can borrow the same idea and apply to the cancer studies and in this case we did not look at the SNPs, we look at the somatic copy number alterations, because in cancer there are a lot of chromosome regions that are getting amplified or denoted right. And then we can consider that as the genotype change right and then we want to ask whether the genotype change will affect the gene expression, mRNA expression and the protein expression of that gene or maybe it will affect other genes in the genome. So, and for this we did an analysis in 90 crore rect tumors and then we calculated the copy number alteration for individual genes based on the SNP array data and then we get the R6 data. So, basically we have the MRI abundance for each gene and we did the labor free shotgun proteomics in this study and got the protein abundance for the genes and we only focused on the genes have both mRNA and the protein measurement this correspond to close to 4,000 genes at that time and but for copy number we have the data for 23,000 genes and now for each copy number data and all the mRNAs we can calculate, but in this case the genotype is also continuous right because it is a copy number measurement and then we can calculate Pearson correlation between the genotype and the mRNA expression and if there is a significant positive correlation we put a red dot here that means the copy number change of this gene will affect the mRNA abundance of that gene and then if there is a significant negative association we put a green dot here and if there is no significant we leave it black and this is a plot we get from this analysis and the interesting thing is we observed two types of interesting patterns why is at the diagonal we see a very strong diagonal pattern and the other type of pattern is this stripes I mean the vertical lines. Let us say if we use this gene as the example this locus as the example if it this copy number amplification will increase the mRNA expression of the gene in that locus. So, you are going to have a significant copy number change and the mRNA expression change and then you are going to get a red dot because here all the genes are ordered based on the chromium location and here it is also based on the same chromium location. So, if it is a safe effect you are going to see a red dot here right, but let us see this gene is a transcription factor and the DNA amplification of the transcription factor not only cause and higher abundance of that gene, but this transcription factor will in turn and activate or deactivate a lot of other genes. So, then you also see a genome wide effect of that DNA copy number change. So, we call this the cis band or the on the diagonal because it is the copy number change that will alter the MRI abundance of the same gene, but we also see the genome wide this vertical bands these are the trans band meaning copy number change at this position may affect a lot of other genes in the genome. And then we also look at the protein data. So, here is the correlation between copy number and protein we can see both the cis and the trans band getting weaker I mean we get kind of similar patterns, but it is weaker. This indicate not all the impact at the MR level can be carried over to the protein level. This is called the phenotype damper meaning there is a reduce of the effect. For example, if the copy number amplification give you more mRNAs of a certain gene, but this gene does not give you the gross advantage of a cancer cell the protein is not needed then we do not need to make that extra protein. So, that means there is additional regulation at that level that will remove those effects that is not necessary. So, and we also sometimes see the copy number protein correlation only observed at the protein level, but not at the corresponding MR level this indicate that copy number change might only affect the protein especially for the trans effect. So, for this is very helpful because now we can look at this plot and we can say ok there are certain chromosome regions that have particularly strong impact at the global level for example, this 20 Q region seems to have a big impact genome wide for mRNA and protein and there might be some interesting genes in that chromosome. And we also can look at the trans effect or cis effect and we can see and this indicate the large circle indicate all the protein coding genes in this 20 Q amplification region that we have both MR and protein measurements. And this indicates the genes with the good copy number and MR correlation and this indicates the genes with good copy number and protein correlation. So, this help us to narrow down to the genes that the copy number will not only increase the protein MR level, but also increase the protein level and these are the very likely cancer driver genes in the region. And for example, through this approach we were able to identify SAC which is a very well established oncogene in colon cancer and then the time 34 has also been reported and we all able to identify a new candidate driver HMF4A which could be a new discovery to be tested in the future. So, just to give a quick summarized summary of the talk. So, we just talk about the genotype and phenotype and the using association study to understand the relationships or test the relationships and depending on whether it is a binary treat or quantitative treat, we have to use different types of statistical tests. And if we do this as a genome wide scale we it is called a GWAS study and we can use Manhattan plot to realize the results. And then if we treat the gene expression as a quantitative treats and then we can do the EQTL or RQTL or PQTL type of analysis and the these QTLs can be divided into the 6th EQTL and trans EQTL based on the positional relationship between the genotype the SNP and the genes and we showed a few examples that you can integrate GWAS and the EQTL or EQTL RQTL PQTL or copy number and the MR and the protein expression to understand the relationship between the genotype and gene expression and the phenotype. But maybe we can have one or two questions or maybe we can discuss during the lunch. Copy number analysis please. Yes. In that the last column so, can it be like the mRNA degradation it is not getting degraded and which is one of the reason most of the proteins are sick and I mean the cast of the secret. So, I think it yeah it could be well, but if you think about the iron stick right it is actually measuring the steady state mRNA abundance. So, I think the degraded RNA will not be measured as the RNA stick I mean yeah if you consider that measurement is at the steady state I mean that has already been taken care of, but of course, I mean it is kind of dynamic. So, it might partially reflect that effect, but I would think that most of the mRNA measurement we have is for steady state. So, that it is already incorporated both RNA generation and the degradation has both been measured in that measurement. But also I mean this is not the abundance of the mRNA or protein right it is association between the copy number and the mRNA or protein. But I think the discrepancy is more likely to be caused by the author like translation or the protein degradation the half life of the proteins. Today's lecture broadly explained the use of various integrative analysis to understand disease pathobiology using examples from published literature CIS eqtl and trans eqtl mapping integrated with GWAS study were shown to correlate the gene expression with phenotype. Additionally the integrative analysis of somatic copy number variations and mRNA abundance were seen to be directly correlated. In the next lecture you will be introduced to the next generation sequencing technology and its application by industry application scientist. Thank you.