 Welcome to MOOC course on Introduction to Proteogenomics. In today's lecture, which is the second lecture of Dr. Frederick Ponten, we are going to continue the discussion about the human protein atlas project. Dr. Ponten will talk about two other domains of HPA, the cell atlas and pathology atlas. The cell atlas provides high resolution insights into the spatiotemporal distribution of proteins within human cells, whereas the pathology atlas contains mRNA and protein expression data from 17 different forms of human cancer. He will also tell us about integrated omics aspect regarding each of the 17 cancer types with mRNA and protein expression data including the genes associated with prognosis. He will also talk about protein localization data, which is a part of cell atlas and how it is derived from more than 60 different cell lines, but so Ponten will further talk about the brain atlas project, which will give information of the region specific gene expression of human, pig and mouse. Finally, he will talk about the blood atlas and how genes of blood cell types are related to each other. So, let us welcome Professor Ponten for his today's lecture. So the next part is the cell atlas or the sub cell atlas and this is then based on immunofluorescence and here we use RNA-seq to select the cell line, which has the highest expression of each gene and we also use U2OS cell line for every just have a standard throughout all antibodies and genes. And here we then, this is also manual annotation at this part, but we are working on more automated annotation. And of course, these immunofluorescence, I think everybody agrees, produces extremely beautiful artistic images, but they are also very informative and I will just show you two examples, transcription factor. But also the cell atlas, we have produced these knowledge-based chapters where you can read about the cell line transcriptomics, the different organelle proteomes, multi-localizing proteins which are extremely interesting, the proteins which are, which then can be in both, for instance, Golgi and maybe the ribosomes, etc., which there is a lot more to, a lot more research need to be done about. And also about the cell-to-cell variation, the proteins that vary during the cell cycle or show other forms of variability, also very interesting proteins. And also here we were successful and had a paper in science and also a poster in science, etc. And these posters, if you're interested, you can just contact me and I'll send them to you because we have lots of them left. And the third part then is the pathology atlas and that's what's, in a sense, been closest to me since I am a pathologist. And here we started off with having millions and millions of images of immunistic chemically stained cancer, but not any real data, more as examples of how all our different proteins are expressed in cancer. And what we had were tissue microarrays that included 12 individually different tumors for each tumor type. So 12 breast cancers, 12 colon cancers, 12 gliomas, etc., and trying to include the kind of prototypic types of tumors. So for breast cancers we had mostly ductile cancers, but we also had a few lobular cancers, etc., gliomas, high grade and low grade, etc. But no real data, just examples. Of course it shows you if, like I showed you for SAP B2, if a protein is expressed in very many colon cancers and no other cancers, it could be a good biomarker for colon cancer. But we realized we learned so much from the efforts using RNA sequencing. So we thought we have to do something cancer based on RNA seek and then you know about the human, the cancer genome atlas. So we went and took all the data from the cancer genome atlas and massaged through it. And what we did then, we defined the cancer types where there were at least 100 patients with full clinical follow-up data and where we had then RNA seek data. And that turned out to be 8,000 patients. And what our question was was which genes correlate to patient survival for these 17 different cancer types? And an enormous amount of bioinformatics work was done during about six months, five months headed by Adil, who's a great computer scientist, and his group of bioinformaticians. Spent lots of time at the National Supercomputer Center, had the systems biology approach. All the data was already available, but it wasn't kind of put together. So we massaged it or they massaged it. And I'll show you just two or three slides of some of the summary of what we learned from that effort. And all this data is, of course, available on the protein atlas, the pathology part. So one of the more interesting things was this is all the 8,000 patients, and this is then all the 20,000 genes and how they are expressed in a PCA plot. And you can see then you can't see the 17 different color codes. But what you can see is that the cancers form clouds, which are highly overlapping. And what this means is, of course, that if I have a prostate cancer, that can be more similar to my wife's breast cancer than to another man's prostate cancer. And so we start to think that cancer might not be so organ. Of course, for surgery, it's very important. There's a very big difference. But maybe when it comes to more biological approaches to treat cancer and so on, one has to think of other ways of classifying cancer, not just based on anatomy. One type of cancer stands out pretty much. This is hepatocellular carcinoma, liver cancer, special type of cancer. Glioma stands out a bit, too, but not as much as one would expect. But so many of the cancers come. If one looks at means, this is on the mean instead of looking at all the 8,000 patients. You can actually see some clusters. Here are the gastrointestinal tract tumors. Here's the hormone-driven cancers, breast cancers, endometrial cancer, ovarian cancer, which also is pretty close to prostate cancer. Here you have glioblastoma. You see the cancers, which have quite a lot of influence of squamous cell carcinoma, et cetera. So actually, the different tumors do separate. But if you look at individual tumors, they're very diffuse clouds, actually. What we also saw was that many of our genes are much higher proportion than thought of before. Their expression level did correlate to patient survival. And that there were quite a lot of genes that we call them favorable or good genes, where high expression was associated with prolonged survival, and many genes where high expression was associated with poor survival, and about three, four, five, six hundred such both favorable and non-favorable genes for all the different tumor types. And one could find very, if you do these Kaplan-Meier analysis, very highly statistically significant separation between high expression and low expression. What about the function of these genes? And this is very generalizing very much. I know that. But if you look at the ones which are unfavorable genes, what do they encode for? What type of proteins? Well, they code for proteins which are involved in cell cycle regulations, cell cycle, progression, cell growth. So very logical that that would then correlate to poor prognosis. While the proteins that are correlated, where high expression is correlated to good prognosis, are proteins involved in cellular differentiation and immune response. Also quite expected from the pathology point of view, but never before shown on the kind of transcriptomic or this global view of all our genes. And lastly, just from this paper which then also was published in Science, is that what is extremely important when you do such a big kind of discovery, you all know this about false discovery rates and everything, is that you have to validate your results in totally independent cohorts with preferably independent methods. So this is lung cancer and this is our primary data and then we from the TCGA and then we had our own cohort of 400 lung cancer patients. And only the candidate proteins that stand attested in a totally independent cohort are the ones that are actually could be clinically interesting and relevant. And we could also show that that was true on the protein level by using tissue microarrays from tumors in large patient cohorts. Okay, so I will just show you a few last slides from the newest version of the pathology atlas. It's now called 18.1, we didn't update just two months ago and where we introduced something we called survival scatter plots. And I'll show you the views then from the pathology atlas. So the pathology atlas contains about 900,000 Kaplan-Meier curves, survival, patient survival curves for all cancer types of these 17 cancer types and all genes where you have a highly significant difference in survival if you have high or low expression with a median as cutoff. So these are shown for all genes. And then of course we show the TCGA data, the raw data from the RNA sequencing. And when you then go further into the pathology atlas, you can see then the protein data, the annotation data done here in India and then some background on the different genes. What we then introduced was these survival scatter plots. And I'll come back to that in just a slide or two. But this is exactly the same data as this. And this of course for those who work with oncology, everybody wants to make a Kaplan-Meier curve that looks like this, that has a high level of separation and a p score that's below 0.000000. And of course this looks very seductive, while this looks like it's just like a scatter plot where you barely can see that this is a highly significant gene. And this is by the way Ki67, the most used biomarker for proliferation in cancer. And in pancreatic cancer. So I'll come back to that in just a second. And we also show in the pathology atlas of course all the immunistic chemical data and where you can go to the full resolution image. And this is pancreatic cancer and Ki67. This is then the Kaplan-Meier curve where the cutoff is not the medium, but is set at the best, highest significance at the lowest p score. This is exactly the same data as this. And I'll show you what this scatter plot is. So the scatter plot has then living years on the x-axis. And then here you can see the level of RNA for Ki67 as in fpkm value. And it's two-dimensional in the sense, or three-dimensional in the sense that we then have color coded each patient so that every blue dot here is a patient that is alive at last time for follow-up. Every red patient is a patient that has died. So you can see here that this is a patient that is still living, does it say seven years after diagnosis? That is the wrong figure of course. Three years after diagnosis. And this is a patient of red one here that's died less than one year. You understand how the whole scatter plot works. And then we've just plotted in where the best separation is. And we have now then developed this further. And this is why we updated the Atlas two months ago by creating these summation curves and making it interactive. So you can put your own cutoff where you want to. And these summation curves then shows you all the blue values added up here and then just smoothened out between the different levels. So here you can see that this is if you have low levels of Ki67, the blue ones are the surviving patients. So low values of FKM have a higher relative degree of surviving patients as compared to the ones with high proliferation where you can see here are up here. So you can see that this is actually a very unfavorable biomarker. And I think this curve, it's difficult to look at the scatter plot. I think this curve is much more easy to get a feeling for both the patient cohort but also the power or the strength of the biomarker. And here we made a summation of all the dead patients and then looked at below and above the cutoff. So here you can see the patients that have high Ki67 above the cutoff, they live 0.9 years. While the ones with low proliferation, they live 1.3 years. And that's a kind of nice assessment to get in one curve instead of just showing a couple of Meyer curve, which is really just fooling you that something looks very good. And I'll just show you this slide to give you a comparison. The only other continuous biomarker that we have in clinical medicine is tumor thickness for melanoma. And here you can see what tumor thickness, what these summation curves can look like in. So thin melanomas, they live 5.4 years. Thick melanomas, they live 2.5 years. You can see a lot of blue here when you have thin melanomas. A lot of red when the melanomas are thicker. OK, so what I think I've shown you is that the pathology or the human protein atlas, not least the pathology atlas, gives you basic pathology data from RNA sequencing, protein profiling data. I think it's a great starting point for clinical studies, for biomarker studies. I think there's really a lot of data there that can be utilized for many different forms of research projects. I'll close by telling you a little bit about how we're doing in the protein atlas. We're doing fine. This is the curve of visits. These are unique visitors. And these are visits to the human protein atlas. And we have about 300,000 visits per month to the human protein atlas. Most are looking into the normal tissue atlas and looking at the data produced here in India. But also the pathology atlas and cell atlas are coming along quite nicely with a lot of visits every month. The atlas is used all over the world. Dominating, of course, is the United States, highly dominating, followed by China, UK, and Germany. But you can see that India is up here. We're also Sweden, Canada, and some of the other countries are when it comes to using the human protein atlas. So I'll end with where are we going? So this is the status today. And this is where we are at. And we're happy with that. But what are we doing now? So what we're doing and what will be the most major and biggest update of the protein atlas so far will be in about four or five months from now, where we will introduce both a brain atlas and a blood atlas. And maybe later on in the fall, there will be a microbiome and a metabolic atlas being introduced as well. And I'll just show one or two slides about these two atlas, the brain and the blood atlas. So the brain atlas will be using three different species. And this is new then for being the protein atlas. We will both use pig data in a collaboration with the BGI, Beijing Genome Institute, and also mouse data that we have on our own. And we will then try to identify which genes are specifically expressed in the brain, which are region-specific, expressed in only the hippocampus or in amygdala or some other brain-specific region. But also try to look at what about the species differences when it comes to overall gene expression in the brain. This is just the setup with what we're trying to cover with our data. And this is a little bit what the data will look like. So we will be able to both show protein and quantitative RNA measures from three different species. And I think this will be a good contribution for neurobiology research. The other very interesting part in this is also going to be extremely exciting to see how the world responds to is that we've then used facts to sort out 18 different subsets of blood cells and doing the full transcriptomic profiling of these so that one can now start to understand how they relate to each other on a more global scale, showing what genes and how do they relate to each other, different types of blood cells. So I just want to end by giving you, trying to get you to come to Sweden and visit our very cold but nice country. HPA is part of sponsoring and setting up a couple of meetings at Keystone Symposia in April, which is then proteomics. We will have an affinity meeting in the summer. And you really, I mean, this is a great meeting, of course. But June is a great month if you're going to visit Sweden. And then I hope you all will come to the HPA meeting in 2020, which then will be hosted in Stockholm. I also want to say that what has been a very nice part of the protein atlas is that we've made a spinout company, the Atlas Antibodies, and that we're now distributing antibodies to the whole world. And this company is going very well. And more proud of being part of this company is, of course, that our data, the science, that we're being so much cited and that the data that we're producing is being used all over the world. And for those of you who don't look at immunistic chemistry every day, it's always nice seeing different creatures when looking at different proteins. Sometimes you meet somebody who looks happy. Sometimes you just get a heartwarming image back. And of course, sometimes it kind of scares you to death. But there's a lot of data. So last acknowledgments, Mathias Ilyan, you know him. He's a great driver. Takes us from new heights to even higher heights. So thank you very much. In summary, I hope you got good understanding of HPA-related projects. You have seen that transcriptomics data has been obtained from the cancer genome atlas. And proteomics data has been generated in-house using the same antibodies as in protein expression profiling in normal human tissues. Dr. Pontine talked to us about the human cellular and organelle proteome chapters, which is a collection of interactive pages providing conceptual overviews, compilations, and analysis of the data. Further, he talked about how survival scatterplot of 17 different cancer types can give you different information of different RNA and protein expression with each patient's survival. In the next lecture, Dr. Jochen's shrink will talk about affinity-based proteomics. Thank you.