 Welcome to MOOC course on Introduction to Proteogenomics. Today's invited speaker is Professor Fredrik Ponten who is currently a professor at Uppsala University in Sweden. Dr. Ponten will talk to us about the human protein atlas or HPA which is a Swedish based program. It started in 2003 with the aim to map all the human proteins in cells, tissues and organs using integration of various omic technologies like antibody based imaging, mass spectrometry based proteomics, transcriptomics and system biology. He will tell us about this MEGA project, how it succeeded despite having multiple challenges. He will also tell us about how Indian pathologist and research collaborators have played a great role to make everything possible for success of this project. In today's lecture he will mainly focus on the tissue atlas of human protein atlas. Further he will tell us about how RNA and protein expression throughout different tissue follows a trend and how this correlation need to be considered for research if we want to obtain the bigger picture. Dr. Ponten will also talk to us about the sub proteome, organ based proteome, secretome present in HPA which will provide you an idea how to use this useful resource for your own research. So let us welcome Professor Fredrik Ponten. What I will talk about today is the human protein atlas and I will give you first just a brief background about the project. I will give you a little bit of our results and data and where we are right now and in the end I will give you some perspectives of where we are heading the next couple of years. So this project started 15 years ago. We received a very generous funding from private non-profit research foundation, the Valemba Foundation and that has kept us alive for these 15 years and we had the goal then to have a first draft of the human protein atlas in 2015 and we fulfilled that code and I will come back to that. The project is a joint effort from the Royal Institute of Technology in Uppsala University and is head of the whole project and director is Professor Matthias Elian who is a very old friend of mine and I am heading the Uppsala efforts of the project. So our vision then and this is timely, this was if you think back this was started to be planned on during 2002 and if you remember 2001 the human genetic code was published in Science and Nature by the UPO, not UPO, UGO initiative and by Craig Venter and of course having all the blueprint, having all the ACTs and Gs, a very logical next step would be to try to add an information layer of what do then all the proteins do that our genes encode for. So that was our kind of vision and the goals then came down to let's try to make affinity probes, antibodies, let's use these antibodies to characterize the human proteome and then at last emerging after a couple of years was well if we have all the data and if we have the reagents let's try to put this into some clinical perspective and try to make some use into discovery medicine and also trying to make some biomarkers and diagnostics, future treatments, etc. So we set up a multidisciplinary team, a kind of Ford factory like research project where we had, we defined the different modules, each module had its own monthly goals and had deliveries to the next goal and so on and what we did we started with an upstream bioinformatics part where we then had the code for all the protein coding genes, we selected a code that was, that we blasted against the, I won't go into any details by the way, I think you all know this and you've heard about this, anyway this is where we started to make our recombinant proteins and the idea behind it all is that we blasted the different amino acids against all the rest of the proteome to get as unique sequences as possible, to get as unique protein fragments as possible, to get as unique antibodies as possible in the end, outsourcing the antibody production and then we had the immune technology and we ran everything on protein arrays and the, all the antibodies that bound specifically to the right protein fragment, they were then tested further in, in immunistic chemistry, immunofluorescence and western blots and what was very nice about this whole project was that all the data that we produced was put out in the open space for the scientific community to use and that was a requirement from the Walembe Foundation for the beginning and that has felt very good that there were no restrictions, all data we produced out in the open space. So what we do and what I'll focus on is then gene expression profiling and for gene expression profiling we use an immunofluorescence for looking at cells and organelles, immunistic chemistry for looking at cells, tissues, organs, that level and then we do RNA sequencing to get quantitative data for, for our gene expression profiles. And I'll briefly just give you the background for this, I'm sure Dr. Navani has told you all about this before but what we use them for protein profiling are then affinity purified antibodies against all the different unique proteins that the, are, are, are genome encodes for and what we do then we look at how proteins are distributed in all our different organs and tissues and the way we can do this to get a comprehensive look at that without wasting too much tissue and too much reagents is that we use tissue microarrays and we have them focused on normal tissues, cancer tissues and also cell lines. For normal tissues we have 46 different normal tissue types in triplicates from three different individuals. We, they make tissue microarrays by selecting representative pieces of tissues. You look under, in the microscope, you find representant, representative areas, drill out a core and then put it in a, a recipient block to, to produce tissue microarrays. And one of these can, we can make about 300, 350 sequential sections, thus used for about 300, 350 different antibodies and, and be able then to protein profile a large part of the human body by using tissue microarrays. And this was also very timely because it was at the end of the 90s when, when, when Olipeka Kalyuniemi coined the term tissue microarrays and the first instruments were, were made for this. And this was also something that made this whole project possible was that we had the possibility to use tissue microarrays. And I think this slide tells you everything about tissue microarrays, but handling 700 of these blocks for each antibody, that just would, would have been impossible. While handling four blocks here is absolutely possible. Immunistic chemistry is our basic method for when it comes to tissues, when it comes to, to getting protein expression profiles. And as you know, immunistic chemistry is a great method when it comes to spatial data, but it's, it's a poor method. It's not a method to get any quantitative data, but there's nothing like immunistic chemistry that can actually give you what structures, what sub types of cells do express a certain protein. And it gives you a little bit feeling of quantity in the sense that if you have a complex tissue, you have one population here that's strongly positive, another one that's weakly positive. At least you know that this population expresses a higher level of the protein than the other one, but it doesn't give you any quantification at all, besides from that. And of course, to do this project, we had also to transform the, the glass slides into digital images. And that was also at the time then when we started in 2003, a challenge, absolutely, to handle all the enormous amounts of, of image data and to store the data and to be able to pick up the data and so on. And, and of course, the magic of the whole project at this time was not just putting out images in a big library, but also making some data from those images. And that's where our collaboration with India and with Dr. Navani started. We realized that, you know, the scientific community would not have been helped by, by just having images stained with immunistic chemistry. And the people who can interpret immunistic chemistry and evaluate tissues is, is the cancer cell or is this a normal cell? Is it strongly expressed here or weakly? Those are the pathologists. And, and meeting up with Dr. Navani and his team of pathologists back in, in, in 2006. And we started them, set the first, first site was set up at the Indian Cancer Society in 2007 by all these talented pathologists who started looking at images. And we had to solve all the internet IT structure challenges and so on. But everything worked out very well. So we continued to collaborate and we were down here. Many from my team were here for months and worked together with, with our Indian colleagues and, and we changed the site to another venue. And, and we've had just great collaborations with, with, with India, Indian pathologists in this project. And they have produced all the data, which I'll show you on the next slide. And I've summarized that as being 10 search pathologists sitting, looking at these images, evaluating them, putting out annotations. Is it weakly expressed? Is it strongly expressed? Is it in 25% of the cell population or more? And you can see here, this is not the, the full figure, but it goes to beginning of 2012. You can see that they then go through 2 million images per year, which I think is extremely impressive. And altogether, over 12 million images have been annotated by Indian pathologists. But not only the workflow and the volume is impressive. It's also been an impressive time to, for the research collaborations. And I just did this this morning, checked out our, me and, and Dr. Navani's where we're co-authors on those papers. And they're highly cited papers in science and many good journals. So it's not only been production of data, but it's also been a very fruitful scientific collaboration, which I'm, I'm very grateful for. So that's the protein part of the tissue atlas and also of the, of the pathology atlas. And I'll come back to the pathology atlas in a while. What we realized a couple of years ago was that spatial data is great, but, but unique quantification. And, and, and I, I know that all of you know this, since you work with proteomics, which is a quantitative method to a large extent. So what we did was we went back to the Uppsala Biobank and looked for frozen tissue samples. And, and we, we went through these by microscope to see that we had normal tissue. We selected cases that were representative and where we had high quality RNA and we extracted RNA and then we did RNA sequencing to get them transcriptomics data from normal tissues. And we had at least three different individuals for each tissue types. And in the end we had, or now we have 37 normal tissue types and over 200 individuals where we have all the transcriptomics data that has then empowered the human protein atlas database. So this, now we started to learn a little bit more about the proteome and about the human proteome and how are our genes actually expressed on the protein level. Because, and, and I won't come back to this more specifically, but it has been shown and this has been a debate and it depends a little bit on definitions, but what about the correlation between RNA and protein? And, and I say that for almost all genes there's an extremely high correlation between RNA and protein. And when I say that, I mean across tissues or cell lines, if you have a high level of RNA in one cell line or one tissue type and a low level of RNA in another cell line or tissue type, the protein levels will follow the RNA levels. However, for each gene there's a different RTP, RNA to protein ratio, and that can differ by many magnitudes. But if you go across tissues, the correlation is very high between RNA and protein. And that means that you can use RNA quantitative, RNA sequencing data as a proxy for protein levels. So what we learned here was that about half of our, our protein coding genes encode for proteins which are housekeeping proteins, 44%, are expressed in all tissues. They, the proteins that, you know, build structure and cell division, all, all cell integrity and everything. Then there's a mixed bag. And then, then we have these proteins which are the most interesting proteins, the tissue type specific proteins. The proteins are only expressed in one tissue or in very few tissues or much higher expressed in a certain tissue type than compared to other types. These are the ones, of course, that, that are responsible for the special functions of different tissues. And these are the ones which will be interesting when it comes to diseases and disease biomarkers. And about 9% at the time we couldn't find any, any RNA in our 37 different tissues. And these could, of course, be pseudo genes. They could be genes that are permanently turned off after development or they could be genes that are in tissues that we didn't have like inner ear or olfactory plate or other more remote types of tissues. With this data at hand, we started then to define the different human subproteons, different organ proteomes. And we put this out on the protein atlas. And this is a part of the protein atlas where we built the knowledge based chapters. And I'll show you just one example after this. What was nice now was that we had the quantitative data from RNA sequencing and we could combine it then with our spatial data from our antibodies. So we could look at where are the adipose tissue specific proteins? How are they expressed? What about the adrenal gland? Are they expressed in the adrenal medulla or are they in the cortex? Are they special subtypes of cells, et cetera? And, of course, the spatial information together with this quantitative information doesn't give you function per se, but it gives you a very good hint of function when you see a protein expressed in a certain cell type in a certain organ. And these are just examples of such cell type or tissue type specific proteins expressed in either here exocrine pancreas or endocrine pancreas, et cetera. So we spent a couple of years writing papers. So if any of you are interested in any specific type of tissue or tissue proteome, we have probably published a paper about it because we thought it was very interesting to go a little bit more into depth what makes up the brain or what makes up the pancreas or whatever. Another way of also transecting through the proteome is to do it not by organ, but expression mode. And I talked about the tissue specific proteome. Of course there's a housekeeping proteome. What about those proteomes or the regulatory proteomes? What about all the transcription factors? Where are they expressed? Are there differences in different tissue type cell types, et cetera? Secretome and membrane proteome, extremely important for all the communication between cells and also biomarkers, of course. Isoform proteome, the very complex Isoform proteome which kind of empowers the whole biology with a lot of complexity. Cancer proteome, obvious and drugable proteome, very interesting for the drug industry, of course. And all these pages, knowledge based pages, they are then in place in the protein atlas so you can go there. And I'll show you one example from organ proteome in just a second. So 2015 we said that, okay, now we have a first draft of the human proteome and we were very successful to publish a paper in science which has been very highly cited. We had a poster in science and we rebuilt the whole protein atlas web portal to then integrate the transcriptomics data and the proteomics data. So today the human protein atlas has three pillars. It has the tissue atlas, normal tissue atlas which shows you in which organs and cell types our genes are expressed. It has the cell atlas which shows you in what organelles are our proteins expressed in the cell and then we have the pathology atlas which I'll come back to which shows you where the, how does gene expression correlate to survival for patients that have cancer. And I'll show you a very short, just a couple of slides from each of, from the web portal and I'll start with the human tissue atlas. And here you can go into and look at these if you want to go through the organ proteomes or the other sub proteomes and then you can just click on any of these tissue types. Say here I click on colon that brings me to a couple of pages that summarizes the gene expression profile in colon and if say I'm interested in then these colon specific proteins I can then click on that and that brings me into the hit list of the, of the, of the protein atlas. And here I get the 165 proteins which are specifically expressed in the, in the colon. I can choose one of these. I can click on that. Oops. And then I can click on that, that, yes. And then I get to the summary page and in this case this is a gene called SAP B2 encoding for protein that is more or less specifically expressed in the colon, in the epithelial cells of the colon and rectum. It's also expressed in the brain. We give a little summary about the, every gene all 20,000 genes and then the expression levels on the RNA level which is an FPKM and then on the protein level which is then how they are, how the Indian pathologists have evaluated the expression level, the protein expression levels. And then one can look at the data in more detail. The protein data as a bar, bar diagrams, our, RNA sequence data but we also have imported for all genes the, the data from the Broad Institute, the G-text project and also from Rekend, the Phantom 5 project. So, and as you can see there's a very good consistency from the different platforms and the different specimens that have been used and I think this gives a lot of validity to the expression data that we show on the protein atlas. And then of course one can go and look at the primary data, the protein data where we then have three individuals for each antibody and for this SAP B2 we had very many antibodies and then at the deepest level you can then go into the, the high resolution image and look for yourself where is SAP B2 protein express? Well it's expressed in the nucleus of glandular cells and colon etc. And just as a little parenthesis, since this was a very highly specific colon protein we thought maybe this could be a biomarker for colon cancer patients. So we looked in colon cancer and you can see it's highly expressed in colon cancer. On the protein level the only tissue that expresses that you can see high expression of SAP B2 was colon cancer. So here we did and you can look at the high full blown resolution also for cancers of course but here we then extended the study and did a clinical study including over 2,500 patients and actually could establish that this is a good cancer biomarker for for colorectal cancer. In today's lecture you have learned about HPA and found that human protein atlas could be divided into tissue atlas, cell atlas and pathology atlas. Dr. Pontine demonstrated expression level of different genes in 37 different types of tissue and how this information is important to understand diseases and identify candidate biomarkers. He also talked to us about how the protein atlas can provide you the status of RNA and protein expression in different cancer with patient follow-up data. I will highly recommend you to visit HPA website and explore it for you. It will differently be helpful resource for your own research. In the next lecture Dr. Pontine will talk about the cell atlas and pathology atlas in more detail. Thank you.