 Welcome to MOOC course on Introduction to Proteogenomics. To enable proteomics researchers to interactively explore the acquired data matrices of quantified proteins or post-transitional modifications and to facilitate an integrative set of analysis tools, the broad institute team has developed a software known Pro-TG which is proteomics tool set for integrative data analysis. Pro-TG is primarily developed for the proteomics platform and now it is open access for the broader audience. Pro-TG streamlines the entire proteomic data analysis pipeline provides an intuitive interface for the lab researchers to analyze and explore proteomics data sets and ensure the reproducible data analysis by the keeping track of workflows and various parameters. Today we have Dr. D. R. Mane who is going to conduct a hands-on session on Pro-TG. In the first session he will primarily focus on installation of Pro-TG and explaining its various parameters. Next he will show how to load data sets and choose annotation. He will also demonstrate running the analysis and then exploring results. So, let us welcome Dr. D. R. Mane for today's hands-on session on Pro-TG. If you have installed Pro-TG and it went through completely you will have a page that looks like this in your web browser. So, that is the tool. So, the history of this is I talked about t-tests and moderated t-tests and f-tests like two component normalization and a lot of these things that you would apply to pretty much any project that you work on. So, previously in our group I had all these tools written up as R code and then people would ask me when they have a project to run it and some of the more adventurous people would take my code and run it in R themselves and kind of work on their project. And then after Carsten joined we figured that it would be a good idea to make a tool that it is easy to use, encapsulate all the algorithms that our group thinks that we should use on a regular basis and then make it available in such a way that anybody can use it. And they would need to come to us only if their experiment or project did not fall into the standard set of tests and analysis that you would do. There are several a lot of those, but a large fraction of projects in our group kind of just use Pro-TG and then they have their results which they send to their collaborators and they look at it and say which is biologically relevant or look at other follow-up experiments and so forth. So, this is a tool that we use at the broad. It is also freely available for anybody to use anywhere. It is available on GitHub. So, you can download it and use it which is what all of you are doing now. And I will just quickly go through how to use the tool. The first thing we want to do is load a data set. So, in the hands-on part on Piazza there was a data set that I had put up. If you can download it or get it or if you have already gotten it then we can try to load the data set into Pro-TG. So, the first thing you need to do is load the data set and choose any annotations that you want to use for your analysis and then you go and do the analysis. So, loading the data set is the first part and very essential part you can do an analysis without it. So, do most people have what we need? Yes. So, the data set is the you can see the extension is dot GCT. So, GCT stands for Gene Cluster Text. It is a format that we came up at the broad. The cool thing about this is not only does this have the data table, it also has sample annotations and gene or protein annotations included in the table. So, you have the data and additional rows and columns that provide annotations. It is basically a text delimited file which you can open in Excel if you ignore the top two lines. So, if you ask Excel to open it as a text file you can see the table. The top two lines are a description of how many annotation columns there are and how many data columns there are. So, that is only for software that reads it, but for you you can ignore those two and just look at it. If you want to take your data and create a GCT file there is a broad software called Morpheus. So, if you go to Google and search for Morpheus it is taking a while, but the thing is Morpheus can read a lot of different formats. So, it can read comma separated text files all kinds of formats and then write out GCT 1.3 and once you read it into Morpheus if you want you can add annotations. You can add annotations from a separate file or in a given file you can say these rows or these columns are my annotations and then you can kind of get everything set up in Morpheus and you can export it as a GCT 1.3 file and then you can read it into Prodigy. Prodigy also can read other formats, but if you have GCT 1.3 it is easy to say which annotations you want to use because the annotations are included in it. Otherwise including annotations is a little more complex and I will not go into it today. So, in the slides I have tried to pictorially display what we need to do, but I will actually do it on the screen. So, you click on browse on the left side you will get a file browser in the file browser you pick the data set that you want to use. So, I am going to pick DRM hands on data and you just say open. So, we will say upload complete and then just wait a couple of seconds you will get a new screen. So, it says what type of data file it found says found GCT 1.3 it says what the samples are and how many. So, these are like the sample labels and it says how many times it found the sample. So, ideally you want sample labels to be unique and so, you want the frequency to be 1. Now, here on the left it is showing all the columns annotation columns that it found in the GCT 1.3 file. So, you can see there is the PAM 50 status for the cancers for samples ER PR how to status P 53 mutation status and so forth. And there is also some other experimental details. So, we have experiment number which is the TMT plex in which it was run there is the channel which is the sorry this is all ITRAC data. So, it is the ITRAC experiment and the ITRAC channel that it was run. So, all those are included in the data set. So, for this kind of hands on let us pick PAM 50 as the annotation that we want to use. So, when you pick PAM 50 it shows you the various levels in PAM 50 and how many samples there are. So, it says there are 26 basal samples, 18, 19, HER 2 samples and so on. You can also see we have 3 normal samples in there which were actually normal breast samples that were included in the study and it shows that those also. So, I will I will just start over again. So, I think some people wanted me to repeat. So, this is the Prodigy opening page you click on browse. So, it looks like in you have to wait for the full file to download. If you do not have the full file and you try to load it I think Prodigy is going to complain that the file was not complete or there were not enough rows or you will get an error and Prodigy will close. So, you have to wait for the file to fully download and once it is downloaded then you can load it into Prodigy and you will get to the screen that says you want to pick an annotation column. For which one? 0 means not mutated, 1 means mutated. So, we you could you could also designate on what type of mutation or which site was mutated, but that only results in subsets that are way too small to analyze. So, we just used mutated or not mutated and I think there is also NA which is missing. So, for some of the mutation status if you look at it there will be 3 groups 0 that means it is not mutated, 1 means it is mutated NA means we do not know. So, remember we have 3 normal samples for normal samples we did not measure whether the thing was mutated or not and. So, for those we basically mark it as NA. So, when you do an analysis you want to exclude those samples and work only on things that are mutated or not mutated I will show you how to do it, but let us just see if everyone is in reasonable shape. So, there is a way to look at the data in prodigy. So, once we get there I will show you otherwise I can show you the data separately you can open the GCT file in excel also. No that I see, but how you reach to that GCT. So, that we use spectra mill. So, we use spectra mill to create the log ratios and then we went through normalization. Well you can do normalization with this if you want, but you need at least output from the spectra mill or we use spectra mill you can use anything. The spectra mill take the input of raw mass spectra data. If you have any raw data, I track the data. All the data for that paper should be on the nature website it is all there. So, let us actually proceed. So, we need to say ok here. Yeah I will get to that. Another file is just a more fierce this one. Yes. How to sue this one? How much should? All the CV, CSV file, GCT file and this file can go to we can convert excel into CSV and that. You can use, but the annotation is a little tricky like I mentioned. So, yeah. So, we use QC pass fail to decide which one to include and which ones to exclude. So, in the breast cancer paper there were a set of samples that we excluded. So, those were marked as QC fail. What is the criteria that was told? That you have to read the paper. It is relatively complex. So, I do not want to go into it now. We can talk later if you want. Yeah I think that was because of some processing. I think it was included multiple times it is the same thing. Yeah. I think it is possible one might have 0, 1, the other one has QC fail, QC pass something like that. Protigy is completely based on R programming. So, we need to learn a basic workflow how to run the scripts to get the wave interface of Protigy. Today, I will like to show you a very basic workflow how to install R and R studio and to run the scripts that required for Protigy. This hands on is only for the people who do not have R installed in their system. First we need to install R. The version is 3.5.1 and R studio. So, on the basis of your system compatibility if you are using Linux, Mac or Windows you need to download R on the basis of that. So, I will show all the downloading and installation based on Windows. So, you need to download R from here and R studio from this website. In the while downloading the R studio you need to keep in mind that you need to go for the free version that is available here. After downloading both the R and the R studio you need to choose the installers on the basis of whatever operating system you are using. So, now so after downloading both R and R studio we need to first install the R. We need to keep in mind that we will only follow the default installation rather any kind of customized installation. The installation might take some time. So, after the completion of the first installation we need to do the installation for the R studio. So, here also I will recommend that do not go for any kind of customized installation rather just click all the tabs as a default installation and install the R studio also. So, while the softwares are getting installed the Pro-TG Broad Institute in Github. So, this is the software Pro-TG. So, from here you will get a couple of informations about the Pro-TG and even the slides that you will be needing to understand and to install or to upload the data and what kind of data format you need to do the analysis in Pro-TG. So, I will recommend you to go through the complete web page to understand and to get important information about the tool. The kind of data set required for to run Pro-TG is in P into N matrix. So, where P is the features that is that can be maybe the number of proteins or genes and N is the number of samples. So, this kind of small information you will get after reading this software web page. The installation of R studio also got completed as Pro-TG is a R-based software, but after giving some basic command it will give you a web interface. So, most of the R-based command which is giving web interface to you are always scripted on shiny. So, first we need to install the shiny packages. So, for installing any packages in R we need to keep in mind that the basic code for installing is installed dot packages and in inverted comma you need to give the name of the software. So, let us try to install the shiny first. So, as you can see after just writing install that is coming in the drop down menu. So, from here just you can choose the installed packages and here in inverted comma you can write shiny and click enter. So, as you can see after clicking enter it has start downloading the packages for shiny. So, it will take some time and after the package get installed we will try to install the Pro-TG software. The installation might take some time. So, finally, the shiny package got installed. So, now we will try to install Pro-TG software. So, to install the Pro-TG software we need to write as we have already installed shiny now all the command will be based on shiny only. So, let us see how to install Pro-TG software. So, the basic command for installing the Pro-TG software is shiny 2 columns. So, the basic command for running the Pro-TG software is need to have the shiny and followed by 2 columns then run Github, Pro-TG and Broad Institute. So, after that we will click enter and it will show that Pro-TG is getting downloaded. So, it will take some time sometimes it might take 10 minutes for some user it might take more than an hour. So, after the completion of the Pro-TG software you will find there will be a wave interface of Pro-TG will be opening and from there you can upload your data the data that is available in the Google Drive link that has already been shared to you. So, after completion of the Pro-TG installation the wave interface will look like this and in the left hand side there is a browse option where you need to click browse and you need to upload your data set. For some user there might some error comes, but to troubleshoot the error you need to read the error what is the problem that is coming if it is something that is linked with installing a software just writing install.packages inverted comma and the name of the software which the error asking for and click enter. You will see that it will help you to download the Pro-TG. So, thank you. I hope today's session was useful and you were introduced more details about Pro-TG. You must have got now a fair bit of idea how to go to the broad institute portal and explore the software Pro-TG. The session of annotation is a crucial step before running the analysis. In the next hands-on session we will learn more about different options like log to transformation, normalization, data filtering and test selection in Pro-TG. Thank you.