 Now, we talk about the data retrieval and first we talk about the protein sequence retrievals and structures. Protein data is of following types, there may be sequences, those sequences may be the actual sequences from the proteomic data or some other experimental techniques or they can be the translated sequences. So, sometimes we go to nucleotide databases, we get those nucleotides and translate them by using some software. So, these are kind of predicted protein sequences. We can also make structures from those proteins, they may be predicted or they may be the real structures coming from different x-ray crystallography techniques and sometimes we are interested in the functions of the proteins, so those are stored as annotations. Now, as far as the resources are concerned there are multiple resources for protein sequences but Uniprot claims to be the biggest and integrated resource whereas for the structures PDB seems like a good resource. So, here is the page for data retrieval from Uniprot, so we want to search favorite protein P53, so we put P53 in the search box, we check this search button right here and then we get output. So, here we see that there are our 18,000 different records, so it is showing us the first 25 out of them right here. We can have different columns of the output on this webpage, so we can have entry, it is ID, we can have entry name, so the suffix human is written, so it is coming from human, so it can be coming from mouse, it can be coming from rat and a rabidopsis. So protein name is cellular tumor antigen, so next column is gene name TP53, so this gene is known as TP53 stands for tumor protein, obviously from human and in the end we have its length that is its 393 base pairs. So, let us check this first one, so here we reach the record for this protein that is cellular tumor antigen protein 53 commonly known as TP53, we can have different tabs showing us its outputs, we can look into functions, we can look into its taxonomy and not many other characteristics, so if we look into the function, so it gives us some description about what it is doing, if we scroll down we can move on to the bottom part where we can see that we have something written as feature key, so it is written as site here, so there are some unique sites in different proteins which are having some specific properties in them, so this is just one amino acid present in this protein that interacts with the DNA, same way there are different metal binding sites, so mainly it is binding to zinc metal and those sites are there amino acids are shown here, the number of amino acids, so these are the regions from which it interacts with the metals. Down below we can also see the DNA binding region, for example here amino acid 102 to 292 and that is shown as a graphical view here also, GO molecular functions GO are gene ontologies, so gene ontologies are different functional annotation terms, so they define different functions, so among us them we have molecular functions, we have biological processes and we have cellular components, so here we just see molecular functions, so it tells us that it performs these kind of functions, so mainly it is a ATP binding, it is p53 binding here which is exhibited in this protein and there are other functions, so kind of like DNA binding, so all those functions related to these proteins they are shown here in these heading GO molecular functions. Next we move on to some other functions, so in the biological process categories we see that it is related to apoptosis, normally apoptosis is the cell death, so and then obviously it is related to cell cycle and some other components, so and then in the down part we see there are some enzymes and pathway databases, so the reactome is a database in which we have a group of reactions which are categorized, so these are the list of those reactions with which it is related, so we move further and till we reach its taxonomy, the top part we see there is something written as protein family or group database is TCDB, basically there is another classification in which the proteins are classified on the basis of as being a transporter proteins, so associated with transportation across the membranes and then there is a 5 digit number, so that is specific classification code which is given to each protein, so this protein has actually this code, so then we have the names and taxonomy, so protein names are there and then the taxonomy of the individual, the classification of the individual from which it is coming is can be seen from here. Let us see how do we reach to its sequence, so that will be till very last, so you scroll down until you reach in the end you find those sequences, so here it says isoform, isoform 1, so different proteins have different isoforms, different alternative splice variants, so this is isoform 1 as you can see it is also exhibited in its name, so it is P046371, so it is kind of the first isoform which is down below we can see its sequence starting from methionine which is always first amino acid in those proteins and then ending at the 393 amino acid, so it is a 393 amino acid long protein the sequence is right here, now you can click on to fast A button at the top and then you can get this output in fast A format, so we will look into fast A format few sections below, so we can also get the same protein from NCBI and in NCBI obviously this sequence is pretty similar and the arrangement is slightly different, so it says origin where the sequence starts and then sequence ends at those two slashes, so we can get it from NCBI also, so it is present where its gene is there, PDB gives us the structures, so we can go to PDB and we can search for the same ID P04637 and then it gives us the sections or the regions from which it can make up some specific structure, so these are the turns here and the black ones are where we have the empty lines and no secondary structure can be formed, they show those bands right here and this orange region is designating the alpha helix regions, so in PDB we can have the structures in this format as well as we can have the 3D structures, they look like this, so you can get those fancy structures, you can play with them and you can output into your nice and fancy figures, so we conclude that uniprot is the biggest resource for which is actually integrated resource between PIR, EBI and Swiss Institute of Bioinformatics and PDB is the good resource to get the protein structures.