 Protein databases store protein data which may be in the form of sequences, motifs where within a sequence we can get some specific patterns of amino acids so they make up motifs. We can have structures in those databases or we can align those structures and we can have a database where we can find those structural alignments. The sequences to be collected were those of proteins that's before nucleotide sequences so they were the first ever sequences they were collected and in fact the scientists they were using methods developed by Sanger and Tupi that was back established in 1951 so most of the time at that time people they were focusing on making the sequences from cytochrome molecules. Those cytochrome sequences they were put together into form of an atlas under the leadership of Margaret Dehoff at National Biomedical Research Foundation NBRF somewhere in 1960s. These collections they were later on became and they were deposited into PIR, protein information resource which is now working as a collaboration between NBRF, Munish Center for Protein Sequences MIPS and Japan International Protein Information Database GPIT. SwissPROT is a collaboration between Swiss Institute of Bioinformatics and European Bioinformatics Institute. It weekly releases from 50 servers across the world and mainly it is controlled by X-PACI which is a main server in Geneva. Here is the page for X-PACI and you can find different structural alignments. You can find different proteomics data. We can also have genomic data in it. Same way international partnership between PE, EBI and SID they created Uniprot. So in Uniprot they put the protein structure database of PIR called PIR-PSD and then SwissPROT and TREMBL. So TREMBL is where we put those translated sequences from the DNA. So DNA is translated into the protein. So actually the protein sequences are actually coming from different reading frames of those DNAs. Using all six reading frames we will talk about those reading frames in some later lectures. Here is the page of Uniprot. As you can see we have three main sections. We can have protein ontologies. It's labeled as PRO and then we have PRO class where we can have the sequences and PRO link tells us about the literature. Here we look into the PRO which is the protein ontologies. Ontologies are where we can classify those proteins on the basis of their functions and different functions they have their hierarchy. So ontologies are labeled in the shape of different hierarchies. So there is a major function and then there is a trend towards moving towards the specific function. So here we can see just a PRO hierarchy, protein ontology hierarchy in this example. As we mentioned earlier, IPRO is basically integration of different protein resources. So we can have sequences from here. We can have protein expression data. We can know about protein modifications. We can also look into the ontology same way in this IPRO. We can also integrate the genomic data with the proteomic data whereas IPRO link provides the literature information and most of the research papers those we can find from here. PDB stands for protein data bank. So basically it's a repository where we have the protein structures. These structures are obtained by different chemistry and molecular biology techniques like x-ray crystallography in the labs and then those structures are submitted into the PDB. Where researchers they can get those structures and then they can compare their predicted structures with them. So it's a good resource if you are working on structural protein bioinformatics. Whereas scope is a similar effort in which it utilizes different structural elements on those proteins and then those it classify those the proteins based upon those structural elements like family, full superfamily domains and then classes. So class is the biggest in this scope hierarchy. There are different major group of classes. We can see here for example we have the class in which we have all alpha helices. These helices are formed by special arrangement of amino acids. So basically when the protein sequence is just a linear sequence of amino acids when it turns around itself it forms those secondary structures. So those structures are then recognized as alpha and beta. We are not going into the details. You might look into some molecular biology course for that or you can Google for what those alpha and beta are. But the main idea to present here is that scope actually classifies the proteins on the basis of those structures. So for example alpha in that class we have all proteins that contains alpha helices in them. We can also have beta same way we can have alpha slash beta where we have alpha helices and then comes beta then comes alpha. So they kind of they are present one after another. Whereas in alpha plus beta we can have separate regions where we can have alpha helices stacked together and then we have beta helices stacked together. So in the end we conclude that first sequences obtained they were of protein sequences and protein databases they can be classified on the basis of sequences, motifs, structures, different structural alignments. And then obviously the growth of sequences in the databases just like nucleotide databases is also relatively high.