 Biological databases store biological data and their main objectives are data storage, information retrieval and knowledge discovery. Biological databases can be classified as primary databases that store the primary sequences and those primary sequences are then annotated and put into secondary databases. Whereas the specialized databases, they are dedicated towards say some specific organisms or you can have like disease data. Polagical databases can also be classified on the basis of the type of data they contain. For example, they can be protein databases or nucleotide databases, they can be protein databases, they can be RNA, genomes or gene expression databases. The issues which are in general, generally which are present in all other databases, similar issues are seen in biological databases that may be correlated with the relatively slow pace of quality assurance techniques as compared to the pace with which new data is coming. So those issues are similar like redundancy, inconsistency and incompatibility. Whereas now we talk about nucleotide sequence databases. So nucleotide sequence databases are the types of biological databases which store nucleotide sequence data in it. That can be CDNA or EST or DNAS. Here we have the diagram where we have a genomic DNA, we have different exons. As we know in eukaryotes we have exons and introns. So exons they get transcribed into messenger RNA and we can get CDNA from that messenger RNA through reverse transcription. And then we can store that CDNA into our databases. Whereas the ESTs are subsets within those CDNAs. Nucleotide sequence databases they were first assembled into GeneBank. That was back in 1982 at Los Alamos National Laboratory, LANL, under the leadership of Walter Gould. GeneBank is now working under the umbrella of NCBI. NCBI stands for National Center for Biotechnology Information. So it's a central repository that stores multiple types of biological data which includes genomes, we have their assemblies, we have their sequencing data, we have their expression data and what not. In this diagram we see the page where you can search for any kind of data. There is a drop-down list so that provides you different options. Here is the page for GeneBank. So if you want to look about nucleotides and genome sequences this is the best resource. Whereas NCBI was established at United States. Europeans they established EMBL which was European Molecular Biology Lab which is established in 1980. Same way in Japan there was the establishment of DNA and Databank of Japan, DDBJ that established in 1984. And all of them they get together into an international collaboration which we name as international nucleotide sequence database collaboration. INSTC that is a collaboration between DDBJ and CBI. And here in this diagram you see here is ENA, European Nucleotide Archive, Oblig EBI. So EMBL they established EBI, European Bioinformatics Institute especially to deal with the bioinformatics kind of stuff. And within them they have established ENA that is European Nucleotide Archive to take care of the sequence DNA sequence data sets. Here is the page for INSTC. As you can see the all three collaborators their logos are there. Same way if you look into the data we can have next-gen sequence reads, we can have capital reads and we can also have information about different samples and slides in this first page. So we can get our next-generation sequence data from this archive also which we will when we go further in this course we will start playing with that kind of data. If we look into the growth of gene bank, if we look into the number of bases in the gene bank now they are uncountable in trillions and so big numbers starting in somewhere in 1982. And if we look into this curve flu is the growth of gene bank and red ones are also we are comparing here the whole genome sequences which are starting somewhere in 2003 or 4 after the publication of the human genome project. So if we look into those number of bases it seems like they double every 18 months so the growth is huge exponential. Same way if we look into the sequences they are also somewhere around thousands of them somewhere in 1982 but now they are more than 100 million sequences in this gene bank. So in the end we conclude that biological databases this store biological data and we look into the INSTC which is a collaboration between the AMBL and CBI and DDBJ. And the growth of data in the databases is exponential if you look into gene bank it is like the data is doubling every 18 months.