 Dear students, in this module I am going to elaborate on the storage of biological sequence information. You know that the biological sequence information may be of three different types. The first one can be the DNA. At the DNA level, the sequence information primarily consists of A, T, C and G. While for the RNA, the information is changed a little bit and you have A, U, G and Cs. For the proteins, you can have 20 different amino acids which are listed here for you. So these amino acids can be represented by their one letter symbol. For instance, alanine has A to represent it, arginine has R. So a protein can be simply narrated by combinations of these alphabets. Dear students, the DNAs, the RNAs and the proteins can be sequenced in the laboratory. When I say RNA specifically, please note that there are multiple types of RNAs. For instance, mRNA, tRNA, ribosomal RNAs and others. So I am just summarizing all of the RNAs when I say RNA here. So once you obtain the DNA, RNA and protein sequences from the experiments in the wet labs, you can store them in simple alphabetic strings. Note that a DNA sequence may contain multiple genes and therefore it can have a very big sequence in hundreds of thousands of nucleotide bases or kilobases as we call them. In the case of RNA, the sequence can be of a much larger variety because you have a lot more variety in the RNAs. Although their sequence is small, but their variety may be large. In case of proteins, the protein sequences can be very long as well because multiple genes can come together and the axons can code a protein. So the protein sequence itself can run in hundreds of thousands of nucleotides. So you can represent the protein sequence, the DNA sequence and the RNA sequence as simple strings. How do we store this information? So the solution to this problem lies in the publicly available databases that are there already. For instance, in case of DNA and RNA, you have the gene bank by NIH and in case of proteins you have Uniprot and SwissProt and others by the EMBL Uniprot Consortium. Let's take a look at each one of them quickly. So here is the gene bank website. So you can just go to the URL provided here. That is ncbi.nlm.nih.gov slash gene bank and you can specify the nucleotide sequence here as well as you can select some options by this dropdown list box to insert, search by accession IDs amongst others. This is the Uniprot web page so this is also a publicly available database. Uniprot stores the protein sequences, the post-translational modifications and other related information and you can simply search it by providing the accession number here for the protein or the sequence and other credentials. So in conclusion, the nucleotide sequences which include the DNA and RNAs, they are publicly available for several species and the database is the gene bank while for the proteins, several proteins have been sequenced and these hundreds of thousands of protein sequences are available in Uniprot.