 Today, we're going to talk about genomics and proteomics. So genomics and proteomics is simply the study of the genome or the study of the proteome. So the genome is a complete set of genes and all the other genetic material in an organism. And a proteome is simply the complete set of proteins that are made by an organism. But let's start off by talking about genomics. So the human genome is made up of 46 chromosomes. So it consists of 22 pairs of autosomes. So we've got chromosome one, two pairs. We've got chromosome two, two pairs. We've got chromosome three, three pairs. Two pairs, just kidding, and on down through chromosome 22. And then we've got one pair of sex chromosomes, either two Xs for a female or an X and a Y for a male. So another thing that we knew before we did the human genome project is we had about three billion nucleotide base pairs. So that's three billion A, G, Cs, and Ts all mixed together. So what do we still need to learn? What we did not know is what order all of those As and Gs and Cs and Ts were in. So if you look at three letters, C, R, A, B, if you put them in this order, they spell Crab. If you arrange them, they spell Brac. If you arrange them, they spell R, A, B, C. So they all spell very different words and they'd have a very different meaning. Same thing with those A, G, Cs, and Ts. We need to know the right order of all of those letters, kind of like reading a book. If you jumble up the words or you jumble up the letters, it's going to have a completely different meaning. And so we've also been able to figure out that sometimes a change in a single letter can be the difference between a disease or health. And so that order, that sequence is very, very important. Another thing that we didn't know was how many genes we had. So some researchers were thinking 100,000 genes. Some researchers on the low end were thinking maybe 50,000 genes. But we had no idea how many we had and we had no idea where in the genome they were. Chromosome one, chromosome seven, X chromosome, we still had to figure that out. But before the human genome project, it would take about 10 or more years to sequence and study a single gene, one gene. And this was a very slow and a very expensive process. And because of this being slow and expensive, it would take a very long time to develop new drugs or new therapies to treat diseases. So with the human genome project, they had about 20 government-funded research centers in about six countries working together and also a private company, Celeria, to do this project. And their goal was to develop cheaper and faster technology to sequence the genome. And they wanted to sequence the entire genome in about 10 years. So not one gene in 10 years, but the entire genome. And they also wanted to learn the location of every single gene. They wanted to know the sequence and order of every single nucleotide, every single AGCT. And they wanted to make this information public, available for anybody to use and look at so that people could advance science much more quickly and so that people could advance medicine much more quickly by having all of this information at their fingertips through the internet. Now the old method for doing sequencing across a gene or across a chromosome is using the method called chromosome walking. So you start off with a probe, which I've got shown in red, and then you sequence a little bit of the DNA. And then you stop. And then you generate a new probe, and then you sequence a little bit of the DNA, and then you stop. And then you create a new probe, and you create a new probe, you sequence that DNA, new probe, sequence that DNA, and so on. And so you constantly are repeating this process, repeating this process, repeating this process. Now the advantage is when you're done walking across that gene or walking across that chromosome, you know where all of those pieces you go, where all those pieces go. You know the order of all of those nucleotides all the way across that region. Now the disadvantage is because you have to stop each time and make a new probe. Stop each time and make a new probe. So this chromosome walking can take a long time to complete. So the new method is the shotgun sequencing. And this method was proposed by Craig Venter from Solarogenomics. So he said, let's put that other method on the table and try this new method. So what he wanted to do is he wanted to cut up the entire genome into little pieces. So chop it all up, all the way across those three billion nucleotides. And then he wanted to sequence all of those individual pieces. And then last, he wanted to use a computer to assemble all these puzzle pieces. So if you imagine doing a thousand-piece puzzle with no picture on the box, that's kind of difficult. Now imagine taking 46 individual puzzles or 1,000-piece puzzles and putting them all together, mixing them up, and then trying to do all of those puzzles at one time. Same kind of thing with this method. So the disadvantage of doing shotgun sequencing is you had no idea what order these pieces came in. But the advantage was even with that disadvantage it was faster. Because you could plug all these sequences into a computer and let the computer try this piece, try this piece, try this piece, and figure the puzzle out. So you can see here this piece has an AAT, we're going to match it up with this piece down here that has an AAT. This piece over here has a GCCT, this piece down here has a GCCT, and so on. In the computer, while you go home and sleep, while you go home and have lunch or something like that, the computer is constantly working trying these different puzzle pieces out until we were able to figure out the entire sequence across a gene, across a genome. Now in 2003 they finally finished the Human Genome Project, they completed it, to about 99.999% accuracy, that's pretty good. And again they came up with that same number, about 3 billion nucleotide pairs. Now of that entire genome they figured out that we had only about 21,000 genes. That's not 100,000, that's not 50,000 that they were guessing, only 21,000 genes. And that makes up only about 1.5% of our genome, far fewer than they expected. Now one reason for this is because of RNA splicing, where you can have one gene, but you can have more than one gene product, or more than one protein made from that gene. Now the rest of the DNA is non-coding, it doesn't make polypeptides, it doesn't make RNA products, and so for that reason sometimes scientists call this junk DNA. Now again junk DNA doesn't code for RNA or polypeptides, but some of it is quite useful. About 25% of our genome is DNA that encodes for introns and gene regulatory sequences. So this affects how genes are turned on, how genes are turned off, and other regulation. So that's quite important. We also have some junk DNA where we have repetitive DNA sequences that are important for chromosome structure. So we've got telomeres and centromeres. Telomeres sit kind of on the ends of these chromosomes, and every time a cell would divide these telomeres protect the genes on the inside of that chromosome. So if you didn't have any telomeres, when you would do cellular replication and make a new cell and replicate that DNA, the ends of that chromosome would get shorter and shorter and shorter until we start losing important parts of a gene. So by having those telomeres that repetitive DNA on the ends of the chromosome, it protects that DNA, that important stuff in the middle. Now we also have repetitive DNA and centromeres, and this repetitive DNA is important for cell division. So when you undergo cell division, it's important that you have those centromeres where they can attach to the mitotic spindle and separate to the proper side of each new daughter cell. If you don't have the telomeres, that DNA is just going to be hanging free, not knowing which direction to go, how to separate, or how to divide. So again, those centromeres, even though it's just repetitive DNA, are still also very important for the structure of DNA. And then we've also got some interesting elements that are called transposable elements or jumping genes. So it allows a gene to move from one place in the chromosome to another place, or perhaps even on an entirely different chromosome. And this is a field that people are starting to look into right now and understand the utility of that. Also, we're still learning what other sequences do. So we've got some other junk DNA out there. We have no idea what it does. Is it important, or is it just simply holding the place between genes? We don't know yet. So future directions. So we've been able to sequence the genome we're done, right? Not quite. So one thing that we need to figure out is how genes are regulated, how they're turned on, maybe turned on a little bit, turned on a lot, full blast. Also we need to understand what products they make, what RNA and what polypeptides. And again, some of those genes can make more than one product. And also we need to know how those products act. So what do the polypeptides do in the cell? Gene or not, we've got some genes. We know the sequence of the gene. We can estimate what that protein product is. But we have no idea what it's doing in the cell. Is it something useful? Is it just hanging out? No idea. Also some of these proteins and polypeptides can interact with other proteins. And they can form complexes, or they can interfere with the activity of another protein. We still have to figure that out. And then also we want to know how these products are regulated over time or under various conditions. So that brings us to the field of proteomics. So we know that genes make polypeptides. And all of the proteins that are found in an organism are called the proteome. Now the challenge with proteomics is that it's much more complicated than genomics. So with genomics we need to know the AGCT sequence in the entire genome. With proteomics we need to know all of the individual amino acid sequences. But there's more to it than just knowing that individual sequence. So again, we know that genes can make multiple polypeptide products. So which ones are made and which cells? Got to figure that out. Proteomics, again, different cell types can have a different set of proteins. So our liver cells make different materials than our brain cells. Our brain cells make different materials than our muscle cells. We got to figure that out. So we have to analyze each and every cell type in our body. Also proteins have a unique three-dimensional shape. So just knowing the individual sequence of those amino acids is not enough. We need to know how these guys fold, what domains are present in that particular protein to figure out how it's going to interact with other stuff. Also the amounts of proteins can vary from cell to cell. Or even within the exact same cell type. So as you age, what proteins that cell make varies. Whether you've eaten a meal, whether you're under high temperature or low temperature, under stress, whether you've exercised. All of these things influence how much of a protein is made in any particular cell. And then also proteins can be modified. So a protein can be phosphorylated where you put a phosphate group on it. It can be ubiquinated. It can be acetylated. And all of these modifications affect the activity of a protein. So again, we might know the individual amino acid sequence. We may have figured it out its three-dimensional shape. But then when you start making these modifications, it affects the activity of that particular protein. So again, the field of proteomics, even though it's older than the field of genomics since we got started in the 1950s and earlier with this, it's still so new to us. We still have so much that we're trying to explore and understand.