 the human genome project was an ambitious project which resulted in the sequencing of the entire human genome. So that is a little more than three times 10 to the power nine base pairs of genetic material. That's a staggering amount of genetic material. How did they succeed in sequencing that many base pairs of genetic material? So how was the human genome sequence? That's what we will take a look at in this video. So the human genome consists of 10 to the power minus nine base pairs. And as you know, we have 23 chromosomes in our haploid set. So how many base pairs does each chromosome have? Well, different chromosomes have different sizes. So the number of base pairs in a chromosome ranges anywhere from 50,000 to 250,000 base pairs. Now the sequencing technique that we have available, that works only for around 500 base pairs. So what do we do then? Naturally, all these big chromosomes, we have to break them down into smaller pieces so they can be sequenced, right? So that's pretty much what they did in the human genome project. So what they did was they isolated genomic DNA from the human cell. And genomic DNA has all the chromosomes and everything, right? So they have to be broken down into smaller pieces, which is what they did. How do you do that? There are various enzymes available. And these enzymes can cut in at different places. And hence we can get smaller fragments. Now, once they got smaller fragments, what they did was they took each fragment and inserted it into a vector. Now, what do I mean by a vector? A vector is a circular DNA molecule, which can be cut at a certain place. And at that place, a fragment can be inserted. And then they joined the fragment with the vector. So each fragment was joined with a certain vector molecule. And then they put this vector with the fragment inside another cell, either an E cell or a bacterium. Wait, why are we doing all these things? What's the point? We just want to sequence the DNA, right? So when we obtain the genomic DNA from the human cell, it's in a very small quantity. And we can't do much sequencing with it. We need bigger quantities than what we obtain. We go through all these things in order to amplify that is obtain larger quantities. So how does that help? So when we put our fragment DNA of interest in one of these cells, yeast or bacterium, and then let them grow, then when the cells grow, the fragments grow along with it. And by growth, in this case, I don't mean the growth in size. By growth, I mean the growth in population. So you know, both yeast and bacterial cells are microscopic, right? So when you put them in a culture medium, a culture medium in a lab is a liquid which has a lot of nutrients. So these cells can multiply very fast. So both yeast and bacteria can multiply very well in a laboratory, in a culture medium. So when you put a DNA of interest in a yeast cell, let's say, or a bacterial cell, and let them grow or multiply, when they multiply in large numbers, along with the cells, the DNA will multiply too. So that's what we are trying to achieve here. We have put the fragment in this circular DNA molecule, and then we put the circular DNA molecule inside the yeast or bacterial cell, and then let these circular DNA molecules multiply. So why not put the fragment directly in the yeast or bacterial cell? Why go through all the trouble of inserting it first in the vector, the circular DNA molecule? Because the fragment is such a small DNA fragment and it's a foreign DNA to the yeast or the bacterial cell. So the moment the cell sees that fragment, that random fragment, all it'll do is it'll destroy it. But these vector molecules, they are DNA molecules which the cells recognize. So the molecules can be either bacterial artificial chromosome or yeast artificial chromosome. So these DNA molecules are specifically designed to be able to replicate inside the yeast or the bacterial cell. So whenever the cell sees these molecules, they immediately recognizes it, and then it allows it to multiply along with it. Okay, so we have put the insert in the vector and put the vector inside the yeast or the bacterial cell and allow the cell to multiply. So now we obtain large quantities of these fragments. And let's say we number them randomly. We number the fragments, one, two, three, four, assign them a sequence each. We don't know the actual sequence of the fragments like in the genomic DNA, where do they belong in the beginning, the middle or the end. We don't know, we just randomly assign the numbers. And all this is put together in different cell cultures. So one yeast or bacterial cell culture will have let's say fragment one, another one will have fragment two, another one will have fragment three and so on. This is called a library. A library has various cell cultures with the different DNA fragments. And once we have obtained the library, that means we have, we have enough amount of the DNA fragments. So the time is ripe now for sequencing. I'm not going to go into the process of sequencing in this video. I would recommend you to check out our video on DNA sequencing in the unit biotechnology principles and processes. Okay, once we obtain the sequences of the fragments, what do we do next? We still don't know whether fragment one came first in the genomic DNA or five came first or what we don't know the order of the fragments in the original DNA sequence. So how do we figure that out? So let's take an example here. Suppose this was the original genomic DNA sequences, you know that genomic DNA cannot be this shot. I'm just taking this example just to show you how sequences are ordered during this process. So let's say we have the sequence and then we take an enzyme which cuts it at this point. So when you cut this DNA sequence at this point, one of the sequences that you will obtain is this, right? Now let's say we cut the sequence over here and over here. So the sequence from this G and this A, which is one of the sequences that we'll get when we cut like this, right? So we'll get this one. Now let's take another type of cut. What if we cut over here? So one of the sequences that we get when we cut here is this. GCTT starts with here and then ends with TACT like this. Now this is assuming we already know the original sequence to begin with. Then we know that if we cut at certain points, these are some of the sequences that we get. But in the human genome project, they didn't originally, they didn't know what the original sequence was, right? So let me remove that. So now we don't know what the original sequence is. All we know is we have these three sequences and we have to order them. How do we order them? So if you noticed, we made some overlapping cuts. Okay, so what do I mean by overlapping cuts? Let me try to arrange these sequences then it'll become clear to you. So if you see here, this GCTG, it is common between this sequence and this sequence. This is what is called overlapping. So these four base pairs are overlapping between these two sequences. Similarly, GCTTA, these five bases, they are overlapping between these two sequences. If we select a lot of enzymes to cut the DNA, we will end up in having these kinds of overlapping cuts. Looking at the overlaps, we can decide which sequence comes after which. In practice, in reality, these overlaps will be far bigger than what you see over here. So it'll be much easier to figure out which sequence comes at what order. And so once you put together the sequence using the overlaps, this is what it looks like, which is the same as what we had in the beginning. So this is pretty much how they figured out the entire human genome sequence. But obviously, it's much more difficult than what it looks like because I took here a very simple example of a very, very short DNA sequence. But in reality, as you know, it was really, really long. So that's why it took so long to complete. And obviously, if a human being or some human beings were to sit and do this, it would take ages. That's why they needed to develop very good computer programs, which could handle this very quickly. And those computer programs were a part of the field, which we know as bioinformatics. They use such a simple and easy to understand technique to sequence such a big genome, the human genome. Isn't this ingenious?