 The problem with Sanger sequencing is that it's slow, and it's also inherently slow because as the size of these segments grow they become harder and harder to tell apart. You can probably imagine how difficult it's going to be to tell something that's 241,049, but there's 241,047 or 48, I mean, base pairs long. Not going to work, and it's particularly not going to be efficient and you will have more errors and everything. But originally the idea behind the whole human genome project that was a gigantic effort, that was exactly the use of this time of chain termination sequencing to determine the full genome. It should be a third year project. But then a mathematician Eugene Myers came up with an idea that what if you could just do a divide and conquer approach and just take that entire genome and cut it into much smaller pieces so that every piece was only one or 200 base pairs long. Those would be trivial to sequence. We could sequence all of them in parallel and do a way faster sequencing. The only problem though at some point we get the results. And now you're not going to have one result, but you're going to have millions of pieces. And that's an exceptionally complicated problem. They even tried to publish this, but they were rejected both by nature and science. And eventually I think they got into journal biotechnology or something. Don't trust me on that one. The only condition under which the journal would accept to publish this paper was that there was also a very critical commentary about the method that this was unlikely to work. Might work for 2,000 pairs or so, but for a human genome we would need in the ballpark of 50 million fragments. And there is no way this method would ever work in practice. The cool thing is that Gene Myers, it's a great name given what he's working on, impact he solved the problem. And it's far beyond the concept of this class to go through how this was solved. It's a very difficult problem. Not just solving it, but proving with certainty that we get the right sequence. And this is the basis of so-called shotgun sequencing. The way a shotgun fires a swarm. So with shotgun sequencing there are actually two things we do. If I'm sequencing a brand new genome, it's complicated because I don't know exactly how long the genome is going to be. So I take my genetic material, I run it through PCR, so I amplify it, I have lots of material. And then I use enzymes to cut it in small pieces. I determine the sequences of all these small pieces and I'm going to need many, many copies of the pieces that are overlapping, right? Because I'm not going to make, I can't control where the cuts happen. So I need to make sure that I have enough cuts that say, if I show with the pens here, they need to overlap each other so that I can stitch it back up. It's not going to be enough to have two or three things overlapping. But in practice, to have good certainty here, we might need something like 30, 40, or 50 times overlapping. Because with enough statistics here, I should be able to devise a mathematical model that predicts what the entire sequence here had to be. Remember, if there are repeating units here of length 100 or so, it's going to be exceptionally difficult. This is still very costly for a new genome, because literally for a new genome, I do not know what the result is going to be. And that requires a lot of computing and probabilistic modeling. But it's still possible. And once I've determined that once for a, say, human genome, it's much easier the second time. Because the second time, my genome is on average completely identical to yours. There are going to be, say, 0.1% or so that differs. And finding those 0.1% when I have all those small pieces, that's much easier as you see here on the right. Because now I have a template to place them all and just do some statistics. In practice, this is what virtually all sequencing lab use. Of course, we haven't completely given up on Sanger. The methods we typically use to sequence each part here is still based on the Sanger sequencing. But we don't use long reads anymore. The typical reads here would be in the ballpark of maybe 100 base pairs or so. And then we'll let computers figure everything else out. It's completely amazing that it works, but it does.