 There are 3 billion bases in the haploid human genome. In 1990, the U.S. Department of Energy formally began mapping those DNA sequences for the unimaginatively named human genome project. The project was expected to take 15 years, at cost between $5 and $10 billion. A very early rough draft, the first round assembly, was announced as completed in 2000, but realistically the first nearly complete assembly was 2003. It had cost about $3 billion. A competing private effort, started by Craig Venner of Solarogenomics in 1998, promised to finish the project first in a small fraction of the time, and for a tenth of the cost, simply by using improved technologies and a more efficient workflow. So now we know the full sequence of the human genome, right? Well, not exactly. Actually about 8% of the sequences aren't collected, and probably won't be in the near future. These are the structural elements of DNA, like at the tips and middles of chromosomes. They consist of long repetitive elements, and wouldn't be very interesting to sequence. But the really interesting parts, the actual genes that make up the proteins that make up the cell, are completely sequenced. In fact, we've begun mapping the variations in those genes. We'll take a thousand people from around the world, and deep sequence each of them to give us representative variation mapping. There are going to be a lot of innovations to come out of this project, and a related project to HAPMAP, or haplotype mapping project. The primary goal, is to identify what gene variants are involved in different diseases. Ultimately, this could lead to improved diagnostics for cancer, diabetes, and birth defects. It will also greatly advance our knowledge about genetics and biology. Imagine the human genome project was a sort of Lewis and Clark exploration project, and what comes now is a series of survey teams that will produce elevation maps and identify useful resources. Or imagine that the human genome project was Apollo 11, and the current efforts are laying the groundwork for our first lunar colony. It was necessary to develop the tools on a pilot run, so that we can invest wisely in the right technologies for future more extensive undertakings. It's one of the most exciting times to be a molecular biologist since DNA was discovered. Some of you may be familiar with Moore's law. It states that the number of transistors that can be placed on an integrated circuit doubles approximately every two years. The power of logarithmic growth is really quite astounding. We've gone in 20 years from transistor counts in the thousands to counts in the billions. This is from incremental breakthroughs in technology, materials, and miniaturization. It's not a law really, it's just an uncanny observation. It's at least that rapid in the field of DNA sequencing. Just 12 years ago, I can remember pouring slab gels, giant glass plates filled with a polyacrylamide chemical jello material, then using electricity to separate out DNA sequencing reactions labeled with radioactive phosphorous isotopes. Prepping the runs took most of the day, then exposed the radioactive spots in the gel to a piece of x-ray film and develop it. Finally, a human sat at a computer and typed in the sequence, decoding it like a visual Moore's code. It took me three days to produce 200 base pairs of sequence. Remember that the Human Genome Project was 3 billion base pairs. It seemed to me at the time a monumental undertaking, like describing flying to the moon to the Wright Brothers. At the time the project started, I was using state-of-the-art technology. But new technologies were developed to support the effort. Radioisotopes were replaced with safer and cheaper fluorescent dyes. Big slabs of polyacrylamide gel were replaced with very thin tubes of the same material, which is why we call this capillary sequencing. Lasers read the fluorescent dyes as they move past the detector, rather than the tedious film exposure and development. The upstream processes of PCR, cycle sequencing, and chain termination were all vastly improved, and computer power allowed for automated reading, scoring, and assembly of sequences into their proper order. No more unpaid grad students manually decoding and typing sequences. I went to grad school at a facility that housed one of the Human Genome Project sites. It was a wonder to behold. There was automation at every level. Robots were on 24-hour shifts, picking bacteria off Petri dishes, transferring them to tubes, adding chemicals, making other important chemicals with complicated reactions. It was a symphony of noises, a ballet of science and engineering, and the energy was amazing. The capacity of one of these centers was about 10,000 bases per day. That's the technology that finished the Human Genome Project. High-tech sequencing in 24-7 automation, and a desperate race between private and public efforts. Brilliant scientists, innovators, and a public that was only slightly apathetic about the research goals. But we've moved on yet again. There are a series of commercial instruments described as next-generation sequencing technologies. They've pushed the boundaries of what is possible yet again. None of them existed when I finished grad school not so very long ago. One instrument, the 454GS, owned by Swiss, pharma, and biotech Hoffman La Roche, can produce one billion bases per day, at a fraction of the cost of older-fashioned capillary sequencing. The Human Genome is three billion bases, so we either need three instruments or three days to sequence the whole Human Genome. Total costs would be in the thousands of dollars, not billions. So let's stop to think about that. We've come in 20 years from 200 bases per day to one billion bases per day. That's more than a doubling in throughput every year. Remember that Moore's law predicts a doubling in transistors and by proxy computing power every two years. I've already used Concordance's law to advise you not to take health advice from the Internet. Therefore, this will need to be Concordance's second law. Sequencing throughput will double every year. It's already started the next stage. The company that makes the 454 next-gen sequencer have announced an improvement that nearly doubles the number of DNA bases that can be read per run. So Concordance's second law is already coming true. There are several competing technologies like the solid system or lumina beta rays, but the biggest innovations are yet to come. I don't want to bore you with the precise details of how the 454 system works, but a friend of mine shared a used chip with me, and I broke out my kid's digital microscope to take a look at it magnified so you can see the coolest parts. The 454 genome sequencer works by a technology called pyro sequencing, which is a specialized form of sequencing by synthesis. The scientist takes bits of DNA that have been stuck on tiny beads made of a kind of starch extracted from seaweed that's called sephrose. They put the beads into what looks like a microchip or solar panel about the size of a playing card. In the surface of this chip are very tiny grooves of a specific size that only allows a single bead in. Then additional beads are added that contain the components of the sequencing reaction. One type of DNA base or nucleotide is allowed to flow across the chip at a time. If that nucleotide is the correct pairing to the one from the sample, that particular hole lights up using an enzyme system taken from fireflies called luciferase. A CCD camera, not unlike a security camera, takes pictures of those flashes of light in sequence and decodes them to read out the sequences from all the slots in the plate. The overall workflow takes the better part of a week, and the time required to read the plate is around 10 hours. Roche conveniently sells a computer array for the intensive image processing and sequence assembly required. Compared to the current generation sequencing, which runs about $4 per 500 bases, the next-gen technologies are already much, much less expensive, assuming full runs and high efficiency reads. The applications for these technologies go far beyond simply re-sequencing the human genome. It's also possible to look at only those genes that are actually turned on being transcribed, and comparing the on-and-off gene switches may give more information about diseases like cancer. We can also use this technology to look at the full diversity of bacterial sequences in a sample. For example, we can sequence all the bacteria in the intestine of a human, which will allow us to learn a lot about what effect our flora have on our health status. I'm a big fan of new technologies for their own sake, but next-gen sequencing is one that's going to lead to a lot of very exciting discoveries, and keep us molecular biologists busy for at least another century. When people ask me why I'm always complaining about anti-science and pseudoscience, like creationism, AIDS denialism, and alternative medicine, the answer is that it cheapens what the real scientists are doing. We are accomplishing so much, moving the boundaries of knowledge further each year, and when I see people yearning for the wonders of the ancient Chinese secrets, or aspiring to be cavemen, or just complaining about evil corporate biotech, I feel very sad for them. I suspect that they fear new knowledge, and there has never been a time when our more knowledge was being produced than right now. Personally, I rejoice in our advances, our modern medicine, our technological progress, our advancing knowledge. In that knowledge, I see a bright future. Thanks for watching.