 In practice, this is not anything we're doing manually anymore. We're buying machines to do it. The largest company in the world selling that type of machine is called Illumina. This is one of their machines called Heisek X10. That's sexy. We have 10 of them at SciLifeLab. Just to give you a ballpark idea, you would need, they cost roughly one million dollars each. You have to buy 10 of them. Each of those machines are going to generate, in one run, taking roughly three days, they're going to generate 1.6 or 1.7 trillion base pairs. I'll say that again. 1.7 trillion base pairs. Compare that to the size of the human genome. 3 billion base pairs. Again, 3 billion, not the trillion. So we're talking about hundreds of humans' genomes worth of information. Now, the caveat is that we're going to need 30 or 50 times overlap to be able to say anything for certain. It takes three days to run that, but if you have 10 of these machines, the throughput we're getting is roughly equivalent to 20,000 human genomes every year. There are certainly sites that have much more than 10 of these. The problem, that's a great deal if you actually need to sequence 20,000 human genomes a year. It's cheap by some definition of cheap. The cost per human genome might be 1,000 dollars, which is amazing. But again, that assumes that you are willing to pay 20,000 times that per year. Still, this means that genome sequencing is more accessible than ever. I remember the days when we were talking about a cost of five or 10 cents per base pair. Today, it's completely ridiculous to talk about sequencing cost per base pairs. We're only talking about sequencing cost per human genome. This scale might not look as impressive as it is, but A, it's logarithmic. B, compare the green line how the sequencing cost has dropped with Moore's law indicating how fast transistors are growing. So the cost of sequencing is going down much faster than the performance of Q computers is going up, which poses pretty big challenges for how to deal with all this data. I would already say that $1,000 is so cheap that within a few years, if we don't know exactly what's wrong with us when we get into the emergency room, they're going to sequence us just in case. It's not as expensive as many other tests they do anyway. But that is the high end. There is also a low end of this. Oxford Nanopore, they have a device that it's literally like a decimeter long or so. You're connecting it to the USB port of your computer. Sure, you need a bit of sample material, and you need some reagents here. So it's per base pair, it's significantly more expensive than the luminous. But if you only want to sequence a few small genomes, say a COVID genome or something in your own lab, this is simple and accessible enough that we can do it. And they're not that expensive. We could certainly afford buy one if we wanted to. So I predict that this plot will keep going down another order of magnitude. This is so cheap that we could do it everywhere. And somewhere here, we start to change not the laws of physics, but the laws of science. Today, there is no other method anywhere in science that is generating as much information as quickly as genome sequencing. And I'm not limiting this to live science. I think this is valid for any field of science. And sure, the information is limiting. It's just letters, right? A, G, C, and T. But as you're collecting this much information, we're pretty much only limited by imagination what we can do for it with it. I'll show you one example that's not all part of this class, but just to show you crazy ideas we can use this for. And then I'll get back to the proteins in a second. So proteins, we've talked about protein structure, but there is actually a structure in DNA too. So this is just a schematic model, but your DNA, my DNA is somehow going to be curled up as strange ways in our cells. It's not completely random. But if I now want to start expressing a gene here in the cell, I'm going to need to find that start code on right and the Tata box needs to bind to the right place in the DNA. How do we find that? I'm going to need to uncurl this ball some way. Is this random or is there some sort of average systematic structure here? I would like to determine the structure of DNA, but I can't really crystallize it, because it's not that rigid and repeated. Well, what Jared Slippermaneiden and Eric Launder did about a decade ago is that they came up with a very smart method that briefly they're adding small chemical linkers so that if two parts of DNA are close to each other in space, we tie them up and then we create a small break around them. And then we use shotgun sequencing. And if I now sequence these small fragments around that particular break, I will now get a small sequence of letters here and a small sequence of letters here. And all I know that these had to be close to each other spatially, not in sequence, but in space. Now I can, of course, also map these letters if it's 25 or so to a particular position in a particular chromosome. And that means that I can now start to say what part of chromosome 14 is close to what other part of chromosome 14. So I'm literally using sequencing and sequence information to derive three-dimensional coordinate structure information of DNA. The difference here, though, is that you can get billions of points like this. So you're getting a huge amount of data, much, much, much faster than X-ray crystallography or cryo-EM. And based on that, they were able to derive fairly amazing results about globules in different shapes, whether the DNA is random or whether you have some sort of hierarchical structure. And this is beyond the class, but I need to give you the result, right? Their argument is that DNA is quite hierarchical and shown by the colors here. So that DNA, the structure of DNA is really designed in such a way to make it easy for me to pull out just one gene, decode that gene, and then let that gene relax back into the DNA. And that was a beautiful example on the cover of Nature about a decade ago from Leonard Mirny when he showed this in 2D, right? So that's the whole point is that you want to draw a curve so that things that are close in the space in the curve should also be reasonably close in sequence, but you can never do that perfectly. So remember, sequencing provides more information per cost and time than any other method in science.