 The next talk is going to be about DNA. When I think of DNA, I think of Douglas Noel Adams, the author, who wrote The Hitchhiker's Guide to the Galaxy, a fabulous book which tells you why you should always know where your towel is. And Douglas Adams was also born in the same year as the DNA was discovered or the first papers were published on DNA. So right next to me we have Bert Hubert, who will talk about DNA and the code of life and why this is for us computer guys so interesting and so easy to understand. Please give a warm round of applause to Hubert. Thank you. Thank you very much everyone. This talk, I love that I'm able to share with you what I have learned over the past 15 years about DNA because it blew me away. But every time people explain this, it's biologists explain it to other biologists or chemists explaining it to biologists. And it turns out if you want to know about DNA, it helps a lot if you think like a computer girl or boy. And this presentation, I hope to do that, it is jam-packed this presentation. There is no way it is going to fit in one hour. The good news is, tomorrow at six, there is an overflow workshop called DNA The Other Greatest Hits in the Pi Room. So let's say we don't make it to the end of this presentation. You can come back tomorrow in the Pi tent and that will continue with the missing bits. And then nothing worked. Welcome. Let's see, what can we do? Yeah, now it works. So this presentation is not controversial. So I'm trying to present DNA from the eye of a computer programmer, of a computer person. With the exception of one slide, everything in this presentation is fully conventional. So there's nothing strange here. And to prove that, I have brought along the books. So if you have any questions, you can later on come by and look it all up in the serious books. And I will clearly mark the one slide that is controversial. So what is my background? You need to know a little bit. I founded Power DNS, a DNS company in 1999. And that didn't go so well. It was not super successful at the time. And then I got into DNA, which I really liked. And that led to a web page called Amazing DNA. DNA as seen through the eyes of computer programmers. And that page is now 15 years old. And over the years, it got a lot of comments from real serious DNA biologists and institutes. And they corrected small things. At one point, I got a really detailed correction. And I thought, what does this guy know? And it turned out he invented the piece that I had described incorrectly. So over the years, this became quite a good site. And at one point, some venture capitalists saw that site and wanted to start a company with me and that didn't work. And that guy was disappointed. So he said, well, what else do you do, Bert? He said, well, Power DNS, I explained Power DNS. And he said, look, that's all wrong. You've made an expensive piece of software for people with no money. That was the end of Power DNS, 1.0. And then we made it open source. So in a sense, this DNA page is responsible for the fact that we now have an open source name server. Then I did a lot of other things. And in 2011, I had to leave the field of security for a bit because of a non-compete. And I was also fed up with security. So I spent 18 months doing DNA research at TU Delft. And that was tremendous fun. And that has led to this presentation. Oh, it also led to this rather lovely presentation in a serious journal. So I can somewhat call myself a scientist now, someone. Not quite, but it says nature on top. It's nice. So what's the goal of the presentation? I want to teach you some basic DNA literacy. I want to convince you that DNA and biology and the cell together are a programming environment, which is a rather big claim. And to make sure I'm not fooling you, are there any actual real biologists in the room? Yes. You all have permission to stand up and shout. The first moment you think this guy is making it up. He's full of shit. You have my permission. Please interrupt immediately. I have tried to keep this fully correct, but I would love to hear from you if I got it wrong. But please don't shout. Use the microphones. Yeah. Please do that. And while we are at it, as I'm convincing you that the cell and DNA are a programming environment, I also want to explain life. It's this modest goal and blow your mind, which is another modest goal. So DNA, millions or billions of things called nucleotides or bases. And these are four chemicals, which we call A, C, G and T. If you want to remember that, there's a nice sort of anecdote which is called D genetic code arbitrary, T, G, C, A. And these billions or millions of nucleotides are organized in chromosomes and genes. And what continues to blow me away for every living thing on the planet, even semi-living things, they all use DNA and it's atom for atom compatible with each other. So you can take some bacterial DNA, put it in a human being, it will work. It might even kill you. It works that well. And it's atom for atom compatible. Four billion years. I mean, I cannot load a Python script that was written five years ago. And it continues to blow me away. Some fun statistics. It's 0.33 nanometer per nucleotide. So that sounds pretty small. If you realize that we are talking about billions of nucleotides, that means that our genome is meters long. Yet it's packed in a cell of micrometers. That's some pretty tight curling going on there. I have a whole book about that here. 40 atoms per nucleotide. And I will get a little bit more what a nucleotide is. Interestingly enough, 40 is also the number of electrons that are needed to store a bit in flash memory. I'm not sure if there's something fundamental going on there, but maybe you just need 40 things. The weight, one picogram per giga nucleotide. So before people started seeing DNA as a computer programming thing, they couldn't read it, but they could weigh it. So they would group animals by the weight of their DNA. And it turns out that it's almost exactly one picogram per billion nucleotides. And also it's super redundant. It has to be because it has to carry all the genetic information through the ages. So what does it look like? We have these four molecules, thymine, adenine, cytosine, guanine. And when you hear biologists talk about DNA, they are very busy with these molecules. And they're right, of course. It is molecules, but we should not let all this chemistry convince us or detract us from the fact that this is real digital information. Important to note. T, A, C, G. T, A, C, G. Every T nucleotide really loves an A nucleotide. So when a T and A nucleotide see each other, they go like, boom, and they stick. But the C and G nucleotides love each other even more. So they're like, boom. That means that when the DNA gets encoded, you cannot just write a whole bunch of stuff that is all Cs and Gs because that binds so closely together that you would never open up the DNA again. And that actually also happens when you store digital information. So here is my bold claim, A, C, G, T. Every nucleotide is two bits because there are four of them, so you encode them in two bits. Is this cheating? Is this strange? Is it strange to say that these molecules are bits? Because our computers, they operate in a pure plane of mathematics where they manipulate numbers and... No, they don't. Our computers actually manipulate modulated voltages and they call it a one. Or they store a bunch of electrons and they call it a zero. Or they have a North-South transition on a hard disk and we call that a one or a zero. So it is pretty functional to just think of these things in terms of bits because if I write a file, if I write a presentation and someone asks me what are you doing, I'm saying, well, I'm writing a presentation. I'm not saying I'm manipulating giga-electron pairs. It is just information. And there's also an interesting analog there if you are a hard disk and you would say I want to store a whole bunch of ones all together and I would all store them as North poles. Eventually my entire hard disk becomes a super-duper North pole and doesn't rotate anymore. And that's a little bit like the C and G story that the DNA is not quite free to just put the stuff in there that it wants because eventually the DNA would not open up anymore or it would open up too easily. So it is fair to say that these ACG and T are just bits. They are as much bits as the bits in your RAM or in your flash drive or in your SSD are. So this is the sort of the official photograph of the DNA molecule and it's super lovely. And so it has 10 base pairs per 3.4 nanometers and it twists like this and super pretty. And but this is a little bit like the sort of what the McDonald's menu looks like and then you get the actual burger and it doesn't look like that. So this is what DNA actually looks like. This is an atomic force microscope where people have made pictures of really the individual atoms and after sufficient number crunching down here below you can actually see the DNA helix. So I have to admit that this is prettier but we can actually also see it and then it looks like this. This is an important slide because at the somewhere in the middle of the presentation is something that is super complicated and I already need to prepare your minds a little bit for it. So remember this DNA is has this helix it twists. The information on DNA is on both sides of the helix and what you see here is that whenever there is a T on one side you will find an A on the other side. When you see an A on one side you will find a T on the other side. When you find a C on one side there's a G on the other side. This is the first part of the redundancy but the key thing to remember already is that the DNA it can be read in two directions and one goes one side is read like this and the other side is read like that. So the symmetry is not just a mirror, one side of DNA, one side of the helix you read like this and the other one you read like that. That's similar in your minds a bit already. So redundancy. Like I said an A is a T, C is a G. DNA has 40 atoms per nucleotide. That's not a lot, one stray cosmic ray, one stray UV photon and it gets scrambled and if your code gets scrambled then stuff goes wrong in your computer. It just as well goes wrong in your cell. So let's say a stray photon hits us and kills the C over here. Now the DNA is a little bit broken. It turns out that because there is a G on the other side the repair mechanisms in our DNA they know okay that one is broken but there's a G on the other side so I have to put back a C and that works really well. And even when two of them are broken still no problem. It is able to fix that from the other side of the helix. So the helix is in itself a rate one mirror where the copy is on the other side and it works pretty well unless of course this happens. Now we have a problem because there is no immediate backup anymore. What do we do? We don't want to die. And so there is yet more repair mechanisms. Our chromosomes for us as human beings are also twice there. So there is rate one on top of rate one. Every piece of DNA, almost every piece of DNA has an identical copy. So first the DNA helix itself has redundancy and then we have a whole spare helix. And it turns out it is smart enough that when the DNA gets killed like this it is able to find a copy of the broken DNA on the other chromosome and copy it back in which is nothing short of amazing which is sort of the equivalent that you have a rate one system you accidentally kill both halves of one file and then it says well I happen to know that you have another copy of that file over there so I'll just zoom that in for you. This is utterly amazing and how well it works can be seen because not all chromosomes have this duplication feature and for the most of the men among us will not have two copies of the X chromosome. Most of us have one copy of the X chromosome. If something breaks on the X chromosome most women have two X chromosomes. So they are fine. If something breaks on the X chromosome they have another one. I got a copy from there. Not so lucky for me and that means that most male people have 16 times more color blindness than female people because when the X chromosome breaks we're toast. I tried to look up this hemophilia which is when your blood doesn't thicken and I couldn't even find if that even what the ratio is how much that happens between men and women because it's at least a hundred times more with men than with women. So this backup system this duplicate rate one thing it works really well. Now nothing in nature is so stupid. Nature is quite clever. So you would think how could it be this stupid at least for color blindness for example we know that it's not all bad. People that are colorblind have far better night vision which is why stuff like this tends to stay around. I have not yet found an advantage for hemophilia but maybe later. This is a nice table at this compares various storage storage mechanisms hard disks flash drives RAM DNA and DNA can store hundreds of megabytes. So it's quite nice and it can store it for ages and ages and ages and but it is super slow 15 bytes per second is not fast. You really have to wait for that. Of course what do you do when one worker is slow you spawn a thousand more workers. That means that many cells can read DNA at a thousand places at the same time. A hard disk can do it in one place so it's quite boring. Also you can make bulk copies of DNA so copying DNA in a bacterium for example takes 20 minutes all of the DNA and that happens at 250 bytes per second but the last row of this table is the most impressive one that is the power consumption. One picawatt. So when you thought your micro watt chip was impressive this thing goes six magnitudes better than that. So the whole DNA system in fact the whole cell runs on 0.5 picawatts. This is true for bacteria human cells use way more but still it's quite effective. What does it look like? This is the smallest piece of living DNA that I could find. This is the hepatitis B virus and it will thoroughly ruin your holiday. The circle in the middle is the actual DNA. We'll explain later how this works but once that little circle is in your cell you will likely get ill. Because it's only 800 bytes I was able to put the entire virus on this slide. We can actually go out to a DNA store give them this bit and say can you make this for me and we can turn that into a living virus again it will make you sick. This is not recommended. Every time someone tries this we get really scared because people have done this with a lot of harmless viruses they have done it with bacterial viruses which make bacteria sick which is nice. And then someone said let's see if I can order the polio virus and he did and it worked. So these 800 bytes will make you quite ill but it's okay to look at it. It's okay. It doesn't jump through your brain. This is the Ebola virus which is super lethal. It's 5 kilobytes. It will super duper kill you. So it's fun that DNA is compatible across all domains of life but it's compatible enough that it will kill you. Also it looks rather nasty. This is a bacteria. This is an E. coli. Actually bacteria they're not quite as... So if someone says this is a tiger and this is a bear and we're quite sure what the tiger and the bear is bacteria are far more fluid than that. So if someone says this is an E. coli then that basically says this is like an animal. It doesn't mean that much. But we saw we have friendly E. coli's we have unfriendly E. coli's but this is sort of the bulk standard bacteria that everyone likes to study. The dark secret is that 99% of bacteria do not work in a laboratory. So we study the 1%. We have no idea about the other 99%. So everything you hear is based on 1% of all bacteria. This is the one we know best more about it later but this whole thing which is a very smart bacterium runs on 750 kilobytes of code. We know that because if you put the DNA of one bacterium in another one it starts behaving like the old one. We really know this is the whole thing. This is fun. This is our favorite fish of course the open BSD fish and it has at 100 megabytes the smallest and tightest genome of all animals and plants. This is one efficient genome. I'm not sure if it's a coincidence but open BSD sure picked the most efficient fish they could find. So well done people. Here's us. We come in at 750 megabytes of DNA. I have downloaded updates for office that have been far, far bigger than that. And to this day it humbles me to realize that apparently you can build a whole human being out of 750 megabytes. But the news gets worse. If we try to look what the 750 megabytes does for most of it we have a very hard time figuring out what it does and why it's there. And a lot of people think that at least half of it is just all crap. That continues to be copied. There are people that say that 97% is crap. And eventually of only 20 megabytes actually I checked this morning it's now 17 megabytes. Of 17 megabytes we know what it does and that's actually important. I just lose my mind thinking that you could build a whole fish or a whole human being out of 17 megabytes. So we're still working on that. But even 750 megabytes is ridiculous. It fits in a CD-ROM. This is nice. This is a plant. For unknown reasons this plant has 50 times more DNA than you do. And it's a lovely plant. I mean it's cool. It's an ordinary plant but we will for now never be able to read its DNA because it would cost like a million dollars to assemble all that because it's so much. So the only reason we know that it has 37.5 gigabytes of DNA is because we can weigh it. It comes in at many picograms. On the right is the biggest animal genome. This is the marbled lungfish which is a very pretty fish. It is rumored to be actually that this fish is actually two animals. So in the larva stage it's one animal and then it switches to the other one. So it might actually have a whole backup genome. So the size of life between braces are the things that are actually not quite life by themselves like the two viruses I mentioned. Typical bacterium 750 kilobytes. Then you go factor thousand up. So between most animals and plants and fungi and bacteria there is a factor of thousands difference. So the bacterium that might make you really ill has a thousand times less code than you do. Bacteria are the original gangsters of code. So if you read the genome of a bacterium 97% is necessary. So there is no spare capacity in there. So these they're like the demo the people that can make a thousand by JavaScript demo that's a 3D flight animation. That's how stupendously powerful bacteria are. And meanwhile with our 750 megabytes DNA we're not able to make a computer that does not indent 33 gigabytes wrong. Maybe one day. So this is called the central dogma of DNA of the genome. Long term storage DNA can be seen as a hard disk. So and oh by the way every line is not true. So we love to think this is all true but everything I say has an exception. So welcome to four billion year old software. Every exception has an exception. So long term storage DNA DNA just sits there. It doesn't but it mostly just sits there. If it wants to do something much like code does not execute straight from your flash drive because it has to be copied into RAM first and then it will do something. DNA has to be converted into RNA and RNA is a lot like DNA except not quite. But it can be converted one to one. You can convert from DNA to RNA and back and RNA will do stuff. The RNA can be converted into proteins and the proteins are the things that do stuff and sense stuff. So you can have a protein that swims. You can have a protein that senses sugar. You can have a protein that kills you like in the puffer fish. It's very lethal fish. By the way in puffer fish the open BSD fish turns out it doesn't make its own poison because it's very poisonous fish. It has recruited a bacterium to do that for it. Nice. You see that a lot. So DNA long term storage RNA active stuff that does something RNA converts into proteins and the proteins are the things that do things sort of repeat RNA is usually shorter parts so DNA is like millions of base pairs long. RNA can be far shorter can be super short can also be 100 bases long copied from the DNA and the RNA can travel through the cell and do things. The DNA is a firewall around it. No one is supposed to touch it. I'll get around how that works later. The RNA just goes out there and swims and the RNA is a 3d printing instruction for proteins. There is no other way to describe it. There is the ultimate 3d printer in every cell and I will show you a stupid movie of that and these instructions how to build the protein are a stream of amino acids. Amino acids are the Lego blocks with which you build proteins. So I realize this is a lot to memorize but I will repeat it a lot. These are the 21 amino acids. These are the 21 Lego blocks out of which all proteins are built and most of life. These two are almost universal across life. This is the one area where it breaks down a little bit. Bacteria have a slightly different table that connects this but otherwise all plants, all people are made from these 21 building blocks. And they have letters. So we have one called R and we have one called H and we have one called K. So that's how we describe them. And it's a nice thing. It's a friendly thing from nature that are only 21 proteins because then we can stuff it in the alphabet. It would suck it. It would be like 36. So this is the table. And this again continues to blow me away. This is the conversion table where we take three letters of DNA and look it up in this table and then we know, okay, this turns into this amino acid. And this table is almost again universal across life. So we can, for example, look up the table. What would happen if you have GCC? I love it that there's a codon called GCC. There are three DNA letters, the G, the C, and the C. And we can read that turns into alanine. This is the conversion table. This is how human beings would look at the conversion table. This is how it looks in computer code. And this is fundamental. This is like the Rosetta stone of nature. This converts three DNA letters into a amino acid. And this is universal for life. You could write this on the wall. It will be true forever. It won't ever change. Yeah, this is the key to understanding life. So proteins are the things that do and sense. DNA converts to RNA. RNA converts to a string of amino acids that form proteins. Proteins are the things that do things. There is a bootstrapping issue here. If you have a 3D printer, the first question everyone asks, at least I would ask, is, I'm not taking your 3D printer seriously until it can print itself. And the current 3D printers are a joke in that respect. They will never print themselves. It's not going to happen. But even if they could, the problem is, where did you get the first 3D printer to print the second one? And I told you there's an exception to every rule. And the central dogma that DNA converts to RNA, converts to proteins, turns out that RNA code, the actual code, also functions as actual objects. So you can take the code, and the code itself will come alive. So you can take a piece of RNA letters, put it in there, and it will start doing things. And one of the things it will start doing is build a 3D printer straight out of RNA. It looks like this. And yeah, I'm always, when I see this thing, I'm like, that's impressive. What you see here is the bluish bits. So I was lying a little bit. This machine is made out of RNA, and it builds proteins. It can function without proteins, just not very well. But this is like a 3D printer that can bootstrap a really good 3D printer. And the mysterious thing is, because this thing works on pure RNA, there are people that quite reasonably assume that when life started on Earth, maybe there were no proteins. Maybe there was only RNA. Because we see that at the core of life is a thing that works almost without proteins. Next up is a movie that is so awesome that my computer has trouble playing it. Let's see if it wants to do it. Ah, it plays. This is so glorious. So what you see here, this bar, this tape that flows through it, that's the RNA. That's the actual building instructions. The things you see flying in and out, they bring in the right amino acids. And they add the amino acids to the little string you see growing on top. So the code is coming in. The amino acids are attracted to the code and out comes the protein on top. And this protein is then collected and put in all kinds of other things. This machine is operating in all of us right now in trillions of times. Every cell in our bodies is doing this a thousand times. We have trillions of cells. There are hundreds of you. I find this still, frankly, unbelievable that a machine, the ultimate 3D printer, is just doing this even as I present or get drunk or whatever. They just continue working. So thank you, ribosome. See if I can actually get back now. Hmm. Otherwise, you will just have to continue looking at this. Yeah? Please don't. It was sort of a fun joke, but not anymore. Yeah. So let's see. What can we do? There is hope, people. There is hope. Yeah. Yeah, there we go. Thank you. So this is, again, a representation of what you just saw. The tape comes in. It has the RNA on it. The amino acids get attracted. Like I said, A attracts to T. RNA is a little bit different. They have a T, but they call it a U. It's too confusing. But the right amino acids get drawn in and they build the protein. So it's amazing. So what does it build? It builds this glorious thing. This is the bacterial flagellar motor. This is the engine by which bacteria swim. It is super important for bacteria to swim because the more they swim, the more dangerous they are. If you see this wonderful device on the left and it looks people 3D printed and then it looks really shady if you build one in real life. But every part of the flagellar motor you see, we know exactly what DNA it comes from. So we can make micrographs with electron microscopes of this. We see this thing and then when we look at it and we look at the DNA, we actually find the DNA for the fly G and for the mod A and mod B. We see all those components in the DNA and this glorious machine comes out and it swims. But we'll get back to more later what it all does. But this is a very powerful 3D printer. Actual genomes on your computer. What does it look like? This is a DNA sequencing machine. I worked with this machine. It runs Windows XP, by the way. As you would expect, of course. On the left, you enter the DNA after your lab assistant has worked a lot on it. This is a part that people forget. You cannot just go in there and put your finger in there and it will sequence your finger. It doesn't quite work like that. So on the left, the material goes in and on the right, there's a USB exit and that's where the DNA comes out. The vendor of this device comes from California. So they are very cloud-minded. So when you buy this machine, it immediately uploads everything you do with it to California. It's a stupendously bad idea. But that's how it works these days, apparently. This is what the genomes look like. The chromosomes look like under a microscope if you've done your work well and this is what they look like on your computer. You can download them. And then you get these files. And they look like this. And you might wonder, how did these chromosomes ever get their numbers? Why do we know that chromosome number seven is chromosome number seven? Is there a small number printed on the site? There's not. So what they did is they could weigh these chromosomes. So they ordered them by size. So the biggest one is chromosome one. The next biggest one is two. And then when you finally die... Okay? Everyone okay? I assume everyone is okay. So when they actually started sequencing it, we found out that actually the genome 11 is actually slightly better than genome 10. So okay, this just shows that it's real, real life. Down below is my favorite one, Homo sapiens mitochondrion. That's a little bacterium that lives inside us and that powers our energy. We took a bacterium and it actually does the most important thing for us. And it has its own tiny special DNA which is slightly incompatible with our DNA. It's like real code. Legacy. You can download it here. And actually they do patch releases of it. So this is release 38. We're up to patch level 11 and they keep finding stuff that was wrong. It is now estimated that we have only understood 93%, have read 93% of the human genome. But they keep on releasing more releases and you could just download it. So it's a lot of fun. And when you download it, a lot of stuff comes out. So you would expect the chromosomes to come out and then you get 71 more files. These are bits of DNA where we do not know where they go. They are part of us but where they fit. It's a little bit like when you build an IKEA cabinet and at the end you have some screws left. So we don't know. So you can look at this and see where it fits. One day we'll figure it out. So genes. Originally we thought we human beings, we are superior to everything. We should have the most genes. So they could already find for some animals how many genes they had and based on how much more stupendous we are we thought human beings will have a million genes. And then with the first measurements we're done with more like 100,000 and then it was more like 50,000 and 60,000, 30,000. It's now more less than 20,000. Which is a problem because the humble potato does quite a bit more. So instead of people saying maybe we're not as stupendous as we thought and this potato is actually a lot of special they redefined us that the gene is actually not that important. Size doesn't matter. So now you have this very strange definition of a gene which basically says a gene is a gene. Okay, live with it. You can explain a little bit. The original idea that they had is one gene is one protein is one function. You can still hear that in the way that we talk. We will hear people say this is the gene for X. And sometimes that is true. Sometimes there is really a gene for X but this is a 4 billion year old software project. Not one thing has one function. Everything has four. So the gene as a unit is technically very important because technically we can say this is the gene this is where it begins, this is where it ends. It's nice, but it's more like a .O file when you link a software project because it's not very relevant if your software project consists of a thousand .O files or 50, but we can still technically see it. So every time you hear someone say he has the gene for X you go like, no, you don't. It's not how it works. This is the bit where we say, so this is the pie, human genome. The red parts are the parts that we actually see really doing something. So that's a 1.5% of the genome ends up in actual proteins. That is not to say that the rest is not important because the actual biologist will stand up and kill me when I use the word junk DNA. See, some of them fret already. But we don't really know what this stuff is. So lines and signs and these are long interfering patterns, short interfering patterns. And then we have miscellaneous and more miscellaneous. So there is quite a bit of work to do here to understand what this all does. But please remember I am not in any way claiming that there is only 1.5% of the genome that does everything because that's not true. The rest does have a function but you can really have a lot of discussion about that. So I started my story with, I will convince you that it's a computer language. That it's a computing environment which is quite a claim so I have to show actually that it works. This is a bit of DNA and almost all of us, 98% in the room will have this piece of DNA, maybe 99.9%. This is the piece of DNA that encodes the insulin genome of the insulin protein. And when you convert all these letters to amino acids and you remember the amino acids are the Lego blocks which build the protein and they all have a digit and a letter to it. It looks like this. And actually it starts sort of with malware. It's very, very strange. I didn't make it up, you can look it up. This is the insulin protein. Insulin regulates many things, one of which is the sugar level, the glucose level in our blood. And if you don't have this functioning protein then you are diabetic. So this is the signal. This is the signal that says please, the rest of the body, take glucose out of the bloodstream and by all means do not put more glucose in the bloodstream. So it does already two things. It actually does 17 things but this is the small part of it. So this is the description in text of the protein and then when you measure it, it actually turns out to look like this. These people are very good with rotating and then pretty things. It's very nice. This is the insulin protein and this gets released into the bloodstream as a signal to the rest of our body and stop messing with the glucose. Where does this protein end up? It goes to this one, which is bigger and prettier. That's why it doesn't rotate. This is the insulin receptor. This is where the signal goes. This is the other half of this one. This is the signal. This is the socket. And this socket sits on all your cells and all the cells can listen to the insulin signal and start to use more glucose or not produce more glucose. This is the actual length of the protein. So it's a bit bigger and this is the actual DNA code that goes along with it. Now, this completes the chain. This is how we know and how have we found this out by millions and millions of hours in the lab? People have worked stupendously hard to figure out the shape of this thing which amino acids it consists of, where it is in the genome. Then people have done the same thing for this one. And wouldn't it be great if we could take a computer and say, look, this is the receiving end of the DNA. This is the socket. This is the signal. Could you please, dear computer, calculate how this DNA fits on this DNA? It's a very clear-cut problem. We cannot do it in any way. We can do it for 51 amino acids and then everything explodes. We have no supercomputers powerful enough to do this. Send help, please. If you are looking for a career change, this is one of the biggest mysteries, the biggest unsolved issues that we have. We cannot take a piece of DNA and another piece of DNA and calculate that they will interact. We cannot do it. So if you want to do a miracle, this is your call to do it. This is an important one. How is DNA addressed? So it's millions or billions of letters long. How does one piece know where the other is? The DNA people, they were biologists. So when they first read out DNA, they said, where does the DNA start? It starts at position one, yeah, assholes. Content starts at zero, of course, but they messed it up. It starts at one, which means that all of the DNA computing code in the world is full of minus one. Just to make it fit. Maybe they were Pascal programmers or whatever. I don't know. Internally, DNA does not use these addresses like we do them in computers because DNA is so fluid that if you would say, okay, now you need to jump to position 1.2 billion, 355 million, by the time you do that, it's already changed. So, hello? 750 megabytes of DNA and you cannot get it working computer. So I will now have to talk more so I can make up for the missing bit. What actual DNA does, DNA gets attracted to proteins. Proteins get attracted to DNA. So a piece of DNA does not have to know, sorry, a piece of protein that wants to do something, does not have to know where in the genome something is. It just has to be. It has a shape, the protein, and that shape connects to a part of the DNA. Routers do this as well. They call this content addressable memory, ternary content addressable memory. This is the same thing. This is where you say, I have an IP address. I want to know what to do with it. And there's a piece of memory that says, okay, I am keyed on IP address. I know that stuff. So when DNA talks to each other, it all has to do with shapes. And that is for my next part. Boolean logic in DNA and how it all hangs together because it's all nice. You can make proteins. They can do things. How does this lead to a working animal? It does something. This is E. coli. You can buy these things, by the way. There's a site where you can buy all these lovely, I have one here. This is actually a bacterial virus. They sell plushies. They sell even plushies of typhus and other stuff that you might not want one. But this is E. coli. It needs 0.5 picowatt and it runs on glucose, which we know as sugar. If the glucose runs out, it can also run on lactose. But then it must convert the lactose into glucose, which is work. Because it only has 0.5 picowatts to live on, it is not going to go around having a whole lactose converter machine if it doesn't need one. So what it does, as long as glucose is present, the bacterium will live on glucose. If there is no glucose, but there is lactose, only then it will start the lactose circuits. If there is also no glucose, we do nothing and wait, and hope to not die. This is the algorithm that operates with almost all bacteria. And this bit of code, I will now show you how that lives in DNA. First, some theory. When the glucose runs out, parts of the bacteria release a protein called C-amp. That's the hunger signal. Sometimes you may feel like that. Bacteria go hungry as well. Releases the C-amp signal and that binds to the cap protein. That means that the cap protein gets a new shape and suddenly that new shape, like the content-addressable memory, wants to fit on the DNA. That's one part. That's the positive part. The cell says, I'm hungry, feed me. That's the first part of the if statement. The second, the final part there, is the so-called Lac-L protein. And that can bind to two things. It can bind to lactase or it can bind to DNA. It likes more to bind to the lactose. But if there's no lactose, then it will bind to DNA. It has a choice. But these are the elements where this starts. And this is what it actually looks like in the genome of a bacterium. On the left. So you have to realize. Remember this ribosome engine that gets fed with RNA? If we want to build the machinery that converts lactose into glucose, we must first make the RNA that builds that machinery. There is this thing going around that loves to convert DNA into RNA. But, of course, it should not be going around and convert the whole genome into RNA because then everything would happen at once and it would be bad. So this RNA conversion thing needs a promote. It needs to get a hint that says get the copying going on. We want you. That is the cup protein that we mentioned. So the moment the cup protein has received the hunger signal, it raises the flag on the DNA that says we want to be transcribed into RNA. Please send in the transcribers. But that's not enough because we said we do not want to start all this machinery if there is no lactose to convert because then we would get the whole wonderful ribosome machine going and it would be ready to convert lactose and there is no lactose. And then the cell would die. So there's a second thing called the repressor. And the repressor is this LocL protein. And remember, the LocL protein has a choice. It can bind to lactose, which it loves to do, or it can bind to DNA, which it loves a little bit less. And the end result of this is that if two conditions are met, the cap protein binds and promotes and says send in the RNA transcribers and the repressor has to have moved away. It has to gone off the DNA and onto the lactose. And when those two conditions are met, we get this if statement. If glucose present, just use glucose. If there is no glucose present, but there is lactose present, then it will start like Z, like Y and like A. Those are three genes. What they do, one of them converts lactose into glucose. One of them attempts to suck in more lactose from the outside world, and one does cleanup. It's nice that they're together. So the way to visualize this is the RNA reading machine needs to get attracted to the DNA and then it sits here. Then it needs to go over the DNA to turn it into RNA, but it can only do that if the repressor is gone. And the repressor is gone if there is lactose. And in this way, this is full Boolean logic. And with these tricks, you can build any kind of if statement you want. You can build or statements. You can make XOR statements. You just have to align the repressors and the promoters right. And this is fundamental. This is the one we understand best. Most parts of the genome, we understand far less well than this. So okay, this shows you how a bacterium regulates its glucose levels. It's a lot of fun. Now we'll find how to do something more complicated with DNA. This is a nature solution for finding food. We have the lovely E. coli again and it's hungry. It's super hungry. And it needs to swim to the sugar. To swim to the sugar, it needs to know where the sugar is. Remember, it has 0.5 picawatts to operate its machinery with. So the human way, how we would do it, we would say, well, we built this glucose scanner and it will go woon, woon, woon, scan for glucose and we will, in three dimensions, we will rotate and see when we go like this and this and this where the maximum glucose smell is and then we would go, we do a ton of math on the numbers just to get there and then okay, we will find the direction and go there. This is the sort of human way in which we would say, go find the sugar. And nature solves problems rather differently because nature does not have the luxury of, because this would, by all means, it would use whole, whole what seconds of power to operate this rotating machinery and stuff and the cell would also have to know where it was. So what do bacteria do? Bacteria are a lot smarter than this. Remember this swimming motor, the flagellar motor that does the swimming. It's actually a super stupid and super smart motor. It can run in two directions. It can run to the right with what they call counterclockwise here and when it runs counterclockwise, then it works really well. Then the bacterium can swim. When you rotate the direction of the engine, it blows up, it goes wrong, doesn't work. So you could see that as a bug, but it's a feature. It's also a bug, by the way, but so what it does, it rotates the wrong way and the whole cell starts tumbling. So counterclockwise, the bacteria goes straight ahead. Counterclockwise, it goes in all directions. How does it look like? Basically, the engine, the flagellar motor, switches from one direction to the other direction every few seconds, I think, which means that it swims a bit, then it tumbles, swims a bit in another direction, tumbles, swims a bit in another direction, and it doesn't really get anywhere. Of course, eventually, this random walk will scan the whole environment. So actually, eventually, the bacterium would find the sugar, just doing a random walk. So it's not the worst algorithm ever invented and probably it started out just like this. Probably this was all there was to it originally if there is no food, swim around for a bit until you find food. Might work. Now nature comes in. Boom. Has a few sensors. These sensors can smell oxygen and all kinds of sugars and important stuff. And when they sense a rising concentration of these good things or bad things, depends on how it are, it releases proteins that influence the direction of the running. So as long as they smell that life is getting better, they will attempt to leave the motor running the way it was running. The moment they see the smell getting weaker, the motor will say, okay, okay, apparently this was not a good idea and we will tumble a bit and start up again. And what does it look like? So remember, this was the fully random walk. This is the modified random walk. And the only thing it does, the segments, as long as the segment is going in the right direction, it will swim a bit longer. So this is a weighted random walk that by extending the straight swimming as long as it's going in the right direction, it will effectively find the food or the oxygen or the other stuff. So this may sound like a rather stupid simple-minded algorithm. It turns out that bacteria that can do this trick and most cannot, but the bacteria that can do this trick are especially lethal to us. So even this simple algorithm tumble or swim and modulate that a little bit, it's very impressive. And we know quite well how this works. It's a stupid code reuse. Remember, DNA has directions. So if you go like this or you go like this, it's identical. But you cannot, if you go like this, it will change. So it's important to know that something, a protein that will work like this, will also work like this. But it will never work like this. There's a direction because of these shapes. So if you want to copy DNA, you can drag it, you can tear it apart. And like I said, an A will always attract a T, a C will always attract a G. And so to copy DNA, it's sufficiently to begin with to just tear it up. And then it will copy itself more or less because the A loves the T and the C loves the G. So here we see how that is going. The DNA is going up. It has been split up a bit. And it has attracted new A's and new C's and new G's. That's nice. But remember, we first tore open the DNA. We also have to close it up again. Someone has to zip it up. And that looks like this after the copying. So you still have to imagine the DNA is going up. And we now have a new left copy and a new right copy. And it's all going up. And now we have to zip it tight. And that over there is a protein that will zip it up for us. And this is my animation skills. They're quite weak. Boom. And now it has this line. It's zipped up again. Now remember that the same protein, when it operates on the other side, it now has to go up. Because it's on the other side. Boom. It's very important to realize this one came from the top and that one comes up from the bottom. But there is a big problem with that. This one goes down, that one goes up. But remember, we were copying this DNA and both of these strands were moving up. So you can see that the one on the left is right because it just sits there and everything flows past. The one on the right has a problem because the DNA goes upwards and it would have to raise extra heart to keep up with that DNA. And what you would actually want to have is have a second copy of the protein that is sort of the mirror copy of the left one and that sits there and can deal with the other side of the DNA. So then you would have two proteins that do exactly the same thing but one does it for one part of the DNA and the other one does it for the other half. But you can see that it's not quite a mirror. It's quite complicated. Asking nature to make a second copy of a protein and keep it in sync is about as much fun as asking the Debian maintainers that you want to have your own private fork of open SSL. It is not going to happen. So for four billion years, nature has said, no, no, we're not going to make a second copy. I know it will make life easier, but we're not going to do it. We're going to build this monstrosity. This is what nature said. Looks like this. Sometimes this works. What you see here, DNA comes in. This is really present presenting with handicaps this, by the way. DNA comes in and one copy comes out here below. Just flows through it. One part is simple. That's the protein that is looking in the right way. Then on the top of the screen, a copy is coming out where you see all kinds of complicated things going on. You see, there's a loop and then it's pulling back the loop. And the reason it is doing that is that it's using the same protein to copy over here as over there, but because it's in the wrong direction, it has to let go of a whole piece of DNA and then pull it back to copy it and then let it go again and line it up. And this is the kind of thing you would not have come up with. But it's a workaround. This is a hack. And I could watch this for hours, by the way, and I frequently do. There's a YouTube version with more pixels. This animation was made for the 50th birthday of DNA. And they consulted with a whole bunch of specialists to make sure that this is actually real-time and realistic. But this is a 4 billion-year-old hack and that is going on this ridiculous process of copying a piece of DNA and then copying it in reverse and then gluing it up again. This is going on in all of us trillions of times now and has been for billions of years. So this is the same kind of issue you get when you fight with a Debian maintainer and they tell you, no, work around the problem. So we are quite short for time, so as predicted I would run out of it. We will skip all this glorious stuff. This is an especially glorious slide which I will explain tomorrow at 6 o'clock at the pie tent. This is super nice. This is the origin of life itself. Oh, I have to tell this one. Let's say you want to know what the cell is doing and you want to figure out what is this cell doing right now. I want to run top on this cell, figure out what processes are active. It turns out we can do that. So take a cell and make it do the thing you are interested in. Give it some poison and see how it counters the poison. You take all the cells, purate them, remove the DNA from it and then you are left with all these bits of RNA that were floating around to make the cell do its thing. Then you convert the RNA back into DNA because we have no RNA reading machines. We only have DNA reading machines. And then you read out all the RNA that was active right at that point. Then you match that up with which parts of the genome that was and you know what the cell was doing right at that time. This is a stupendously powerful technique and that when you do this, you get two kinds of data. You get the so-called housekeeping genes which are like the idle process in your operating system. Those genes run all the time, so you will always see them. And then you also get the pieces of RNA that are actually involved with doing the thing that your cell was doing when you were subjected to it. So this is running top on the cell. Yeah, okay, so I will, oh, this is also super glorious. Malware, oh, you really have to come back. This is a bacterial virus, I have it with me. So this is when your infection has an infection. Which you will see tomorrow at 6 in 10 to 5. Thank you very much. Thank you. Sorry to interrupt you, but you made a point landing. So tomorrow, I think, could you repeat it one more time? Tomorrow at 6 in the pi 10th, there is more. I have this whole library of books with me where you can look it all up. And there we will be doing the missing slides and we also do more Q&A because I hope that at least some of you find this stuff as fascinating as I do. But thank you very, very much. Awesome talk. Thanks a lot. This is your applause.