 Thank you very much, everyone, especially Zach, for setting that up. In fact, I have my first slide, I guess. So this is my thank you slide, and conflict of interest. And each of these icons represents some institution that is very pragmatic and practical about helping our world get better. And when I talk about data privacy and risk, I inevitably have to talk about technologies and cohorts, because I see this as an important part of this. And this is roughly how the narrative that we'll be talking about today. I want to introduce the question of who can contribute to breakthroughs. It is not just large institutions. I have to talk about not just three technologies, but their impact on this question of data privacy and risk, and then how we do data integration and sharing that Zach has introduced so well, and how we get from association to causation briefly, and then how this impacts therapies, new therapies, not just small molecules, but gene therapies immune microbial and brain. Now this is actually from the last pair of talks that we had. John Sulston came in electronically, and so I came electronically to listen to him, and he had this diagram here about the threat of proprietary databases that only people, institutions with money, would get access to the human genome, and we did it as much as we could to prevent this from happening by putting as much as possible into the public domain. I would generalize this very slightly. This is literally John's slide. This is not. So I took his name off. To say the DB gap covers not just the human genome, but human genomes plus traits. It's even more useful than human genome, and it still has this possibility that only exceptionally wealthy institutions have access. So we ask, who should get access? Here are two clerks here who did quite a bit without appropriate credentials. By the time they were 26 years old, Einstein and Ramanujan, and then here is Aaron Swartz and Richard Feynman, who almost certainly would not be given access to DB gap because one is a safe cracker, and the other is an activist who likes putting public, putting private data in the public domain. But nevertheless, they made gigantic contributions by the time they were 26 years old. With the exception of Feynman, were generally not associated with institutions at the time of some of their breakthroughs. Here are three more examples, more up-to-date and more genomically oriented. They happened to all be in my lab at some point or another, but they made all these contributions before they were, some cases before they were 17. This is Kaol, who was interested in hemochromatosis because her father had it, and so she did it herself, literally in her closet. This is her closet door right there in an eBay PCR machine. Does the FDA control Kaol? Not clear, but the Globe reported it. Ann West, in 11th grade a few years ago, did and now analyzed four members of her family, all four members of her family, including herself. Her father had Factor 5 Leiden, and she set a new bar for 11th grade science projects. And then Shoudy Woo, I assigned his undergraduate class, so what I thought was a ridiculous assignment, which was to, this was in 2007, to take genomes in and pop interpretations out, and he didn't know that I was kidding, and he actually did it, and so he got a science in the nature paper out of that, so. And we have companies that offer tiny bits of our genome that are considered to be particularly informative. You don't necessarily have to have a family history. You could be the first one in your family because of a carrier status, basically. It's the first time that you and your wife got together. I'm one-quarter Ashkenazi, so I'm at risk for the largest set of these genes, of the 23 genes in the Good Start menu. Now that I've had my genome sequenced a few times, this is much less likely, but nevertheless it was good to know, and I, so I was, that's an example of partial genomes, but what about whole genomes? There are now a number of companies that offer whole genome sequencing. It's affordable, they consider, much more affordable than $3 billion anyway, and sort of, and these range from CLIA CAP type approvals to direct to consumer. And this is a partial list. That's whole genome sequencing. Now how do we get to that place, and how do we predict where we're going next? This slide, you know, the punchline is already ruined for this audience, but nevertheless we rarely talk about how long we would have predicted it would have taken if we had extrapolated this Moore's Law curve, this dotted line, with this green curve which had been kind of, genomics had been on this pokey Moore's Law exponential since I was a student, it would have taken six decades to get from the end of the genome project where actually it could have flatlined after the genome project was over, but six decades with an optimistic exponential, and instead it was six years, instead of six decades due to the advanced sequencing technology development grants from the NHGRI, among other reasons, and it's down around $2,000 is what we've been paying for the last couple of years in the personal genome project. So just three snippets on technology and how it impacts this privacy issue, clinical accuracy, if I had to put down some money today and get my genome sequence, actually I did do it this way, among other ways, but it would be this long fragment read from complete genomics now, CGI, BGI, where they can get long haplotypes up to sort of a N50 median of 1.5 million base pairs, so that you can tell the difference between having two mutations in one copy of your gene and still having a one functional copy, or having the two mutations knocking out both your mother and father's copy in your genome. That's a big deal clinically as to whether you've got one functioning copy of your gene or you have zero functioning copies of your gene. You would think that would be present in all of the whole genome sequencing, at least the ones that are clinical, or the exomes, or the SNPs and so forth, so this haplotype phase I think is quite significant and this is one of the few papers to have a practical application protocol, this was just published this summer. Now, one of the other consequences of having, and this is very quite practical, of having a very accurate genome, so you not only get accuracy in terms of haplotype phase, but also we improve the accuracy for SNPs to about 1 in 10 million, Q70, when you have a truly accurate genome, you can throw away all the data except for the variance and it fits into four megabytes. Instead of being a terabyte of data, you've got four megabytes of data, you could fit the entire population of the earth on a couple of petabyte drive, okay, so stop worrying about data, just worry about accuracy. That was a one-dimensional genome, and we might be interested in three dimensional genomes, this is not the day to talk about that, multiple colors, but we are interested in whether we can get down, that was by the way done on ten cells by necessity, but ten cells is getting to be a small number of cells to do a genome on, all those genomes by LFR done that way, but we can do single cells, we can do less than a cell, and so if you're worried about your privacy, you better protect your subcellular material, and here's just one little slide on technology, I can't avoid it, here we have on the order, we can do next-gen sequencing in C2, a variety of different next-gen sequencing methods apply with very light amplification where you leave, these are RNAs, but you can also do with DNA, where each of these cells has each single dots as a single molecule of RNA lightly amplified, and then next-gen sequence, and you can see these two adjacent cells differ by one base peer in one messenger RNA type, and you can get the whole transcriptome this way, so you have effectively four to the sixtieth colors, or you can think of them as sequence tags. Speaking of sequence tags, and you know risk of your genome, these nanopore technologies, which I am embarrassed to say is my second slowest technology to develop, no actually no the last one, the nanopore was, this is nanopore is the slowest, it started in the late 80s in the in C2 sequencing started in the 90s, maybe they're both slow, ridiculously slow, but these have the potential of being highly accurate, this is not a sequencing accuracy, this analytic chemistry on peg-label deoxynucleotide triphosphates from my colleague, my colleagues at NIST and Columbia, Jingwei Zhu and John Cassianowitz, in collaboration with Genia, the nanopore company you haven't heard of too much, but this is scalable to billions of transistors and each going at 10 hertz or so, and it's potentially wearable as a consequence, you can have a battery operated chip that's disposable and is capable of going at gigahertz rates, I'm not saying when this or if this will arrive, but let's imagine that there is at some point some kind of technology that will be wearable, as you are monitoring your oral hygiene real time, you're picking up every person and microorganism in your environment as well, is that regulatable? But back down to Earth, we have data leakage and re-identification issues that are constant, Zach has started the conversation on this, but their entire databases of ways that people get data leakage and re-identification of various sorts, gigantic databases out there that you can look at, where they document things like 26 million veterans medical records and their social security numbers and the disposal of the information, not yesterday but actually quite a while ago, and this was part of our motivation for starting the personal general project was the knowledge that this could happen, and the case that Zach mentioned from our colleague, Yaniv Erlich, in 2013 was actually predated in 2005 when a 15-year-old kid did it, he didn't get a science paper out of it though, but he did find his anonymous sperm donor dad, first try, and so this is not new stuff, as Zach would say, this is Groundhog Day, how many warnings do we need? And when we have 10 pages of consent form, a fine print written by fine lawyers, are we protecting the laboratory or the patient? The million veterans project, I have to say to their, you know, I have to compliment them on doing a survey to see what the patients want, and they found that the patients, 96% of them wanted to receive information about their health, and their response was, sorry, we are legally unable to return, so why did we ask? I don't know, but nevertheless, this is in the face of a study that at least, in part, said that research results can and sometimes should be returned to the participants, and this was in 2010. I used to get a little whimsical, but, and I don't believe this applies to everybody, certainly if you're on the International Space Station, you don't have the kind of privacy that most of us have, it's a luxury. If you are participating in a disaster like a tsunami, or in this case, 1918 flu virus, you lose a little privacy. But privacy is not only a luxury, it's a symptom. One of the personal genome project volunteers, and one of the reasons we picked them, is he was a father of 400 kids, estimated, and that's because he was a sperm donor, and the secrecy that surrounded this was intended to protect him and possibly his kids, but what we actually have is an Ann Arbor, Michigan, well you'll notice, by the way, that I casually mentioned names of people in this talk of patients, that's because they have given permission to do this, so almost, my entire talk, every time I mention a patient, I will mention my name, but in case Kirk Maxey is gone on the warpath, because they have 400 kids in Ann Arbor, Michigan who are half siblings who are about marrying age now. So what are the risks of privacy loss? We were very worried about this at the beginning of the personal genome project, which started around 2005, because there was not the Genetic Information Non-Discrimination Act, which was the big bugaboo about we could be denied our healthcare and employment by participating in this open study. That was hypothetical at the time, there were very, very few cases of it, and since 2008, it's been illegal as well, so it would probably be a bad idea for a big company to do that. Life and long-term care insurance were not covered by Gina, but are still hypothetical, so that's the risk of privacy loss. What about the risk of privacy abuse? If I were Typhoid Mary, I actually went to great extremes to try to avoid or to retain her privacy while continuing to be a food handler. There are many people that like spreading STDs without having to share their private information about their health status. Same thing goes for many perpetrators of violence do not want you to know that, et cetera. I don't drive because I have publicly disclosed narcolepsy, but other people do, and there are 1.2 million people a year that die in driving accidents worldwide, and not all of them, but some of them are due to these and other causes. Individuals can and probably do decide whether they're going to take out long-term care insurance based on their APB status. This is a factor of 20 when an actuarial terms is a gigantic factor. Actuarial guys get really excited about a factor of 1%. So we stand at an important juncture for genomics where we decide, something that was decided long ago for faces and voices, which is whether we're going to reveal them or whether we're going to keep them very private. I would argue that our faces and voices are more revealing than our genomes right now anyway, because they not only reveal our ancestry like the genomes, but they reveal our emotions, our age, and our health. If you really wanted to keep people from knowing whether you're asleep or angry, you would probably cover your face like Spider-Man. Omic privacy. As I've said, we've had repeated warnings in peer-reviewed papers where the genome plus the phenome, either one of them can be re-identifying. The SNPs, this Homer and Craig paper, had the consequences. I understand that NIH scraped a bunch of data out of the public and put it back in the vault. Maybe that's an oversimplification, but also it's been pointed out there's tryptomics, GWAS studies, microbiomics, lychromosome. There are many things that are risky to put in the public domain even after de-identification. Nevertheless, a modern consent form, and as far as I know, all the consent forms on the NHGRI website say, thanks to this effect, it will be very hard for anyone who looks at the databases to know which information came from you. This has not been corrected, as far as I know, since the publication of the Yanni Varix paper, which wasn't news. This is from the 1000 Genomes Consent Form. They also put in language about how it would be, really easy, but they also say it would be very hard, and so this is a mixed message. So we have genomes, environments, and traits. Wouldn't it be great if these were really fully integrated the way that Zach and I have laid out here? Are there any barriers whatsoever to full integration? I mean, isn't it the case that the 1000 Genome Projects also had all this other stuff about epigenomes and microbiomes and so forth? No, it didn't. Wouldn't it be great if the Human Microbiome Project also had the human genomes with it and the, you know, cell lines and so forth? Wouldn't it be great if the ENCODE Project encoded all these things? So I've brought these, I've put these acronyms here that you're all familiar with. We need integrated datasets and shareable ones. I will argue that there is at least one such integrated environment. It's pretty pathetic in any way, other than integration and sharing, but it is a placeholder. It's a gift that people can choose to take it or not. I'd say there's zero barriers to full integration and sharing in this. I'm not saying we've fully delivered integration, but we've certainly delivered sharing, and this is now international. Canada just announced its Personal Genome Project and immediately got 500 individuals who are willing to pay for themselves and one other individual, and it integrates. And as each item gets integrated, the cohort becomes more valuable in the same sense that each article that gets written for Wikipedia makes Wikipedia that much more valuable and more and more people participate. So this is something that is likely to snowball, although I am not claiming that it is snowballed yet. We each group can set their own goals and get their own IRB approval using the template that we provide. So it is we consent for re-identification. We don't put names out there, but we say you should pretend your name is out there. You may even consider putting your name out there on your own, as many of them have. And we do that because we require not signing a 10-page consent form, which they don't read, and if you ask almost any principal investigator, they will admit that their research subjects do not read the consent forms. But they do read ours because they have to get not a 99% but 100% on our entrance exam. It's only 20 questions, but they have to get every one of them right. Our old exam was pretty hard, and we felt that that was okay because we didn't want every, we were getting too many applicants, not too few. The new ones were pretty easy. The old one that was hard, at least one applicant took it 90 times. That's how excited they were to get into this project. When we first proposed this project, people were saying no one's going to show up, George. Well, that's not the case. And stem cells, another thing very hard to do, hard to share because just like genomes, maybe even more than genomes, it tells you a lot about the person. Here's a practical consequence of this that we see. It's not every day that two major government institutions get together like NIST and FDA, and they did so in this case on the issue of genome standards. They wanted to have genome, it's actually called genomeandbottle.org, if you want to look it up, if you don't already know about it, where they wanted to have thousands of copies of one lot of DNA from maybe multiple different sources, but one lot for each that would be distributed for anyone who wants to develop a new diagnostic and new genome service or new instrument. So they all be on the same page, same genome, and they looked around for properly consented genomes, and so far there aren't a whole lot of them out there, but ours looks good so far in terms of consent for re-identification and for commercial use. I'm not going to go through the technology here, but we have new technology, well, technology has been in development for five years, but still fresh and new, where you can analyze your immunome and epigenomes of other types, and this is not only identifying of who you are, even if you're an identical twin, or even you can tell who you are, the changes from day to day. This is extremely identified, tells you what you are being exposed to. People sometimes say, well, I don't think it's time to get my genome, even if we're free, I wouldn't bother, there's nothing I can learn from it. You know, nobody in my family ever had a genetic disease, but here's some examples of people who have, and it illustrates an interesting point, which is both clinical practice, we're all in equals one, and increasing for research is informative to deal with a coherent picture of a single individual, which is extensive, it's a big data and a diverse set of data, and these typically, they were exomes or genomes, and we should be able to recite these to our friends as examples of the cutting edge of genome sequencing. Nick Volker, you probably all know about, he was three years old, having multiple intestinal surgeries, he got, notice I'm naming names, all of these are publicly known names, many of them are a couple of them in the genome project, and so the intestinal surgery stopped when his pediatricians did the absolute desperate act of sequencing, his exomer genome, and found that he had an immune problem that could be solved with cord blood transfer. The burial twins, cerebral palsy, and their diet changed include serotonin and dopamine precursors, so the list goes on. John Lauerman was one of the PGP volunteers who wrote about his experience in Bloomberg, and he found he had a somatic variation in his blood for Jack II that explained why he had leg problems and scotoma in his retina, and in fact his leg pain would cause him to be hospitalized, and he got a genetic test, they thought it might be Factor V Leiden, it shows the difference between a genetic test and a genomic test, because he didn't have Factor V Leiden, but he did have Jack II. Now those tend to, and what we usually think about genetics and even gene therapy is about curing people have rare deleterious alleles, but the alternative, and I think it's an interesting way of going forward, is focusing on just as rare, the other end of the bell curve, things that are rare protective alleles, and we're looking at these three studies on really old people, the Welderly, which is sort of 80 and above, the 100 over 100, which is 100 and above, and the Super Centenarian study, which is people over 110, and there's some likely genetics here, Tom Pearls and others have pointed out that there might be a 17-fold increase risk of living too long if you're a sibling, and you can see that probably these people are making it past 110 years, probably not because they're drinking and smoking to excess, but in spite of it, we could, and they might have rare protective alleles. This is not meant to encourage you all to smoke and drink to excess. And here's some rare protective alleles, not from the Super Centenarians, but from other studies gathered from the literature. Let me give you some idea. Oh, by the way, the Super Centenarians are very hard to protect their identity. There's a list of, 72 of them online if you want to bounce your laser off their windows. That would be interesting. And here's just a laundry list of homozygous nulls and special heterozygous and so forth, and it's really quite a number of them in this list is growing, and I think we're going to learn quite a bit. And if you weren't already convinced that genetics is not your destiny because you can change your environment, you can now change your genetics as well. So there's now in phase two clinical trials from this company Sangamo and others that they collaborate with and compete with. You can now not just insert a gene that covers for a rare deleterious missing gene function, but you can remove gene function precisely or change a few base pairs. And this is due to zinc-finger nucleases. We had a very tiny role in the beginning days of Sangamo in the 1990s, last millennium, but I'm not currently affiliated with them. So this is one of my few slides that is not direct conflict of interest. But anyway, they're in phase two clinical trials on this zinc-finger nucleus that knocks out both copies of CCR5, which is the HIV ligand that allows entry. And we're now doing CRISPR RNA, and I'm not going to talk. This is, I would love to give an entire talk on the CRISPR gene therapy and CRISPR analysis of causation. But I just want to end on this last topic here, which is another kind of gene therapy that my postdoc Volker Busconf did as a graduate student, which was introducing bacterial genes into retinas of rodents, and this to restore visual responses by making neurons essentially responsive to light. And you can do the reverse where you can make neurons report out their activity with light. And this and various other progresses in nanotechnology and synthetic biology are the basis of brain activity map, which you may have heard of. I don't know how, since it's so secret. But the brain activity map in a certain sense is already baked into the personal genome project. We want to know about behavioral cognitive variation among individuals and their potential genetic and environmental components. And so for many years now, we've been collecting things like functional magnetic resonance imaging data. This is very, very coarse grained, but it's the state of the art ways of doing a brain activity map. And we've done this on many of the PTP volunteers. This is actually my brain sliced through here. Fortunately, these are virtual slices rather than actual slices. But we would like to get really high resolution. And to do that, we might have to be a little more invasive than FMRI. Here's an example of a patient of John Donahue. John Donahue is one of the founding members of our brain activity map project clan. And he and his team allowed this tetraplegic woman to, for the first time, give herself a drink, albeit through a robotic arm. But you can see this little implant on the top of her head, which had been there for five years. Finally, allowed this feedback circuit. It was very slow and clumsy. And it required this huge arm. John Donahue and others on our team would love to allow her to do this with her own arm and to do it with much finer grain and faster. That's what we're in the brain activity map is largely about technology, reducing the costs, enabling all kinds of small science, and not only just reading the brain, but being able to have the brain control various things. Now, are there any LC issues? And are we concerned? Yes, of course there are. You can read brains and potentially control them. And this could be as significant as genomics. So here's an example, again, not news, 2005, where a neuron was identified in an epileptic. You have the IRB approval to put electrodes in for epileptics looking for vents, but you also can look for images that they could recognize. And a Jennifer Aniston neuron was discovered, which was very specific for her face and not a variety of faces that looked kind of like her. And so this was really quite remarkable, but it showed how crude it was too, because we only found one neuron. There was almost certainly big nerve circuits, and there were many other faces that were stored in that brain. So we'd like to be able to do this and many other things at higher resolution. So Zach and I think have left quite a bit of time and produced enough provocations for discussion. But clearly data privacy is an issue in the world. It's especially an issue in research. If we want people to participate in such things that are extremely identifying and revealing about one's state, not just inherited, but day-to-day state, we need to solve many of these problems. And I've presented at least one cohort, which doesn't have to be the only way of doing things, but it should be baked in as part of, I think, our portfolio of cohorts. Thank you very much.