 Thank you very much. It's a video input out of range on this thing. It's a pleasure to be here. Thank you very much for having invited me. It's really nice to be able to reflect now 10 years out on what a transformative event we're celebrating. And so I'm going to reflect on where we are with respect to genetic privacy and how we're going to get to an information commons. And I use this terminology information commons specifically because it was a term that a IOM committee that I had the pleasure of being part of described where we want to go next in terms of measuring genomic variation and phenotypes in large populations. And it was actually a pleasure to have on that same committee some of the original members of the National Academy of Science group that called for the Human Genome Project in the first place. And they were able to share how back at that time they were told that there was one of two reasons why they could not actually follow their recommendations to go with Human Genome Project. One was that it was impossible to do. And the other was that it was already being done. Which was interesting that both were being said at the same time. And so similarly in our report around precision medicine from the IOM, we said we need to create this information commons, but the privacy implications of it are significant. And therefore, another way to think about this talk that I'm going to give you is to say I'm going to talk about perilous privacy perspectives, which may promote parochial policies, pinching personal and public prerogatives. The 10 Ps remember them. So let's start with some publicity. There was a paper that recently appeared out of the whitehead about how individuals from the 1,000 genomes cohort had been re-identified using publicly available data and ancestry.com genealogies. And there were hundreds of headlines, such as, your biggest genetic secrets can now be hacked, stolen, and used for target marketing. Wow, think of all the Viagra, those poor 1,000 genomes people are now going to have to buy. Study highlights the risk of handing over your genome. Researchers found they could tie people's identities to supposedly anonymous genetic data. If you can't beat your genome sequence anonymously to scientific study, that data might still be linked back to you. Sounds very worrisome. Remember this, Groundhog Day? I've been inspired actually by President Obama's recent visit to Israel, where he reached into the language of my ancestors to speak in Hebrew. And I'll do the same now for my colleagues in Cambridge, Massachusetts. Ain Chadash Tachat Hashemesh. There is nothing new under the sun. And why do I say that? One of my colleagues and friends published in 1997 a study, which I now reprise from you from a 2001 article. This is Latanya Sweeney, who did this work as a graduate student. And as this article from 2001 notes, starting with a birth date, sex, and zip code, computer privacy expert Latanya Sweeney, PhD, retrieved health data of William Weld, former governor of Massachusetts, from an allegedly anonymous database of state employee health insurance claims. Knowing Weld lived in Cambridge, Mass, she crossed her data with that community's publicly available voter registration records. Only six people shared Weld's birth date. Only three were men. Of these, Weld was the only man in his five digit zip code. And then she was able to mash this up against the supposedly anonymous but public insurance records of public employees. And since she had very publicly vomited during a public presentation, she was able to track that episode of gastroenteritis that he had been hospitalized for. And she could see his whole record using public data. So I think Latanya Sweeney had made the point, which is, if there's enough data out there, we can mash it together and re-identify anybody. We knew this and had been shown multiple times. So therefore, what does this last study mean? And what have we learned that is new? Is genomic data different from other health data? It's not the biggest. Functional MRI scans are much larger in information content and storage than even BAM files from a genome. It's not the most predictive. Buglestein showed that, in fact, it's not very predictive if you're asymptomatic. It's actually quite useful in disease. But a lot of other things are very predictive. Family history, the fact that you're a smoker, the fact that you're sitting down all day, those are good predictors of future health status. It's not the most expensive, thanks to people like George and MRI cost more than a genome. And we'll continue to cost more. It's not the most identifying for at-risk subgroups. Turns out, if someone says they're African of African origin, they're of African origin. The SNPs bear it out, but it's self-evident. And in fact, if you look even at electronic medical records and look at what the ethnicity stated and then do the genetic studies, they're right. So it's really not disclosing. But it is the most disclosing personally. The fact is, your genome is your barcode. And not only is it your barcode, it's also the barcode of your family members. So it does have some special characteristics. And unfortunately, because a lot of criminal databases, the database is not criminal there, databases of criminals, or people who are arrested, and maybe not criminals, have some genomes, it could actually have some disclosing capabilities. So how does that make us think about privacy? Well, what if I'm in the room and I hear Eric Mudder or something about that, Zach Kohane? Did I evade his privacy when I heard that? No, it's debatable. It's debatable whether or not. On the other hand, if Eric is sitting in his office and I'm bouncing off from outside his office down on the green, a laser beam onto his window, as has been done many times, I take the public data, this is just public data, of his vibrating window to actually hear what he's saying. I've used public data. Have I breached his privacy? It's public data, after all. Well, people would say it's probably pretty antisocial of me to do that. But it is public data. Well, Eric then decided he could go into a bunker and lock down himself so he can have all these lonesome conversations about the sequester. And no one would hear him. He could yell very quietly behind meters of concrete. By the way, this is, in fact, a real concrete bunker in Moscow that's been turned into a very fashionable restaurant. Do we need to do that? Do we have to go with our genomes into the concrete bunker and lock down that data so that no one could ever see it, so that we don't have someone bounce up the equivalent of a laser beam off of our public data to re-identify it? Think about that. Do we want to go down to the bunker? Well, even if you're not in the bunker, even if you're very much in the public eye, you might think that you have some reason to want privacy. The very talented and lovely actress, Scarlett Johansson, had the unfortunate experience of having some photographs of her hacked out of her cell phone. Some argued that what did she expect? She was, after all, a celebrity. She's in the public. You're in the public. Therefore, it's OK to look into things that are in the public. And as she said, just because you're in the spotlight or just because you're an actor or make films doesn't mean that you're not entitled to your own personal privacy. So just because the data's out there doesn't mean that it's an invitation to breach your privacy. That's bad form. It's bad social form. Another individual, lovely in his own way, Richard Stallman, who some of you may recognize as the founder of the GNU Project and the Free Software Foundation. And one of the few people I know who has a beard that can compete with that of George Church said, there is no substitute for privacy. Fortunately, we can maintain our privacy by limiting by law what companies and the state can collect on a regular basis about everyone. For instance, instead of requiring ISPs, internet service providers, and phone companies keep data on everyone's contacts, laws could forbid keeping this data except for people already placed on a surveillance list by court order. We must require a new system to be designed for privacy rather than to collect all possible data. It's not too late to protect privacy pretty well, but we must insist on it, which means not to heed the people who say it's hopeless. So just because, I want to highlight a few words, he says, pretty well. So you want to make sure that you're not allowing me to casually overhear you in the whole way. But if I'm behind my windows in my office, you really should not be looking, trying to listen to me. And our laws should protect that. And so I think one of the ways you can interpret what RMS is saying is privacy is not dead. We just have to engineer our society and our systems towards it and recognize that some behaviors are just not acceptable. But this is not an academic discussion, not at all. Seen here is an impressive roadmap published in nature by a group of people including our own Eric Green, outlining the multiple stages of work in exploiting the knowledge of the human genome for the public good. Note here, we have understanding the structure of the genome, the basic spelling of it, understanding the biology of the genomes, how they interact, how they're regulated. And note here, understanding the biology of disease. And for that, we'll start to have to require this information commons that I was referring to and which I'll detail a little bit more subsequently. But if we get the privacy discussion wrong, if we cannot actually, if we cannot understand that privacy exists and that we can in fact put data out in the public with expectation and enforcement of good behavior, then we're at great risk to actually disrupt that roadmap at great cost to the future pain and suffering of us, the citizens of this planet. And it's ironic, it's ironic because as published by this other Institute of Medicine report called For the Record, by the way, for those of you who don't know IOM reports, they're all available for free on the web at NAP.edu. So the irony is that even today, when there's limited access of healthcare data, of biological data to patients, there's very broad access to that same data by insurance companies, by the government, by researchers, employers, direct marketers, state bureaus of vital statistics, pharmacy benefit managers, local retail pharmacies, and attorneys. And what's interesting is of all of those, the biggest focus somehow ends up being in most of the debates around these people who are actually trying to promote the public good. There's no argument that any of these people should have access to data. You're not seeing paired conferences about whether pharmacy benefit managers should see the data. They say we need it to pay the bills and everybody says okay. And so it is ironic that we're talking about this when in fact others don't seem to worry too much about it. So back in 2005, with my colleague Russ Altman from Stanford, we wrote a sounding board piece called Health Information Altruists, a potentially critical resource. And what we articulated was that there was going to be some cause for concern that in fact we have to recognize there is no such thing as perfect anonymity. We were well aware because she's a colleague of Latanya Sweeney's work and so we did not need to go through another Groundhog Day or another study to realize that there was no perfect anonymity. But we also saw that this concern was going to increase when we were to have large genomic research on large clinical populations. And that everyone would be worried about the risks of sharing data. And parenthetically, we were also aware that a lot of people would use these concerns as excuses not to share data. However, we also note that there were various levels of concern. Some people really did not want, were very worried about disclosure and others were so unwarded that they bordered on the exhibitionist in terms of sharing of data. And what we recommended, we put out there in this article. We said, first we should make a set of guarantees to the subjects, the research subjects, about the risks of re-identification. But to be realistic, by the way, just as we were with the 1000 Genomes cohort, and we can outline the potential damages about a disclosure. And if they continue to decide to go forward, the subjects presumably will elect to take the risk in the hope of helping to address human disease. We also wanted to make sure we actually covered the researchers who curate genetic databases. They should have also protection as well, provided they follow these guidelines, as they have, in fact, for the 1000 Genomes. And most importantly, we said, and still quite controversial, patients should be granted explicit control over the disclosure process. Patients should get to the side, not anybody else. And those health risks turned out we're not hard to find. What a motley crew. And what was interesting about this study is that I found it helpful, because when you share data, knowledge is certainly accrued around the data that's mashup. So this individual, Steve Pinker, a distinguished psychology researcher at Harvard, was found to have a mutation that supposedly predisposed towards hypertrophic, a hypertrophic cardiomyopathy. Turns out he does not have hypertrophic cardiomyopathy. And for me, that was actually quite rewarding because it led into my growing obsession with this biggest ome of them all, the incidental ome, the ome of all incidental findings. And when I actually pointed this out to one of the researchers involved, she said, well, he hasn't developed the hypertrophic cardiomyopathy yet. And so I started to wonder, although I didn't say this out loud, how old will he have to be before this variant starts becoming protective against HCM. But nonetheless, this was a brave and bold step forward that actually showed the way that research could be advanced through this altruistic publication of your own clinical data and your own genomic data. And this study, the PGP, is actually in this hairy or fuzzy rather, another land, another world between clinical research and clinical practice. It's definitely not clinical practice. It's not really necessarily clinical research unless you actually use it for clinical research. And this got me thinking quite a bit about what is the distinction between clinical research and in clinical care because in fact, if you go to most IRBs, they'll take great exception to the idea that there is not an absolute dichotomous divide between clinical research and clinical care. And yet, let me run a few cases by you. Your pregnant daughter consents to a research study of fetal taste act screening. Trisomy 21 was found but was not reported because not part of the consent. Were they right? Hands up, those who think they were right. So either you're a bunch of meek Nambi Pambis or you're all massively agreeing that in this case this clinical research data, this genomic clinical research data should have been shared with the patient and become clinically actionable. Okay, let me push you a little bit further. Your son contributes blood for a study of ADHD. During exome sequencing, they find your son has a variant well documented to cause familial adomatosis polyposis. Now today, the ACMG just announced their guidelines on incidental findings and for a clinical exome, they say that you actually should report if you have this variant that leads you to essentially super high risk for colon cancer. But that's for clinical exomes. This is your son. They found it in their research exome for an ADHD study. Should they report it to you, you or your son or not? Hands up who thinks they should not report it. Just let me get a little bit braver because maybe you're all zealots like me but it looks like maybe 4% of this room actually thinks that you should not disclose it. So therefore, I think you implicitly agree with me that this boundary is extremely vague and that therefore the genome itself is accelerating the era where it can become very unclear to what degree one is participating when one is a patient as a research subject and as a research subject as a patient. This led me to publish a piece in science back in 2007 where I said what the hell is this? Why have patients and doctors entered into a compact of mutual ignorance where the doctor agrees not to find out what the identity of the patient is and the patient agrees not to find out what they may learn about themselves from the study and therefore they only benefit from the study as being a member of a class of patients. And what I argued was that we should have patients contributing their data to an anonymous database as before but now if we have a finding it should not only result in a high impact publication but it should also result in a review which will then allow communication back to the patient of those results that matter and that should become a routine part of practice. At the time, I got a few congratulatory comments actually from clinicians and clinical leaders but many of the genomics community were quite annoyed with me for suggesting this. Not least of which because I was suggesting that they had a reporting burden where they did not feel that they did. But we continue to learn. So in 2008, this individual who we should really be celebrating on this day that we're at the 10th anniversary made his whole genome available through next generation sequencing and it was very interesting because although he's a quirky individual, he's not as quirky as his genome might suggest because for example he's homozygous for a number of diseases such as Usher syndrome and cocaine syndrome and probably not a sequencing error. And again, it suggests that our knowledge was inadequate of the genome and that we really have to push forward in the annotation of the genome. Here let me make a plug for an NIH effort called ClinVar. We talk about open access publication, what we really need if we're gonna take good care of our patients is open access genome annotations. And although there are companies that are in this space, I do think that the public wheel is best served if initiatives like ClinVar are maximally successful so that every patient can have the most authoritative up to date interpretation of the genomes and this individual is among many who are helping us reach that state. Now, whether these results get reported back to a patient or not is actually a very individual decision. You don't have to introspect too long to realize that some individuals want to get everything. They wanna know everything about the genome and others want to know less. And as I wrote in an article with Patrick Taylor in Science Translational Medicine, it's gonna be a function of what are you? Communication capabilities, what your preferences and your risk of verseness. Some people just don't wanna know. No, no, no, don't tell me. And others wanna know everything. And I think we have to respect that. But when I say that, people say, Zach, what incredible overhead bureaucracy are you anticipating where we have to take care of every single wish of individuals? Well, I was gonna show you. I don't think it's that out there. For example, there's this website, mint.com and when I first started using it, people thought it was remarkably bizarre that I was doing it because what you do to mint.com, you give it all your usernames and passwords for all your financial institutions. Your credit cards, your IRA, your home address. And what it does is it does something pretty interesting which is initially there was no standardization across these very databases. You would write a software agent that would go and log in as if it were you and take the, if the data was not available, take the HTML and just decode it and put in a central database for you, all your financial database so that for the first time, you'd have all your financial data in one place. Furthermore, it'll alert you that you have to pay your bills or you just got charged a fee. And by the way, did you know this is their business model? You could get a credit card that's gonna be less expensive. And I chose to actually take my risks with my privacy because the benefits to me were very significant. And every time we get to the tax month, I'm much happier that I've done this. But not everybody has to do it and nor should they have to do it. So I think what we're heading towards is what we talked about in this Institute of Medicine report. We said that much like Google and others, but particularly Google, has made an industry out of taking something that used to be incredibly boring, geography, and they've layered onto it added value such as the location of the nearest pizza parlor or how to drive around. By putting multiple dimensions on top of geography, you achieve extraordinarily a ramp up in value. And likewise, we're going to create an information commons by putting everything down from the exposome, signs and symptoms, microbiome, epigenome. These are not my ohms, so forgive me ohm. And line up all these data types against actual individual patients. When a lot of us do our genomic meta-analyses or matchups, it's not always on the same patient. If we can get it down to this individual patient, we'll understand how the epigenome is actually informed by the microbiome and so on. We do it in little ways in projects like TCGA and so on, but we need to do this exhaustively on large populations. And that's what we called for in this ion report. And furthermore, we pointed out that if we did that, that we created this nicely stacked, multi-dimensional perspective on the genome, what we'd have is, in addition to the usual thing that we've done quite well in academia and in commercial companies, which is going from big science discoveries all the way to targets, we'd also be able to really take advantage of clinical medicine in all its messy glory, do observational studies and do clinical discovery that's informed by both the clinical information and the molecular characterization. And we did note in our report that this area is the part that hasn't been underserved until very recently. And so when we think about bringing together all these data around patients, I think we have to think expansively. Oh, by the way, I should note, the very same concerns about genomics were not early articulated about geography, but they could have been. Several people have published papers that show that the public diagrams of maps, for example, of individuals with HIV in first-rate journals like JAMA had enough geographic resolution so you could actually figure out who the patients were. No, I didn't hear a big outcry that we have to stop doing geography. And in fact, there was a risk when Google sent its vans through the various streets picking up your Wi-Fi passwords. They had to apologize for that and I think pay a lot of money subsequently. But somehow, the outrage was not the same. So we're gonna have to bring together at the patient level several kinds of data. We've talked about the research data which comes from both registries and cohorts and pharmacy trials. And I've alluded to electronic health records and labs in the clinical data, but increasingly, we're gonna see more and more data coming from yourself. We all like to remind ourselves that in a good life, much less than 1% of your time is spent near a doctor. So in fact, what you actually gather at home and in your every day is a much larger amount of important data that will be important genomic correlates for the future. And of course, there's also the people who pay for healthcare. And they also have a lot of data like what they paid for and how much they paid. And that's gonna be important too. And the thing that we have to think about very, very hard is in creating this four-way join, the genomics can come in from all angles. Certainly, this has already happened with 23ME and the direct to consumer genetics companies. This is already happening and in the trial and in the pharma trials and in the various GWA that have been put on NCID to be gap that have happened. I'm predicting this will happen as well. That there is going to be, when you pay for an expensive drug and you don't actually, you do the companion genomic diagnostic, you may not actually be reimbursed. So that's where we'll have to really think about data sharing. So, how am I doing for time? I'm doing okay, fine. So, and think about that more recently with my colleague Ken Mandel. I wrote a couple of pieces in the New England Journal of Medicine. First, entitled No Small Change for the Health Information Economy. And that was an article that we wrote shortly after Obama was elected president because it was a massive investment about to happen in electronic health records. And we said that this was an opportunity for us to actually say, why is it that our health record systems are so monolithic? Why is it that if we don't like a laboratory system or a order entry system, we can't rip it out and replace it with another just as simply as we could replace an app on your iPhone. And then in our second piece, we said, why is it that in our day jobs as clinicians, we use these electronic health records that more or less our state of the art technology for 1980s, whereas when we go back to our kids, they're using a variety of different apps across different vendors in a very coherent matter and actually getting the job done. And so this is why do I even bring it up here, because if we are going to have the synthesis, this mashup of genomic data and the clinical care, we're gonna have to require an electronic health record system that can support that. Guess what? 99% of the electronic record health record systems out there don't support genomic data. They don't even support basic family tree data. That's gonna be a gating factor. And in another talk, I could tell you what we're doing to actually get around that factor and to build the modularity in the apps that will allow us to have that genomic data. And in fact, by coincidence, apparently, a year after we published those papers, something miraculous happened and it's a really good thing. What's happened is the market forces became aligned with the best interests of the patients. And a number of these large EHR companies said, you know, we've been talking about interoperability for 30 years, but we're actually gonna make it happen. Why they did it? I don't know, I'll leave it up to your imagination. But even today, we are actually doing the kinds of studies that I'm hinting at using our electronic health record data. So under something called the NCBCs, the National Center for Biomedical Computing, of which I'm the PI of one of them, something called I2B2, we've been able to extract data from electronic health record systems by disseminating our open source software that actually does this, extracts data from these various electronic health record systems. And we have put it out there and over 84 academic health centers across the United States have adopted the software and they use it for genomic studies where they look at the genomic correlates of clinical findings. They use it for quality improvement. They use it for pharmacovigilance. But do they share the data? Well, as a matter of fact, they do. And we actually tested the proposition of sharing in that most difficult place to share called the Harvard Medical System, where we have five hospitals whose first inclination is not to share data among themselves for a variety of reasons. And yet, they all had installed the I2B2 and we were able to have a reasonable discussion with them about having a distributed query done across what we call the shared health research informatics network and allows us to, for example, query the data on six million patients just at Harvard. And so, for example, I was unaware of it until my egocentric citation robot alerted to me. There was a study in nature about peripartum cardiomyopathy. They used the system to find the small handful of cases that had this disorder. And they only were able to get that number of cases because they used Shrine to do this. But this is, so that's five hospitals at Harvard, but the whole UT system now uses it. The president of the UC health system funded something called UCREX that links all the I2B2s across the 11 million patients in California. The six South Carolina health systems, all with different electronic health record systems are sharing the data using the system. And there's 12 international sites and a bunch of pharma that are using the same system. So even though we have all these obstacles, we can actually do that sharing today. It's not impossible, it's just a matter of vision. And we actually have a live network for some studies that we're doing of autism and type two diabetes where today, today, queries are being issued across this network and it cost hundreds of thousands of dollars, not millions of dollars to actually run this network. And so let's get back to our theme. Whose property is this personal data anyway? All of you at this institute are well aware of the Henrietta Lax story. And we all feel really badly about that fact that her data was used and she and her descendants really were not very recognized if at all and did not profit in any way from this worldwide use of her genome. And we'd like to be able to, we all feel badly about it and that's why this book in part did so well. But would we really do it? Would you really acknowledge, pay anybody who contributed to their genome? Most of you would say, well, no, no, I wouldn't do that. Well, I think you would be wrong. And I'm reminded of this by, I just saw one of my former trainees, a tool butte, and he was telling me about this great thing. He said, Zach, do you have clout? I said, what's clout? He says, it measures your citation index for tweeting. And so clout's mission is to empower every person by unlocking their influence. And what they do is they monitor all the social networks. And they capture these moments, they give you perks, they pay you, go into airport lounges and you can use the fancy lounges. They pay you for, if you have more clout in your tweets. And they have a privacy policy which they're explicit about, which should they say basically, you have no privacy. But they're explicit. And so the trade off is there and you don't have to use it but I was there in San Francisco two days ago and boy was everybody really proud of their clout. Why can't we have a system of micro recognition, micro payments in healthcare? There is no reason. My last slide is, but is it worth the risk? All of this will come to naught if we don't fix something. Here's a bunch of studies published in 2003. Quick question for the audience. 2003, how many primary care providers were ordering a genetic test? What percentage of primary care providers were ordering a cancer genetic test for cancer susceptibility? How many? What percentage? Five, zero, lower. Price is right rules. One, okay, the shocking answer, 2003 is 30%. And what was the greatest predictor of them ordering this test? Whole one, that's great. Very few people get that right. It's a patient asking for it. These other studies showed that the doctors were uncomfortable interpreting the test. They definitely ordered it and other studies showed that they were uncomfortable, they were not competent actually in interpreting it. And finally, another study from CDC showed that the usual thing, detailing, which means sending attractive men and women into your office to tell you why you should order this test, increased the ordering of this test, which are neither competent nor comfortable in interpreting ordering went up by a factor of four. What does that say about their healthcare system? So that's 2003. Just now appeared in Oprah Magazine. You laugh, but this is clout. Genetic testing, pass or fail. This interesting study of a patient, gastroenterologist said there was a new test that could determine if I had a gene, I could have my blood drawn. And then this receptionist told the patient they were positive in mutation. She took the results of the doctor who recommended she go over the test with the genetic counselor. And then the genetic counselor said, oh, by the way, it's unlikely that this is actually causative. And in fact, let's test your father just to be sure. Your father had colon cancer. And in fact, she didn't. So the incidental home strikes again. But the point is, there are some highly paid professionals here who do not know what to do with a test. And all this investment, this risk of privacy, if I'm a patient, I wanna know that when I contribute that something useful is gonna be done with it. And we can be as successful as we want to be in discovery if we cannot get that last mile of actionability. And when I'm talking about actionability, I mean competence on the part of clinicians to actually advise patients what to do. We don't have yet. In fact, the least paid person in this whole system, the genetic counselor, knows more than most doctors. And that is a real problem. And so in summary, genomic science does not change our fundamental need for privacy. Not, and not all have the same needs, not all have the same sensitivities. Privacy is not protectable by technology, period. But by mores, by institutional transparency, and by exercising of individual autonomy. And bunker legislation will only hurt we the patients because it'll prevent science from helping us. And democratized controlled data as an individual, not an institutional prerogative may be the future, although many, including I'm sure many in this audience will resist that notion. So the question I have is, will the medical establishment lead in this, or are we gonna use privacy as an excuse, not to? Thank you very much.