 All right, everybody. Why don't we go ahead and get started? My name is Bob Trug. I'm the director of the Center for Bioethics at Harvard Medical School and an intensive care physician at Children's Hospital. And I'm going to introduce Aaron. And then I'll have him introduce our panel and the topic. So welcome. This is part of our series on policy and ethics. I'll say more about that in a moment. Let me first tell you a little bit about Aaron. So Aaron Kesselheim is an associate professor of medicine here at Harvard Medical School. He's in the division of pharmacoepidemiology and pharmacoeconomics in the Department of Medicine at the Brigham. And there he heads the program on regulation, therapeutics, and law, which is known as the Portal Program. Aaron is also a faculty member of our Center for Bioethics here, where he teaches in our courses and, importantly, organizes and runs this series of seminars. And let me just say a word about what the Center's been doing around these consortiums. So we've been ramping up this year, but in the fall, starting in the fall, we should have something happening in this time slot, 12.30 to 2. Almost every Friday. So first Fridays of the month, we'll have the Clinical Ethics Consortium, where the hospitals come and present a difficult case from their ethics committees, commentary, and discussion about that. That's our oldest running series. We've been doing that now for more than 25 years. Second Fridays will be on Research Ethics. And this is a new series. We're starting working with the IRBs throughout the Harvard system. Third Fridays will be the Policy and Ethics Seminar series that Aaron will be running. And on fourth Fridays, we have a series on Organizational Ethics. So it's all a way of saying, I would hope, that beginning in the fall, you'll just mark out these times on Fridays and come and get a free lunch and participate in our discussion about topics on ethics. So with that, Aaron, I'll turn it over to you to introduce the others and the panel today. So just as a follow-up to that, if you registered for today, we have your email address so that we can keep you informed about the topics and the speakers that we're going to have starting in September. But if you're here and you didn't register, please come see me so I can get your email address so that we can make sure we directly alert you to your interest in this. But thanks for coming today. So our conversation today is about building a biomedical information commons. And I just want to present a few sort of introductory slides to kind of wet your appetite for our three distinguished guests who will talk in more detail about it. So this shows of the over 15,000 patents granted in the United States that contain composition of matter claims to simple nucleic acid molecules. About two-thirds are assigned to private sector or business assignees. And about a quarter on the top there have public sector assignees. And then there is a smaller number in the yellow, I'm sorry, in the green, where it's sort of a combination of both. And as you can see in the two other figures being seen, that the numbers overall have been fairly consistent over the last couple of decades in terms of ownership of gene patents. And so our conversation today is going to be about ownership of genes and genetic material and gene patents and the effects that those can have on health care delivery and on research and this sort of idea of commons as a solution and what some of the parameters are of that solution. This just again to sort of bring home the point that these things have a substantial effect on, can have a substantial effect on patient care. This was a survey from about a decade ago of US lab directors performing DNA-based genetic tests. And as you can see on the first line is they asked the lab directors, has the effects of gene patents made testing more or less accessible to patients and the sort of 90% of them indicated that it had a negative effect including reducing access to testing. 96% of them indicated that it increased the cost of the test. And so these things have real and direct implications on patient care. So there's a spectrum of access that different bodies have set up in terms of the extent to which they share the patent, the patents on genes and their genetic material and provide a basis for a common. So on one end of the spectrum is a fully closed access model. And one example of this that we'll talk more about today is the myriad genetics case. So myriad was organized around patents related to the BRCA, one and two genetic testing for breast cancer risk. And myriad sought patent protection for both methods of detecting and comparing the DNA sequence variations for the isolated DNA molecules and then commercialized the BRCA analysis test. And now claims that it has a dozens of patents with 500 claims relating to the field itself. And there have been objections raised by public health advocates about myriad's restrictions on uses of its genes in the context of research and its refusal to allow independent confirmatory testing of ambiguous test results. And over the last 20 years, since myriad first grabbed these patents, it's built up its own proprietary library of BRCA mutations. And we can talk about some of the implications of this library of mutations being proprietary. But for example, one study found that the BRCA analysis test was lacking among 12% of women with a certain high risk, breast cancer genes that were not included in the test, but there was no way of trying to develop tests around it. And this led to some legal challenges to their patents on the gene itself that were ended up being overturned by the Supreme Court. Moving further along on the spectrum is the database of genotypes and phenotypes, which was a sort of a mildly open access model. It was developed to archive and distribute data on the results of studies that have investigated the interaction of the genotype and phenotype in humans. So the National Center for Biotechnology Information created the DB GAP public repository for individual level phenotype exposure, genotype and sequence data, and associations between them. And there's a public interface that allows users to browse and search the metadata, phenotype, variable summary information, documentation. And then if you're a researcher and you want more individual level data, then you have to apply through a relatively, what I perceive as being a relatively rigorous process to receive authorization for access. And then on the sort of other end of the spectrum is a sort of very open access model. So the Personal Genome Project, or PGP, was founded in 2005, and it's dedicated to creating a public genome of health and trait data to accelerate research into human health and biology by collecting and sharing genomes and other data from people who have chosen to donate their data to the public domain. So privacy, confidentiality, anonymity are basically impossible to guarantee, and they sort of work directly with patients so that patients understand that what the implications are of their sharing, their data, with virtually anyone who wants to see it. But on the other hand, the goal of trying to make a valuable and lasting contribution to science, and then there are some hybrid models out there, and I think we're also gonna talk a little bit about the 23andMe model as well. So the statement of issue for today's panel is that access to large genetic databases can advance the diagnosis and management of genetic diseases. Some large databases of variants are held by proprietary companies that control the access to the data. Public databases are racing to catch up, but have been criticized as unreliable, expensive, and vulnerable to funding cuts. Meanwhile, in addition, the infrastructure for global sharing is limited, and efforts to build a global genomic commons are relatively new, and so in this forum, we'll explore the pros and cons of open and proprietary strategies to managing genetic information. And so now I wanted to introduce our three distinguished panelists, and we're very lucky to have them with us today. So our first, we're gonna hear from Robert Green, who is a medical geneticist and physician scientist who directs the G2P research program in translational genomics and health outcomes in the division of genetics at Brigham Women's Hospital, the Broad Institute in Harvard Medical School. Dr. Green currently leads and co-leads the first randomized trials to explore the implementation of medical sequencing in adults and newborns. He co-chairs the steering committee of the NIH Consortium on Clinical Sequencing, exploratory research in the steering committee in the NIH Consortium in newborn sequencing in genomic medicine and public health. He's a member of the Institute of Medicine Committee on Evidence-Based for Genetic Testing and collaborates on research studies with Illumina, 23andMe, Pathway, and Google. He's board certified in Neurology and Medical Genetics and is an associate director of research of Partners for Partners Healthcare Personalized Medicine and is a board member of the Council of Responsible Genetics. Our second speaker will be Heidi Williams. Heidi received her PhD in economics from Harvard University, has been affiliated with MIT since 2011, where she's currently the class of 1957 career development assistant professor in the Department of Economics. She's also a faculty research fellow at the National Bureau of Economic Research. Her research agenda focuses on investigating the causes and consequences of technological change in healthcare markets. Last year, she was awarded a MacArthur Foundation Fellowship in part for uncovering how the timing and nature of intellectual property restrictions affect subsequent innovation. And then finally, we have Bob Cook-Degan, who's trained as a physician and is a research professor currently in the Stanford School of Public Policy at Duke with secondary appointments in internal medicine and biology. He was the founding director for the Genome Ethics Law and Policy in Duke's Institute for Genome Sciences and Policy and is the author of the Gene Wars, Science Politics and the Human Genome. His area of expertise includes genomics and intellectual property, the history of genomics, global health science and health policy. And I would say that Bob was also my director in my very first job in health policy when I was in law school and I did a summer at the Institute of Medicine where he worked at the time and obviously as you'll see, he had a substantial influence on my career and on my sartorial choices. So without further ado, I wanna introduce Dr. Green. Thank you. Well, that was a lovely setting of the stage. I think Bob Cook-Degan is like Kevin Bacon. We're all just a few degrees away from each other through him. As you'll hear later, he had a big influence on me and getting me into this area as well. Let's see, do we bring up the slides for, just maybe see if it's, thank you very much. So I'm delighted to be here because I don't know much about this topic and this is a great multidisciplinary opportunity for me to learn more. My charge is to lay some of the groundwork about where we are and where we're going in genomic medicine and perhaps lay some framework for the discussion that's gonna be followed. It is important that you know my disclosures. So here they are in terms of my compensated speaking and advisory and my uncompensated research collaborations with two direct-to-consumer genetic testing companies. Now we're in something of a wild west here with genomics at the moment. You know, it's very exciting. There's new land to explore and there's a lot of gunfighting going on and there's a fair amount of snake oil being sold about what exactly genomics can do. I think it's fair to say that it's a little bit exaggerated in times and we're all prospecting for that scientific gold. So this is sort of the framework I think we find ourselves in and I think keeping this in mind as we go forward. Now I'm gonna lay out in this slide a few of the examples of how genomic medicine is gonna be used in the future. So one of the ways people have talked about even Francis Collins starts his talks off this way is the notion that every baby born will be sequenced and have other omics evaluated in order to lay the groundwork for their health for the rest of their life. We're also of course already in an era where genomics is being used for the diagnosis of rare conditions. That's happening every day. It would be happening more if we only had better insurance reimbursement for this. One of the low hanging fruit areas is preconception screening. Why is it today that we are leaving to chance when two people have a baby? Whether or not they're being preconception tested for the panoply of recessive carrier traits that they might have. I think this is actually in my own view one of the very, very potent areas for growth. And of course we've seen an extraordinary explosion in preconception screening, particularly with NIPT over the last few years. We are already and have been for quite some time doing pre-symptomatic testing in families with a genetic diagnosis, classic the classic dominant and recessive Mendelian conditions which have traditionally made up the bulk of what a medical geneticist and genetic counselor does. We are dipping our toe in the area of predispositional and population screening. Everybody in this room within a couple of years is gonna be offered some special opportunity to be sequenced and you'll have to decide whether this is an opportunity you wanna take advantage of. Of course, we're not gonna talk a lot about it today or at least I'm not, but there is the very fertile area of targeted therapies for cancer and for sort of taking a step back for all of pharmacogenomics. Do we have markers that can tell if we're gonna have an adverse effect to a medication before we actually get started on that medication? And of course, new gene variant and treatment discoveries. When you talk to a certain class of genomic scientists, they really poo poo a lot of the categories that I've laid out on the screen and they basically say that the registries and that we are collecting are not there for current medical needs but are there to discover genes and variants that are associated with common and rare diseases and that will lead us to treatments in the future. Now, there are a lot of cross currents in the implementation of genomic medicine that are sort of fascinating and sometimes contradictory. So one cross current is the tradition in genomics of being very concerned about the impact, the negative impact of genomic information. You could say we're the only specialty to be afraid of our own technology. And yet there is good reason for this when you have been a genetic counselor or a medical geneticist, you sat across the table or the bed from someone on whom you have disclosed the knowledge that they are carrying a variant for Huntington disease or BRCA or that their baby has a particularly impactful genetic condition. These are, can be devastating pieces of information. So while we don't want to be cavalier about this, I think I've often actually been concerned that some of this thinking has held us back from the appropriate implementation of genomic medicine. Another cross current is of course direct to consumer everything. It's not really just direct to consumer genomics. It's everything in our society that's moving out toward the more empowered self and in health that's gonna mean the more quantified self. That's x-rays, laboratory tests, and yes, genomics. It can be in the arena of genetic testing along arrays as it is right now with 23andMe and other direct to consumer genetic testing, but very soon it's gonna be in the domain of direct to consumer sequencing, direct to consumer microbiotics, direct to consumer gene expression, and so forth and so on. The confluence of money and healthcare is something I think some of the other speakers are gonna get into with far more expertise than I, but it is an extraordinary intersection of enthusiasm around venture capital, around the start of new companies, around the resistance to reimbursement from insurance companies. Almost at every level, we like to stay focused on the scientific aspects of things, but the economic aspects are driving so much of this. And there is of course both within the hype and within the hope. There is the notion that we are on the cusp of a true revolution, that somehow, someday, we're gonna change the kind of reactive medicine that we're practicing into a more proactive medicine where we are preventing disease rather than just diagnosing it and chasing it and trying to keep up with it. So all these cross-procurents are at work. And I guess the question is, for me as a translational genetic scientist is how can we gather data that will inform these questions and help us pick our path through this thicket of often conflicting themes? So I'm gonna tell you very quickly about some of the research studies we've been conducting. As far back as 2000, we used the paradigm of disclosing apoE for Alzheimer's disease as a paradigm for genetic disclosure. Now let me pause here and ask how many of you, well, let's take the big picture first. How many of you have had your own genome sequenced? Your entire genome sequenced? Okay, a couple of you there. How many of you have taken advantage of direct-to-consumer testing or some other way of had genotyping of some sort? How many of you've learned through that or another mechanism, your apoE genotype for your risk of Alzheimer's disease, a couple of you? And how many people who've never done any of this would like to know your apoE, which gives you a probabilistic notion of your risk of developing Alzheimer's disease? Put your hands way up. That's great, because I have a phlebotomist in the back who will be greeting you as you leave the building. So this was the basic question we asked starting in, now 16 years ago, was how many people would want this and what would be their reaction to this fairly scary piece of information where there was then and still is not a preventive treatment to utilize if you learn this information. And I won't go deeply into it except to say that with the help of many collaborators over the years, including Bob Cooke-Degan, who was in on this from the beginning, we ran several randomized clinical trials where information, not a pill, was the focus of the trial. And we randomized people to learn their apoE in the initial foundational trial or not. And we learned that people who volunteered for this were amazingly well compensated when they learned their information. They were neither more anxious, more distressed, they didn't have more depression. We would see blips in levels of distress, but these would even out after the first six weeks or a few months. And we were able to demonstrate this in our flagship paper. And yet, they were happy they received the information and they actually tried to do things to change their fate. They changed their vitamins. They changed their exercise. We're not sure if this worked or not. But it sort of challenged the prevailing notion that nobody does anything with genetic information. And one thing that they clearly did was they reported they were going to purchase more long-term care insurance, which, of course, makes perfect sense. But then, as it does now, puts insurance companies quite on the defensive. Because, of course, if you know something that the insurance company doesn't know, you can game the system a little bit, purchase more insurance if you're at increased risk. And this starts what they call a death spiral of adverse selection, whereby you can actually, you meaning all of society, can actually put this entire insurance company out of business. So this is of great concern then when we publish this and continues to be of great concern to insurance companies. And I will say that we've done a lot through the years that I'll just point out some of our recent findings that it turns out that when you add an unanticipated pleiotropic risk information to that ApoE, for example, in this case, we just added the statement that your ApoE also tells you something about heart disease. Well, when you add that, the amount of distress and anxiety actually flips. People become less anxious, happier, and more satisfied when they get this information. Even if they learn they were at increased risk for Alzheimer's disease. It's like if you give them a second piece of information that gives them something to work on, like I can work on my diet or my exercise, it completely nullifies the distress of the first piece of information. So this was fascinating and we just recently published this in Annals of Internal Medicine. On a completely different note, we've been looking at the way direct to consumer companies actually calculate the risk that they give you and the risk that they leave you with. And we've been uncovering some really interesting things about the paradigms that the early direct to consumer companies, this is Navigenics, this is DecodeMe, all compared to 23andMe. And if they were perfectly concordant, they'd all lie along the diagonal line in each one of these graphs. And you can see that in fact they are discordant in terms of the risk profiles, excuse me, that they give you. So one of the things we believe we're doing is holding companies to a high standard in terms of the kind of ways in which they communicate risk information. We're also finding out some fascinating things about direct to consumer testing with what people do with when they get direct to consumer testing risk for different kinds of cancer. Turns out that how they act after they get that information depends on their perception of their risk. And their perception of their risk is driven by a lot of things, not just what's on the screen in front of them, but whether or not they have a family history of a particular type of cancer, whether or not they themselves, for example, smoke. So there's a lot of interesting features that help define how people's self perception of risk is changed by the genetic message that they receive. And one of the concerns economically that I thought I'd pull out is there's been a great concern about robbing the medical commons. The notion that if you get a bunch of genetic risk information, you're gonna go out and order a bunch of tests which is gonna use up society's pool of resources in terms of medicine in ways that really don't make sense and that you're essentially taking away from other more needy areas of societal benefit. Well, one of the papers that we just submitted suggests that if you're looking for direct to consumer followup among the customers, the only thing in the world, I know this is rather small, that predicts whether they go out and tap the healthcare system for screening after they get their own risk is not the message they receive. It's not whether they're at increased risk or decreased risk. It's not whether they're scared of the condition. It's not whether they have a family history of the condition. It is entirely dependent in a multifactorial analysis on whether they happen to go out and get screened in the year before they're direct to consumer testing. In other words, people who like to get tested use direct to consumer testing as another excuse to get tested. And what this suggests is that there's a subset of people who are out there using a lot of medical resources. But it does not necessarily suggest that across the board, personal genomics is going to trigger a lot of unnecessary genetic testing. And this is the first piece of evidence that I'm aware of that's really spoken to this very important criticism of personal genetics. I will just flash through a couple of slides from the reveal study in case you're interested in looking up any of these, they're all on our website. And we've tried to paint a coherent picture of what it means to get this piece of scary information, what you perceive, how your perception changes, and what you do with it. And by the same token, we've had a grant for a few years to study the customers of direct to personal, direct to consumer personal genetics. We've been publishing steadily on this about confidence, about the impact on what people do with it, about how well they actually understand their findings, about what are the reasons they're going in on doing this, how adoptees view this differently than people who are not adoptees, and about how primary care doctors respond to this information when their patients walk in the door with direct to consumer genetic testing. So obviously not time to go into all this today, but I will point you to that if you're interested. Now, one of the questions I think we've got to address here is clearly genomics is going to be used in the practice of medicine, I think sooner rather than later, but I think we all agree that it's gonna be used in the everyday practice of medicine. It's already being used, as I mentioned, in the diagnosis of rare conditions, and in that context has arisen the question of secondary findings. So if I were to sequence every single person in this room, statistically, one of you at least, would end up learning that you had a mutation for a fairly highly penetrant condition, usually a cancer predisposition condition, or predisposition to a redditary cardiac disease like cardiomyopathy. That, when you scale that up to a population basis, that's a lot of people learning unanticipated information, and that's been the problem and opportunity of what's been named incidental findings, or sometimes called secondary findings. So I'll just remind you that the ACMG put together a working group on this that I had the privilege of leading a couple years ago, and we came up with a fairly arbitrary list of 56 genes representing 24 conditions that we recommended should be returned to anyone who's getting sequenced for any reason. And this was something we can talk about later, but it has at least put a flag in the sand where people can react to it, often critically, but at least it is a place, a starting point for professionals who are now sequencing people in clinical practice, and for researchers who are trying to decide when you have a giant registry, when you have a giant biobank, and you sequence that giant registry or biobank, what, if anything, should be returned to the research participants who gave you their DNA? Well, utilizing genomics in the practice of medicine, we have engaged in another randomized clinical trial we call the MedSeq project. And in this, we are taking people with a genetic condition, in this case, cardiomyopathy, and people who are ostensibly healthy, and we are randomizing them to receive whole genome sequencing or standard of care. And I will tell you that one of the most surprising things we learned was how frightened people still are of insurance discrimination. We found that over 20% of people declined participation based simply upon their fear that if it was put in their medical records, their insurance companies could get hold of it and differentially charge them. A second critical problem within the MedSeq project, and really throughout the practice of genomic medicine, has been how do we filter the variants? Each of us has four to five million variants. How do we filter those to find the ones that are truly pathogenic? And this is really perhaps the problem where there is the most opportunity for informatic sharing. And the solution to interpretation has been put forth that we need to connect in federated databases the work of people who are curating these variants within their individual laboratories and institutions. And so of course, we're very proud that Heidi Rehm is one of the leaders of the ClinGen project, which is serving as I think the nation's current resource for collecting and curating these very different variants from many different sources. Heidi often puts up a slide of shame for the laboratories that have promised to commit resources and to share their data and the laboratories that have not. And that turns out to be very effective at national and international meetings. Now, you know, when you think about it, we've got to somehow scale genomics across all the practicing doctors in the universe. And one of the things we're trying out in the MedSeq project is a one page whole genome sequence report. I mean, think about that for a second. Three billion letters, four to five million variants, that entire complex translational interpretation pipeline. And we're gonna give it to your doctor in a single page summary. I'm really proud of this and we are testing it out in our MedSeq project. The other thing that's really surprising us in the MedSeq project is how many of us are walking around with a risk variant gene. You know, if you take the ACMG 56, there's only about 1% of us walking around with that. But if you take 4,600 disease associated genes, it turns out that close to 20% of us, based on this little sample, are walking around with a disease associated risk variant. Not a recessive carrier variant. We've almost all got those. But a dominant or compound heterozygote that actually connotes disease risk. Now, that's pretty amazing. And when you give that back to doctors as we're doing, you can see that they do start spending more time and they do start spending more money on the patients that come before them with this information. So there's no question that integrating genomics into the overall practice of medicine will cost more. The only question is, will the value that we glean out of that justify the increased cost? And finally, from my money, the place that's most ripe for shared informatics is the fundamental question in genomic medicine right now, which is related to the question of variant interpretation. And that is the global question of what is the penetrance of all these disease associated variants? So if I find a cardiomyopathy disease gene variant in you and you live to 100 without ever getting it, how is that different from finding it in you where you're gonna get it at age 24? What are the parameters? Why has that happened? And how do we get better at predicting what this is? And I'll show you some unpublished data that I'm very excited about where we took all the people we could get in the public domain who had been sequenced as part of the Framingham Heart Study and part of the Jackson Heart Study, and we looked at the phenotypes that had developed in them over 20 years in the Framingham Heart Study and over a much shorter period of time in the Jackson Heart Study, and we looked to see if those mutations were expressing themselves. And just as a starting point, we took the 56 ACMG mutations, I mean genes, and we saw that in all cases, there was a statistically significant increase in the clinical features associated with the respective genes. You were six times more likely here, you were close to five times more likely here, both were done prospectively independently and both are highly significant. So we're holding our breath as this gets reviewed and hoping that it's published soon, but it starts to speak to the question that if you wait long enough, particularly over 20 years, and you take a subset of genes that are supposed to have a high penitence in families where you've seen a strong family history, this may be useful as some sort of population screening as well. How useful it's gonna be, of course, depends on what kind of intervention we can do as well as simply what we're finding. I'll finish up by saying that we're also doing the baby seek project, which is doing a similar randomized trial in newborn babies. We've just started recruiting for that. I'll be able to tell you more about that next time I speak to you. In doing so, I've become very aware of the conflicting and often mythical forces exerted by the FDA and the IRBs. I'm very grateful to the NIH for supporting our work in this area. And for those of you who are interested in this, we are hiring genetic counselors and project managers and we are delighted to consider collaborations if this has sparked some interest in our data sets from you. Thanks very much. So we're not gonna take questions now. We'll just open it up for comments at the end. Thanks for the opportunity to present. So my background is I'm an economist. I'm not an expert in genetics, but what I tend to do in my work is basically try to get data to shed light on policy relevant questions in the area of science and medical innovation. And so when I think about this topic from the perspective of an economist, there's two questions of interest. The first is whether the prospect of private firms being able to hold proprietary rights over data sets encourages them, thanks to develop new data sets that otherwise wouldn't have existed. And so in the case of myriad, which is what Erin gave as an example of a very close proprietary data set. The question is essentially, would anyone have come in and developed the same kind of data that myriad has available in the absence of myriad having this proprietary right to their data set? So that's an empirical question and one that I don't know the answer to, as I'll tell you today. The second question of interest is, conditional on these genetic databases existing, how did the rules that govern the access to those databases shape the follow on scientific research and the follow on commercial product development that actually makes these databases have some benefit to patients? And so in the way that was laid out before, you can imagine you have this full proprietary model, you have a full open access model or you have something like a hybrid model. And essentially the second question is asking, once these databases exist, does the governance structure which has rights over who's allowed to access those data sets and under what circumstances shape what medical technologies actually come out of those data sets and benefit patients? And so in my view, the optimal policy would be to trade off those two factors if we think that there is a conflict between them. And so what I'm gonna talk about today is I have some empirical evidence on the second question and then I'll come back at the end to what I think we would wanna know in order to inform this full trade off. So to fix ideas, I'm gonna be talking about one example of proprietary database which was actually sort of more of a hybrid structure which was held by the firm Solera. So for people that remember when the human genome was sequenced, the human genome project which was the publicly funded effort existed in parallel with a private sequencing effort by the firm called Solera which was run by Craig Mentor. And so to fix ideas for what I was trying to do in this research project, suppose that Solera holds a gene as proprietary information. Some other person or firm in the economy comes up with an idea of how to develop a diagnostic test or a pharmaceutical drug that's based on that gene which Solera holds. And so the question is, do the rules and the governance structures on the Solera gene affect the likelihood that that diagnostic test or that pharmaceutical treatment will ever be developed into a product that consumers have access to? And so the context for how I collected data to try to study that empirically is that I basically traced out the history of the human genome sequencing effort both by the human genome project and by Solera to try to isolate what you would think of as a natural experiment. So something that could mimic a randomized control trial where some genes were always in the public domain and some genes were held with this proprietary database access set of rules that Solera had. And so to give you a sense for how I came up with structuring that study, it's just useful to review the timeline of what happened when Solera and the human genome project were trying to do their sequencing efforts. So the human genome project launched first around 1990. Solera was founded later in 1998. And both of them had the same goal, which is that they both wanted to sequence all of the genes in the human genome. So Solera began at sequencing in September of 1999. And then in 2001, the sequencing of the human genome was declared complete. And so what that means is that Solera and the human genome project both published a version of their sequence genome, Solera in Science, and the human genome project in Nature. There was a big press conference sort of announcing that the genome had been sequenced. In practice, neither of those efforts was complete as of that date. So that 2001 date was essentially a convenient date for a press conference between Tony Blair and President Clinton. And that was sort of a public face on saying the sequence genome has been completed. But so the fact that neither of them were complete means the following. So when the human genome project sequenced its data, they put all of its data in the public domain within 24 hours. That was under something called the Bermuda Rules. So it was a very aggressively open access stance, which was not without controversy. So some people were concerned that requiring all of the sequence data to be put in the public domain within 24 hours meant that there was not a lot of time for error checking, for example. And so when you think about sort of what does open access mean, I think there is sometimes a tension between quick access to raw data and error and quality checking of that data. And so that's just to say that this was sort of a very aggressive form of open access data, which was valued speed potentially over quality checking. What did Solera do? So Solera was essentially a private company that was trying to realize some return on its effort to sequence the genome. So the first thing it did is it tried to file for gene patents. So it filed a lot of patent applications, but at that time when most of their data was sequenced, it wasn't sufficient to just have a sequence gene in order to get a gene patent. You actually needed to know something about what that gene did. And so that was because there was a change around 2000 or 2001, where the US Patent and Trademark Office increased the utility requirement for gene patents so that you needed to be able to say something about what that gene did in the human body, at least speculatively, in order to get your patent application be granted a patent. So it turns out Solera didn't have any idea of what its genes did. It just had the sequence data that it had produced. And so almost all of their patent applications were not granted. So the Patent Office can't reject your patents, but basically none of them were granted. And so what Solera did is they hired a really smart lawyer and they essentially said, we're not gonna be able to get these gene patents. Can you structure us something that will be able to allow us to capture some return on our investment in the absence of being able to have patent protection? And so what this lawyer came up with is something that is sort of complicated to explain, but it's essentially a combination of a shrink wrap license, like you would get on a piece of software, and just some contract law tools which outline various restrictions on how people could use their data. So Solera published their draft genome in Science. As part of that deal for publishing in Science, they had to make their data openly available. And so the way that they did that is you could go to a website which was linked to on the Science Journal and you could basically look at the genome online. If you actually wanted to get access to the data that underlined their Science article, you needed to basically send in a request to the company saying, I would like to get access to your data and they would send you a data DVD free of charge with no restrictions on whether you could publish papers based on their data. And so in that sense, this was actually like perceived by many people as quite a lenient policy. So academics could get access to the data for free with no restrictions on their publication. And what you had to sign is an agreement that said, I understand that I can't redistribute this data. So that's similar to the software license where you're not supposed to buy Microsoft Office and then copy it and resell it to other people. And I understand that if I wanted to develop a commercial product using that data, I need to come back and negotiate a licensing agreement with Solera, okay? So there were restrictions both on redistribution and on commercial development, but not on academic publications. And so that was basically what Solera's intellectual property rates were. They didn't have patents, but they had this form of proprietary contracts that they used to try to make some money off of their databases. And so Solera decided at that point in 2001 when they published their data that it wasn't worth it to them to sort of finish sequencing the human genome. What they decided is they wanted to just move on to trying to develop new products based on the genes that they had sequenced and to realize the returns from people that would come back and negotiate these licensing agreements with them. And so the human genome project on the other hand wanted to complete a finished version of the genome that would be documented in its entirety. And so they kept going on their sequencing effort until they had sequenced all of the DNA in the human genome. And so what that means is essentially in 2001, there were some genes that had only been sequenced by the human genome project. Those were in the public domain. There were some that had been sequenced both by the human genome project and by Solera. And there were some that had only been sequenced by Solera, but that would get resequenced by the human genome project as the human genome project continued its efforts. And so what you have is this group of genes that were temporarily held with Solera's proprietary data rights, but which eventually got transferred into the public domain. And the reason why that happened is exactly because Solera didn't have a patent. So normally if you have a patent and someone redevelops the same idea, you can still exclude them from entering the market because that's the right that a patent gives you. But this sort of hybrid form of contract law and contractual obligations that Solera had wasn't robust to rediscovery. So as soon as the human genome project resequenced these genes that were held with Solera's intellectual property, those immediately were transferred into the public domain. And so as of 2003, all of Solera's genes were in the public domain. And that was actually completely expected. So you can get quotes from Craig Venture, the head of Solera in 2001, when they were selling their data to pharmaceutical firms and biotech companies and scientific labs saying, everyone knows our data is gonna be in the public domain in two years. They just don't wanna have to wait for it. So it's not that this was like a surprise to people that Solera's data was all in the public domain in two years, everyone knew that. And Solera was able to still have a somewhat successful business model selling their data during that time period. So that's sort of the institutional context for how I thought about trying to design a study to understand what was the impact of Solera's intellectual property on whether genes ended up getting used in scientific research and product development. And so in order to match that institutional context with some data, I just wanted to give you a sense of what kinds of data sets I put together to try to answer that question. And so this is just for one example which is a gene called Rax2. So if you use the NCBI databases or any of the NIH databases you'll notice that genes and RNA sequences all have the equivalent of like a social security number. So they have unique identifiers which are things that basically make it possible to link across databases so that you know this gene was studied in this scientific paper and was also the same gene that was used in this diagnostic test and was this gene, you know, when was it sequenced by the Human Genome Project? And so basically I structure my dataset around using those identification numbers to try to trace genes throughout different parts of this process. So under the Bermuda rules, when the Human Genome Project required that all of the data be put in the public domain within 24 hours, it turns out that you can get a record of when specific genes were uploaded to this open access database based on the history of the website. And so that was very useful for me because it basically gave me a record of when the Human Genome Project sequenced all of these genes. And so it turns out this RACS2 gene was sequenced by the Human Genome Project in 2001. So then I needed to know was this gene initially held with Solaris Intellectual Property or was it first sequenced by the Human Genome Project? So essentially what you'd want to know to know the answer to that is you'd want to take a version of Solaris Sequence Genome and compare it with the Human Genome Project Sequence Genome in 2001 and say was this gene only in Solaris data and not yet sequenced by the Human Genome Project? So that's actually for someone with like expertise or probably everyone in this room except for me, not that hard of a thing to do, like you would do a blast search and compare them and then you would know. So it turns out for an economist that's like not an easy thing to do. And so luckily for me, I wrote to some of the scientists that were at Solaris and I sort of said, oh, this is what I'd like to do. Do you have like an idea of how I might do that? And what they pointed me to is there was actually a paper that was published that did exactly that comparison because essentially Solaris scientists were interested in who won the race. So they wanted to know who sequenced more data and the appendix tables to this paper just detailed a list of here's the genes that were only in Solaris data and the genes that were only in the Human Genome Project data. So even as an economist, I can take their appendix tables and like put them into my data set. But just to say it's not like a huge scientific advance to be able to do that stuff, but for me it was not possible. So okay, and then I wanna look at all of the genes that were either sequenced by Solaris or sequenced by the Human Genome Project to try to ask how did they get used by scientists and how did they get used by either academics or commercial firms that were trying to develop genetic tests for consumers. And so this gene, it turns out, was studied in several publications. One of them was published in Human Molecular Genetics in 2004 and these papers were arguing that that gene had links to at least two different conditions, one is age-related macular degeneration and the second is a different vision condition called cone rod dystrophy. And so what I did is I tried to track both the scientific publications and the number of phenotypes that each gene was linked to as a measure of how much effort were scientists putting in to try to study these genes and figure out what kinds of phenotypes those genes were related to. And then I used a non-mandatory but widely used genetic testing registry called genetest.org, which basically you can just search these ID numbers and say, is there a genetic test being offered based on the racks to gene ID number? And it turns out that there was an age-related macular degeneration test available as of 2009. And so then I can say this gene had a test available based on that phenotype. So all of that data is just descriptively interesting and at least to me because it sort of gives a timeline of progression of what the important sequences of discovery were for this gene. So when was it sequenced? When was it studied by scientists? When did we learn something about it? And when was it actually used in a product that consumers had access to? And so given that data, you can do various tabulations to try to ask the question, did this proprietary form of intellectual property that Solarra had over its genes discourage follow on scientific research and follow on commercial investment relative to if those genes had been in the public domain with the other human genome project genes the entire time? And so the most basic cut of data that you can do for that is essentially all of Solarra's genes were sequenced in 2001. And what I did is I just tabbed what are the follow on innovation outcomes for Solarra's genes relative to genes that were sequenced by the human genome project in 2001. So if you think of sequencing as sort of the start of a new ability to do research on a gene, all of these genes were sequenced at the same time but one was held with Solarra's proprietary rights and one was in the public domain. And so then I wanna look over the subsequent nine or 10 years what happened to those genes. And so it turns out that Solarra's genes had many fewer publications. They were much less likely to be linked to a phenotype either one that was uncertain or one that was relatively certain as coded by a national institutes of health database. And Solarra's genes were also much less likely to be used in medical diagnostic tests relative to these genes that were in the public domain. And so that's a comparison that on its own is potentially not that informative and what's the concern? The concern is just we don't have a randomized experiment where some genes were Solarra's genes and some genes were the human genome project genes. And so in the one and a half hour version of this talk or the 87 page paper, which I'm happy to send you if you would like to read about it, essentially what I try to do is to say was there selection into which genes were Solarra's genes and which genes were the human genome project genes once you condition on some things like the year that the genes were sequenced. And it turns out that there is some evidence of selection that you can isolate comparisons in the data that look like they circumvent those selection concerns and you basically get the same bottom line as what you get from this basic comparison where it looks like Solarra's genes have about 20 to 30% lower scientific research and lower commercial development relative to comparable genes that were sequenced at the same time but were always in the public domain. And so just to give you some sense of the kinds of comparisons that I put together for that. So these are two sets of Solarra genes. All of Solarra's genes were sequenced in 2001. Some of them were res sequenced by the public sector in 2002 and others of them were res sequenced by the public sector in 2003. And so if you look at this graph, in 2001 both of those cohorts were held by Solarra. And so you'd wanna see that in 2001 when they didn't differ, they had the same number of publications on those genes. And in fact that number is quite similar across the two groups. Then you have some of those Solarra genes became public in 2002 and consistent with Solarra's proprietary rights interfering with scientific research, you see an uptick in the number of publications in those genes the year that they moved into the public domain. Similarly with the second cohort of Solarra genes that came into the public domain in 2003, you see an uptick in publications for those Solarra genes when they moved into the public domain in 2003. What's notable about this graph is eventually these two groups of genes converged in the rate at which scientists were studying those genes. So it looks essentially like when you're held with this proprietary form of database protection, you see an uptick in your research when you move to the public domain, but then eventually it looks like you get studied about the same amount as other genes. That's sort of a flow measure of research effort. This is trying to get at a different question which is what's the stock of knowledge that we've accumulated about Solarra's genes relative to similar genes that were held with this proprietary database protection for a shorter period of time? And so the outcome variable here is just is there a known but uncertain phenotype? So this is one measure of have we seen a connection between this gene and some phenotype or disease that we think it's related to? So again in 2001 both cohorts of genes were held with Solarra's intellectual property and you see a similar likelihood that those two genes, those two sets of genes were part of a known phenotype. In 2002 the first cohort went public and you actually see an uptick in the likelihood that there's a conjectured phenotype for those genes even in the first year that they're in the public domain. The most important thing to point out about this graph is that there's no catch up over time through the end of the data which is 2009 for this study. And so what that means is that the difference between these two cohorts of genes is that one was held with Solarra's intellectual property for one additional year. And it looks like through the end of the data that I had when I completed the study that resulted in a persistent difference in the likelihood that that gene had a known phenotype through the end of the data. And so one way of quantifying this is to say I can't reject it as if a gene being held with proprietary data rights is as if we just have a lost year of research that will never get caught up once that gene moves into the public domain. And what I would stress again is that this is not an incredibly closed model like the myriad model. This is a relatively open hybrid model where people did have access to the data and could publish without restrictions. But we're still seeing these very persistent effects of this form of proprietary database protection on subsequent follow on research. And so this is my last slide but what I wanna come back to is what I started with at the beginning which is to say this is not answering the question of whether proprietary database protection is good or bad because essentially we're not answering the question of was society better off because Solera came in and tried to do its sequencing effort for the human genome. So in this particular case study you may just think well we had a very good substitute to Solera sequencing the data which is that the human genome project was doing the same thing. And so would that data have been provided anyway? The answer is yes. And so here the question that you would need to think about is did Solera's entry spur the human genome project to complete its work more quickly? And sort of did we get all of the sequence data accessed to society more quickly than if Solera never would have entered? The human genome project is on record explicitly saying that they didn't speed up their effort. So you may think that in this case study we know that answer. But I think what's more of interest is just in general not just for this case study do the prospects of having proprietary rates over databases encourage firms to make investments that wouldn't happen otherwise. And that's something I don't know the answer to. I haven't sort of been able to come up with a way of using the existing institutions and the existing data to shed light on that question. But I do conjecture that there often is a trade-off where proprietary databases probably do provide some incentives for private firms to develop new databases and that there is often a trade-off because those proprietary databases, as in this example, can have negative effects on follow-on innovation. And so optimal policy essentially requires having empirical estimates of the magnitude of both of those effects so that you can think about how to trade them off in a way that results in the best benefit to society. But that's, like I said, not something that I have an overall answer on. It's more just to say, I think this is a very important topic and I think it's one where we need data and empirical research to try to inform how we think about optimal design of policies and institutions, okay? Thank you. Or am I supposed to sit down? Come on up guys, because I don't have any slides. I don't have any slides and I will talk from here and then we'll just very gently seg into a panel of discussion. The whole idea of creating an information commons that's relevant to the new technologies of genomics and actually it's not just genomics, it's also medical imaging and laboratory results of various kinds. Just, I want to put this in a bit of a context and set the stage for the subsequent discussion here. In the last two years, the President of the United States has made two announcements in his State of the Union address that bear directly on the discussion that we're having today. Last year he announced the formation of the Precision Medicine Initiative that you all have undoubtedly heard of, which is, it's partly a new infusion of enthusiasm and funding streams for taking advantage of the new technologies that are available and trying to translate the new knowledge that's being created out of genomics and other fields and incorporated into medical practice and take advantage of the precision that can be added by having information about what's going on within cells and at the level all the way down to human genes. So, and then this year, the announcement was about the so-called National Cancer Moonshot, which we can debate whether that was a smart framework for pitching this, but the idea is that one of the places, one of the places where the wave is breaking for incorporating genomic information into clinical practice is oncology. And that's partly because we've always cared a whole lot about cancer and there's always been a lot of, there have been many resources devoted to cancer. It's by far the dread disease, way more than even Alzheimer's disease, mental disorders or cardiovascular disease. That may or may not be an accurate perception on the part of the public, but it's real and it drives policy. So that's partly it, but it's also the fact that in fact, oncology is a field, is the place where understanding cellular biology and molecular biology has begun to have clinical impact so that instead of just trying to figure out which tissue was the source of this tumor that's growing somebody's body, we are actually now beginning to look at which genes are turned on in the cells that actually we know are cancerous in that patient's body and then directing therapy at the mutations that are identified by using the new genomic technologies. So the cancer moonshot is building on the fact that that's where the wave is breaking in genomics. These characteristics have, these two new programs have three characteristics that I think are distinctive. One is they rely on the new WizBang high tech stuff that we do in biology that we didn't used to be able to do. We can generate information about imaging and about genomic, individual genomic information in a way that we simply couldn't. The main reason being that the sequencing technologies have dropped by six orders of magnitude in the last decade and how expensive it is. So I've had several of you actually, I couldn't see in the back of the room how many of you raised your hand about having had your genome sequenced, but that would have been completely unthinkable when it cost hundreds of millions of dollars to do somebody's genome, which it did. When Heidi's slides began with the human genome project, it was a big deal, really big deal with capital BD to sequence a genome, a reference genome. And now we can do it in a matter of a week, sometimes even a day at a high throughput lab and at a cost of something on the order of, just to do the raw sequencing, $15,000 if you do the interpretation of it, maybe another two and a half to three times that cost. So what that means, we're doing a lot of it. And we're doing a lot of it in oncology because the treatments for oncology are really expensive and harrowing and it is worth trying to figure out what's going on biologically before you initiate the therapies. Therefore, the diagnostics are high value. So all of this stuff is, so it's high tech, whiz bang. It depends on transfer of information through the digital networks that have emerged. So think about it, when the human genome project started in 1990, there was no worldwide web. The way that people contributed data from their laboratories into GenBank and the DNA sequencing databases was usually they published a paper and the people at the databases would type the data from the papers into the databases by hand. And so the first genome sequencing projects were actually grad students looking at gels and typing things in by hand. And that has clearly changed. The digital technologies are every bit as important as the biological technologies for managing large amounts of DNA. So we've got these two distinctive features of digitally intensive, computationally intensive, and generating lots of data. And the third feature that I think is really interesting is that there is a serious premium being put on engaging the people who have concerns about the use of the data. So patient advocacy organizations, disease advocacy organizations are being taken very seriously and they're trying to incorporate that in these two initiatives into the process of planning them and deciding how to deploy resources and how to structure things. So now what does all that got to do with the medical information comments? Well, let me see if I can begin to tease out some of the policy questions that are gonna emerge in taking seriously the fact that we want these new technologies to actually make a difference in improving and lengthening the lives of the human beings on this planet In order to achieve the goals of any sort of precision medicine, the scale of the research has gone to a point where no institution can capture and exploit the data that they're going to be generating. That means that data have to be shared in order to capture the benefits of these new initiatives. And yet, if you think about it, the structures that we've got in place to make sure that we can do that are, I don't wanna say primitive because they aren't primitive, they're very sophisticated, but they were designed for the purposes for which they were built. And the purposes for which most of the data structures were built is to help scientists make new discoveries and publish papers and enhance their careers in a relatively constrained environment. But where is the data coming from now that we're generating about human genes and human biology? Most of it is not gonna show up in scientific publications because most genetic tests are being done on genes that have already been discovered. And the data are created in clinical laboratories or in high-scale research laboratories, but most of it goes into a collection place and it does not actually flow into any of the data structures that are available for public inspection. It stays, the most extreme model of this, of course, is the myriad model where, in fact, myriad had the proprietary rights to these two genes, BRCA-1 and BRCA-2. And from the period of around 1998 until the lawsuit was filed in 2009, and actually until the lawsuit was decided in June of 2013, pretty much all of us thought that the patents meant that myriad was the only place in the United States that you could get testing for these two genes. And the patents were being enforced relative to legacy. What does that mean? It means that everybody in the United States who was getting genetically tested, the samples were sent to myriad genetics and the patent rights were for the use and interpretation of the genes. But in fact, what that meant is all of the tests in the United States were driven to one purveyor, one service operation, one service monopoly. That meant that they saw all the new mutations that nobody else was seeing because they were doing all of the testing. And that is the origin of the database that was created at myriad. They've done two million tests at myriad, got the largest database in these two genes on the planet. And it's because of their patent monopoly that they've created now a data structure that is proprietary and it's a competitive advantage for them because nobody else has those data in order to interpret the variants that might be found. Now 98% of the time it doesn't matter because 98% of the time if you do a test you're gonna either find that a variant is well known and it's either well known to be not disease causing or it's well known to be disease causing. But two to 3% of the time you're gonna have a variant that we'd actually, either we haven't seen it before we don't know how to interpret it because we don't have enough information. Those are the so-called variants of unknown significance. And in that case, in those two or 3% of cases they actually have a big advantage because they've done two million tests and nobody else has. So that's an extreme example of the kind of situation that we find ourselves with. Now let's generalize that. Here are the two genes that have probably been studied about as thoroughly as any two genes in the human genome with the possible exception of cystic fibrosis and hemoglobin and a few others that have been thoroughly studied over the decades. If you add it all up across the whole globe there've probably been as many tests administered for BRCA mutations in the rest of the world as there have been at myriad genetics. But the difference is at myriad they actually, because they're monopoly they've built a really good database and they've kept track of all the information because it came, it flowed to them and they can actually make interpretations of those. For the whole rest of the world we have data structures that are if you go to the national health system in the UK it depends on which region in the UK you live in whether you get the test at all whether it's gonna be paid for. If it is done it'll go to a database that's based in the UK. If you go to Germany that those data will go to a database that's housed in Germany. If you go to Iceland you're gonna go either to Decode or one of the health system hospitals in Reykjavik. So what we've got is a pocket of databases all over the world that have information based on the populations that they've been doing testing on but no way of sharing that information about variants that have been discovered. And the only way that we have of collecting that until very recently has been basically the published literature. Now two years ago that began the change and a database was created at the National Library of Medicine at the National Center for Biotechnology Information called ClinVar that Robert alluded to in his talk. And the idea of ClinVar is to pool these data from all over and put them in a place where clinicians who are trying to make interpretations of these variants can actually use the best information that's available. The problem is most of the information that's being created in laboratories never gets there. So let me give you two examples of what's going on in the real world right now and why this medical information commons is actually much more an aspiration than it is an achievement at this point. I'm involved in a case that's actually based associated with a laboratory here in the Boston area but here's the situation. A child is born in 2006 and he's progressing fine for about four months and then he has a seizure the day after he gets a vaccine. And that's actually, there's a whole cluster of disorders that are provoked by getting fever when you're a very young infant and it's a severe myoclonic epilepsy of infancy. And it's associated with, there are actually many genes that can be associated with their channel defects. They're the proteins in the cell surface that allow ions, negative or positive ions to flow through the membrane are molecules that can be affected by the mutations and when they aren't functioning right they can lead to epilepsy. So and there is a cluster of these and there are different channels that can cause the same general phenomenon of getting seizures. Some of them are sodium channel, some of them are other kinds of ions and you need to know which flavor of the problem it is in order to figure out which treatment to apply. So this child got a seizure and a few months later they decided to examine whether this was caused by one of these mutations. A sample was sent off to the laboratory that had patent rights to that particular gene and the result was interpreted as a variant of unknown significance and sent back to the clinician with a recommendation that the parents should get tested. Unfortunately there had been one case reported in the literature, actually two publications but one case reported in two different publications of a child with that same mutation who had this horrible syndrome of infantile seizures. But to this day, it's now 2016, that case that was reported in 2006 is the only case in literature of finding this mutation in a person and the correlation with this disease. So what we have is a situation and the child had that mutation, did have the syndrome. The mother was never told that, she actually never saw that report until 2014. The child died at age two in January of 2008 and that's what has led to a lawsuit and I have no, I'm an unpaid consultant on this and I got pulled into the case because I'm really, really interested in the phenomenon that I'm describing for you all which is we've got a problem that this was a test done quite appropriately in a clinical laboratory but the information that was generated out of that clinical test, number one was compared to a database that's completely inadequate to the task of doing global interpretations of what these mutations mean and to this day, that child's death is not recorded in the medical literature and the correlation with that clinical syndrome is not reported in the clinical literature to corroborate the fact that it is in fact a disastrous mutation in this protein. So we have a serious problem of looping the information from clinical testing into the databases that are needed to interpret the results of the genetic tests that we're doing and we're gonna be finding that, so BRCA1 and BRCA2 face the same problem. Myriad every week is coming across mutations that it has never seen. These are genes that have 165,000 base pairs in them, BRCA1 and BRCA2 and theoretical calculations suggest that if you sampled all the people on the planet Earth you would find a mutation at almost every position in both of these genes and we know the clinical significance of only a small minority of those. Now almost all those mutations are gonna be really, really rare. But if you think about it as a system we need to be able to capture the information from all of the tests that are being done all over the world and pooled into some sort of centralized data structure so we can use that information. That is the central intuition of needing to construct a medical information commons. So I'll finish by just and just open it up for discussion. Just observe the complicated stuff that I've just talked through and the number of kinds of policy issues that are connected to what we have just been discussing here. Number one, does it involve clinical, does it involve scientific structures and how we discover things? Yes, it absolutely does. Does it relate to the incentives that those people who do research face in their own careers? Absolutely, right? Why hasn't the information about this child who had a mutation? Why isn't that shown up in the clinical literature? Well, because you can't publish it because we already know that that gene is associated with the clinical syndrome. You're not gonna get a publication based on finding 15 cases that say something that we already know. So our incentive structure is not to capture that. We're not gonna capture it through the medical literature or through grants. We have to capture it by having a system that captures the information and makes sure that it gets channeled into public databases. We don't have any incentives for doing that. So, and that's irrespective of the intricacies of this field that is now growing up. Think about it. When the sequencing phase of the Human Genome Project began in 1996, Heidi alluded to this. The rule for sharing data at the end of 24 hours, this radical open science policy of sharing data every 24 hours was put in place in part because everybody wanted this to be an open science initiative. And the leaders of the project, particularly the people who had worked on the nematode worm, which became the scientific structure that became the sociological structure that was used to drive the Human Genome Project. As opposed to, for example, the field that I grew up in, which was cancer genetics and human genetics of Alzheimer's disease. If we had left it that the human geneticists of Alzheimer's disease, we'd still be waiting for the publications to come out. There was a very strong norm of not sharing data in human genetics because you would construct a pedigree and you would mine it for the rest of your career. And so the norm of sharing data came from a community of folks that had gotten used to spending a lot of money to create maps and sequences and sharing it with labs all over the world as just a public works project. There was an element of that in the Bermuda decision, but it was also a question of, okay, so we're gonna sequence the genome. Somebody has to do chromosome one. Somebody has to do chromosome two. And there needed to be a way to allocate the work. And in order to allocate the work, people had to report what data they were actually doing, what they were actually doing in their labs and also put it out there in the public so that everybody would know that they were playing by the rules and that the sequence they were producing were accurate. Otherwise everybody's gonna say, oh, I wanna do all of chromosome 21, I own it. And then they wouldn't necessarily produce. So the open sharing stuff was a combination of the spiritual idea of open science and also some very practical considerations allocating the work and making sure that it was done with high quality standards. And the only way to do that was to make sure that everybody could see the data. In 1996, when that happened, what were the structures that were in place? Well, for 10 years, starting in 1984, the places that the data would be deposited, GenBank, the DNA database of Japan and the European database, EMBL base, had already got a decade of sharing data daily between themselves. So the data structures were already there to share the data. And there was really only one company that mattered. And it was the company that was manufacturing the instruments and it was feeding both sides of the arms race. Solera was a spin out of the company that produced the sequencing instrument that was used by both the public genome project and by the Solera team that was sequencing the genome at the same time. That was really the only company that really mattered at that time. There were spin out companies for finding genes and stuff like that, half dozen of them at the time. But in fact, the only companies that was deeply ingrained in the science was the company that produced the instruments supplied by the systems. Fast forward to 2016, what's the difference between the norms of sharing and all that in Bermuda in 1996 compared to what's changed today? Well, number one, in 1996, the United States and the United Kingdom paid for 90% of the human genome project. And what NIH, the Department of Energy and the Wellcome Trust said became policy de facto. And you could get 50 people in a room to decide a set of rules about how they're gonna share data and you would have all the people that were gonna be contributing to the data represented in that room. Now, we have literally hundreds of companies, some of them are software companies, some of them are instrument companies, some of them are DNA sequencing service companies. We've got a half dozen varieties of companies and hundreds of them pursuing different lines of research. It's a way more complicated commercial landscape that's out there. More countries are involved. The Broad Institute is probably the largest genome institute on the planet. But what used to be the Beijing Genome Institute is now in Shenzhen in China is probably number two in its sequencing capacity and may even be number one, I don't know. So things have gotten much more global, much more international. We've got national genome projects growing up in almost every country at various stages of repair. It's a much more international data sharing problem that we face. So we've got commercial stuff and we've got international stuff and we've got a huge problem in that the data are now flowing not just from a handful of research labs, but most of the data that's gonna be generated about human genomic variation is gonna be flowing through labs that are actually doing their work either for consumer genomics and people interested in their ancestry or for some other reason, or more commonly, it's gonna be connected to clinical sequencing that's going to a laboratory that is reporting its results back to a doctor. And we're back to this problem that we don't have the loops of sharing information that will allow us to interpret genomic variation over time because we haven't built the infrastructure and the sharing norms that are needed to make sure that we can interpret that information over time. We have not built a learning system. That is the central intuition of the report that came out in 2011 from the National Research Council, the report that became the template for President Obama's announcement last year. That's probably enough to get started with a discussion and open it up for questions. Yeah. So I think we should do that actually. We've got about five minutes left in our scheduled time although I'm happy to also stay over a few extra minutes if there's substantial discussion but I just wanted to see if anybody wanted to bring up a question and maybe I'll bring up Michael. Thank you. I was wondering, you mentioned about incentives to share things and I know with clinicaltrials.gov and the under-reporting going on there, that's another closely related area where do there need to be more incentives? So I would be interested in hearing more about that. Why don't we pick up a couple more questions we can kind of collectively answer and that'll be a little more efficient I think. Yeah, I had two questions. First is what you think the likelihood of reconciling national policies on data sharing. I work at the Broad and we're finding it even harder now to get European data into D.B. Gap. So what the chances are that there'll be progress across international lines on that front. And then also what you think about the notion of patient requesting data that's been clinically sequenced through the CLIA laws, requesting their own data and then making that data available via a public database. Is that something that sounds feasible? We've heard about this before and I just don't know what the likelihood of kind of a patient-driven effort to get that data out of clinical private sequencing labs and into the public domain if that's something you see as likely or feasible. Okay, and then we'll do one more and then we'll get some responses. My question is why are we incentivizing genetic discoveries by giving rights to exclude others from using the findings? There are other ways to incentivize economic activities, give them money, give them intellectual property rights over things that matter to us less, things of that sort. So I'll take a stab at those and we'll just open it up. So on the international stuff, I think it's gonna be really complicated because I mentioned that the Bermuda rules were imposed, well, they were imposed by the US and the UK because the leaders of those three funding institutions basically agreed it was good policy. It actually, even at the time, conflicted with national policy in Germany and Japan and they had to write some nasty letters saying, you know what, you don't get to call yourself part of the Human Genome Project unless you play by our rules. What was going on in Germany and Japan is the companies in those countries were supposed to get privileged access to the data that were generated in that country. And I think we're beginning to see biotech is associated quite strongly with genomics. And every nation is developing a policy structure for capturing the value of genomics under the rubric of biotechnology. And in addition, because of the international treaty regimes, we're creating obligations and many countries have now passed laws about transferring samples or information across national borders that are gonna make this way more complicated than it would have been in 1996. So I think it is gonna be a real challenge. To the question about are there other incentives there absolutely are. Most of the incentives for discovery are, if you ask scientists, the premium is actually on reputation and being first and all that. The fact is though, I think we've got a relatively unsophisticated way of thinking about intellectual property and patents. And my conclusion after studying this stuff for about 12 years is the debate has tended to center on one question and it is an important question which is what sort of thing is patentable at all? Is patenting a good idea or not? And my sense after studying this for about 12 years is that's actually not the most important question most of the time. The most important question is much more, what do you do with the rights that you've got? And how exactly do you decide in the legal system to grant those exclusive rights and how broad should those rights reach? And that's the debate that I think, and let me explain why I think that's true. I think patents actually do serve a social end, that's why we have them, in that they induce private investment on top of a layer of public investment that we can put into R&D. They do that in a very special context where it requires extra investment to develop the thing that you've got into something that's actually gonna turn into a commercial product or service. It's an inducement to additional private investment and it becomes the way of preventing free riders from benefiting from your expenditures, your research expenditures in the private context. Sometimes that's gonna prevail even in diagnostics and certainly in therapeutics and I think that patents are one way to do that, that exclusivity is another way, but they're all sorts of tools that we have for creating incentives. But what I don't think we have is a very robust theory of how and in what conditions you should grant rights of what breadth and what is the right tool of what form of intellectual property is most appropriate for this kind of situation. It's not very nuanced, it's not very granular and it's kind of one, it's a very broad brush that we're painting when we're talking about patent rights and I think that's actually the conceptual flaw in our system that those of us who study this stuff need to be doing most of our hard work to do. And Heidi's work is actually a beautiful example of that kind of nuanced analysis I think. Can I just follow up and ask you and Heidi, what do you think about the Genomics England model? Which as I understand it, they have created teams of academicians which they have pre-vetted and those teams can get sort of unfettered, unpaid access to the data and then if you are a company, you pay a much more robust fee. Is that kind of hybrid model a good model for the future? That's pretty close to what Slara model was and it's very useful for certain contexts. But every set of eyes that you shut out of the data, you're paying a price for future exploitation. So this is what, I don't think we have the theories that allow us to make these trade-offs in a nuanced way. That's why I love Heidi, the fact that you're trying to examine the questions that help us think about what the trade-offs are. Cause those trade-offs are very real. But yeah, there are many, many models and I actually think we're gonna kind of stumble our way in, we're gonna have lots of models out there in the real world that are gonna be tried. So will PMI be wide open? No, there's no way that that's gonna happen. PMI is gonna be a hybrid of 45 zillion kinds of ways of sharing data and different levels of consortia and all that. We do have this aspirational model of creating a medical information commons that truly is open and on which everybody can draw and there will be an element of that and my own bias is the more of that that we create, the more productive and fast progress will be. But there's just no way that this is all gonna be in that space. So- Because the culture is in the constraint. Because the culture and also because the data are about human beings and we have real, not just virtual, we have very real concerns about privacy and contractual obligations, not just informed consent, but things that we've agreed to that we have to abide by further down the line. So it's not, it can't be open. It can't be completely open. There are just too many, these data are about real human beings and we're just not gonna have something. The Bermuda principle was, the Bermuda rules worked in part because the data were not supposed to be identifiable and we were creating a reference sequence and if they hadn't been, in fact, there was a faithful choice made in the mid 1990s to not sequence, not make the reference sequence a particular human being who had actually answered an ad for the group that was studying chromosome 19 for the Department of Energy. We would have known exactly who the sequence was from if we had used the clones that were originally produced for the mapping efforts and they hit a reset button saying, ooh, ooh, ooh, we don't wanna do that. We would have had another Henrietta Lax situation where we would have known the very person on whom the reference genome was sequenced unless we had actually taken a policy intervention. That was Francis Collins and a bunch of people saying, oops, we better not go there and realizing that error. So the privacy considerations are very real and the informed consent issues are very real. Yeah, I'll just follow up on the first question about incentives for sharing. I think my read of the historical evidence is that compliance with requests to share data is gonna be quite low in the absence of requirements by regulators or incentives for them to actually do that in practice. And so how does that usually happen? It can happen through NIH funding being conditional on submitting data to certain repositories. It can happen through firms that wanna sell their projects to the consumer, going through FDA regulations and the FDA having requirements at the point of regulating access to new diagnostic tests. So I think I'm not advocating for those to be the right thing to do, but I think just based on the historical record, voluntary sharing oftentimes just does not result in the level of compliance that we would need. Yeah, so the fundamental problem is sharing costs, cost, time, and energy and all that. And that's the biggest reason that it doesn't happen. A lot of, there are actually a lot of tools for encouraging sharing once you get the clinical sequencing. I think for scientific purposes, the sharing norms, if we actually abided by Mertonian science and all the journals enforce these rules that already exist that are basically, if you're publishing something and your conclusions are based on the data, you have to share the data sufficient for us to replicate what you've said. Now that's abided by only spotally, but if we actually took that seriously, that's a solution to the scientific problem of constructing Mertonian science. When it comes to clinical stuff, there actually are several levers. Payers could say we're not gonna pay for this test unless we can build a system that over time we know that that test can be independently verified. You can't tell us you got a proprietary database. Trust us, we're giving you the right answer. You could say we're not gonna pay for the test unless it can be clinically validated. You could accredit the laboratories and say you only get accredited if you're sharing data in such a way that your results can be verified by other laboratories. You can certify them under CLIA to say this is a condition of getting CLIA certification so you can get paid by Medicare and Medicaid. And you could make it a regulatory condition. The FDA is thinking about this. I don't think they know where they're gonna come out on it, but the fact is FDA is gonna be approving all sorts of genetic tests that are measurements of genetic variation. They need to be able to say they can't approve one test at a time for all 22,000 genes in the genome. They're gonna have to have a regulatory grade data structure out there that they can rely on that's used for interpretation of those variants. And one of the things that they could do as a condition of regulation is to say your data have to be, you either show us the data, period at FDA, or you have to transmit your data into a public database in such a way that we can be sure that it's being curated and validated under the eyes of many other experts. So those are many levers that are out there that haven't been pulled yet. It appears almost that the FDA is trying to sort of crawl their way in that direction with regard to ClinVar. So for example, in the baby seek project, they decided to practice regulating on us as an IDE, an Investigational Device Exemption which had never been done before for any set of clinical sequencing projects. And one of the things they asked us was how are you going to interpret your variants? And what they wanted was a structure. They wanted some sort of resource in place to guide the interpretation of our variants. So we were able to actually point at certain point to ClinVar and to the standards for curation and agreement on curation in ClinVar as a principled umbrella under which they were able to then go forward and accept that component of our IDE submission. So I think that aligns very nicely with what you're suggesting as one of the mechanisms for incentivizing sharing. But I think the thing to observe about this, this is all work in progress. And these are norms that have not been constructed or even very much articulated. And the infrastructure for doing all this stuff simply does not exist. And it's very uncomfortable to be in the middle of it when you're trying to run a clinical research study from a scientific perspective. I mean, it essentially paralyzed our effort on baby seek for close to a year and pushed us into a regulatory region we had no experience or expertise in whatsoever. So it was very, very uncomfortable for us, for our institution, for NHGRI, the child institute, for pretty much everybody involved. And as you said, nobody funded us to struggle with that component. So actually, I wanted to ask Heidi also the alternatives for setting up the very insightful question from the back about exclusive rights don't have to be the only way that we set up incentives for making discoveries and translating them into practice. As an economist, how can you help us think about the alternative ways other than this classic patent system of creating exclusive rights for an invention that meets certain criteria? I mean, I think what the historical record suggested that firms will use the existing set of incentives and regulations to the best of their advantage to get a private return on their investment. And so I think the Solera example that I talked about is actually just a really nice case of that where they tried to get patents, their patents were rejected, and then they sort of pulled together like whatever other levers were available to try to get a profitable return on their investment. And so I think there are many, many alternatives to the patent system and many alternatives to awarding firms exclusivity. In practice, it seems like the ones that firms default to when we limit their ability to patent may not result in the social gains that we would want. And so I think this comes back to what about what Dean Lohan was talking about with, are things patentable? So a lot of the recent Supreme Court cases around gene patents have declared that certain kinds of genes should not be patentable, certain kind of genetic data should not be patentable. And I think what the Supreme Court has in mind when they say that is that this data is gonna be in the public domain if it's not patentable. And I think that's not the right counterfactual. The right counterfactual is if firms can't get patents, how are they going to try to protect their investments in the absence of patents? And in the Solera case, it looks like they did something which based on some unpublished research that I have looks worse than if we would have just let them get patents in the first place. And so it looks like patents actually have less of a deterrent effect on following innovation than does this package of proprietary contractual rights that Solera used in the absence of being able to get patents. And so I think we just need much more of a nuanced empirical understanding of what are the set of options that are available to firms that they'll use if we sort of put limitations on them being able to have patent projection or have sort of other forms of monopoly rights. And it's just not clear a priori that we're gonna be better off as a society by limiting firms potential of patent. It could be that in other contexts we are but it's just, it's not obvious I think in the absence of some data on that, so. So maybe we'll take one more set of questions and then close up, does anybody else wanna ask any? Thank you. I was wondering, none of you had mentioned the WHO as a sort of arbiter of a database or having any role in discussing any of this process. I know that with clinicaltrials.gov it ultimately just ended up being an NLM effort. And you had mentioned that the NLM is now potentially going to be our primary repository. But I was just wondering if you had any thoughts on the role of the WHO or other more international collaborations. Yeah, it's funny, I was sitting at WHO working there this summer that the press conference announcing the Human Genome Project Completion, the kind of the pseudo deadline that Heidi alluded to in her thing that I was working there at the time. I think WHO is notorious, is conspicuous for its absence in a lot of these. WHO did do a report on genomics and world health in 2000 and it's a very good report. It's quite skeptical about patents but they were trying to play a role but the fact is WHO is thoroughly under resourced international organization. And if I were identifying a flaw in our system I would say the number one flaw is the lack of infrastructure at the national level. We just haven't developed the structures that will capture the data because we are really, really good at pursuing innovative science and peer reviewed R01 investigator research. We suck at developing, excuse me for the term but I mean we suck at building infrastructure that's gonna be robust and long term with a few singular exceptions like think of the world without index medicus and now PubMed and that's the world that we're sailing into with genomics is that we don't have that stuff. And maybe the structures will build out but observe ClinVar is funded by a single national institution, the National Institutes of Health that in and of itself doesn't particularly like supporting infrastructure. And I just don't feel, it doesn't feel to me like that's a stable long term solution and I don't think people in China are gonna be completely happy or in Korea or Japan are gonna be completely happy saying all the information about human genomic variation is gonna end up in ClinVar which is owned and operated by the federal government which Edward Snowden told us maybe isn't so trustworthy. Through a grant mechanism, by the way. Yeah. Through a competitive grant mechanism. So I think we've got some structural problems. It would be lovely, it would be totally lovely if we had robust structures at the international level to pull this stuff together. The best thing that we have in this field right now I think is the Global Alliance for Genomics and Health and they're working really, really hard to pull this stuff together but it's on a shoestring and the resource almost all of the solid resources are either in industry or they're in national funding structures of one way or in one way or another and that's the way science is organized and I just don't see it would be lovely if WHO had enough money to actually do its job but they don't now and I don't see them filling the gap here when they still have to deal with Ebola and Zika and all the other stuff that is frontline for them. All right. Well thank you very much. On that very positive note. Yeah, really. Go have fun.