 Hello. My name is Sue Pepin. I serve as the director of health and clinical partnerships at Arizona State University. I want to welcome you all to our third of four events in our biomedical innovation series. Today's topic is bridging the gap between data science and medicine. I want to thank the Arizona Biomedical Research Center for sponsoring the series and Arizona State University Knowledge Enterprise for the work they do in fostering innovation through research and discovery. The structure today will be about 40 minutes of a fireside chat starting out with some opening comments with our speakers, followed by, you know, 20, 15 minutes of questions and answers and discussion. Please, if you have questions, which if you would write them throughout the webinar in the question chat box and we will try to get to them. So it is my great pleasure to introduce two real leaders in this space of data science and health. Dr. Melissa Handel is the director of the Center for Data to Health at Oregon Health and Science University and the director of translational data science at Oregon State University. Dr. Handel's vision is to weave together healthcare systems, basic science research and patient generated data through the development of data integration technologies and innovative data capture strategies. She currently does integrative research in environmental health science, public health and bioinformatics. She also plays a leadership role in informatics consortia dedicated to data sharing and semantic engineering. Dr. Handel's research has focused on integration of genotype phenotype data to improve rare disease diagnosis and mechanism discovery. She received her PhD in neuroscience from the University of Wisconsin Madison with postdoctoral fellowships at the University of Oregon and Oregon State University. As an undergraduate, she studied chemistry at Reed College. Dr. Anthony Philopakis is the chief data officer at the Broad Institute of MIT and Harvard, where he is also an institute scientist. Dr. Philopakis is committed to bridging the gap between data sciences and medicine. He's a cardiologist at Brigham and Women's Hospital, whereas primary focus is caring for patients with rare genetic cardiovascular diseases. At the Broad Institute, he directs the data sciences platform, an organization of over 100 software engineers and computational biologists that develops software for analyzing genomic and clinical data. In addition to his roles at the Broad Institute and Brigham and Women's Hospital, he is a venture partner at GV focusing on machine learning, distributed computing and genomics. He received his MD from Harvard Medical School and completed a PhD in biophysics at Harvard. As an undergrad, he studied mathematics at Yale University and later studied mathematics at Cambridge University. So we have quite the duo today to lead us in this discussion and I'm going to start out with Melissa to tell us a bit about her work. Fantastic. Thank you so much Sue. Well, it's really my great pleasure to be here today. So thank you so much for the kind of invitation. I'll tell a little bit of a background of my sort of scientific trajectory in this space. As Sue had mentioned, my background is in developmental genetics and neuroscience and I have been slowly marching my way towards the clinical enterprise and towards patients throughout my career. During the time that I worked as a zebrafish genome coordinator. I was found to be tasked to coordinate the structuring of data for genotype phenotype and gene expression data and recognize the incredible power of these data. And that they were underused and at the time developed some data models and infrastructure to help support what I refer to as sort of a phenotype blast approach where we took these structured phenotype descriptions of zebrafish and we're able to retrieve candidate genes from human diseases to help understand molecular mechanisms of diseases for which we didn't know yet know the genetic basis. So this was back in 2009 we published our seminal paper on these technologies and marching forward to today's world these technologies are now in wide use in diagnostic tools for rare disease, where we are leveraging model organism data directly in a clinical setting for helping support diagnostics. We are group as part of the monarch initiative program, which is a large consortium international consortium that that helps develop these resources and algorithms also develops the human phenotype ontology, which is a medical terminology that is structured to describe the patient as a biological subject rather than a billing subject or our clinical encounter subject as is so often the case with most medical terminologies. By leveraging the kinds of infrastructure that treat the patient as a biological subject it allows us to compare humans to other organisms much more readily using semantic engineering technologies and these can be can be combined with genomic technologies and other kinds of technologies to help support clinical decision making and diagnostic tools. And so it's really exciting time on in health science research to be able to leverage so many different kinds of data together in pursuit of improved patient classification for diagnostics for prognosis for mechanism discovery and hopefully for treatment discovery. I'm really excited to be here today to talk with Anthony, more about his views on some of these same topics so thank you so much. Over to you Anthony. So thank you so much you know I really can express how honored I am and excited to do this. As Susan knows, my dad was actually a professor in Arizona State for over 30 years grew up on the corner of rural and baseline in Tempe, and I still it's on double at heart. So really it's just a real pleasure to be here today. You know, so, as Susan mentioned, my path in life was that I was a math nerd for all sorts of reasons who went to med school. And, you know, have always been really passionate about the opportunity to bring better use of data into the practice of medicine. And, you know, one of the things that has happened in recent years is that I've kind of veered much more towards what I'll call the applied or engineering side. In terms of focusing on actually building, you know, production grade software systems to store, share and analyze genomic and clinical data for the better medicine science and medicine. You know, and if I were to summarize one of the motivations that led me in this direction. It goes back to my time as a cardiology fellow. And I remember one evening I was on call, and a patient was admitted who had had cardiac arrest at the gym. He was a 45 year old otherwise healthy man who was on the treadmill and arrested. And fortunately someone at the gym new CPR, and he was brought to the hospital in critical condition. So that night, stayed up with him trying to keep him alive. And then in the coming weeks did a lot of very elaborate tests on him to try and determine what had caused such a seemingly healthy person to suddenly, suddenly die, honestly. And, you know, get a cardiac MRI cardiac catheterization and electrophysiology study. But it turned out that the single most valuable test we did was actually at some point, embarrassingly far along, was to get in touch with his brother and take a rigorous family history. And when we did that, you know, you ask, is there a history of sudden cardiac death in your family. Oh, no, nobody's had any problems. Nobody's your mother and father alive. No, mom died when she was 25. It was strange. She drowned and was a champion swimmer and nobody ever figured out what happened. What about her brother and sisters? Well, actually, two of her brothers also died in their 40s. One was in a car accident and one fell down the stairs. And sure enough, as you take a family history throughout the family, it certainly seems like this is something that looks like a genetic cause of sudden cardiac death. So we ordered some genetic tests to try and confirm it. And sure enough, there was a variant in a known gene known to cause sudden cardiac death that had never been described before in the literature. So the gene was known, but not this variant. And as many of you know, you can have either benign variants in a gene or really bad ones. So we were in kind of a difficult situation because we didn't know whether or not this was the cause of his death. Now, you might say, you know, why does it matter? The guy's in critical condition. You know, what do you care about making an elegant diagnosis? Well, it matters a lot for his family members. He had five kids and two of them carried that mutation. One was a rifleman in Afghanistan. And the other was a college student. And at least for, you know, the older son that was in the army, you know, exercise is a big risk factor. You'd have to tell him to quit his job, find a new career, you know, and you wouldn't be at all sure that you would be giving him the right advice. And you can put him on medicines, but they're often very toxic. So sure enough, ended up calling all of the other cardiologists I know who care about genetic diseases. Eventually found another case of a patient somewhere else that had the same variant and had also had cardiac arrest. And while being far from conclusive, really is the kind of thing that it smells good enough that you say, you know, until proven otherwise, we believe that this is the cause. So when you think about all of the data technologies we have and all of the social networks, the fact that something is important to this really came down to just, you know, making phone calls and having other friends that are interested in this is really a tragedy. You know, you should really be having databases that are queryable, where you can search by genotype and find other patients and figure this out in a much more automated and rapid way. And so, you know, this kind of caused me to pivot and really spend instead of going into more kind of fundamental academic research, really focus on building a team that is dedicated to sharing of genomic and clinical data, and applying that knowledge to important problems in the care of patients. What a wonderful story or powerful story and leads us really into the topic of, you know, we often don't get a full history before an appointment and or aren't able to necessarily track health outcomes after an appointment. We don't get a holistic view of people, which leads to incomplete care, really, or less than optimal care. You can also drive health disparities how talk to us about how we can leverage health data to put people whole and to have a more comprehensive view and ability to take care of them over time and space and for their health for their health to be empowered. Do you want me to take that? Maybe let's have less of a first, please. Let's take it. Okay. You know, I think I really love this topic because we're in such an emergent time of the patient becoming the center of their, their health universe, rather than this piecemeal set of records that live out in the world about them. There's obviously a lot of technology needs that are yet to be met, and there's a lot of regulatory and sort of process issues that still exist. And furthermore, that as you said, Susan, that the fact of the matter is that we get such a brief snapshot of the patient in their clinical encounter that we really don't know what's going on in their real world on a day to day. And then even in that context of that clinical encounter, you know, we don't have the full picture and we ask the same questions every time a patient comes in on a piece of paper. You know, about their family history and then when their family history doesn't agree from the last time they were there or at some other clinic, we don't even notice because they've not, you know, they've not reported it in a consistent way. So, you know, it's very hard to track things like which drugs are they taking family history issues like Anthony talked about, but also other kinds of data like thinking about the fact that, you know, when a patient comes into the clinic and, you know, we do some medical imaging we do genomics. These data then might might actually if we're lucky go to research databases, or they can be used but they're separated from from their clinical data so it's hard to know what the relevancy of any patterns that we might see in those data really are. And then from the perspective of the patient, you know, thinking about all the other kinds of data that might exist out there like their dental data or their vision data, or other kinds of public health data that might be really informative like do they live near, you know, it's a fuel plant that might have a certain kind of exposure. This kind of information can be really critically important for diagnostics and for, you know, managing care, but we, you know, clinicians rarely have access to those kinds of data in hand for their for their patients either. And I really think that that one of the key ways that we're going to overcome this challenge is to move to a system where the patient becomes the center of the other their data universe rather than this this body collection of information that lives out in the clinical and public world about them. So, so I think it's an exciting time especially now with COVID really sort of pushing on some of these technologies rapidly so I think we can expect to see some some changes in that and that's actually moving forward. Yeah, so many thoughts on this topic. So, I think one of the things that's unfortunately true is the way that we have approached the construction of electronic medical records has been centered first on billing and the end of one patient. And second, about being able to utilize predictive analytics and the end of many patients. And so, you know, when you think about just the way that we organize the data on a patient, it's very set up around each encounter that happens in the healthcare system. And it's also very if you get your care in different healthcare systems can be very difficult to move it from one place to another. One of the examples I find kind of most amazing is my hospital Brigham women's hospital is physically connected to Boston Children's Hospital, you can walk from one to the other without actually physically going outside. So I stopped practicing about a year ago but until then I was, and it was amazing to me that I had a lot of patients that would be taking care of it Children's Hospital until they were 18 years old. And then when it came time to transfer their care to the Brigham. You know they could walk from one building to the next, but when it came to their medical records they would have to print them out and carry them over. There was no way to transfer them. And then of course, I would receive, you know, something like 1000 pages of, you know notes about a patient. And in the course of a single clinic visit would have to try and make my way through them. And then I would write a single note that summarized everything that happened to them from the age of zero to 18 that essentially then would become the source of record for their care going forward in this hospital. Oh my goodness, what a terrible situation. You know, obviously, it's a lot of pressure on the clinician. There's a huge amount of information that is lost along the way. And, you know, this is not even beginning to scratch the surface on all sorts of other data types that are incredibly important to a patient that don't even exist inside of a medical record. For example, your grocery bill probably contains a lot of very important information about your future health outcomes. As does lots of things about your environmental situation and who you live with and who you're caring for. And so when it comes to caring for patients, even the data we have access to from the electronic medical records is very poor and very low fidelity and often wrong. And then there's a whole world of additional data types that we would really like to utilize and just don't have access to. So I think honestly this is the data challenge of our time and I'm sure during the course of this we'll talk about sort of various ways the situation might be improved going forward. You know, I want to expand on that and personalize it a little bit in terms of, you know, you have to be part of a medical school or a university medical center or a hospital system in order to access electronic health record data. It's problematic for schools such as Arizona State University and Northern Arizona University that don't have medical schools or university hospitals. You've both been pioneers in, you know, exchange efforts that aim to increase data sharing between hospitals and researchers. I'd love you to talk about your experiencing experiences with this, both in regard to what you see our successes. And we are the obstacles that you're hoping to overcome. I think, you know, I definitely know that we need to talk about the matchmaker exchange that Nancy and I have both worked on together. So this is an initiative that's part of the Global Alliance for Genomics and Health, and this is a global consortium that's dedicated towards bringing genomic health technologies to everyone around the world. It's very technical community, really kind of focused on data sharing and data interoperability, but also regulatory differences across countries and making it so that we can overcome some of those barriers. And one of the initiatives is the matchmaker exchange, which really aims to support, as Anthony was talking about, finding patients that have similar phenotypes and similar genotypes across the world. And it's been very successful at, you know, helping to find the end of two and confirm candidate variants, which have led to in some cases, you know, identification of treatments and care that have helped improve the lives of those end of one patients. And so that's very exciting and a truly wonderful success and it really takes a village to have built that kind of community around the world. You know, some of the challenges still though are that, you know, not every participating site really shares very much of the phenotype information. And so really, we've been working very hard to think about how we can share structured phenotype information, also in a global alliance for genomics and health with a resource called phenopackets, which can sort of enhance the way that we share variant information with structured phenotype information that's de-identified in such a way that you could just post them on Facebook, you could put them into your journal articles. And in that way, kind of make that phenotypic information about an individual much more computable. So we have these wonderful knowledge bases that help us understand at a sort of population level patients have this variant tend to have this set of phenotypes but we don't really understand specific individuals and the variability of those phenotypes and when you're talking about rare genetic diseases you really want to understand that that sort of heterogeneity of phenotypic information. And that's really, I think, you know, one of the areas that we need to head towards is making it so that the information both about the variants and the phenotypic information can be more broadly shared in a de-identified manner as well as in a more, you know, kind of closed environment like the matchmaker exchange has established. And then also just really focusing on more of the longitudinal phenotypic information so that we can understand individual level disease courses with these patients with these variants in a way that's, you know, secure and completely de-identified but where the knowledge of individuals is actually shared. So I think the other thing, well, maybe I'll let Anthony talk about that but I think we should talk about EHR data after that and sharing EHR data. Sure. And thank you. So I totally agree with Melissa, you know, it's been a real pleasure. I was very much one of the people who got the matchmaker exchange going along with Melissa. I've been less involved with it lately because honestly it's doing so well that I felt like, you know, there are problems to fix whereas this one was so functional that it, you know, I don't know that I had anything to offer. You know, another effort that I think has been quite successful, and this kind of gets into sharing of EMR data, is an effort that Arizona as a state is actually very involved in called the All of Us cohort program. As you know, this is an effort to recruit a million or more Americans and get full medical records on them, full genome sequences followed longitudinally for decades. You know, and actually banner healthcare in Arizona is one of the key recruiting sites. Actually, honestly, from my perspective, it's been one of the most successful and high performing ones, which is great to see. And so, you know, that's another effort where they're already more than 250,000 people recruited. And all of the sites that they're recruiting from are sending their medical records to a place where researchers can access it. And importantly, the way that this being done has some notable features. Any qualified researcher can access the data. Every quarter there's a new release of it. And it's not just the researchers that are funded by the consortium can access it. So it's a very different model that really is kind of much more conducive to broad data sharing than we've seen in the past. You know, there's a phrase that epidemiologists like to be buried with their data. I very much hope that that's a relic of the past in terms of the way with medical research will go forward. You know, one of the things I think Susan had also said about is areas where there's need for improvement. And I think we need to reconceptualize how we think about data ownership and healthcare. And this is kind of true on on two levels. One is, you know, I think we should get more to a mindset that the patient is the owner of their data, not the medical center or the insurance company that paid for it. And again, I think we'll talk about this more so I won't say too much about it now. But there's another area that I think is very important and it will sound kind of wonky and technical. But I actually think a lot of kind of sociologic change will follow from it. And that is, you know, I really hope that healthcare starts to embrace cloud computing, especially public clouds much more. And the reason I say this is not because I'm, you know, trying to send more money to, you know, alphabet or Amazon or Microsoft, but rather because, you know, right now when we talk about data ownership, we often complete three things. So one thing is to own the computers that the data is stored on. Another thing is to be able to use the data for some research or commercial purpose. And then the third thing is to be able to control who can who else can access the data. And so, you know, when you put data on a data center at a hospital, all three of those things end up being conflated with one another. And it's very, very difficult to have real data sharing happen beyond the boundaries of the hospital. Whereas when you put data in public clouds, now the, you're no longer owning the computers that the data is stored on. And really, you know, it's much easier, it's much harder rather for researchers to hide behind saying I can't share the data because I can't give you access to my data center, or because my security department blocks it, all sorts of things like that. So I do see cloud computing is a real force for change in how we think about the accessibility of medical data. And I guess I'd like to follow up on that too. You know, I think we've been doing a lot of work recently for COVID on a project called the national COVID cohort collaborative which is trying to extract, well it is extracting the data from now upwards of 70 institutions nationwide and taking those those electronic health record data and leveraging the work that was done in the communities that Anthony already mentioned like all of us in the odyssey community, where individual institutions has been putting their EHR data into a common clinical data model so that they can be queried in a distributed fashion. So that allows for example for any of the all of us sites to to run a query how many patients, you know are on ventilators longer than a week, right. And that's an answer back from each hospital that participates in the odyssey community. And that's been an absolutely fantastic resource. And there are multiple net research networks like this that core net act odyssey, and also a commercial vendor trinetics that supports the same kinds of things and so there's a lot of efforts to support this distributed application. But one of the problems, and is that there's two, there's two key problems that I think we're starting to understand how to overcome. One is is that you can't do machine learning and large kind of data set applications in these you can do some statistical and some machine learning types of applications but you can't really do some of the big data AI type applications on these very small, you know queries they're just really distributed query networks for the most part and then you can you know performs statistics statistics on those queries if you have enough hospitals participating. But what you can do is you can take those data that have already been formatted in that common data model, and align those common data models and merge all those data together into one giant data set. And so this is what we've done for for COVID and it's been a very interesting process both from a regulatory perspective, as well as a data management and data harmonization perspective, even data coming from the same data model models that you know loads the data differently like we can't even represent things like weight with the same units, it's astonishingly different across different institutions. And so when we look back now at how those distributed queries are working in terms of about the degree of quality assurance and harmonization that has to happen on the query level. Well we can actually, when we look at all of the data all at once we actually have this this opportunity to really help each of those individual individual institutions, improve their overall connection with each other whether we're using the distributed model, or this sort of harmonized aggregate model and the nice thing about this, this aggregate models not only can we deploy some, you know, really much more modern AI and machine learning types of applications, but it's in the cloud and we can actually provision access to it to a fairly broad set of researchers so long as they are working on coven and have the proper access permission so it's been a real, you know, the pandemic has really created this unique opportunity to really grow our capacity to share electronic health record data for research purposes. So there's a lot there to actually expand upon and you know I love you to talk more because the national code of collaborative and three C is a big deal. And the, you know, the open science community that know the fact that we know that research and medical management will progress faster of data and knowledge are openly shared can you expand on on that concept and tell us more about how it's going with and what's being produced from the national code of collaborative. Sure, so I think, you know, you know, we're, you know, it's, I guess it's a lot of questions in there too, you know, I think it's, it's a very interesting social experiment, frankly, to understand, you know, how to help best support the participants that are working on the data distributional side, as well as the, the folks who are working on the analysis side. And there's sort of this whole pipe of data quality and data processing and data harmonization that has to happen. And then even just to analyze the data we need people who were, you know, helping at the source of the data to also help analyze the data because they understand a lot of the nuances of the data. But it really takes a village in terms of the analysis as well and how, you know, bringing together clinical experts, statistics experts people are experts in machine learning people are experts in terminologies and data model harmonization and the individual sources. And then, you know, really deploying, you know, sort of standard informatics techniques in a, in a reproducible way. You know, one of the, for folks in the audience who might be familiar there were a couple really important retractions in the New England Journal of Medicine and Lancet over observational studies on EHR data that was that were fully non transparent and non reproducible. And even some of the authors didn't actually view the data that on these articles that were published and this is this is really the problem as you were just saying Susan like a lot it's hard to get your hands on data, even if you're like an expert and author on a paper. And so one of the really exciting things about the the entry C is that the the entire workflows for the analytics and the data sets and all of their provenance and versioning and sources are are fully reproducible and transparent to any reviewers of any manuscript that that want to go and reproduce those workflows. That doesn't mean that you can just go in there and look at everybody's data. And it doesn't mean that we have, you know, we've stripped out as many of the HIPAA identifiers as we can and only kept the two that are most important for studying pandemic response, which is service dates and geocodes because obviously those are important for understanding but you know the, you know, and so there's, you know, this kind of oniony layer of access permissions and you have a data access committee that's responsible for judging whether or not the research you're proposing requires those HIPAA identifiers or not. We're also generating synthetic data so that a much larger community can use the data much more broadly. But at the end of the day, whether you're using the synthetic data or the tier one data which has those HIPAA identifiers in it. The full workflows will be reproducible so that when we publish science on these sensitive data they can actually be transparent and reproducible and we can evaluate the science based on the on the true data rather than on hearsay. And I think that's one of the most exciting things about about the entity as well as giving people attribution for the many hands that it takes to. So part of the part of reproducibility is also knowing everyone who helped and and tracking their contributions. So there's a question from a premed student that's related to this and in having expanded it more. They asked what concerns with health information privacy may come up in the improvements in data sharing. Are there risks patients would rightfully want to avoid that would make them opt out and how can we convince people to opt in. Sure, I can take this and I think it's a great question and have kind of many thoughts on this. So, you know, first one of the things I've kind of come to believe is that, you know, information security in some ways is not so different than security in the physical world. And there are things that we prevent by policy and retrospectively punishing. And then there are things that we try to prospectively prevent. So, you know, concretely, we put cameras in a convenience store to prevent and get shoplifting. But we put metal detectors at the airport, you know, because the ramifications of something wrong there are so much greater. And so when you try to prospectively prevent, you know, you often create a lot of friction in the system, but it also can often be worth it. You know, a lot of things that I think, you know, we've been talking about today are friction points that don't actually improve security. They're much more around kind of incentive structures that are perverse in various ways, honestly. But the risks that you point to of this is often very sensitive, very private data that should not be seen by everyone in the world. And it really should only be seen by someone who's a bona fide researcher, who's trying to use it to improve science and medicine, and who will only be publishing aggregate level insights that I don't betray anything that you don't want to generally know about you. And so setting up the both the policies and the technologies to both kind of minimize friction and that protect research participants is a very challenging thing. And in some ways, I think that this is the data challenge of our time is to get this right. You know, and something, for example, that I've seen kind of people go back and forth, even over the events of the last two years with Cambridge Analytica and stuff. When all of us first got going, there was a real push towards being able to empower so-called citizen scientists, which is scientists from outside of medical centers, who might be able to make great discoveries. And I think everyone agrees that that's a laudable and appropriate thing to do. You know, but then there's the reverse, which is, do you also agree that we don't want a foreign cyber terrorist accessing your data and publishing your whole re-identifying you by joining to external data sets and then publishing it in a way that can't be pulled back? Well, of course, you know, that is also very important. And, you know, it turns out trying to figure out the, you know, brilliant, altruistic, 18-year-old researcher in Russia who's trying to do great things. And the cyber terrorist in Russia that is trying to do malfeasance can be very difficult to separate. One of the things that I think the world should really get to is an accreditation for system for researchers. You know, this is what we do in healthcare, where in order to deliver care of almost any kind, you have to have some kind of accreditation. And if you do wrong, there are legal repercussions for it. Getting to something very similar for researchers where there's, you know, formal ethics training. And to some extent you have it, you have city courses and stuff like that. But something that's a bit more formalized and regulated, and it also has some legal teeth behind it when there's malfeasance. I think this is a very important thing for us to get to. I just want to add on to what Anthony just said. I've long thought that we need a Hippocratic oath for research, and we do not have that today. That's a great point. So back to, you know, you talked about the data, how do, how do we have data be owned by, by the individual. How do we create those systems that really empower patients or what are your thoughts on what that system should look like to take control of their health information. So that, you know, and how can doing that really improve individual and population health outcomes. Yeah, I have a few thoughts here. You know, one of the things I think is interesting is you can donate a lot of things to medicine. You can donate, you know, your organs, you can donate your blood, but you can't actually donate your data to medical research. And, you know, your data is often what medical research honestly most needs. And I think there's a, I could imagine an interesting confluence of things that might lead to a sea change in this issue. You know, more and more, especially rare disease and cancer groups are kind of saying this very forcefully that I demand to be able to access my data and donate it. I think that's a rallying that a lot of people are starting to get behind. Moreover, as part of healthcare reform, so-called meaningful use three, there's a legal requirement that anybody should be allowed to ask for their data in a standardized format and have it given to them. And, you know, so far that has not really been tested and most medical systems just technically are not set up to be able to export a medical record in a standardized format. But, you know, it could imagine a day that comes when you start to see the rise of a real social movement where, you know, especially patients with rare disease or cancer start demanding access to their data from medical centers and then it be exported in standardized form. And then you'll also see the rise of organizations that are there ready to catch it and make it available to researchers. This doesn't seem so far fetched to me, and I could actually see it being the tipping point in kind of a reframing of how we think about health data, who owns it, who can access it, and for what purpose. You know, I think, you know, it's funny because I have this test every time I go to the doctor and they make you sign the form that says that they won't share your data. I say, no, I want to sign the form that says I can donate my data to medical research and they say, we don't have that form. And I'm waiting for the day when that form will appear. And I think that, you know, that's, that's really should be the patient's choice. I should be able to decide when and where and with whom I share my data. You know, and, you know, even maybe if we start a little bit smaller and just say, okay, I would like my data in the health system X or in state X to be accessible to researchers for purpose why, you know, and and to be able to have that sort of fine tuning to make those decisions yourself and to do so in an informed way. That's the other thing I think that, you know, even even with, you know, deposition of other kinds of samples. You know, the consent forms are often not very, you know, things are written in small print and the ramifications of the use of that. So the samples are not necessarily so clear to the patients and with data. It's even more imperative that we really, really teach people what the ramifications are so that they can make the best decisions for themselves and we're genetic data is concerned for their families because obviously when you share your own personal genetic data you're also sharing your family's data. So that's those implications as well, but I completely agree that that the patient needs to become the sort of center of those, those permission giving universes and, you know, needing to evolve the medical systems to be able to support, you know, back to that data harmonization issue and standardization issue. The sort of processes of data capture within any given clinical setting are still just so different from setting to setting that that even when we are able to capture the data at the same way we're still going to have this very large challenge of, of just not capturing data in the same way and it's not very interoperable so there's a lot to do on on how data is encoded at the point of source as well as after it gets shared. So we're back some to what COVID has done to our world and it's really in some ways there's been successes and improved and increased speed at which data is shared between institutions and sectors. For instance how quickly the virus was sequenced and published in terms of this genome. What do you see are the successes and and how can we learn from them so that we don't go back to old ways of doing things. Sure. I think one of the big successes, honestly, is, and maybe not noticed enough is clinical trials and showing the scope of the possible. So there was the recovery trial in the UK that, you know, got enrolled or got going and approved and I'll be approved and designed in record time. And, you know, within a matter of months of the outbreak was yielding results, like for example, Dexamethasone has a mortality benefit. You know, many people I know who are seasoned trial list thought that moving that quickly would be impossible. And it has become an exemplar that actually can be done. If you have sufficient motivation and and and I think it's an existence proof that maybe a lot of aspects of our clinical trial apparatus should be rethought. And then even again, you know, I think we're all very excited about the results that came out on Monday from Pfizer and the vaccine. And again, being able to go from, you know, not knowing the existence of the virus in January to, you know, actually having a vaccine trial successfully is really a sight to behold. And I think, you know, we as a species should feel very proud at how far we've come. This is one way in which the current year is very different than the Spanish flu, you know, long ago. Yeah, I think another another area that we're thinking about, you know, just in terms of like reconnecting data. So if you think about how fast the genome was sequenced and the amount of data sharing that's happening for, you know, both host and viral genomes in the context of COVID is just outstanding. But again, it's not connected to the clinical data, we can share all this genomic data over here, and then we can share all this clinical data over here. But how do we interpret the changes in, you know, the differences in both viral and host genomes with clinical outcomes. And so I think, you know, there's some some really exciting pilots that we're starting now for hashing data where we can reassociate those patients, you know, host and viral sequences from within an institution back with their clinical records, so that we can do those larger scale multimodal analytics with sensitive data, but only when the institutions have the regulatory control for for doing that. And so there, you know, I think that's a really exciting notion. And we, it's about putting the patient back together again. And, you know, similar to the imaging data we're doing a pilot with that as well, to, to, you know, think about how do we, you know, leverage kind of AI applications on the images with the clinical data simultaneously by using these types of hashing technologies. And on the point of clinical trials, common disease clinical trials, although very much needed are often less prioritized than clinical trials for rare diseases and cancers. Why are they important to improve population health outcomes and how can we leverage partnerships with medical centers to increase access in common disease clinical trials. I think this is a wonderful topic and one that I'm very passionate about. You know, one of the things that I started med school in 2001, which is the year that we back was approved, and then also the trials for a vast and which was a intended genesis work getting going. And I remember everyone kind of was betting on a vast and is the kind of cure for many cancers, and people saw Glee Beck and they said what this is crazy, are we going to make, you know, a different drug for every mutation and every cancer, you know, we'll need hundreds of drugs. And sure enough, what's happened over the last 20 years is that drug development has really become focused almost exclusively on rare diseases and cancer. And that's mainly because the cost of running a trial for common disease is so so large relative to those. And don't be wrong, I'm very in favor of drug development for rare diseases and cancer. But I am quite dismayed that when you look at the kind of top 10 causes of death in the world, the number of people that are still trying to make drugs for heart attack stroke. Diabetes is dwindling. And, you know, here's an example of why when you look at let's say a typical cardiology trial for my would typically take 20,000 patients to run it for about five years. And the event rate, which is to say the number of people who have death or am I or stroke is often about 10%, which means that 90% of your trial is wasted. And so, you know, this is quite a state of the usually cost is around a billion dollars. So a few innovations have happened in the last few years that I think are very exciting. The first is, you know, using genomic and clinical data to both better select patients that will respond to a drug and to increase the event rate, so that now you can run much smaller trials. And then second is a lot of the work that's come out of the UK. And I mentioned the recovery trial and some more other trials where pharmaceutical companies have been able to partner with the UK government. Access health data to identify patients who meet inclusion exclusion criteria, and therefore run a trial for a much smaller cost than what typically be expected. So I think there's a huge opportunity for a partnership between medical centers and the drug development industry to leverage health data to run much more efficient streamlined trials where we're finding people that will most benefit from the drug and being able to more rapidly recruit them into the trial and follow them longitudinally. One of the things that I'm really excited about what you and what you just said Anthony was was really about sort of the the inclusion exclusion criteria and how do we help support more rapid classification of patients so really building computable strategies to help, you know, identify those patients so that we can run smaller trials and iterate in a more sort of nimble manner in these in these trial contexts for more common diseases and I think that really gets after a more general problem of just patient classification. You know, it really every patient is an end of one. The question is what are the useful classifications that help us, you know, you know, identify the correct treatment or the correct diagnosis and in the case of trials for common diseases that becomes critically important to reduce the cost and create a more nimbleness that is that is needed so I think that as some of those strategies for classification for inclusion exclusion criteria, as well as just, you know, many other aspects of classifying patients, you know, really start to become more robust we're going to start to see some improvements in that area. I want to personalize it again and get your thoughts to sort of help Arizona. We have a pretty dynamic zone in Phoenix that it's the Phoenix biomedical area corridor zone where we're still working on on the name. All three universities state universities have a presence there in health. It's in School of Nursing, the College of Health Solutions, the College of Medicine University of Arizona Phoenix and Northern Arizona University has fabulous programs and physician assistants OTPT etc. How do and both of you both in Oregon and certainly the MIT Broad Harvard hospital systems, you know, have to work together how do we foster that and leverage particularly when it comes to infrastructure in the data space what what would be your pearls of wisdom. I think Anthony already said it cloud. Cloud and really, really excellent access and security management. You know, it makes things go so much easier for management but it's not a, it's not a, you know, magic. It's not it's not magic. Just because you have it doesn't doesn't make it all happen it takes really good governance. You have issues there still security issues and the like but it does make it much the, you know, sort of technology wise much easier to share sensitive data across institutions. I definitely agree with Melissa and you know kind of further areas that I think are real room for opportunity. You know, I think the analogy of how MIT works with a lot of hospitals in Boston is an interesting one. You know, MIT doesn't have a medical school, but it is unquestionably, you know, one of the great centers of medical innovation. And it really was quite aggressive about building partnerships with various hospitals, who are eager to tap into the intellectual resources of the institution. And, you know, in particular, I think there are a few things that Arizona State really has going for it. You know, first, like MIT there's an outstanding tradition of engineering of all forms, and also computer science computer information systems. And so, you know, that's a real resource. You know, second, one of the truly great healthcare or some of the truly great healthcare systems in the world are in Arizona. You know, you think about the Barrow Neurologic Institute is, you know, world class and neurosurgery. Mayo Clinic has, you know, which is, in my opinion, honestly, maybe the greatest American healthcare system has one of its three outposts in Arizona. So I would look to see, I guess one advice I would have is to really try and build deeper relationships with those groups and to have a lot of software engineering and data science be core to them. You know, Arizona State is a great home for software engineers, and more so maybe then, you know, a lot of traditional medical institutes. And so what, you know, the lack of a medical school can actually be in some ways more of an opportunity than a challenge. What I will say is, again, is someone who my dad was very involved in academic administration, and the number of times, I hope I don't offend anyone that he would say the goddamn border region still won't let ASU have a med school. So I'm sorry about that. Well, I agree that there's opportunity without having med school. Just have a few more minutes and I wonder if you both want to just, I wouldn't say summarize but closing comments in terms of data science and health, as we move forward. Well, I guess I would just, first of all, like to thank everyone for joining us today and please don't hesitate to reach out. If you're interested in COVID join our COVID project we need all the help we can get. But I think, you know, maybe my closing comment would really just be, you know, imagine big. There's a lot of data out there that is not being leveraged to maximize health or health research or health outcomes. And it really does take a village it takes those computer scientists in Arizona, working together with clinicians and basic science researchers and and caregivers and public health epidemiologists and, you know, just an army of different areas of expertise. And the more we can connect our data and the more we can think big about how we can move data around the more value we're going to get out of the data. I mean, I would agree very much with Melissa, you know, a lot of the things we talked about today, you talk about things that are hard or things that we wish were different. One thing I do hope everyone here feels is, you know, this is a time to be tremendously optimistic. You know, I one of my kind of passions in life is history of science and history of mathematics. And, you know, I often kind of imagine 200 years from now, what will people think about this era as I wish I was alive then because this exciting thing was going on. Well, you know, our era will be marked by is two intellectual revolutions that kind of grew up in parallel. You know, one is the information technology revolution that really accelerated in the 90s with with internet and then, you know, obviously now into the machine learning era. And then in parallel, you have the kind of biotechnology revolution with the rise of genomics and genome editing and recombinant DNA and all these other things. And, you know, what there really is is the opportunity for these two fields which are independently incredible to actually come together. And you know, I assume that because you spent an hour listening to Melissa and I, you're interested in that intersection. And you know, I do feel it really is kind of chocolate and peanut butter to be someone working in this space at this time. Chocolate and peanut butter is a good place to leave off. So make sure there's no more questions. I really want to thank you both for your participation in our series. It's been really interesting and stimulating. Wonderful to get the two of you together as well. Different sides of the country but both working in such an important space and I know you already knew each other but this was this was really fruitful for us. So thank you both, Melissa and Anthony, and thank you, thank you all of our participants. We did take this will be available later, and we hope you'll join us for next week. Our last in the series of our biomedical innovation speaker series. So, thank you all, and stay safe and well. Thank you, Susan.