 Live from Cambridge, Massachusetts, extracting the signal from the noise, it's the Q, covering the MIT Chief Data Officer and Information Quality Symposium. Now your hosts, Dave Vellante and Paul Gillan. Welcome back to Cambridge, Massachusetts everybody. This is Dave Vellante with Paul Gillan. We're here at the MIT IQ Information Quality. It's the CDO conference that's, I think in its ninth year, Paul, it's not brain surgery or is it? Nick Marco is here at Sigma Horizon. He's the Chief Data Officer of Geisinger Health System. What do you think about that, Paul? We're having a fascinating discussion off-camera because Nick is a brain surgeon who is also a data scientist or a CDO and which naturally begs the question, how does that happen? How do you get from one to the other? So it is an interesting story. So my initial training was in clinical medicine and I'm a clinical neurosurgeon. I still practice. I do neurosurgical oncology, so I operate on brain and spine tumors on a regular basis. I had always been sort of a math, science, technology sort of guy. When I was in undergrad and medical school, my research focus was in molecular biology. So this was back in 2003 toward the end of the human genome time and I spent a year doing Howard Hughes Fellowship at the Institute for Genomics Research where they were finishing up human genome sequencing at the time and I worked with the microarray gene expression group and it was an interesting mix because I went there for the molecular biology part of things and the laboratory-based side of things, but the other half of the group that I was working with wrote software and did the analytics for those large volumes of data. So those were really the first big data or large data sets that we saw in biomedical sectors where these expression arrays or these sequence data from the genome and so the process of how do you analyze those both from a computing standpoint and from an algorithmic standpoint was just starting to be fleshed out. So actually fell in love with the work that the computing group was doing also and the math behind all of that analytics. So I continued to work on that throughout med school and working on sort of the interface between cancer genomics and clinical medicine and once my residency was done I did my residency training at Cleveland Clinic in neurosurgery and once that was finished by that time it wasn't just molecular biology type data or computational biology work. We're starting to see larger volumes of clinical data be available and it was just sort of a natural progression of things saying well if we have these techniques for computing on big amounts of data it doesn't so much matter what the data is you know we're going to be able to put more and more together and that was when the term big data was starting to get popularized and other such things so it was just sort of a logical extension of that. And then because I didn't have you know formal training in the computational side of things or the mathematical side of things most of what I learned was kind of self taught to do the work I needed to do. So I took a year at the end of my residency went to the UK and studied at Cambridge for a year with their applied math department and with Cancer Research UK to bring myself up to speed on that because I think you know I'll never be as good of a mathematician as somebody who spent their life doing it but if you're a clinician who can speak that language and know how to find good mathematicians and good computationalists and know how to pose questions well to them then they can take over from there and do what they do really well and it makes a good synergy but as physicians we're not really used to talking to data people or math people and vice versa. So I think that's kind of the niche that I occupy is that that piece in the middle that helps bring the compute side together with the with the clinical side. So after that I went to Geisinger Health System where I'm at now and started doing computational neuroscience research in addition to clinical neurosurgery and that kind of grew as our informatics division expanded and we took it institution wide and so now I do a couple things I do clinical neurosurgery I'm our organization's chief data officer that's a role that's just emerged for us over the last year or so from from the idea that we want to make a more strategic commitment to how we leverage the data that we've had because Geisinger's had electronic medical records for 20 plus years one of the oldest systems in the country to have that and so we have a lot of data sitting around and we were doing some really good stuff with it but we realized there was a lot more we could do if we're a little more strategic about how we build our infrastructures how we make it accessible that kind of thing so that's where the CDO role came from at Geisinger and then the other thing that I did was take my computational neuroscience group and scale it up to an organization wide data science group and and we focus on sort of the large computational problems so basically three three functions but it's great because you know on any given day I spend part of my time in the hospital part of my time talking about our data strategy part of my time looking at actual data related problems and in every one of those spheres I get to interact with people who are really expert in each of those subsections and it's a great place to kind of be in the middle of all of that so how was the decision made to put a practitioner in the chief data officer so it's interesting I'm just lucky I guess we talked about it for a little while and I was actually one of the people that proposed that we create a chief data officer role and so I think by default I won but yeah that sounds like a great idea but you know it's interesting because as with as with any organizational change or with any sort of creation of a new role there are some people that say wow that's a great idea there are other people that say what do we need that for I don't even know what that is and there are some people that are sort of in the middle and so when we finally collected a critical mass of people saying we should define some role to focus on data strategy and how that integrates with the patient side of things right so I mean the business of health care and the goal of health care is really to provide good care to patients so it's a very patient focused thing it's a very clinician focused thing and so if you're going to make a strategy about your data I think it makes good sense to say how do we focus that that strategy on the people who really need it our doctors and our patients and so putting a clinician in that role I think is a kind of a neat and forward-thinking kind of angle on that and I'm just lucky that you know Geisinger's willing to have a clinician spend part of their time doing that kind of thing because they recognize the potential value of leveraging the information for the patients and for the delivery of care we hear so much about the potential big data in medicine to revolutionize health care what are some some potential breakthroughs or or some sort of short term deliverables short term outcomes that you the particularly site you about the potential combining clinical data with big data right so I think there's a couple of interesting parts to that question so the first part of that question is when we're talking about combining types of data clinical data with big data and other such things the question is where does one stop and the other begin so in many sectors there are you know they'll have one or two key types of data that they use maybe it's sensor data from some sort of engineering process or something like this and just you know volumes of this stuff are created hundreds of terabytes of information and that's truly a large data problem because you've just got a lot of data to manage and comb through in health care our big data challenge is a little bit different so we have plenty of of data sets that are small in terms of size right you can fit someone's entire clinical chart into maybe 50 megabytes worth of storage space because it's just a bunch of text based notes and binary variables but then you've got something like the genome for those patients that may occupy one or two terabytes of storage space and then you've got something in the middle like clinical imaging data that may occupy a few gigabytes or you know 10 or 20 or 30 gigabytes of raw image so now what you've got is you know some very discreet clinical information in one format you've got a little bit larger free text type of information all the notes that doctors are in such you've got a larger set of data and imaging data and other such things with a very different structure and then now you've got a much larger genomic set of data so the question for us has really been how do you combine those things to do something interesting and important for the patient and so that I think speaks to the first part of your question which is how do you combine clinical data with big data for me I don't really know where one starts in the other one stops because how big does it have to be to be big data how small does it have to be to just be clinical data we view it as a spectrum of data sizes and structures and types so the first challenge is really how do you put those things together and that's a data engineering challenge to put it all together in the same place that's an infrastructural challenge to figure out how you're going to be able to move that data and analyze and compute on it and then it's a data science challenge to figure out what are the right algorithms that can take those different kinds of data and how do you put it together and answer questions from it so that's kind of the background answer to the question and then the second part was what are the specific interesting things that you think can happen from that short term and long term so I think there's a lot of low lying fruit that we can get at in the short term by simply combining a couple of simple data types and data sets together in ways that we haven't been able to do so before so this can be anything from patient specific predictive modeling so a good example of this is everyone's talking about you know first was personalized medicine and now it's precision medicine and other such things but the idea is how do we take this set of data that we have about an individual and make a prediction for them about what sort of medication should I give you what drug are you likely to respond to which of these three treatment options should I pick right the challenge with that has always been that you need a little bit more than what we always looked at for clinical trial data which is a couple of thousand patients with 20 or 30 variables now what I need to do is switch that to a patient centric view so I'm looking at an individual patient with lots of variables spread out over time and then to really be able to compare and compute on those I need to look at a couple of thousand or couple hundred thousand patients like that simply because the data structure is different so now because we're starting to be able to use bigger bigger data we can look at this from patient centric standpoint instead of a population centric standpoint and when you do that you start to be able to identify predictive models that are much better at predicting individual survival in response to therapy and the medical literature starting to see that already they're not terribly computationally difficult questions but because we can now deal with the data structure those are really what I think in my mind represents a lot of the low lying fruit of where these you know where we can get some early wins from this and make it very patient centric and patient specific the other part of that is operational or procedural things so the the the process of running a hospital of moving patients from one place to another of how you schedule or how you time things all the logistics of being treated as a patient this is really important to patients because it's part of their experience of going through the health care system right and everybody knows at some point that they've been stuck in a doctor's office waiting for a long time or they've been in the hospital had three or four different people who don't seem to be completely talking to each other and know exactly what's going on I think from a process standpoint we can now start to integrate the data about what happens to a patient as a result of being in the hospital and also how the patients respond to that and try to optimize those processes so that the the whole experience of health care delivery is is improved in addition to just the picking of a drug or the picking of a therapy and then in the long term I think we're looking at sort of bigger problems in both in those areas and beyond so one of the big projects going on at Geisinger and in a lot of other places is capturing a lot of genomic and molecular information and so now we're saying in the long term how can we take somebody's entire genome sequence and combine that with all of the other pieces and start to get some sense of how the individual is going to respond to treatments what the individual's risk profile looks like over the long term and that's not going to be something that we solve in the next year or two but it's going to take a lot of sort of iterating and observing and looking at new data types and computing in new ways and so I think they're going to be longer term wins that come out of particularly the genomic side of things because that's a data type that is very different in some ways than what we've been working with for a while and the process of integrating it with clinical data still has a lot of development to go on but that's where I think a lot of our longer term ones are looking at technologies like machine learning and like IBM Watson that that are really targeting these these kinds of problems potentially dealing with much larger data sets and humans could ever comprehend does that excite you do you see a lot of opportunity in applying these absolutely absolutely so it's not entirely clear to me exactly which technologies will prove to be the most valuable or which won't I think we're still very much in an exploratory phase we know that there are some algorithms some machine learning technologies and approaches that work very well on this kind of data with the limitations that we have and the structures they have and some that are easily tricked by that so I think it's an exciting time to be in this phase because my data science team for instance looks at these fundamental questions which algorithms should be picked to work on which kinds of data how do we know that when we put it into a black box machine learning kind of algorithm what we're getting out as a reflection of reality and not an easily you know some way that the machine learning algorithm is easily tricked so there's there's plenty of work to do in that domain but I think it only stands to reason that if we can put a lot more information in and we can compute on it in a way that the true patterns are really spotted then there's you know there's got to be something more that we can do then you know than just what we've been able to traditionally get by observing a few hundred patients over a couple of months or a year or two so I think it's a great opportunity there's a lot of great challenges in it both for physicians and for computationalists and mathematicians right now and I think as we get these things sorted out we're going to see a lot of differences at the point of care delivery for the patient somebody made the comment today and the Q&A and one of the keynotes of HIPAA has so many outdated you know pieces to it as you think about patient specific prediction what has to change from a sort of policy perspective so I think that there's always the balance between protecting patient privacy and wanting to get as much data as possible we know that when we're when we're using these sort of machine learning algorithms the more that we put into them the more it can learn about the differences and more that it can get out the flip side to that coin is you know if you're patient you want to be careful about where all of your information goes who has access to it what they use it for and how they compute on it so I think there's always a balance between privacy and availability of data that said as a clinician from working with patients I found that most patients are pretty willing to share their information particularly when it comes to research type questions and things where you say we're really trying to do something better for the group as a whole do you mind if we include your information with it very rarely do I encounter anybody who says no I have a real problem with that you know so I think the HIPAA privacy rules are well-meaning well-intentioned because the idea is you don't want your personal health information floating around but I think we have to make it a little bit easier for people who want to share their data to do so without having to jump through a lot of hoops and there's a lot of artificial concern about HIPAA there's a lot of liability that's coded into that law that makes physicians and researchers say you know what I'm maybe we won't do this or maybe we won't tackle that because you can be you know individually liable for things and it's just very concerning but I think as we get as we get around some of that we're going to find that what it really is important the patients really want to share their information to help the the general cause and and I think you know the doctors wanted the patients wanted it'll sort itself out HIPAA HIPAA is almost 20 years old and it was it was invented at a time when none of the technology exists to do with the this data what we can do now and it's actually maybe holding holding us back sort of in that realm we've been talking about clinical applications of big data but there are also applications to the business and to reducing health care costs and ended just attacking that whole problem what do you see there that you think what are the big opportunities you see in in tackling the cost problem so I think there are huge opportunities in tackling the cost problem I think any clinician that you talk to or anybody who spends time in the hospital will tell you that if you walk around on a daily basis you see areas of waste you see areas of opportunity for process improvement and other such things the challenge is how do you convert all of those anecdotes you know where you observe one event here you say oh look how this process is you know is taking up extra time how do you convert all of that into some sort of unified structure or larger framework in which you can study those well it's been traditionally very difficult because sure there may be a bunch of time stamps recorded for how patients move through certain areas or there may be a variety of pieces of a paperwork trail as a claim moves through the processing pathway but unless you have a real context for studying those things and a real way to pull all that information together those those little one-off things that everyone sees and observes on a daily basis never really get converted to a large pattern or a large set of of common observations that can that can be acted upon so I think the ability to combine a lot of these kind of nontraditional data types so things like log file data things like time stamps on on the way that patients move and to look at environmental variables and other such things are really where we're going to be able to target at least some important strategies for cost reduction and efficiency improvement and those are the exact kind of data that we never really paid attention to before as clinicians so it's not uncommon for instance in a healthcare system to have an electronic medical record where you pay careful attention to all of the patient specific data that's collected but then as a byproduct of that there's just scores of log files that are created that people just ignore or put on the shelf in case they ever need to query them for something I think there's a lot of great value in those log files and those time stamps and other such things but we just always very difficult to get your arms around those and do anything with them and I think you know infrastructures like Hadoop and big data architectures and the ability to analyze those things now and in mass creates a lot of opportunity for process improvement that leads to cost reduction how about adverse drug interactions it's one of the biggest killers of patients in hospitals I understand do you see a promise in the near term for getting a better picture of how patients will respond to different medication yeah absolutely so you know so the the initial problem of adverse drug reactions come you know from say mixing drugs or cross reactivities or whatever is a problem that's nicely dressed by sort of traditional electronic medical record relational database thing because it's easy for the computer to look at every medication you're on compute every possible interaction then worn the clinician without the physician having to think about it but when it comes to patient specific reactions to drugs right it's often difficult to predict if you're going to be the one in a thousand patients who has this one terrible reaction to this drug or something like this right so that's not easily solved by traditionally MR structures however if you've got you know instances of millions of patients nationwide receiving certain drugs you know statistically there's going to be a handful of patients or 10s or 20s or 100 who've had this adverse reaction and now that we can go back and look at their entire medical record and can capture their data we can really start to spot patterns that probably we never would have recognized as clinicians that can raise alarm bells or can make us say hey you know when you see this and this and this co-occurring that really would raise your suspicion for this but as an you know as an individual physician who's got to see you know who sees maybe one or two of these really weird adverse reactions over the course of your career you never have the opportunity to spot a pattern like that so that you know the larger data infrastructures really do hold a lot of potential for that I think we're just about out of time Nicholas but I wanted to ask you about your your CDO role you're establishing a guy singer I presume at some point it becomes a full-time job for somebody so what's your what's your vision as to where you take this thing yeah that's a that's a great question so and this is something that we talk about from time to time how much clinical work should someone do versus how much administrative work should someone do and everybody's got a different perspective on this so my take is I think there's real value to having somebody who's in these in any executive leadership role in health care whether it's chief data officer chief operating officer chief medical officer or whatever you know having some clinical presence that they maintain because the biggest value that I get in doing my job as CDOL is from walking you know in the same shoes as the clinicians practicing medicine and seeing patients every day I know what's important to the clinicians I know it's important to the patients and I know the problems that people are having because I live it you know every day of the week in some way or another I think that doesn't mean that you have to at some point scale back and say well it's a big job I can't do this anymore I think the key is building a good team around right so I don't try to do every part of the job myself you know we've got people who are really great at data governance at data infrastructure at data integration etc etc and you know my role as CDO I think is to coordinate these pieces and focus them on the real-world problems that I see every day in the in the the clinical setting so I think you almost don't my opinions you almost don't want to scale back to be a full-time administrator or full-time executive if you're in health care because it takes you away from the point of care delivery and from the daily patient interaction that helps guide these things the trick is to just focus on the strategic parts focus on the putting the pieces together part and then build a good team around you that doesn't so that's how I think we're trying to do it and and I'll let you know next year how excellent Nicholas Marco thanks very much for coming to the queue was great story thanks everyone right there buddy we'll be back with our next guest right after this short break this is the cube we're live from MIT IQ and Cambridge Massachusetts