 From the SiliconANGLE Media Office in Boston, Massachusetts, it's theCUBE. Now, here's your host, Dave Vellante. Welcome everybody, this is SiliconANGLE's theCUBE. We have a special CUBE conversation around transforming healthcare through data science. Brent Richter is here, he's the director of Enterprise Research Infrastructure Services at Partners Healthcare. Brent, welcome to theCUBE, thanks for coming on. Thank you. Of course, Partners is a combination, a partnership between Mass General and Brigham and Women's two of the greatest institutions in the United States, but talk a little bit about Partners and your role. Sure, so as you mentioned, there's the top two from the research perspective, organizations within the healthcare space in the U.S., Mass Choose General Hospital, Brigham and Women's Hospital, other AMCs, McLean Hospital, around behavioral science, Spalding, rehab. Those are the main academic medical centers and basically the hospitals of Partners that I support. There's other clinical hospitals there as well. But what my group does is support the research and clinical innovation of these hospitals through providing technology and tools. I mean, essentially in our discussions, my takeaway is you're providing cloud data services to your organization so that people can create better patient outcomes through data and maybe talk about sort of how that vision started in your journey. Well, it started with high-performance computing, actually 10 years ago when Partners found it strategically important to support the research community across Partners, across the hospitals. And from there, we've built other services around general services, around storage, but then more specific services around data and electronic data capture tools. But then as we're building high-performance computing, lots of the workloads were in genetics, then clinical genomic pipelines and areas. And then most recently as sort of the data volumes get bigger and people started to bring more and more data to it. For strategic purposes, we needed to sort of build this data science platform for investigators and clinical research groups to start to leverage for their analytics as they grow out of their ability to manage all this data. You know, internet of things, as you know, is a big hype right now. A hospital is like this internet of things. These connected devices, sometimes they're connected, but they've increasingly became more connected and they were sort of, 10 years ago, not too functional, but now they're incredibly functional. So is that where all the data came from? And you just saw this huge increase in- Yeah, so we were seeing this trend, again, 10 years ago with sort of the transformation of these instruments from like on the genetic side from sort of, you know, sequencers that could do, you know, 48 lanes, so 48 individual patients basically, but only about, you know, 1,000 base pairs at a time. But now, then they transformed into, you know, providing tens of thousands, if not a megabase at a time and doing whole genome sequencing, you know, within a week or so, which is what we're at now. Now there's also, on the other side, there are other sort of clinical devices, EEGs, EKGs that are, you know, developing a lot more sensitivity and a lot more sort of capturing a lot more breadth of data. And we were seeing that explosion, this was just over the last 10 years, until today where, you know, sort of quite a lot of the instruments are imaging-based, but then also on the monitoring side, you know, that data's just a lot more sensitivity, a lot more channels, going from discrete monitoring to continuous monitoring, you know, some patients are in the hospital for a week, they're monitored the entire time, you know, for events in epilepsy, for example, or something of that sort. So, you know, that data is all getting captured. Some of it up until today, basically, you know, this year has been kept for clinical use and that, you know, you keep it around for maybe 30 days, but now as the utility of that data becomes apparent and investigators are beginning to leverage that data for, you know, looking at their analytics and do their proof of concepts today, they want to start to capture that data and start to build it up, basically. So, you decide, okay, let's not throw this data away, let's capture, there's a ton of data, so you created what's some kind of repository to put the data into, this was like a few years ago, so you probably weren't using the term data lake back then, but did you essentially create a data lake, is that right? Yeah, basically, so there's, I mean, there's data warehouses, right, and they're around. We have a clinical enterprise data warehouse. We have a, on the research side, a research, what's called a research patient data registry, RPDR, what is called within partners, which is based on I2B2, or actually, it's the reverse, I2B2 is based on our RPDR instance. Lots of other organizations have the I2B2 instance, but that's the data warehouse, and that's within sort of these SQL databases. What we were trying to do is capture the data that's not in those warehouses, because it wasn't thought to have utility at that time. We're starting to do that now with, across both the Brigham and the general, in the emergency departments capturing, what's called waveform data on EKGs for heart monitoring in ICUs with EEGs for monitoring electrical impulses from the brain, et cetera. So that, again, went into this for clinical use, it just sort of went in and kept for 30 days, but then it sort of just falls out of that server, basically. We've set up to go into our data lake now that we built within partners. That data is now starting to accumulate over the last three months within the data lake. Okay, so you came to the realization that, okay, this data has value, it's going to help the patient outcomes, and then you said, okay, we got to put the data into a place where we're making it accessible. We can't do that in our traditional SQL data warehouses and data marts, and so what did you specifically do from an infrastructure and software standpoint? Can you give us some details on the steps that you took and what was different about sort of the new infrastructure? Well, it's fairly simple once you have the infrastructure. So we've based our, what we call the idea platform that we describe it, which is the integrated data environment for analytics within partners. But that's based on these Dell EMC technologies. It used to be called the Federation Business Data Lake now, AIM, and part of that is quite a bit of storage, a lot of storage, petabytes of storage, a virtual environment as well, based on Vblocks and VMware. And then tools, data science tools, and data sort of database tools, Mongo, like the new database tools, Mongo Hadoop, Hortonworks, Spark, all these sort of soup of acronyms and names. But once we have that set up, and that's what we worked on over the last couple of years, setting up this ingesting is basically providing a VM to the emergency department, and the person sort of working the data scientist and the research group that's looking at this. I should say that they're not looking at it specifically for research, they're looking at it for clinical decision support going forward, and that's where they're starting from. They already did their proof of concept research using sort of an extract of this data, and proved out the utility and wrote some papers, and now they're scaling up. That was on their laptop, obviously. This has to be, if they want to scale up, it has to be within this kind of environment. So anyways, we provide them a VM. We put R and Python on the VM that's part of the service catalog that comes with our IDEA platform. And then they are now capturing the data via the APIs that's provided from where this data gets put into, which is a GE care space server. They're again collecting EKGs first, and then they're joining, they will be joining that data once that data sort of builds up with the patient records that's contained within our RPDR database, basically. So you built this essentially an internal cloud with a service catalog. You're providing these services. You said AIM, I always forget the names. It's called the Analytics Insights Module, Analytics Insights Module, which is the new thing. We've talked about this on theCUBE before that Dell EMC announced. It's kind of a combination of Dell EMC and pivotal technologies, sort of end to end platform. Okay, so who is accessing these services as a data scientist, as an application development? Can you talk about sort of the consumers of this service? It's all of those, basically. So our first customers, I call them customers, but our first internal users were clinical departments, the MGH Cancer Center, they have, and that was a joint project between, or that is a joint project between the MGH Cancer Center and a pathology group called the Center for Integrative Diagnostics at MGH. They have top tier investigators and researchers, but they're not thinking about, they're not developing this workflow for research, they're developing it for clinical care of cancer patients. So they have hired a bunch of data scientists together with Long Lee and John Iafredi who is leading this effort to develop this clinical decision support system. So that's one. We're also working with the Center for Connected Health. Again, not research, but they are developing applications, so these are application developers. Who have a bunch of projects, they're collecting data from activity trackers for the most part, this is more externally facing, that's connecting the patient to this kind of technology. So those are a set of application developers and then we are working with research groups that are more on the research side who started out as maybe statisticians or biometricians, sort of what's called the dry lab area, and now they're getting into the new paradigm of data science, which is a lot different from sort of a statistical approach. And beginning to try to understand and train themselves on the tools, the technology, how to think like a data scientist rather than as a statistician around the data. And the platform allows them to do that as well. So it's quite a bit of all three of those areas. So again, those sequential steps, come to the conclusion that data can help the organization, patient outcomes, you invest, you get data into this location that is accessible. Now you got to trust the data, you got to have data quality. Are you using the tooling inside of AIM and this platform to now improve the quality of the data? People joke about data lake, data swamp. What are you doing in terms of improving data quality? Obviously security's part of that, but that's sort of second nature or first nature in your world. But what are you doing in terms of data quality and things like provenance and the like? So we're inheriting a lot of data from, as I mentioned, the clinical, like the data warehouse, the clinical data warehouse that is fully structured. You know, the data maintains its provenance. There's a patient, a master patient data index. That's a part of that. So the same patient that could be seen at the Brigham could walk into MGH and they get different medical record numbers, but our master patient data index reconciles that and joins them both up. You do that today? Yeah. Okay. That's done today within our clinical group. And in addition, that could be a community hospital as well. So it reconciles all of those records. So one person seen 30 years ago could walk into a different hospital today in the system. We'll reconcile that. So we pull from that data, from the enterprise data warehouse that's also done within the research patient data registry. So we maintain that. Investigators can bring their own data as well. So they end the application developers. There's also the Partners Biobank, which is collecting tens of thousands of genomes, essentially, that's also part of this. So we're pulling data from these sources, which that data provenance and integrity is guaranteed. Then people using the platform can bring their own data. They have to ensure that their data is clean. So that's where we're at today. So it's sort of a inheritance and bring your own data type of approach. As we start to place additional data into the system from the enterprise, like this waveform data, for example, medical imaging or medical images as well, that's one of our challenges for this year is that data governance for the most part and cleanliness. So we're beginning to work with internal enterprise development groups to pull this data, join it together and put it into an index that basically was within the data. So kind of the next step in that whole data quality mission. So what have been some of the outcomes so far? And maybe you could give us some examples and we'll talk about sort of where you want to take this thing. So some of the outcomes, like the first customer, the MGH Cancer Center, they want to, and they've begun to sequence every cancer and or every tumor and patient that comes through their door. They are building up that data so that they can sort of be able to understand what worked in terms of a treatment for a cancer patient in the past and then apply that, what had the best efficacy to the cancer patient that walks in the door today. So based on that genetic profile, based on the patient history and other diagnostics and factors, they'll be able to apply the best protocol for that cancer treatment. They've shown very good results in doing that because cancer treatment in the past has always been, if you come in with lung cancer, you start with this course of chemotherapy and radiation treatment. That doesn't work, you go to this next course of change the chemotherapy treatment and continue. For everyone. For every single patient that has lung cancer. What's different now? Well, what's different now is that they can look at the genetic profile of that patient. See if that, before they start on a course of treatment, they can see if that, if a specific drug sort of works or more importantly doesn't work with that genetic profile. If it doesn't work, then they'll choose a chemotherapy treatment that does work with that genetic profile. And they've, some of these things come out of the blue. So they've shown that this one cancer patient that came through their door had a very far advanced sort of small cell carcinoma. So it was metastatic throughout their body. They realized doing the genetic sequencing that some of the tumor had the same genetic profile as a different tumor that reacts very well to chrysotinib, this one chemotherapy drug, which is not used in this patients. It's not prescribed basically for this patient's cancer, the small cell carcinoma. So what they did was this was in their research. What they did was provide chrysotinib to this cancer patient and in two weeks, their tumors basically shrink to almost nothing. Two weeks. And in two weeks. So that's a very good result. And that shows you the power of what this can provide. And that never would have happened without that sort of data other than just by dumb luck. Right, right. They, some of these drugs, I mean, there's some of the treatments now sort of take into account that you can sort of cross, cross cancer profiles. Some of this has to get through the FDA, but these things are happening in the past, these things may have taken 10 years to do. And what they're doing now is shortening that cycle to two years and then going forward is getting it even shorter through all that regulation and then also treatment. So as these things continue to speed up, they'll have, it's going to have a profound effect, not just in cancer treatment, but cancer treatment across the United States. So in the last three years, you're obviously able to deal with more data. You're able to wrangle data more quickly, doing more precise analysis, the data scientist I'm hearing. Are you able to operationalize the analytics and put it into hands of more people? We're talking, you know, the data warehouse days, it was insights for a few that took too long to get to. Are you able to sort of get, you know, people use that term, citizen analyst. Are you beginning to operationalize these analytics? Yeah, so if what you mean by that is providing these, the tools and platform to the data scientists and developers. Yes, absolutely. The researchers on demand. Yes, that's the way we started. So, you know, we've been in development of this platform for the last two and a half years or so. And the initial part of that was basically to develop the platform and provide a catalog so that as users want to use the platform, they can just go and interact with the catalog, get a virtual machine set up the way they want it set up. You know, it might have Mongo on it. It might have Hortonworks Hadoop. It might have RStudio for development or Python. And then also sort of access to specific data sets. We have lots of public data sets that are in there. As I mentioned, access to RPDR data. If they want to bring their own data, there's data ingestion tools that they can use. So that provides them with the environment. Through this catalog, we can get them set up, you know, within the day, basically, within hours and provisioned and up and running and leveraging and doing their development. So instead of, you know, in the past, they would have to request a server or figure out what tools they need. Here, they can mix and match and pick. So in the last two or two, now closer to three years, you've kind of made the investments, put in this infrastructure. Now you're delivering these services, that's working. Now you've got a new sort of analytics capability platform that's going to help you take the next steps. What are those next steps? I mean, it just feels like you're just getting on the steep part of the S curve. I mean, infrastructure was a lot of heavy lifting and you had, I'm sure, skills and training and things of that nature. Where do you want to go from here and what's the potential of this platform? So where we want to go is, well, there's a few things that we need to get done for this year, 2017. One is that data ingestion and bringing in more enterprise data. So we have to build sort of data governance around that, data provenance around that. That's sort of an outstanding challenge, as I mentioned in the beginning. We also need to continue training across the board, basically. So with, as I said, thinking like a data scientist and not thinking like it's a statistician. Understanding the tools and the power and how to manipulate the data, again, it can be a steep learning curve. Understanding the development, like with Pivotal or Docker containerization, that is also part of a learning curve. So part of our mission is not just to provide the platform but to provide the training of this next generation of data scientists, of scientists, of developers, so that they can really start to take advantage of this new era in sort of data science and data wrangling and munging. Diffuse that data innovation throughout the organization to create more patient, better patient outcomes. I got to ask you a couple of personal questions here. So we were surprised by Brent this morning brought in his bike. You rode your bike here from Lexington to Marlboro. You're going to ride back to Charlestown. So you're an athlete of sorts, sometimes known as old men in Lycra, but you go pretty fast. So you're up to almost 6,000 miles this year, which is pretty impressive. Getting close, yep. You'll get there, right? Yep. And then the weather holds. If the weather holds, yeah, exactly. If there's no early snowstorms. How long have you been a biker? I've been a biker for many years. I guess the last 20, 30 years, I started out as a swimmer, and that's what got me through college, but biking since then, basically. I dropped off sort of in the middle years, but then over the last few years, I've committed to biking a lot more. I do all my commuting on bike, so I try to avoid the wet streets and snow and wet weather, and then I have to drive, but other than that, it's keeping me sane. So Cincinnati Boy transplanted to Boston, Harvard educated. So those of you who are interested, a lot of times we get questions like, ah, you know, what's the background of these folks? So your background is in biology. Yeah, biology, genetics. And also philosophy, these are some ethics courses as well, but then you also have a computer science degree. So it's a lot of hard work to get to where you are today, but maybe talk a little bit about sort of how you got to this position and some of the background. Well, it's basically following, leaving doors open, right? And sort of taking the opportunities as they come. I was very interested in when I started college in pre-med, but then wanted more of a diverse education. And so I dropped to pre-med and just stuck with the biology and picked up philosophy. Started working in a more of a theoretical area of biology and population genetics and evolutionary genetics, which got me into the analytics, the algorithms within population genetics. But then sort of, you know, who would have known, and this is what this person that I worked for at Harvard, Richard Lewinton sort of describes today is that, you know, who would have known that population genetics would have become so important. You know, when I was doing it, it was just, you know, engaging of the mind and understanding like this evolutionary processes of organisms. When he started, he was just looking for a place that he could, you know, apply his mind. Today, what we talk about within precision medicine is all population genetics, basically. So, you know, just sort of keeping that door open as the data sets began to grow, as the Human Genome Project evolved, and genetics became more and more important. Computers became more and more important. Storage and data analysis became more and more important. So just keeping those doors open and sort of following that path is what led me to where I am today. Well, it's an awesome story and thank you very much for sharing us and thanks for the great work that you and your organization are doing and you live in this Boston area. You can't help but be touched by MGH and Brigham's, I got personal experiences with both of them, amazing institutions. So again, thanks for coming on theCUBE. Appreciate your insights. Thank you. All right, thanks for watching everybody. This has been a special CUBE Conversation with Partners Healthcare. We'll see you next time.