 Live from New York, it's theCUBE covering Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Okay, welcome back everyone. We are live in New York City here at the Big Data NYC event put on by SiliconANGLE and Wikibon on theCUBE. In conjunction with Strata and Hadoop, we're 100 yards from the Javits Center up West 37th Street right next to the Javits. We're all the big data actions happening. This is theCUBE, SiliconANGLE's flagship program. We go out to the events and extract the signal of noise. I'm excited for this next segment. It has to do with healthcare and we have EMC here. We have Chris Harold, Global Field CTO from EMC, and John Jackson, Scientific Computing, Services Manager, Partners Healthcare in Boston, which is a very innovative organization. All the hospitals kind of pulled together under one big organization. Guys, welcome to theCUBE. Thanks, thank you. Appreciate it. Partners Healthcare, really known for its innovation, specifically in getting operational efficiencies around all the different distinct hospitals, expertise, which is a challenge I know in the Boston area, but to pull it off from an IT standpoint and then values of really big deals. And excited to hear what you guys are talking about with respect to the big data, just with making things real time. So first, John, I got to ask you, what are some of the innovation kind of concepts that you guys are working under? Because big data is one of those things where everyone's just waves their hand. Yeah, we need big data, we're healthcare. We're going to have real time information. It's like the alerts going off on the devices. You need people's health at risk. What are some of the technical things that you guys are doing with the innovation? So the Partners Healthcare Research Computing Group has been providing services for many, many years to that community to really make it excel, allow them to excel in research. And we've been providing computational services, platform, software, but also data. Taking that medical records data, packaging it in an easy to consume way so you can discover your cohort and go away and do great research. But it's been somewhat siloed and the trends today with the increasing amounts of data is that really to bring it all into one place is critical to leverage the value. I just want to say that folks watching, just to disclaimer John's comments or reflect his own opinions, not that of Partners Healthcare, get that out of the way. But I'm sure it is a great proxy in my opinion for what you're seeing out there. So I'll get your expert opinion. Just in the general, you need to see the keynotes. Oh, healthcare, it seems to be a nice glimmer and a nice gimmick to throw on keynotes because people can relate to healthcare. But there's some serious issues going on with healthcare because there's compliance, you got HIPAA, you got privacy, but speed of data and integrating data sets is a huge issue on the services side then on the research side, from genome to other really cutting edge research, you need horsepower, you need cloud, you need computation. So talk about that, Damon. What are some of the things that you guys are doing that's exciting that you'd like to share and the things that you're interested in? So exactly, so people are coming towards with all kinds of interesting projects and they've been hitting the limits of performance in kind of the traditional, either HBC or a large machine. And they're wanting an analytics platform that can scale not just to like 5,000 records but to a million records. So whether it's predicting what someone's going to need when they arrive at the emergency room or whether it's predicting what is the best treatment for this individual based on all of the records and the information that we have in our system going back 20 years. It's all about, so what we're looking at today is really the Hadoop platform, Pivotal Hawk, these big data technologies spark that can really facilitate analytics on that massive scale and bringing the information into one place. So huge for us right now is the genome information because it's got to the point where it's economical to acquire your DNA when you come through the hospital and Mass General is doing that. And so we have DNA information, we have medical record information, and all of the other research data sources that can be queried and people are building models to try to use that for the individual when they present in the doctor's office. And that's really exciting. But you need the speed to deliver that analytics quickly because it's real time. Chris, you work for EMC, which storage vendors. So you have the big iron storage which stores everything. You have a part of EMC now it's cutting edge but your role as a data scientist and you're in the analytics side. This is not the cutting edge or bleeding edge. That's where the value is and that we're seeing here in the queue but EMC obviously positioned for that. What do you see in terms of this and what are these guys doing that's unique? What would you share? Yeah, I mean, fundamentally, I think, I spent some time with John last week just catching up before we came out here and he hit on something that I think is pretty consistent across all of our customers right now, which is we understand the value of big data has to happen. We get that. We need it to be faster. We need it to be simpler, right? And we need it to be secured. And I think that was a point that John made to me last week that really resonates with everybody that we talk to. It's why we're building the solution set that we are around sort of the federation of technologies of EMC and our federation platform. For a long time, I think people had this impression that EMC equals big giant boxes of disks, right? Like that was our, that was what we did. And we've really, as an organization, kind of gotten out in front of this big data wave now with a solution set and a grouping of technologies and products through certainly Pivotal Labs but our ETD division with Isilon and ECS and those type of technologies that enable that massive scale rapid ingest, scalable storage, scalable compute, sort of exactly the things that John hit on that we're trying to address in the broader market. And I think the piece that we were missing that we're really focused on now is wrapping that all up in a platform that allows you to deliver in a governed and managed way, right? And so we hear a lot about the, sort of the Wild West that is the analytics space, right? And we're really trying to tame that in with a set of solutions that wraps that all up. And I think that's where all of our customers get it, all of our customers understand the importance of it. It's really, it's the gap is, I understand, make this reality for me. Show me how to actually stand this up and do this thing because there's just so many options and so many technologies and so much complexity. I got to ask you guys, I want you to comment on a concept that we've been kicking around the cube this week and just kind of an accumulation of most of our interviews. And that is that the systems of record, systems of engagement, we've heard that stuff and we see the systems of intelligence coming up from Wikibon now, which takes us to a home of the realm. Systems of record, pretty straightforward. Database stuff, you can implement it with that. That's all the stuff, structured databases. Systems of engagement really is the interactive piece. Applications using data, we're seeing that heavily with SQL on the dupe, getting the data into memory with flash. So under the hood there's some stuff there. But when you start to get into systems of intelligence it opens up algorithmic and machine learning concepts. I got to ask you to comment on the following concept. Humans, whether they touching devices or working at a certain speed, call it 100 milliseconds, your brain, see something on the screen, your doctor, you hit this, give me that. So kind of the recommendation engine side of data science. That's the speed of a human. When you get into systems of intelligence, the speed of machines is 10x or more faster than any human can think, right? So now we're in that wearables, internet of things. So I wanted you guys to comment and some color to envision around if machine learning and other things are happening, how do the machines do the work for the humans? If the speed is greater, then it's going to put pressure on the software and the analytics, because the analytics is where the value is. This new systems of intelligence becomes critical. So what's your thoughts on that? Can you just share what you hear from customers? Is this two out on the fringe? Is it two bleeding edge? Are you seeing the systems of intelligent code coming in? Obviously visualization helps see it for humans, but like when machines are processing streams at massive velocity, there's work that can be done there, there's opportunities. At the same time, what the hell is the technology that does it? Is it just machine learning? Is it streaming? Is it flow in the flow? It probably won't even hit a database. So a concept that gets thrown around is clinical decision support. So we don't see the machines replacing the doctor, but to be able to pass a whole load of information and give color to all the various options available, using sort of all of the records going back is a tremendous value. And there's a lot of digital pathology is an area where it's really exciting. So there's been research studies that have shown greater specificity with machine learning applied to detecting what is going on in those past slides. And that was a really exciting sort of concept, but what if we apply that to all of the other diseases? Can we get greater specificity through machine learning? But more so, can we just reduce the amount of time that it takes to process? It's kind of like when you walk into an emergency room, you hear all these beeps going off. If there's so many beeps and notifications going off at some point, it's like, what do you pay attention to? Do the humans have the ability to know that that little nuance beeped is different? Someone's dying? So like, that's the notification economy problem that we see with intelligence. Do you agree? Exactly. In the emergency room, things are moving so quickly. As soon as you can predict what medication that person's going to need or how much time they'll need to have care in the emergency room, all of these things can make care more streamlined and improve care for the patients. So we think there's a huge opportunity. Okay, so if we buy the thesis then, the intelligence systems, a lot of machines and algorithm software is going to augment the human component which has some finite millisecond time situation. So now there's a user interface question. So back to Chris, your thoughts on analytics, dashboarding, so assume that they're working together and they don't replace each other. I mean, one doesn't replace the other. Yeah, I mean, you can argue that some jobs may go away, but they'll shift, they don't want to touch with us all the time, new values we create. So what is going to happen on the analytics side? What do you expect to see? Because again, on the dashboard side, I talk to customers all the time I hear, I just, I need to know the dashboard like a hole in the head. Right. It's like, I need better dashboard. Yeah, I think, and I think really where people are trying to take this, and it's something that we focus on quite a bit within our solutions organization, just EMC as a whole. Dashboards, PowerPoints, I generate these great insights. I find a result. Copying and pasting in PowerPoint is where insight goes to die, right? So I need to actually drive an action with it. I need to create a result. I need to do something. And dashboards aren't good enough anymore because the expectation is that there's literally somebody sitting there watching the dashboard all the time and that's just not the information economy of today. Right, I need something to pop up on my phone. I need real time. I need alerting. I need whatever it is. And so really having those data driven applications directly integrated into the platform, that's where people are taking this. I got my eye watch here, so the notification could be. Exactly. The wearables. Exactly right. This is where the smart phones malls to IoT, right? It's exactly right. It is a feedback loop. And when we talk about internet of things, the connotation of thing is a watch or a sensor in a car or a pacemaker monitor or something else that's a digital thing. But we're things, right? And we create our own sets of data as well. And my interaction with a patient as a doctor to John's point earlier, that's real time records. And if I make a note or I update a field in that patient's record and suddenly it invalidates an entire course of treatment because of an allergic potential or something like that, that's that real time data that we need to really influence that outcome. And that interaction has to be happening at the point of entry, not later on in a dashboard when I pull up a report about this patient and realize, oh no, I sent them home with a prescription that's no good for them. Literally when I click that button, I need to know right now. So innovation on dashboard and clearly is not going away. That's going to be a trend. We envision reports of the same format that a doctor is accustomed to being embedded in the medical record system. Exactly. So they can go to an application that's embedded. They can call up a report and the analytics will take place on the back end, on the data lake. The results get returned and it's right there presented to the doctor in that framework that he's working in. He or she's working in. So I want to shift gears to some of the things that we've seen in Hadoop. So the scene we've been seeing is Hadoop next, which thank God we finally got there. Abstraction layer on top of Hadoop. Certainly we're bullish on it too. We've always said it's relevant, but it should be invisible, but the ecosystem is filling in around it. You see the big whales coming in like EMC, IBM, HP, among others. So the vendors are coming in because they have big solution sets that are not just exclusive. So the shift to Hadoop is great because now you can have a pile of data, move some compute to the data. In that example, storage is the value, right? So, because you're moving compute, you got to store it in storage. And you access it, pulls it out. When you talk about real time streaming, there's no value in storage, your value's in real time. So that means you're in a flow model. So now you're in kind of like flow theory. How do you guys look at that? Because that's where we see the innovation happening. Is that being talked about at the show here? Are people talking about specific technology that's saying, okay, I got it on disk, disk is covered. Certainly in flash and stuff helps that would spark, I get that. But we're talking about real time, people on the table, patients, scientific research where you have massive amounts of processing going on from time series to holistic data modeling. The machine's churning this out. What is that new technology at that level? What do you see? Is there anything out there? I mean, is this too early to talk about this? Well, I think that a lot of innovation that's taking place in big data companies like Facebook and Twitter is enabling the research community and the medical community to do great things. And I think that's a fantastic transfer of ideas and technology. And you guys, let EMC, what are you guys doing? So that you can draw parallels, absolute parallels with the use case, a Twitter feed or a stream from your heartbeat, all your monitors, same kind of data. And the potential is enormous of what can be done with that. Absolutely. Yeah, I tend to agree. I think we talk a lot to all of our customers and I mean, you guys are right on this boat too. Hadoop gets you so far, right? And it is very good for what it does. But okay, now I need to do real time. Now I need to do stream analysis. Now not only do I need to do stream analysis, I need to actually real time insert other values into that stream and then do an analysis and then output and translate that into the result. And I need microsecond level latency. I'm not doing that on disk, it just doesn't work that way. Spark is a great bridge. We've got other open source projects like a flink that's coming right along behind that. And this is the standard of this industry, right? This will kill this, which killed this, which killed this other thing. And that's just progress, I mean that's how it works. But I think what also people are realizing, and in fact I saw it in the report you guys released it yesterday morning, the top two cloud providers for analytics, right? You guys put out who's doing what in the cloud. And I actually posed this question to a bunch of our internal guys that morning and I said who's the number one and of course everybody Amazon, Amazon, Amazon. No, the number, the co-number ones are actually Google and Microsoft. And I dug a little deeper into that and in the Microsoft side especially, what are you using to do your data analytics SQL? Overwhelmingly. Yeah, clearly this show validates 100% SQL full on Hadoop. At blazing speeds is the lingual framework. Might not be the only one, but it is now, it's standard. Yeah, exactly right. To fact those things or whatever you want to call it. And having that ecosystem of tools beyond Hadoop that give you the real-time tie-ins, MPP, multiprocessing, parallel processing databases. All of that technology needs to work together, it needs to fit. And I think that's really what I'm seeing at the show is that people are really pushing the boundary of what Hadoop can do just because of its technology. Let's take, let's jump into that. So if you believe that SQL and Hadoop works, we've also got a validation that Kalmaner stores, our Kalmaner stores, our valid Cloudera throws it into there. Could-do announcement, which seems to be kind of like just to kind of say between HBase and some other stuff. But Vertica's been around for a while, we've seen successful HP Vertica. SQL on Hadoop, the table stakes seem to be, you got to have those two things to do large, fast scale. Okay, but that's just one element. Now, if you look around that, now you have diversity around use cases. I need cloud. I need on demand, yeah. This is now the new normal. So that's not a startup game. That's going to be hard for a Cloudera or someone to do because that's a lot to bite off and with all the value that's on the table. So, okay, I guess I'm just kind of riffing on this. The diversity of use cases is what we see in research all the time. So whether it's a researcher needs Mongo or they need Spark or they need Hadoop or they need Hook. And that's really where the FEDL, the Federation Business Data Lake is fantastic because I can carve out a different environment and it's isolated, segregated, but it's also utilizing the same data layer underneath. And I can provide you a custom environment to do your analytics and that's really flexible. It gives me a lot of flexibility. It gives peace of mind around security, you know. It's just great. Well, let's take a step back. I want you guys to comment as experts because they can take your hats off for partners in EMC because you both work for stakeholders, you know, you both have to serve up to the business owners who want value. So they're confused. So if I'm a CIO and I'm looking at our CXO and I'm looking out over the Hadoop landscape this week, I'm just as confused now than ever before. I've been vendor hopping for the past few years, poking and trying. Now I got to write a check and I need to have a team in place to do this. So like, I need to move now. So how do you guys talk to that stakeholder and what is available today and what is some best practices? And you can just clear that because that seems to be the number one confusing thing we hear. Yeah, from our side, I think that question is probably a little easier than it is from John's side. He has to defend a technology decision up. For us, it's more about this is an ecosystem, right? It's not a thing. You don't buy it. You don't buy a data lake. You buy parts that make up a data lake. And we've taken the engineering effort and time to stitch those together to solve some of these multi-use case functionalities. But the key for us when we think about what's the tool, what's the piece, what's the... I don't know. There's two guys in a garage right now writing the next big thing. That is happening right now, right? So I can't build you a platform that restricts you from innovating down the road because I made a technology decision today, right? And that's really what EMC is focused on with our solution set is we're working with the best of breed. So we obviously, we have the pivotal big data suite. Partners is a customer of that. From a technology perspective, genuinely taking my EMC Federation guy hat off. I think they have the most comprehensive suite of tools that enable all of that technology built around that Hadoop ecosystem. But, Cloudera, Hortonworks, there are other tools out there that are doing the same thing. There's market basket of things you could use. Absolutely. But this comes back down to my whole point, which is if it's an outcome-generating conversation, you don't really have to get dogma, have a lot of dogma around one end of the other. You can say, hey, I got EMC, they're a great partner on storage, and I'm going to use XYZ over here. And although EMC has something, maybe I'll use this or that. So the customer ultimately is architecting. It's like hybrid cloud. Exactly. It's not a product. It's an engineer now. So, Cloudera's easy. You've got public, private, and on-prem, and hybrid. So you got at least some swim lanes in cloud, right? And Amazon's helped there. It's their swim lanes and big data. Yeah, so for us, we're looking at the fabric, the applications, and then the infrastructure and plumbing. Those are kind of your swim lanes. The data fabric, that's genuinely the wild west. There's a million tools to do those things. Really what it's about is those are just parts. I need to know what I want to do with it. When we talked to John, it was what do you want to do? I have a research organization. They use a bunch of different tools because this guy likes Mongo and this guy likes Hadoop and this guy likes graph databases. Well, John, let's go to John. So let me ask you the question directly then based on that. Your job is to deliver the outcome. And engineer and architecture and have a team working on some stuff. The last thing you want to do is get foreclosed to a future benefit. So you have to think, okay, I got to deliver today, but I don't want to put myself into a bad situation, but I have to unwind it. So given that's the scenario. I'm sure you'd agree. That's a mindset, right? That's the mindset. What do you do? I mean, what are you doing right now? I mean, with the fabric kind of up in the air, you guys say, okay, I want to have flexibility. Let's get back, we're back to that. So what's your answer to the CXO saying? Hey, John, I want best in class, high performance solution today. Price somewhat okay. I can maybe go up in price, maybe pay a little more, but I don't want to get locked out of the downstream benefit. Absolutely. I want large scale, real time, save lives, do our great research. What's your architecture? So the nice part about the research IT is we're sitting in between the clinical IT and the research folks. And our mentality is somewhere in the middle. So the research folks are super agile, always trying new things. Clinical IT is more conservative, have to pick solid platform. We have the freedom to do POCs, to try things out and to take ideas from the research community. And that's somewhat what we do, but also take that enterprise mindset. So we need a supportable platform. We need a platform that will scale. So we have partners with ENC on the architecture side, and that's kind of really what we're doing. We have these use cases that come to us and we're running POCs and we'll figure out how it works. And if it works, then we'll scale out that model. But always, with the mindset. So you just shorten your mile post in terms of your milestones. You got to take it piece by piece. Take it piece by piece. Yeah, prove it out at a small scale before we, and then prove the value and then roll it out as a service to the whole community. So price probably isn't a big issue for you guys. I mean, partners are not saying you're going to just, you know, drunken sailor type checks to the vendor. But for the most part, your risk is mostly on the architecture deployment, less on price sensitivity. Most vendors, I mean, most customers that have risks, right? Yeah, absolutely. I mean, the long-term supportability is key to be able to roll out that service. So how are you guys doing on POCs now, just kind of in general? Are you moved out of POCs? Are you in production with big data? Is it mindset-wise working? How are the teams? Give us a taste of some of the dynamics involved in being on the front lines like you are. So again, partners is a huge organization, 50,000 people. There's the big data taking place all over. But in the research space, we're in POCs. So like they say, we have these great ideas bringing together medical records, genomics, all of the research data sets that are out there. And we're running POCs in several different verticals. And they're all really exciting. And we have a whole, yeah. What is EMC doing for you guys? Because obviously they have analytics, they have pivotal, they have EMC. What are some of the things that will you guys share when you're working on together? Yeah, sure. Yeah, so the EMC have been great partners. They come along with me, they meet the researchers and we architect out something with the EMC consultants. What are we working on? Specifically cancer genomes, DNA, medical records. And like you say, all the data sources that go with that. We're working on, so on kind of the more business side, sort of taking the medical records, can we predict trends? Can we identify what's taking place now? Sort of in patterns. Is it a service catalog? Is it an application? I mean, is it analytics? Yeah, so the service catalog is storage, compute and data services. And we're putting that together on the business data-late platform. So that the researcher or the end user can come along with aiming for self-service, self-provisioning. You can request your storage, your compute, your analytics platform that you need. And data services, whatever data set that is that we have, we want to be able to provide that to you via that portal. And that's the vision that we're heading for. EMC shares our vision with us really, and so it's been great working with them. Chris, any color on that? Yeah, no, I think partners represented for us from a solution set, kind of the ideal customer, right? They had the same vision that we do, which is, and it kind of ties into your earlier question about sort of the trends at the show. One of the things that we're focused on is how do you consume analytics even for a non-data scientist, right? And a lot of our customers, partners is kind of an anomaly in that space. You've got a ton of data scientists. Most of our customers might have one or two business intelligence guys or database type guys, right? Yeah, scale that operation up. Scale that operation up, and really it's about being able to consume data science without a data scientist. And so our data and analytics catalog that's built on top of the platform is that consumption model, and it just tied in really nicely with the vision that John laid out for partners. So John, I got to ask you a final question. What is your take and summary of big data NYC this year? This whole week is all about big data. You've got our event going on in Estrada and Duke. What's your takeaway? I mean, what's the aroma? What's the vibe? Where's the state of the industry? Just what's your quick take on? What's the summary this year? How would you log this into the file in terms of what's happened? I think there's great things going on around Spark, for sure. It was very exciting this morning to hear the emphasis on precision medicine right up at the national government level. I thought it was fantastic, and that's what partners is doing and committed to. And security last year was, security was a big piece. Like what was taking place around security was really going to enable us to do what we need to do. The conference is just getting going. So I'm excited to find out what's going to happen in the next couple of days. Chris, what's your take this year finalized? Big data NYC, what's your take this year in terms of what's happened? What's the walk away message? Yeah, absolutely. I think this is the year that big data turns to the corner, stops being something that people talk about, starts being something that people do. And we've really seen a huge uptick in adoption. And I think the big takeaway from the show, I agree with what John said. People are, it's about beyond who do, right? We are, that's the platform, we got it. Now let's take that to the next level. So I'm really excited to see the development that's going on. It's awesome, yeah. As Jimmy Fallon was saying, it's huge. That's done, I'll jump back to it. But that was a good skit. Anyway, guys, thanks for coming on. Really appreciate taking the time to share in your insights. EMC and partners, healthcare here outside the Cube, sharing the future of healthcare, big data and technologies. We'll be right back with more after this short break.