 Live from New York, it's theCUBE covering Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Okay, welcome back everyone. We are live in New York City. This is SiliconANGLE Media's theCUBE, our flagship program where we go out to the events and extract the signal from the noise. I'm John Furrier, the founder of SiliconANGLE, and we are here covering Big Data NYC, our event that encapsulates and it's in conjunction with Strata Hadoop and all the Big Data actions here. It's all about Hadoop, Spark, where Big Data is going. We got it covered here on theCUBE. Our next two guests is Bill Schmarzo, the Dean of Big Data, CUBE alumni. Great to see him again. And Aiden O'Brien, general manager of Big Data at EMC. Guys, welcome to theCUBE. Aiden, welcome to the first time on theCUBE. Appreciate it. Thanks John. He's a rookie, he's got to break him in. Bill, we knew you when, when you were just a smart guy. Now you're the Dean of Big Data, teaching MBA classes. So much has happened over the past six years of theCUBE. You've been on every year. I want to thank you personally for coming on. It's been great. Now you're like the star analyst. Appreciate it. So give us the update real quick. MBA program, what are you teaching? How has this evolved? Second book, what's the status of what's going on in your world? Yeah, so the MBA class, as you know, I teach at the University of San Francisco School of Management and actually they've officially announced me as a fellow of the program now. So I've officially have been anointed something which means you don't make any money but you get a fancy title. And what we've learned from the MBA class is that the challenge we're seeing with our customers and you can replicate it in the class is that while organizations are trying to hire data scientists, they're missing an opportunity to teach the business users to think like data scientists, to really create a more collaborative environment where the business people and the data scientists work together to try to identify those variables and metrics that might be better predictors of performance. Yeah, and we have Aiden O'Brien here who was a professional football soccer player, football being soccer player in the UK. Great to have you on as well. Good to see you. Thank you. Teamwork is critical in. Or for football, as you call it. Like any sport, you'll mention teamwork. This is what's happening right now in the big data world. The word integration is big. And even today and yesterday when we were on theCUBE keynote today, it's official. Hadoop is now under the hood. It's relevant. It's one of many pieces of the ecosystem. Now, we're talking about outcomes. We've heard this on theCUBE outcomes, all about outcomes. But when you start hearing outcomes, that means checks are being written. That means stuff has to work together. So talk about what you're seeing in that regard, the teamwork involved in regard to technologies and how you guys are approaching this. So yeah, so for the last few years now we recognize if we look back to where it started, everybody was thinking big data is about the data scientist. But then if you look at one of the reasons why so many data scientists went off and got sick of the job was because they're spending 80% of their time doing non-math, non-statistics work. And so we started to recognize that big data is a team sport. It's not just the data scientist. There's a data engineer. There's a data architect. There's the IT operator. There's the business analyst. There's the executives who sponsor this. So until you actually address all of those different skill sets and bring all of them together, you're not going to be able to actually take advantage of the opportunity that big data presents to companies. So big data team sport, what examples can you share with us where you've seen it play out in a positive way? Ah, so you see examples in all different industries. So the most recent one that springs to mind is around sort of a customer satisfaction with a series of, with an airline. And what was happening was they thought there was a particular reason why their customers were dissatisfied compared to what other airlines were seeing. But it was only when you actually managed to bring all the data together from a variety of different sources, then actually do the analysis on that with the data scientists, and then actually sort of build the application on top of which allowed those front line workers to actually change how they're receiving customers when they get on the plane and sort of improve that customer experience. You need all of those different skill sets all the way through to the front line of the business to actually be able to sort of have that positive outcome. Bill, when you're teaching in talk, while teaching obviously students, but also you talked to a lot of customers. What's their view? Right now, they're looking at what's going on at Strata Headdupe here at Big Data NYC. Say, hey, I'm getting a lot of signal from the cube and all the guests, and I see all these keynotes, all this metadata is coming at me, data. I'm getting more confused. I'm getting more full of all this talk. They then got to go back and figure it out. What are you hearing? What's the mindset today of that CXO in the enterprise, the practitioners who want to get the ball rolling and have these big projects? John, I think you summarized really well that they want outcomes, and standing up to technology is not an outcome, right? Building a data lake is not an outcome. What they want is they want to basically improve the decisions they're making. They want to improve customer intention. They want to improve predictive maintenance. They want to reduce hospital-acquired infections. So we're seeing organizations very much focus on not the technology, but on the business ramifications and the outcomes. And one thing I'll add to the conversation regarding collaboration. Innovation and creativity is a team sport. What I mean by that is that the best ideas, the most innovative ideas, aren't held captive by the senior executives of an organization. But we're finding that the best ideas sometimes come from the lower ranks. The people who on a daily basis are touching students, are working with patients, are teaching students, right? It's this collaboration is requiring big data because of this availability data is challenging organizations to leverage all the people assets of the organization to try to identify and uncover those variables and metrics that might be better predictors of performance. I want to get your thoughts on that point. Let's double-click on that. Because the drill down on that conversation is in the old way, data was a competitive advantage. You ordered the data. I put it in a stovepipe or a silo. Even today, you can argue that data's competitive advantage. You see companies like Twitter, Facebook, and the social network hold their data. And it's in a company, that's power. Not sharing is power in the old model. Now, take your thesis, which is sharing is power, sharing economy, whatever you want to call it. But the data model that we're seeing from successes is that when you integrate data together, more outcomes, more visibility comes into insights. So that insightfulness of sharing and integrating data actually is the new power base. So, how do you, well one, do you see that still being a problem? And two, how do companies get over from being hoarding data, being a data silo, to letting it go, now there might be compliance reasons why they might want to have stuff parked in the silo for whatever reasons. But the thesis is share, integrate. So how do you see that market and what are companies doing to overcome that? So, I'll take that first one, because I know you just got an idea and thoughts on it as well, is one of the things that when we get involved with customers, we know almost immediately whether that customer's going to be successful or big data or not. And the first thing that jumps out is the willingness to share data. If you have people who have silo data and don't want to share it, you know that that is doomed. Let me give you an example. Let's say you're trying to calculate the value, customer lifetime value for a customer. And you've got in your financial services organizations, you've got credit cards, you have mortgages, you have savings accounts, you have brokerage accounts, you have all these accounts and they're all hoarded data. And what may be a customer who's not important in four of those five different business units might be very valuable to the fifth one and in fact across all five of those may be a very valuable customer. But when you think about your business as a silo versus as we call strategic nouns, right? What's the strategic noun of a financial service business? A customer, right? So you want to know as much as detail possible about the customer's interactions with the entire organization. And if organizations can't broaden their perspective and share that data, they're going to be outwitted by the companies that can share data. A technical issue or a policy issue or a personality issue? A culture as much as anything else. I mean, I think until companies start to adopt some type of outside in thinking, which is, well, what does it feel like from a customer perspective? A customer doesn't... There's all these different divisions in an enterprise. They care about the experience they have with that customer. So for as long as that client can't actually sort of understand and assimilate what that experience is for the customer, then they're on a rocky road. I want to build on that point. I think what's interesting is, we're going to talk about, I'm sure, is this idea behind a data lake. And what we're finding is a data lake is not, it's impediments to data lake are not technology. It's cultural, right? And the ability for an organization to be willing to bring all different data together into one location. So the organization has a holistic view of their customers, their employees, their outlets, their trucks, their ATMs, their jet engines, right? Whatever their strategic nouns might be. So the data lake is really forcing organizations to confront those cultural issues of how am I going to share data in order to make better decisions and to uncover new monetization opportunities. Yeah. So talk about further about that because I want to get more into the mindset of the customer because, again, back to the power base of sharing. If people are sharing, what technologies are out there? Because data lake, let me back up. Yeah, let me back up. Answer this question. We've heard this on theCUBE yesterday. Data lakes are becoming data swamps. So, meaning, I've been storing it on Hadoop, been pouring it out. Is that a question or a statement? Yeah. That's a statement to happen on theCUBE, but I want you to answer why that's happening to you. It kind of pollutes the data lake vision, but I don't mean to be anti-EMC, and I want to point out that people are forming data lakes, or I think they're more like data oceans, that's my personal opinion, but you have a data lake, but if they don't take care of the data or the process, that came up on theCUBE. How do you guys talk about that? How do you have a clean data lake? So, I'll chip in and then I'll let you go. Yeah, all right. Who's going to go first? I mean, the first fight on theCUBE, here we go. All right, exactly. We are in violent agreement on this topic. I think it's interesting coming here this year. I think there's an increased level of sophistication around what people perceive a data lake to be. Traditionally, it's always been just about the storage and everybody's given it, but what about this data swamp? It's impossible to govern just throwing it in. There isn't the answer. Obviously at EMC, for a number of years now, we've understood the importance of data governance and managing that, the security of data, being able to publish it and provision it, and that's what we're obviously doing with our platform. But I think the rest of the market is catching up now and understanding that it's more than just throwing it in one single repository. Yeah, I mean, I'm not surprised that organizations have these data swamps because what is the basis for deciding what data goes in it? It's random. What we do is we take a very thoughtful process for working with clients to understand from a business perspective what data is most important to them, what decisions they're trying to drive, what problems they're trying to address for a hypothesis they're trying to test. When you take that approach and you say, okay, given the hypothesis I want to test or the decisions I want to make, what data do I need to drive that? And if you have that kind of a thoughtful process, the data that goes in will be there to support that. So it won't be a dump. And by the way, what Aidan said is, you've also got to contemplate things like data governance and data security and data lineage and data catalog. All the hard stuff around the edges is not fun but makes it necessary for organizations to trust the data in the data lake in order to make support the decisions they're going to make. Okay, it's great. This leads me to my next level of question, which is where the value is, right? The values in the analytics. We've heard that on theCUBE. That's a fact. We forgot that it happened on theCUBE. So let's talk about that. So data lake implies, okay, I'm going to store data in a lake and I'm going to put it on disk or some form of media. I can put compute to it. I can do all kinds of things. That's the systems of record kind of world. You move into systems of engagement where you have much more interactivity. That could be the mobile app. You brought up the financial services value of the customer example. The analytics price point and the architecture involved should be related to the value and the price charged for that service. Meaning, if I can give you real-time information about the value of a customer happening in real-time, you'll pay more for that if I'm a customer to a vendor versus storing stuff in a data lake might be stored on storage. So the analytics markets are looking at business model innovation saying, hey, what is the tiered pricing for value? So talk about the value tiers in analytics from the data lake. So a couple of thoughts based on that, John. I think for me, I'd like to extend that idea because just doing the analysis and just doing the analytics and getting the insight, that's not actually enough. You've actually got to do something useful with that insight. And what's very much in fashion and in vogue at the minute is building these data-driven analytical apps that we've all got on our smartphones. Just doing the analysis isn't actually enough. And then afterwards when we're talking about the pricing models around real-time being more valuable, I think fundamentally it's the right time around data. Just because you have data in real-time doesn't mean you actually need it in real-time. And therefore that undermines the value type conversation. So for me, it's much more about, you know, getting data at the right time and then sort of, that's how you should drive the value conversation. Well, I know this conversation is going well because my eye watch is going crazy because I can tell when I'm getting the tweets, Twitter's blowing up from the conversation because my watch is going off. It's like a little signal, keep the conversation going. I thought your hand was just doing that all the time naturally. So I want to go back to this point because again, I believe, and I've said this on Facebook, that on one time, analytics, the beauty's in the eye of the beholder. The customer can decide what the value is of the end and what they will pay for that value if they're buying it from, say, a service or vendor. So the question comes in, the different architectures, for instance, if you're a real-time financial services, you might want millisecond access to certain data because that's going to give customer value, whether it's fraud detection, could be medical, could be health, that value is so valuable that I might pay for certain architectures versus saying, hey, I have access to information on records where it might take, run a report, get that a day later. So this is where we start to see the analytics market kind of waffling back and forth is, how do I enable that enablement? So here's the approach that we take, John, is that when we work with clients, we identify sort of the business problem they're going after or a key business initiative. And then what we do is we take that key business initiative and we decompose the data events that comprise that. And what we're looking for are data events that if we could identify them sooner, maybe minutes sooner, maybe hours sooner, maybe days sooner, provides an opportunity for us to act and monetize. Let me give you an example. We're doing a project for a hospital and the hospital's key business initiative is how do I reduce hospital acquired infections, staff infections, right? And so what we've done is we've created a HAI score on every one of their patients. When a patient comes into admissions, we figure we got between four to five minutes to have the nurses check out the patient looking for two things. One, do they have any open wounds? And two, is their treatment going to require a catheter? If those two combinations with the score put the patient over a certain level, the nurses are authorized to take them directly out of admissions and directly into a room, right? Don't want to get sick. Admissions is the worst place to be because everybody's sick there, right? So some decisions have to be made in sub-second, but some decisions you literally make in minutes or hours if I can identify that sooner so that I can act on it sooner. So a lot of what we do from trying to figure out from an architecture perspective, how fast does it need to be is based on how fast does that decision need to be. We don't need sub-second for that decision to decide what their HAI score is going to be. We've got four or five minutes. And what's your take on the marketplace with respect to that? Because what he's bringing up is real time, right? You got to get good analysis. This event is Strata Hadoop. It's the Hadoop world called Strata plus Hadoop. Now it's called Strata Hadoop. Should it be called Hadoop, I mean, Spark World? I mean, there's a lot of buzz around Spark, right? So that teases out the notion of faster analytics, custom analytics, cloud analytics. I think for me, it's all part of a maturation which we're seeing in the market. Before, obviously, as you say, it was much more Hadoop focused. Now we're seeing there's other types of data processing framework, and people are, you know, be able to have a more sophisticated conversation in that space. And as a consequence, people are also seeing that other parts of the architecture are more important. I think, you know, especially you see with Cloudera this week, with Kudu, you know, and other types of things like Taki, and people are starting to say the storage substrate is more interesting now as well. I think just generally, people are becoming more mature. Best validation for EMC. Storage is now important. Again, that's kind of right. We've got to put the data somewhere. I mean, Cloudera is literally saying, okay, storage, Hadoop's the storage layer. It's not even, I think, especially as it moves into the enterprise, people understand that, you know, the science for Hadoop of yore is fantastic for those, you know, Googles and Facebooks of the world, but we need something different for enterprise. And obviously that sort of plays a very nice place. So Hadoop, I mean, we heard that today, yesterday, it's a storage layer and that Kudu is just a way to say, HBase is too hard to work with, custom suit, the IN stuff, HTVS is important, Hadoop's going to be there. What are you guys doing to enable Hadoop? What specifically are you guys doing to say, okay, because you're going to store it on EMC drives. EMC is the leader in storage. So what's the storage layer story for EMC and Hadoop? So it's really interesting. Obviously each of our individual product divisions, you know, are working in that space and there's a number of different innovations in those areas designed specifically at helping enterprise, you know, whether it be from a ease of deployment, security or whatever that might be, you know, sort of within the solutions part of the organization, we're actually looking to bring all of those different platforms because we understand that different big data workloads require different platforms underneath it. So what we've done is what we're seeking to do is to provide a single platform. So regardless of what outcome you're seeking with, regardless of what use case you're going after, you have a single platform that can support all of those different use cases. So make it frictionless for the buyer. Absolutely, so if you go back to the team sport, you want a seamless experience for all of those different players in your team, data scientist, app developer, et cetera, et cetera. So I'm the CIO for CXO, you know, Aiden built, sold me. I'm going to use big data. I heard this Hadoop thing's pretty big and it's invisible, but you guys abstracted away the complexity. How do I consume this? What do I do? What do I, what am I buying? Drive? Am I buying software? What am I doing? They're going to consume the big data. Well, there's obviously- There's products that you're selling to this most of their outcome. Products, what we tell clients is you're buying better decisions, right? I mean, and then what you're trying to do is you're trying to optimize key processes and monetize your data to find new revenue sources. So they engage with you guys professional services on the front end? I mean- We do a lot of consulting up front. We run this concept called a vision workshop with our customers, which really is a two to three week engagement where you make sure you understand we identify where and how they're going to make money, what parts of the business. We go from there, we go through a proof of value, not a proof of concept, but a proof of value to show them what the analytic lift is, show them the ROI, and then we operationalize it, right? Which- I mean, that process takes what? You said two, two months? The vision workshop's a two to three week engagement. The proof of value is probably a 12 to 14 week engagement. And then the operationalization with our data lake is probably six months or so. Did you do a promotion? You could do a data lake in a day. What was that? I saw a promotion. So that's the Federation Business Data Lake. So that's the platform that allows us to, you stand up that platform and the empirical evidence is showing that most enterprises are taking six to nine months to stand up their environment. With this pre-integrated solution, you can stand it up and you're starting your analytics within a couple of days, not sort of six to nine months. Yeah, that's a game changer. Yeah, a game changer. I mean, when you show them in the vision workshop, how much money they can make, time to value becomes everything. More cost, too. I mean, ridiculous. How fast can you do this? How fast can you do this? All right, so let's back down to the sports to end the segment on here. Let's talk about sports and data force. Our customers in shape and to handle the big data tsunami that's now here was coming for many years. But now the reality here at this show is, Hadoop is beyond Hadoop now. It's done deal. It's well done. But people want real reality. People are writing checks. As Merv Age from Gardner said, it's not a lot of production still ramping up. So the POC, we've checked that category. Now we are in the reality phase. Are the customers in shape to run, to play that soccer game? Or are they in shape to handle that fastball called outcomes? In fact, John, the thing I would challenge on, you said the POC, our customers aren't doing POCs. They're doing proof POVs. They're doing proof of value. They want to prove that the analytics can prove. They want to prove the ROI. And we're seeing ROIs, 700,000 percent. Okay, let me rephrase it. Are they prepared to operationalize this? Because again, you're decked down to the reality, which is, okay, I'm convinced, I got to write a big, fat check. I got to operationalize the big data, which is not that easy. You got a lot of multitude of the point solutions. It's under one holistic view. So what it might take on is like, if I'm the manager of a soccer team, what I'm in the situation is, I've got to go and get the right players on the pitch. And I think a lot of enterprises are now saying, okay, fine, we've got some of the players, but we don't have all of the players. How do we actually sort of, you know, get the right skill sets? And then how do we actually put them in the right formation to go after this? And then afterwards, you know, sort of, the critical thing coming down the line will be, will those players trust each other to go for each player to go and do their job? But that's going to be for six weeks. Dave, I want to always use football metaphors, but we'll stay with soccer here. Who's the star player on the big data team? Who's the striker? Who puts the ball in the net? Is that the CIO? Is that the data scientist? Or is it to be determined? So for me, it's, you know, it's not a visionary executive, whether it be on the IT or the business side, who knows how to change the business, who knows the outcome he's going after. He's the guy who's going to get the limelight. And his support is his team. Exactly. And it's a team sport. Big data is a team sport here in theCUBE. We're talking about that here at Strada, Hadoop with Big Data NYC. I'm Trevor with EMC inside theCUBE, live in New York City. We'll be back with more after this short break.