 Live from Las Vegas, extracting the signal from the noise. It's theCUBE, covering IBM Insight 2015, brought to you by IBM. Paul Gillan here at IBM Insight 2015, Mandalay Bay in Las Vegas. We're wrapping up the day here, it's getting toward the end of the day. It's kind of emptied out here in the hall, but we're still hopping on theCUBE. Got plenty of talk about still. I'm here with my colleague George Gilbert, a Wikibon analyst, and joining us is Sean Pooley, who's VP of Product and Offering Management with an IBM Analytics. Is that a correct title for you, John? That is correct. And Sean spoke today at the afternoon keynote about complexity and the cloud and simplifying things, and the cloud is the next point. Won't you just give us the 30-second overview of what you talked about? Yeah, sure. Thanks for inviting me, Paul and George, and I guess you're saving the best till last, I guess. Yeah, absolutely. So, look, there was a time when it was really nice and simple, wasn't it? I mean, you had this good old structured data, and it lived in one place, and now it's radically transformed. It's hybrid in just about every way. It's on-premise and it's off-premise. It's structured and it's unstructured. It's easily accessible from wherever you happen to be and to be able to interrogate it in any particular way you want from wherever you want. And so we've gone from this very simple environment to one that has become necessarily more diverse, but at the same time, more complex. Gartner made some waves earlier this year with a report that said that Hadoop was basically floundering, that while awareness was high, actual usage was low because of complexity and just the skill sets are required, the learning curve. Are you seeing that in your customer base that this complexity is a barrier to adoption? Without a shadow of a doubt, there is absolutely tons of complexity. If you think of what Hadoop is, it's really just a series of different packages, 15, 20 different packages, sometimes integrated, sometimes not. I think we're into the couple of three dozen or something now. You know, you lose count, right? You kind of, they've all got funky names as well. And so, you know, that's a degree of complexity in and of itself and just even having the open source itself integrate is a challenge. That's in fact why we initiated the Open Data Platform Initiative. It now got something like 30 or 40 companies and the whole idea was to create a standardized set of Hadoop packages so that our clients could build analytical applications on Hadoop knowing that they could move it from one distribution to another because open source doesn't mean standards. Open source can be just as locked in and proprietary as any other kind of capability. So that's one form of complexity that's holding our customers back and there are many others that we could talk about. You mentioned something interesting with the word platform and ISVs. How is the definition of a platform, an analytic platform changing or do we have one even that ISVs can target? What might that look like? That is a great, great question. So again, when it was simple, you'd maybe just have a relational data warehouse and you'd maybe have a business intelligence tool over the top of that and that was your analytics platform. Now you've got a variety of different data sources, some structured, some unstructured. You want to bring in that exogenous data that you've been hearing about from the weather and from tweets and what have you into that environment on which you'd want to build some kind of intelligence and then the analytic layer itself is changing at the same time. So not only do you have transformation happening in the data layer, there's transformation happening in the analytics layer. If there's one layer that we would want everyone to be building to, it's Spark. We're investing very heavily in Spark. We see it as the, if you like, the analytic operating system and the reason we see that is it has the means to be able to abstract away from the underlying data infrastructure. So as the data infrastructure proliferates, having one platform on which you can build your analytics is a great thing to do. But, I was just going to say, when you guys did the Spark announcement, we were actually there at Galvanize that day and we were broadcasting and we were very excited. In fact, we thought it was as big an announcement as the commitment to Linux 16 years before. But then people are already saying, well, you know, there's new things coming along where like real, real time for internet of things where we don't have to, you know, do these micro batches or whatever, whether it's Flink or whoever. How long can Spark take us into the future? I think quite a long time. So, how do I think of Spark? I mean, yes, there are lower latency means of streaming analytics that are lower latency than say Spark streaming, but they come with a higher degree of skill requirement. The value of Spark to me is not its complexity or its simplicity. The value is its abstraction layer across multiple data sources. Think of it another way. You know, when the telephone was first invented, how would it be if every time you wanted to phone someone, you all had to be on exactly the same infrastructure, so it would just be point to point. And you can imagine what that mess would be between you and I, Paul and you and I, George and you and Paul, that's a mess. So the beauty of the Spark layer for us is it allows you to build one common analytics layer or set of analytics that can interrogate the data wherever it has to be, rather than the old world of building your analytics specifically for a relational model, you know, for IBM, for Oracle, for someone else, and then building it for somebody else in the no SQL world. That's highly ineffective as far as we're concerned. So this common layer and the ability to interact with all of the different data stores in an abstract way, big, big deal. This is sort of like the Group 3 fact standard, what that did for the fragmented fax industry many years ago. Who are you talking to these days? Who is the customer you're trying to reach? Who's the key contact who is making decisions right now in the enterprise related to the big data and analytics products that you sell? That's yet another fantastic question. I'm going to run out, I think I'll get three, that's a fantastic question, then I'll have to do something different. No, you can have as many as you want. Look, the chief analytic officer or chief data officer is redefining the landscape of what's happening in most companies. That's one kind of buyout. Clearly there's a set of new developers building a new set of analytic applications, that's another, and then the data scientists him or herself. Three very related constituencies. If I look to the chief data officer, there is no chief data officer or chief analytics officer that isn't looking to the future, not next year, but five and 10 years. If you look at what's causing that transformation of the data layer, it's really the access to connectivity. So when we had dumb screens, you had a certain level of scale. When you went to client server, you got the next level of scale or putting pressure on the data infrastructure. Then we went to kind of web scale and that brought the next. And then mobility again brought more pressure. And as we had connected devices and there's nobody, no industry that's not looking at how they bring connected device or as some people would describe it, IoT capability. That's just blown the doors off the back end infrastructure. And everyone is talking about the scaling of this back end data and the dark data that's coming into the enterprise. So there isn't a chief data officer that isn't looking five and 10 years ahead and thinking in the scale of data that we think of as being large today, hundreds of terabytes and petabytes as being something that's very small and trivial at some stage in the future. So every one of those execs is looking for this next generation of data architecture. Now, what do they want to get out of it? They want to get new insight out of that new data. We've been talking about, for example, marketing to the market of one. But how do you do that if you can't pull together all of the entity data around pool? You must have, I don't know, you probably don't even know how many digital footprints you've got out there. But in order for people to know who you are to be able to sell or market to you, they need to bring all of those things together. So one of the reasons the chief data officer or chief analytics officer is looking at this new scale of data infrastructure is because they want to get combinations of the new data with the existing transactional data to get new insights, to be able to personalize offers, to know when to make an offer to you, when not to make an offer to you. Now, we heard a great example, I think, from the Twitter guys today, that said on Monday and Tuesday, people don't ask about ice cream or talk about it. On Wednesday, it kind of builds up to Saturday. Who knew? Not in my house, obviously, but who knew? So anyway, the chief data officer is a primary buyer, and there's two others we could talk about. Well, we'll come back to the other buyers, but I just have a question on this immense scale of the data, if this analytic pipeline that sort of works in two directions. One is sort of figuring out what's the best offer to make, and the other is coming up with a model that figures out what offers to make. When you're dealing with 10s of petabytes, perhaps more, how difficult will it be to keep that model up to date? Okay, so first of all, let's just talk about the scale of the data. It's funny, actually, when people talk about the big data, they say they hate the term, but how big is big, right? So it's reckoned in the world today that there's something like, I think it's seven or eight zettabytes of data. Whatever that is. Who knows, right? And by 2020, it's reckoned to be 35, 40. The funny thing is, if you actually ask most human beings what a zettabyte is or a zettabyte, can't tell you. So just to give you some context, a zettabyte is enough paper to cover the continental United States and Alaska seven feet deep. That's one zettabyte, and we will probably have 35, 40. Now, big data challenge is like a needle in a haystack. It's like a needle in a haystack. And so we're going to see a new set of technologies that can actually resolve entities automatically, because the idea that you're going to be able to figure out all of the connections between all the exogenous data just isn't going to exist. So what does that mean? Well, if you were a small company, you might have had a small data set. And if you were a big company in the old days, you would have had a big data set. Well, the reality is, any piece of data can be relevant to any company no matter what its size. So what that means is, whether you're small or whether you're large, you're going to want to have as much access to as large a corpus of data as possible. One, two is you're going to need technologies that actually find the relationships in the data itself, and then have that data find a relationship to you. That's how it's going to change in the future. And we have that technology operating today. We use it, for example, in anti-money laundering solutions. It's all about finding in intelligence terms, the weak signal in the noise. And it's the same when I'm trying to find what the right thing is to market to Paul or to you, just- Isn't that what Watson is all about is finding signals and noise? Precisely. So you're going to find this entity resolution, entity analytics technology, is going to become quite pervasive because the idea of us as human beings finding the relationships in the data, it's going to become infinitismally small as the data becomes unimaginably large. I want you to finish answering the question earlier about who the customer is. You talked about the chief data officer. You said there are a couple of others. Who loves to follow you? So this relates very much, in fact, to George's question about the model. So we have this kind of growing profession of data science and data scientists. Some would say that machine learning and artificial intelligence will obviate that as a profession. We don't think so. At least if it is, we don't see it happening anytime soon. And so that community of individuals are the people that we see building those clever models. A good example would be, say, Uber. We have a set of developers in Uber that build the application, but we have a set of data scientists that build the model, which tells Uber how to do that right increase when it starts raining. Very clever data model, which does the surge pricing when it's raining in New York. Normally if I get a car from battery park to Grand Central, in normal times that might cost me $6. The other day I did that and it was raining, it cost me $30. That's really clever data modeling technology in application. So there are many industries that can take advantage of that kind of cleverness. That's the data scientist profession. Secondly, there's a whole new realm of analytical application developers. People that are building analytical applications just for the sake, if we went through the transactional era and the operational application era, then we went into more of the engagement app. There's a whole set of new analytical applications. Ones like the ones you saw today, like the one for me.com application, where it can intelligently understand, in natural language, what you mean when you say a fruity red wine. And that's a whole new class. Yeah, and just to touch on that, this is an app that records your wine preferences and then presents you with personalized recommendations based upon the terms that you use and can actually be integrated with the in-store inventory system which was demonstrated this morning. Exactly, and that's just a good, that's sort of like a consumer example of the inside economy. Another class of application would be the kinds of predictive applications that we have built with people like Pratt and Whitney. We analyze all of the data streaming off of all Pratt and Whitney engines as they fly in flight. We do predictive analytics on that data as they're on the wing so that we can predict what maintenance is gonna be required so that when they land, all the right parts are in place for when they land at the airport they're gonna land at. That's not an operational application in a sense but it is purely based on analytics and predictive technologies. So another example. So two great stories you told there but how capable do you find your customers are to build those kinds of applications? You talk about, I'm not talking about Pratt and Whitney specifically but you talk about predictive maintenance, huge cost saving opportunity. How well prepared are the enterprises you work with to take advantage of these opportunities? Not as well as they could be. My observation in terms of what I'll describe as round one of the big data and analytics transformation has been very technology and tool centric. Round two is gonna be more about the application of the technology. So we know for example that in order to build any of these applications requires data engineers to wrangle data and more than 80% of any of the project can just be about trying to find the data integrate all the different tools, et cetera, et cetera. Well in that old adage, time is money. Saving $100,000 on a tool, when you're spending tons of money in building and integrating doesn't make sense. I'll give you a good example. One of my clients, I can't tell you who, is about to hire 500 data scientists. 500 data scientists probably cost something like $200,000 a year, something of that nature. And some technology people would be debating do they buy a tool for $2 million or $4 million? What difference does that make when you've got 500 data scientists? In fact, if one tool is twice, three, four times as productive as the other, you're looking at the wrong metrics. So in the world of the insight economy, hyperspeed is what matters. And so to your point, Paul, having integrated technologies that allow you to seamlessly integrate open source with the existing systems, which is necessary, whether it's on cloud or on premise. And for you to be able to ask questions of the data and you as the end user or the business analyst, not even knowing where that data is, that's where the market is going to go in round two. And that's where IBM has a distinct and clear competency overall players. We're out of time. Sean Pooley, I'd like to thank you for being here. VP of product and offerings management at IBM analytics, seems we haven't run out of, haven't run out of topics to talk about in a full day of talking analytics here. Some other great examples you've shared and perspective on who's buying analytics. So thanks for joining us. My pleasure. Thank you very much for having me. We'll be right back with our wrap up at day one here at IBM Insight 2015.