 Okay, we're back. This is Dave Vellante at Wikibon.org, and this is theCUBE, SiliconANGLE's continuous coverage of IBM IOD. theCUBE is our live social media studio. We go out to all the events, we extract the signal from the noise, and we bring you the smartest people that we can find. We try to get in practitioners, people who work with practitioners, experts, domain experts, bloggers, pundits, and of course we try to bring our own analysts and journalists as well. Go to Wikibon.org for the research, go to SiliconANGLE.com for all the news of the day. Check out SiliconANGLE.tv, check out Kristen Folletti's news desk in the process of launching our 24-7 network. We got a lot going on. You got questions, hopefully we got answers. And if you're a big data practitioner or if you want to be a big data practitioner, this is going to be a great segment to boost your big data IQ. This is Dave Vellante, I'm here with my co-host. I'm Jeff Kelly, else from Wikibon.org. We're here with Paul Zacopoulos, Director of Client Technical Professionals in the Information Management Group at IBM. Welcome to theCUBE. Thank you. So tell the audience a little bit about your role and then let's just dig right in and talk about what you're seeing on the ground there in terms of big data practitioners. Yeah, you bet. I'll tell you, it's 20 years at IBM, so which I think is rare for a guy of my age, I guess. You started at 15? I did, I felt that way, but so I spent 10 years in development and then moved out to the field. Started in database technology, pretty deep in there, written a number of books and then moved into the big data world exclusively about a year and a half ago, two years ago, really saw that growing market momentum and really chatter about big data and probably a lack of understanding what big data was. And so now I run our organization at Technical Professionals, there are about 800 of us worldwide, a responsibility for those folks and we help implement our solutions and help teach our customers about how to get flat in the time and analytics of our solutions. So you said you've written a number of books. Yeah. On which topics? Take us through some of them. I'll take you through the whole biography. There are not many books. Bibliography, but so... What are some of your favorites? DB2 for dummies, DB2 certification for dummies, wrote a book on governance on SGML, on the new book here at the conference, Harness the Power of Big Data, and last year it was really understanding big data. It was the book I wrote. So what do you make of Oracle's, Larry Ellison saying that Oracle is the first multi-tenant database? Did you agree with that? I think if you look at Oracle 12C, so I used to lead the competitive team, so I know Oracle pretty well. I think anytime Larry says anything, there's a grain of salt to it. I think 12C, I'd say the C stands for catch-up. If I looked at the multi-tenancy and I look at separation of concerns, separation of duties and those kinds of things, they've been in DB2 for a long, long time. It's a lot of catch-up there. There are a couple of nice things in the pluggable architecture that I'm looking into, but by and large, I don't think there's a lot there. It's more catch-up. I'm biased, but that's what I'll tell you. Okay, so we had to get that in. All right, so tell us what's going on with customers. What are they talking to you about? Where are we at? Take us through that a little bit more. Yeah, in the context of big data? Oh, yes. Yeah, yeah, so look, what can I tell you? Big data is the hot, hot word, right? So I was looking around LinkedIn the other day, like a 60% increase in big data termed skill for a profile, so unfortunately what that means is it's becoming so ubiquitous that it's going to kind of dilute or fragment what the understanding is. So one of the things I'm finding customers doing is putting their arms around what big data is. And I have to tell you, the word big data, I hate it. It's like the worst term ever, but apparently the industry has gotten around it because it implies that all data is big, right? And big data is so much more than volumes of data. We look at things like its veracity, its trustworthiness, so to speak, its variety, the speed at which it arrives at my organization. So that's the first thing I start with clients is let's level set on what big data is and I take them through that. And then I have to tackle the client who's looking at big data as the science project because that is a road to failure. I'm going to tell you that right now folks, right? Have to attach it to a business need. So we see this struggling on what is the business case for big data, where we see this science project coming up. We want to change those kinds of things, obviously. But that's the struggle. Give me the business case, give me the payoff and how's it going to be different than what I'm doing today? Those are the key things. So I'm inferring from your comments that a lot of people say, okay, we got to do something with a dupe. So let's go out and try it. And you're saying that's not the place to start. Yeah, well, you know, it could be the place to start. So the first thing I'll say is the biggest myth is that big data is Hadoop, right? So Hadoop is part of the big data ecosystem, right? And so we're going to look at different types of technologies that are purpose built or suited for different things. Hadoop is an at rest engine, the same way on the Tesla or a Teradata or an Exadata would be an at rest engine. I think big data requires in motion as well. So you look at our infocere streams, look at TwitterStorm for example. So we have to have that discussion. But speaking on Hadoop, a lot of people are jumping on that. How do we get going on that? That's some good purpose built things that can solve some problems if that's your issue to start with. I want to analyze raw data. I have a number of data that has a very short shelf life. So when I bring data into my warehouse, right? It's expensive no matter what the target is. I'll propose to you that we have the lowest cost per terabyte platform around for what you get out of it, but I still have to enrich the data, cleanse the data, document the data, right? Understand it's lineage, that's expensive, right? So if I'm in a discovery phase, that's great to go into Hadoop and try to discover those kinds of things. So there's some use cases to get started there or I want to analyze data that has a very short shelf life, but I could have other big data problems. And I think I've finished your question with the most common big data problem I see is clients to me, they're guilty of not knowing what they could already know. They actually already have big data assets. They don't know they exist. So I want you to think of back on for the windows people because you're all macked up up here, right? You download Google desktop, right? Why? Because I find files all the time, I didn't know I had. And so you can imagine working at a large organization, even at IBM, I can't find stuff to save my life. We have some of this information. So we've gone from finding a needle in a haystack to finding a needle in a stack of needles. And that's a big data effort and that has nothing to do with Hadoop. And so the big data story depends on what your pain point is. So where would you say we are in the maturity model of big data adoption? Are we still in the kick in the tires phase? Talk about where we're at specifically. So IBM just released a study. We did it in joint with Oxford University on big data. We talked to a number of leading companies and professionals around the world to figure out where they were. I think you're looking at around 50% are in the education and planning stage. We're in the very low double digits, maybe 16 to 22% are actually doing something with it. And some of the problem there is around the consumability. So we're not all Facebooks and LinkedIn's and Yahoo's. We don't have thousands of developers who are Java, Java and their DNA and programming. We have a lot of existing investment in SQL skills and those kinds of things. And they're struggling with how do I get here and how do I get the skill? So what we have to do is bring consumability to it. And that's what's stopping that adoption from the learning phase to the execution phase. So I think one of the things of the platform I'd like to talk about is how do we flatten the timed analytics curve? And you do that with various features in the platform. And so in the big data platform, that's what we do. For example, we've put a declarative language around text analytics. So listen, here's a dupe. You go downloaded, you get it from Cloudera, you can get it from us, non forked version of the dupe. And then I say, go build some text extraction. You're going to do some social sentiment analysis. So where do you go from there, right? In our platform, how we're trying to address that is to say, we understand that SQL democratized relational database. So now I could write this declarative language. I didn't have to know the underpinnings. There was an optimizer which said, go get it from this table or perform these algebraic expressions, gave it back to you. Then we gave it a tool set, whether it's Excel or whether it's a tool builder like Cognos, a SQL builder. Now everyone can write SQL for good or bad. For text extraction, now we've generated in our platform, AQL, annotated query language. Looks like SQL, has a built-in optimizer that understands that the CPU is critical in that kind of work and has an integrated development environment. So, consumability. Paul, our customers looking at their big data investments. Who's their data investments on a portfolio basis? In other words, you've got the traditional data warehouse space, the BI piece, which in many respects failed to live up to its promise of a 360 degree view and predictive analytics. In a lot of ways, the Enron debacle sort of helped boost that business a lot with the reporting and the compliance and that was a real win in the sales. But there's a lot of spending that went on there and then you've got this sort of new stuff, a lot of it's experimental, but clearly people are adding some business value. Our people and our customers and how are they managing that portfolio between the new and the old? Yeah, some are struggling, right? Some, I mean we're not asking clients and no one should be asking clients to give up their SQL investment and their warehouse investment because it's very purpose built for specific tasks and it does those tasks very well. Pre your comment on the Enron debacle, right? We had to have this trusted reporting data source. If you look at some of the NoSQL databases, right? We'll kind of lack in consistency, right? We'll trade acid properties of a database for the base properties of NoSQL. We'll worry about consistency later. So I'll answer your question with an analogy and then you'll see kind of in that area we kind of go forward with both, right? Because don't you find it interesting that for NoSQL the biggest movements in NoSQL is bringing the SQL API to the NoSQL database world, right? It's hot right now. I'm hearing a lot about that this week. Absolutely, absolutely. So here's the analogy I like to tell clients. Think of gold mining for it, okay? Or gold mining. So I think back to the gold rush a hundred whatever years ago and you had this gold miner and he would pan for gold and he could spot a big chunk of gold, right? And that was visible to the naked eye. So that data had value to the naked eye and while we didn't go and extract massive amounts of earth it would spike a gold rush and an investment of towns around where gold was fine and we'd go and find that. If I look at gold mining today we have this new capital equipment and it's capable of moving millions of tons of dirt, low value per byte data in spotting near invisible strands of gold. So as it turns out, gold needs to be more than 30 parts per million or to be visible to naked eye. So most gold today that's mine is invisible. Well, guess what? The dupe is to find these near invisible strands of gold, extract that in a cost efficient way and then build that into our processes perhaps into the warehouse or to side by side processes. And I'll finish the analogy with this. I was watching the documentary on gold mining the other day and our company's under some heat because they're not reconstituting the land that they've dug up and put into park lands and they say we're working on a chemical wash. In five years of time we'll be able to find finer granularities of gold and I thought, this is perfect. You think about Hadoop as a low cost platform to store corpus of data of which I'd like to discover. I bet you just like in analytics three years from today assures you'll find more gold, you'll find more data, you'll find more insights, more signals in the noise. And so I think that's an analogy I'd use to answer that. I want to carry that and that for through a little further and get your take on this. So one of the things that people often say about the gold rush is the guys who made the real money in the gold rush with the guys who, you know the railroads that got the people there, the guys who applied the picks and the accident and all the infrastructure around that. But there's a premise in big data that says and Peter Goldmarker from common is the first I ever heard say this. He said that big data practitioners are actually going to create more value and extract more value than the big data suppliers. Do you buy that? And are you actually seeing that in the customer base? Yeah, you know, I think the practitioner is where we execute. So I think what the practitioner needs is this consumable platform that allows them to execute. So I view it as a partnership. And I think during the ecosystem or the ebbs and flows of the partnership there times where the practitioner is going to carry us forward, right? So today to get started, it's, you know the data scientist is the hottest topic going. You know, you look at the starting wages for some of these kids coming out of university and it's awesome, right? And it's around math and Java, right? And so they're getting stuff started but then was we kind of democratized access to big data then it will be the us as a provisioners that help you why? Because I'll take your common class folks or the average type of worker, knowledge worker we have and we'll boost their big data IQ. And then the data scientist will come up and there'll be some new technology and someone else is going to push that. So it's going to be a partnership and that's how I see it going forward. So you talk a little bit in your book about end to end. What does that mean? And you know, why is that important to a customer? Yeah, so, you know one of the books I wrote was around governance and we talk about information lifecycle management and probably one thing I'm a little worried when I see and obviously I'm not going to say the customers here on age is people doing a do but I'm like, doesn't that data have to be protected in the database? So if I store it in our DMS you have to protect it but for some reason you think because it's in HDFS it no longer needs to be protected. So when I talk about information lifecycle management I want to talk about how does the data arrive at more organization? First am I taking the opportunity to apply analytics at the moment it arrives at the organization? Because the opportunity costs and it's a term I use in the book return on data. Your return on data starts to dilute the moment it hits your enterprise. What you're going to get out of it, right? What I could do with that data. So as I look at lifecycle management I'd like to plan that says the data's entered the organization. I apply some analytics perhaps as it makes its way to rest I attach to it governance policies. Is it immutable? Can I store it? Should I get rid of it? Should I enrich it? And deciding not to care about it is a governance policy. I need to make a decision and then as it sits there you know to me is Hadoop not the new tape. So if it sits in my Nutiza database and I want to apply that into an at rest engine low cost maybe I need a policy to move it there. So cradle to grave that's what we talk about in lifecycle management. So I want to ask a question about something you just mentioned. So you know Hadoop is the new tape. You know certainly in the Hadoop open source Hadoop community and a lot of the players we're going to see tomorrow and a Thursday at Strata would take issue with that. And I think we're seeing players like Hadapt for instance build out kind of on their vision of a unified platform based on a Hadoop foundation that includes both SQL and no SQL type functionality. And I guess the knock on the more traditional or the approach we're seeing now often is HDFS a lot of connectors between databases. Is that a long term strategy? Is that a long term effective way to go about big data and what about these new vendors like Hadapt and others who had this vision of a unified platform really built on Hadoop at the foundation? So I mean you guys ask good tough insightful questions, right? So I like it. You ask the same question my customers ask me. If I look at Hadapt I'm not sure how unified the engine is as opposed to it's unified on the API level or at the top and underneath the covers it's kind of separate, right? They have some Postgres in there they have some Hadoop in there. Here's the notion that I go after when I talk like that. At IBM we believe in purpose built optimized engines, right? So I won't take shots at our competitors. I don't think one size fits all when we look at the kind of workloads that we have. So I think IBM does a terrific job at that. Where I think the economies of scale come from that you're referring to is what is the portability of the development and the power user skill because developers outnumber DBAs anywhere from six to 10 to a factor of one. So if I have a transportable API or programming method across the engines then you've got pretty much the Hadapt model but for us I mean we're bringing enterprise class proven solutions. So if I look at the IBM big data platform I'll give you a couple examples. If I sit in Hadoop and Big Insights and by the way I could slub Cloudera as the Hadoop engine in there I can use the IBM technology to extract text extract to build a text extraction dictionary. Once I've identified that text extraction so this kind of harvested artifact I can move that in motion at the drop of a dime. There's nothing left to do. I just move it over. Why? It's a transportable skill. It's a transportable artifact. If I build a map reduce job in Hadoop but maybe I've got some structured data that I want to keep into the warehouse I can take most map reduce jobs and run it in database in the teaser which supports in database map reduce processing. And so now you see instead of making people run to the engine I want people that are great at what they do text extraction, map reduce programming, machine learning and I want them to be able to walk across the portfolio and use the engine that's best suited for the task at hand. The final example I think we've been doing this a long time. If I go and program to DB2 for Z and I want to take that application and move it across the DB2 Linux News Windows family the SQL API is 98% portable. And so we took that concept when we introduced our Oracle compatibility layer so we natively not emulated we natively support PL SQL and DB2 because we saw a lot of clients choosing Oracle not because they thought it was a better choice but because they had a skill set in PL SQL. Take your smartest people touch them with the business problems but let them go across the technology that helps implement it. I want to follow up on something that you said and I wonder if customers are asking you this when you talk about purpose built optimization. So a lot of the industry discourse Paul has been around hey we've got these silos and application with purpose built infrastructure we have to break those silos down and the cloud is all about a general purpose infrastructure on which you can run any application across the portfolio. And then you see what you guys are doing certainly Oracle Exadata you see in many others building these purpose built appliances help me rationalize the dissonance there where we're talking the general purpose sort of a flexible infrastructure versus that very focused block of infrastructure. So I think there's not as much dissonance as maybe you're suggesting and I think that IBM has got that flexible in it. So if you look at our pure systems architecture we start with ironically pure flex right which is about that exact elastic compute model to apply whatever applications I want in there be it ERP, be it application server or database provisioning. As we go into maybe what I'll call tier one like for example in transactions I need a tier one database that has higher characteristics of availability higher characteristics of scalability without changing the application we have these other systems so in that case we have a pure data system for transactions. It's kind of you can think about that as an on premise cloud model it has the characteristics the elasticity of cloud, the provisioning of cloud the monitoring of cloud can add in capacity, take capacity I can take it in and out by hour if I want in seamlessly. So we have that concept and so I think those cloud characteristics find their way into our usage patterns I think the cloud's terrific I'm all over the cloud, love the cloud do I think every enterprise is running to run their entire transaction system on a cloud? Absolutely not there's security concerns in there I think first step cloud without question is around development and I think Hadoop is boosting the cloud because I'm able I think I get about a hundred node Hadoop cluster for 34 bucks an hour I can do a lot of work with that but that option depends on the data I'm crunching that data needs to be protected I can't just send it out there I have some provisioning rules around that and so I think we've got some of those cloud concepts in there and they'll come together as we move forward So you can accommodate what you said was with pure, with expert integrated systems you can accommodate that use case that I was talking about the cloud applications across the portfolio and you can pick the horse for the course in the transactional model Do you see in the client base role your own as essentially a dead model? Yeah, I hear a lot about that and so as I was just talking to a customer today that loves a certain storage vendor and then they, but they love the appliance form factor here's the bottom line if I look at percentage of IT budgets and spending and if I forecast that in the 2015 I'd say that you're looking at maybe only 20% is on new server storage spending I'd say about 25% is on heating and cooling associated with storage of which has double digit compound on your growth rates so we got to do something about that and we're investing in that What's the rest? People, labor Okay, so I'm going to ask a question to the audience here, okay? And I'm going to go on record and say this isn't happening for me at IBM I can't speak for others How many people have experienced double digit income raises year over year for the last four or five years, right? I mean they just haven't and this isn't a world of brick outsourcing Brazil, Russia, India I can outsource cheaper labor than ever before yet the cost of managing these systems as a personnel perspective as a percentage IT budget is going up that answers the question we got to get to the appliance because the business can't afford to invest in us three cats up here So another way to say that is if you're going to roll your own you better have a damn good business case to do so I'd say so Or live on the West Coast I can't That's fantastic All right Paul, great perspectives I really appreciate the candid answers and the insights that you're bringing from your customers Thanks very much for coming on theCUBE Yeah, it's been a pleasure All right good, well keep it right there folks we'll be right back with our next guest live from IBM IOD This is theCUBE