 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015, brought to you by headline sponsor Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity. Now your hosts, John Furrier and George Gilbert. Okay, welcome back everyone. We are live in Silicon Valley in San Jose for Hadoop Summit 2015. This is theCUBE, our flagship program. We go out to the events, extract the signal from the noise. I'm John Furrier, the founder of SiliconANG. I'm joined by co-host George Gilbert, Wikibon's new big data analyst. Check out his work, it's pretty amazing. Really interesting thesis, taking it to a whole nother level, literally, and it's great stuff, go to wikibon.com and check that out. Our next guest is Joel Horowitz, Director of Product Marketing, Big Data Analyst at IBM. So great to see you again. Thanks guys, great to be here. We were just back together at Interconnect Go. We did that Interconnect event. We also had the Big Data SV event which was matching success around Hadoop World. What's new, tell us what's going on. Man, it's busy, look around you. Hadoop is alive and well. We're doing some amazing things, as we announced at Strata Hadoop World in March earlier this year. We talked about the ODP and what's going on there. We're really leading this event with our message open inside anywhere. Really, we're putting a lot of emphasis on the anywhere part of that statement. As you can tell, Hadoop had been gaining traction over the years and now I think more and more people want to get value out of their data using Hadoop. So we're making a few announcements at the event today. Predominantly, we're talking about Hadoop that's going to be available as a service on our cloud, particularly Bluemix, as well as we support a number of systems platforms as well for Hadoop to work on. Love the line, open inside anywhere. What does that mean for customers? Open source, I'm using open source product, open in terms of any system, just tease that out a little bit. Yeah, I'm using a little bit of a double entendre there with the word open. Clearly, it's about open source. I mean, that's what Hadoop is. But also we talk about the ability to open up the value that's really in Hadoop as well as in data. A lot of our clients today are actually struggling to take those amazing work that they've built on Hadoop and integrate it with the rest of the enterprise. So we're also talking about how people can integrate the value, really connect Hadoop and make it an extension of their data platform as opposed to say a business unit off to the side that's doing kind of a proof of concept with a small cluster. So we're really talking about opening up Hadoop to the rest of the enterprise to integrate Hadoop and secure and make it secure for more people to get value. You know, Rob Bearden, the CEO of Hortonworks had a similar message, but take us down into the, you know, an executive in IT who might be listening who says, okay, so if IBM is my supplier, how would they help me make that happen? Tying, you know, my transactional systems to the data I'm accumulating in Hadoop. Yeah, you know, that's a great question. The way I talk about and the way we talk about, you know, those systems, so we talk about systems of record, so things like, you know, client records, things like financial records. On the other hand, we talk about systems of engagement, so anything from mobile to, you know, web applications. And in the middle, we really are starting to talk about, and I think more people are starting to talk about this system of insight. And so when we talk to line of business owners when we talk to IT, you know, folks, we're talking about how do you create that system of insight that actually bridges the two. So as more data is coming in off of, say, Twitter, you know, we have a relationship with Twitter, as well as, say, you know, other external internet of things like the Weather Channel and what we did with them, how do we take that, you know, data and flow it into the enterprise in a useful way? And so for that, there's a lot of things that we're doing. And so IBM, you know, we actually are doing some pretty amazing things to create these hybrid solutions that can not only take what you have on premise and get the most ROI out of that, but also, you know, talk about the cloud. So how about like analytics for a second? I got to ask you the provocative question. Analytics, is it a processor product? What do customers think? That's a really good question. I think if you were to ask like DJ Patel, he would say, you know, data product, and he would say at LinkedIn, and they coined that term data product at LinkedIn, and people you may know as the classic example. Well, data can be a product, but analytics. Yeah. Is analytics a processor product? Well, I think- You guys have analytics group. Yeah, I think a lot of people try to productize the process, but I think what we're finding out that it's more involved than that, I think you really, you know, when we talk about the data scientists, you talk about, you know, the application developer, you talk about the data engineer. It's really a collaborative effort. It's not a single individual, nor is it a one solution fits all kind of thing. So I would answer, it's more of a process, a little process has a little bit of a, you know, a certain connotation with it. Well, the process is being re-engineered. We're out of beer and uses the electricity example. Yeah, great example. That changes the game inside the house. New processes develop, hey, I got a refrigerator. Yeah. And that plays to store all my beer, you know? So, I mean, this is the systems of intelligence. I mean, you guys talk about systems of insight, and that's cognitive. Your vision of your research is systems of intelligence. It's really kind of the same thing. I mean, cognitive is an outgrowth of systems of engagement and insight. Yeah, I mean, there's really, you know, it's really building, right? There's nothing, I would say, net new here, frankly. I think we're building on a lot of these skills that have been growing over time. I mean, if you look at our big sequel capability, we're really able to take a lot of those processes, in terms of querying and offload them to Hadoop without changing a single line of code because we have such a great engine. So, that process has always existed for, say, database analysts, but now we're just kind of transferring a lot of that over to Hadoop to work with even larger scale data. I think what's interesting is what you just described, the new intelligence or insight we're talking about there. That process, I think, hasn't been fully defined. I think you'll probably see next week at Spark Summit. They've been talking about the data science workflow and things of this nature. They've been talking about machine learning pipelines and things of that nature. So, I think it's early days for the intelligence workflow or process, if you want to call it that. So, it'll be very interesting to see how that gets kind of defined. So, when you go to a customer, I just want to press a little more on the, you've got the systems of record. You could put an API on it, so developers can pull the data out. And you've got the systems of engagement. So, what would be an example mechanism to pull those together into a system of insight, as you call it? Yeah, I mean, I personally see looking at how do you, so Padoop to me has been a very great environment for creating data access, right? No longer do you have to sit around and wait for your BI department to do the ETL process, create the cube, and then be able to have access to that data. As you put it, APIs, I call it OLAP cubes and other things that have been around for years. The problem is, is that generally speaking, they'll create a dashboard, they'll create an insight, and then they'll hand that over to say a product manager who will then go and take the insight from this dashboard and start building an application. I think where, you know, say machine learning and some of these newer capabilities are coming in is around the ability to not only do data discovery and create the insight, but actually turn insight to action and create an artifact from the data, right, that captures that insight and push it into a place of engagement, so. Where the application sort of, it's operationalizes that, it's a production. So I constantly use the example of, maybe it's not so constant, but lately it's been constant. I talk about like a hotel, you know, applications, very simple, everyone uses hotels. In many cases, you know, a simple application would be, you have a reservation system, you go to the website, you know, you book your times that you're available and you book the room. And then you show up at the hotel and there's a concierge, hopefully, that'll tell you what shows are playing, where to eat, all of that. You know, we're trying to take that level of cognition, right, of cognitive ability and put that into the app, right? I mean, why wouldn't you, as you're booking the, you know, as you're booking your reservation in your hotel room, would it not be recommending and be suggesting places to eat, things to do, based on what's going on at that time, right? And your interest. And your interest, exactly. So, you know, today we have all of this data that's pouring in from mobile, like I said, from social, from, you know, other external factors that should very well be in the moment and be in the application, as opposed to saying, you know, here's a dashboard, wasn't that interesting? Joel, I got to ask you, on your Twitter feed, you have a picture of the security here at the Duke Summit. And there's now security metal that I want to bring this up into a thread, because it's interesting, right? Security is a big part of the big data. And also, you mentioned Spark, with the voucher this morning. Right. So, two things. With all the big data actually storing it into no brain, or we see people rushing, building out, store, store, store, hence the word data lake kind of flies, although I don't like the term, I like data ocean better. But I think data lake's fine. Get the lake on, get some proof points, move on, get more cash, reinvest, do more. But the security thing is interesting. We've seen that all through the enterprise. Security is a huge issue. Yeah, yeah. What are you guys sharing there? You mentioned Weather Channel, some other examples. Is there any ones around security you can talk about? Yeah, I mean, clearly, security is a big deal for everyone. I mean, I think especially, you know, the reason why data had been so locked down, in my opinion, over the years, is because everyone treats data as a liability, right? So, you know, having the wrong data, making the wrong decision, leaking information to your competitors, all of this is very bad news, right? So clearly, as we're starting to dump data into the lake, I hope we're not, but we are. You know, security does become an issue. I think what's actually more interesting is the governance piece. I know that many, you know, Hadoop vendors are already talking about Kerberos security and doing kind of the technical bits. I think we'll get there, that's kind of old news. But as we know, when we look at hacks and break-ins, it usually has to do with the weakest link, like usually an individual, right? So I think a lot of the stuff that we're seeing and a lot of the things that we're working on is the governance piece. So ensuring that people can still access that data so we're not inhibiting that access, but also making sure that only the right people have the right information. You guys have a huge expertise in governance. IOD, information on demand, which I know Sun said at that event now. I forget, is that interconnect now? Is that, so that's interconnect. So what is the challenge? So I mean, you have a good view of the challenges of governance. And it's not easy. No, it's not, it's not. What's hard about, what does it need to do better? It's ecosystem. Yeah, I mean, frankly, like you said, I mean, we have a lot of great products and technology to conquer that. I think a lot of it right now is just working with our clients to inform them. I mean, today I was just, for example, I was at a client briefing over in SVL and Silicon Valley Labs at IBM. And all of our German clients were there and they were asking me about security. And it's not security as you would think. It's really about how do you basically scrub data so there's no personal information in there. So a lot of it, it comes back to your kind of process question, John, about that's why analytics is a process and it's not a product because a single product, in my opinion, can't solve all of the human factors that are going on. And so when you talk about IBM and what we're doing to solve that, it's not necessarily going to be solved by technology. It's going to be solved also by our expertise and our heritage of working with data for a lot of time. Well, if you believe what you said earlier about data should not be restricted. Yeah. And the things we were just talking about, I mean, that points to the, if you believe that, then this has to be a free and always robust environment. So you need the ecosystem to step up. Big data has to be out there free and the governance has to be flexible. So the question is, that's counterintuitive to the old way. I'm trapping it down to make it locked in. Yeah. Yeah, there's certainly a lot of- How do you explain that to customers? Like, well, hey, you don't want to put everything behind a castle with a mode in front door. Yeah. There's still a lot of learnings for everyone. I think that we're still figuring out with the right solution. I think it's really a case by case basis. I mean, if I'm talking to a bank versus, you know, say an internet startup, right in the valley, they have different opinions of what security means and levels of governance. So, you know, I would posit that we're actually coming at this in a very constructive and a very kind of case by case basis and working with our clients, you know, through all of these open source, incredible innovations that are coming out of open source technology. Just like stepping up sort of beyond the technology, does IBM want to engage with customers as like a trusted advisor on how to secure your information assets? Because that's like no one else has stepped into that role. That's a really good question. I mean, frankly, I think that's where we're at. I think now, you know, IBM certainly has been working with clients over the years in that exact role. So I think the only thing that's really changing here is just being able to work with our clients and work with Hadoop. I mean, we're at Hadoop Summit and work with open source technologies to ensure that, you know, the, that's why IBM, we have our own, you know, in a sense why we have an IBM Hadoop distribution. I mean, we run that through a lot of tasks. We run that through our own QA to ensure that it works the way it should. So I think there's, you know, that's part of our overall, you know, development process. But that as a platform, I mean, and this has been a theme we've talked about throughout the day that, you know, as a platform that's incredible innovation, but it's also remarkable in its sort of fragmentation, which is the other side of innovation. And, you know, anytime you want to worry about security, the number of seams you have in a product or family of products is, you know, opening yourself to compromise. Yeah, I mean, honestly, I mean, I mean, we haven't talked about this, frankly, in the context of the ODP, but I actually see that as one area where we could actually talk about it. We had an ODP meetup last night. It was great. I mean, we had, you know, Hortonworks there, Pivotal there, Altascale there. What time, where was it? It was yesterday at six o'clock. Where were you, John? I was going to contrive down. No one replied to my tweet. I guess no one's answered. No engagement on my Twitter stream last night. No, I couldn't get down. You should have been at the meetup. That's where we were all at. How long did you guys stay out? We were out for a good couple of hours. Surprisingly, there was, I mean, I shouldn't be that surprised, but, I mean, frankly, you know, when you talk about industry standards and you talk about, you know, security and stuff, I mean, it's not your usual suspects there. It's not the most excited topic. Yeah, it's good to get a meetup going. It's good conversation, collaboration. But these are exactly the questions that people are starting to ask about Hadoop. It's like, and that, frankly, I mean, you know, we're huge supporters of the Apache Software Foundation. No question, right? But as you say, you know, innovation can breed fragmentation. And then questions like this, that when you get out into the real world and you start testing in real environments like Verizon, like these companies, you start questioning, you say, well, what is the open source community going to do to solve this problem? Like security, it's like, and should they be solving it, right? Should we be taking their attention away, necessarily from, you know, innovating and solving these case-by-case industry? And that's where I think IBM and the ODP comes in. Because then it's more, it starts as process the way John says, but it needs a product manifestation because someone's got to plug those gaps. Yeah, I mean, and there are, you know, there are open source projects like NOx and a few others who are, you know, looking at the technical side of things. But I think this becomes much more of a people issue. I mean, that's really where I sit on this subject is the fact that I think it is people in the end who ultimately are the ones who are working with data in a very intimate way. And so I think for that, it's a matter of, you know, informing, you know, organizations, creating a culture that thinks, you know, through now that there's data everywhere. You know, what, it's the same as, you know, the internet, right? When that was born, like, there were people tweeting things. They probably shouldn't be tweeting things. So I see it more of a cultural thing than really a technical thing. I got to ask the question because George brought this up really in another segment is that, you know, IBM, oh, IBM, other companies have taken advantage of open source and it's paid dividends in a synergistic way. IBM has had a huge run on with open source. Linux, I mean, the heritage of open source is pretty solid. I mean, not solid, pretty solid. It's very solid. So I mean, you have a DNA of open source and an IBM. What is the view now with analytics? Obviously you're pro open source, but yeah, as ODP comes out and Inhe Chousa told me on theCUBE, ODP is good because their customers need stability. Yeah, and that was her point because she's in the exact, but as you invest more in ODP, what does that look like? Analytics has to play a role on it. Clouds coming over the top and a big tsunami of growth. Yeah. What's going on? How do you put that together? How do you go to market? How do you bring the open source investment to the table? Yeah, I mean. I mean, you're in the analytics group but share your perspective. Yeah, I mean, IBM is an innovative company. I mean, no question, right? I mean, I don't, goes without saying, right? And open source breeds innovation, frankly. And so I think our strategy with open source, frankly, to speak for myself, is really about looking for where innovation is coming from. And so it's not to say that we're going to support every open source project, we're not, right? It's, you can't do it, but we are looking at where innovation is coming from and IBM I think wants to be there whenever that arises. I think the challenge though with open source and with innovative technology is the fact that we need to harmonize it with the business, right? These are not always in sync, frankly. And so I think what we're doing with the ODP is not congruent to what we're doing with analytics or open source or anything. I think it's just the natural step in terms of maturing Hadoop, frankly, to allow people who aren't experts at MapReduce and experts, because I'm certainly not, to get value out of it, right? And we can't all read blogs all day long to learn how to do this stuff, frankly. So we need to create a way and an outlet for people to better understand how to get value. So IBM is on our big push. Love the social mojo, you guys big part of CrowdChat. You know, I love working with IBM. It's the best CrowdChat customer we have. You get social, you get mobile, cloud is booming with BlueMix. What's the future for analytics team? What's your go-to market? What's your key messaging? I know you're a big part of a lot of the events you're doing. Just share with the folks out there what you're up to. Got Spark Summit next week. We'll be there. I mean, you guys are pumping on all cylinders. The messaging, two years ago, now it's starting to execute. So share it. Well, thanks for saying that. No, I mean, we, you know, cloud data engagement, that's our thing. And we're focused, right? And we're going to nail it. And there's a lot of areas that we're looking into. Clearly, you've described a lot of them. Next week, you know, Spark is a big deal. I mean, 700 plus committers that are contributors at this point. I mean, if that is an innovation, I don't know what it is. And three years ago, it was just an idea and I am flat. So, you know, I mean, I got to say, Bob Picciano said on theCUBE first of all the guests we've ever had, systems of engagement. Right. Okay, that was really, that's came from IBM. I think everyone's kind of co-opting that now. Yeah. And it makes a lot of sense. It's a no brain. Okay, record engagement. George brings out the systems of intelligence. Right. Which teases out like your insight message and a cognitive. What's next? Guys, you're both, you're an analyst. You're the expert at IBM. Duke it out. Who wins? Systems of intelligence or systems of insight? I mean, we use insight. I think that's, you know. I'm not only kidding, you know the Duke it out, but you know, talk about it. Yeah, no, I mean, it's fair. I mean, it's, we're talking about, you know, wordsmithing a little bit, maybe in my mind. No, no, talk about where the connections are. Cause this is really the action, the actions, automation, intelligence, cognitive. I actually would, I would have to say that. I actually like, frankly, because of Joel speaking, I like intelligence just because I think when I think about insight versus intelligence, I feel like intelligence is more actionable. And so for me, I think where I'm headed and where I would like to bring IBM to is talking about not just how you create an aha moment, which is what everyone here is talking about, but actually how you create an artifact, right? And how you say, okay, I've created an artifact of intelligence and I can now apply that in your system of engagement. Anticipate and influence. Yeah, leave and put a dent in the universe of a real tangible asset. Yeah, we talked about the cube, we do the cube. Where's the output? Where's the gold nugget, you know what I mean? The one thing that I would add as what's next is taking that design pattern and sort of flipping it, maybe 90 degrees for internet of things, where instead of sort of human generated transactions, we have machine generated ones, so they're far more streams and you push some of the intelligence towards the edge. And so like taking example, a telco network, you could do a self healing network like that. We're definitely seeing that, we're definitely seeing. It's the same, it's a different topology, same architecture. Right, well what's also, I mean extremely, I don't know if you read this fellow named Cooney, he put out a really interesting article on MIT Tech Review like last couple of weeks and he's talking about, we talk about what are the drivers of all of this, right? So when you think about what's driving all of these innovations, it's really interesting. It started with Moore's Law, which we all know was increasing the speed of processing in the square inch, whatever. But then you talk about the declining costs of storage, the declining costs of processing, the declining costs of bandwidth. They decline at different rates. And that's when we get big advances. But the latest one is the broadband piece, right? Is the actual transfer speed. And that's actually, that's following. And so that means when you talk about internet of things, there's one other trend, is actually the amount of processing you can do per energy is actually going up. So it means that you can actually do more in a certain thing, right? And so that's driving to me exactly what you're saying is actually having smarter and more intelligent applications, devices and really things around us. So just to add to that, on the client device, we're trading for better power efficiency in the data center where real estate isn't as critical. We're trading for speed, you know, and because we have 3D, you know, 3D sort of topology. So having it as dense isn't as critical. And you know, the interconnects can be high bandwidth. Anyway, I mean, I'm geeking out, but the point is. That's okay, it's secure, so we do. But the point is those are, you know, those are internet of things apps. Yeah, I think what's interesting, so what you describe with the internet of things is one thing. I think what actually is very interesting, I saw a great talk at one of my meetups, fortunately, where one of my speakers talked about what shape is your data, right? So I think what's actually really interesting is talking about the different types of databases that are coming out, like you have graph database, you know, you have all these NoSQL database, you have streaming, you know, data environments. So it's really interesting. So when you use the analogy of like kind of, you have a lot of space and topologies, it gets me thinking about the fact that, you know, just your, it used to just be rows and tables or whatever, right? But now it's becoming this very kind of three dimensional architecture. Polyglot. Yeah, so you hear these types of things, and you know, I think we'll see a lot more of that for sure. Yeah, so I think one of the things that's key is, what I see as you're getting on here is when you start to get intelligence and cognitive is transactional on the fly transactions. So sensors coming in, you want to be the act on the data, so that means you have data aware issues, right? So I think that told them to me is where the action will come from, and I think we're just too early. I mean, it's early, but it's set up time now. It's set up time, it's time. It's at the table. But the problem is, I have to say, so one of the biggest challenges that we have as a community is frankly, you know, ensuring that innovation gets adopted. I think that we all pat ourselves on the back for creating a great, you know, data environment with Hadoop. But frankly, it hasn't been adopted as much as it probably should have. That's the dirty little secret of this conference. Right, and so we all are responsible for that. We all need to, we can. And absorption. Yeah, but the problem and the challenge is actually also the skills component, right? So data science, as you know, is not a trivial thing. So I think. But the operational complexity of the cluster, yeah. So we need to continue investing, right, as a community into teaching people how to work with data. I mean, if you were to, I think it was Eric Schmidt who said, you know. Eric Schmidt from Google, not Eric Schmidt, the product manager who was going to come on theCUBE on Thursday. Yeah, exactly. Just clarifying to our audience, you might be confused out there. Yeah, I know. So I think he said, and I'm paraphrasing, he said that, you know, we've created more data in the last few years than in all of mankind. I mean, that was like basically what he said, like, I think it was like a month or two ago. Which you think about, so that means everyone who's been working with data like even five years ago are like totally out of their element. Like we have to think completely differently about how we interact with data, how we create value from data, how do we push it to the edge as you described into devices, so it's a whole new world. And it's great, it's a great opportunity for everyone here. Okay, Joel, we got to put it there. We're getting the hook here from Leonard. Leonard's our guy, he was on track. What kind of name is that, Leonard? I mean. There we go. There you go. Got to bring to John Cleese in here somehow. Look at the water, he can't be thrown with bottles. Joel Horwitz, thanks so much for coming. It's great to see you. We'll see you next week at Spark Summit. This is theCUBE, we're here live in Silicon Valley. We'll be right back after this short break.