 Live from Las Vegas, extracting the signal from the noise. It's theCUBE, covering IBM Insight 2015. Brought to you by IBM. Now your host, Dave Vellante and George Kilburn. Welcome back to IBM Insight everybody, this is theCUBE. This is day two at IBM Insight. Check out ibmgo.com, it's a digital social experience. Rob Thomas is here from IBM, CUBE alum. Rob, it's great to see you again. We just saw you last month at Strata and Hadoop World. We've seen a constant cadence from IBM this year, so it's great to have you on again. Yeah, great to be here, Dave. George, good to see you as well. Thanks for having me back. You're welcome, big show for you guys. Focused on analytics, it's one of your biggest, I don't know if Insight is the biggest show you do, but this is the biggest analytics show I think in the business. I don't think there's a bigger one. It's like four times the size of Strata and a lot of excitement here. Yeah, it's a very different crowd. So I was kind of reflecting on all the events that I've been to this year. Started with Spark Summit, where obviously we had a big presence and that is very much purely a data science type crowd. Strata to me was different this year. It was more of what I would describe as a biz dev crowd for whatever reason, which tells me maybe that means Hadoop has matured a bit in that respect. And I think what you see here is a little bit of a combination of those two, plus you see a lot of the Fortune 500 here. These are enterprises that are deciding we have to do something. And it's a really good mix of folks. And I think a lot of people forget less than 30% of companies have adopted things like Hadoop. Spark is way below that right now. So what you see here is this is the Fortune 500 that's trying to figure out where they go from here. No, it's true. I was writing nice comment on the Cubius that I'm riding down the elevator on Monday. It's like, oh, JP Morgan Chase. Oh, Citi, oh, AIG, oh, South Africa National Bank. All your big customers are here. You don't get that at the other shows. Like you said, you get the hoodies at Spark Summit. Strata really has become a biz dev show. A lot of vendors, you know, showing it where. Vendors talking to vendors. Well, I mean, there's still some practitioners, but remember it used to be the data science show early on. Yeah, exactly. So what does that tell us about how the market is shifting towards Spark? You guys had some Spark announcements recently. Maybe fill us in on those. Yeah, we've been busy since we made the announcements back in June. We continue to build up the Spark Technology Center in San Francisco. We announced this week that we've now enabled 15 products in IBM on Spark, ranging from commerce applications to some cloud applications to things like SPSS, you know, advanced analytic algorithms running natively on Spark. So we've been very busy. You think back to the strategy that we described to the world on Spark. It was one is we're going to invest in the open source through the Spark Technology Center. We've certainly been doing that aggressively. Two was we talked about getting IBM running on Spark, which is, you know, an example of what I just shared. And then third was around how are we going to educate the world and help drive the data science revolution? So we've been very busy on that aspect as well. So yeah, I was saying off camera, we had Mike Tamir on and talking about that imbalance in supply and demand. And he was saying, I'm all worried about that ever, that there being a glut in data science. There's so much demand right now. I wonder if you could comment on that, what IBM is seeing. So it's been crazy. So we committed in June, we said we're going to educate a million data scientists. We are at 310,000 as of this week. And that's through avenues like Big Data University. It's through offline courses we did. We know we did a two day course at Strata. Think about it. That's pretty amazing in a four month timeframe. The summer, I thought people were just to take the summer off. I guess that doesn't happen anymore. That's how much interest there is in this topic. And I think part of that is because traditional IT skills are under threat, is my view. Between cloud and data, that puts a lot of pressure on traditional IT operations, that type of thing. And so what I see is these kind of folks coming to the table saying, it's time for me to do. I'm going to reinvent my skills and data science is the place to do that. So one follow up on something that you said earlier, you said enable a number of products on Spark. You mentioned SPSS an example. Just taking that example, what does that mean to enable products on Spark? So think about it this way. Historically, if you were doing advanced analytics, whether it was with something like SPSS or something like SAS, step one is move the data into SPSS or move the data into SAS so that the programmer can actually then work within that environment. What we've done is we've taken all the intelligence of SPSS, the machine learning, the algorithms. We've pulled them out of SPSS and we've built them directly on Spark using things like ML lib. So now you can run those same algorithms where there's a lot of people in the world that know those and you don't even need to use the SPSS engine, if you will, you can run them right there on Spark. And the reason that's valuable is think about it. We view Spark as the analytics operating system, pull data from anywhere, whether it's a mainframe, a warehouse, a database, you can now use your SPSS algorithm in one place and access all those data sets. So you're extracting sort of capability and function from the app essentially in that example, embedding it directly into the workflow. That's right. So it can scale? It can scale and it's much faster. You think about it, you no longer have to work with a data engineer to say we need to move the data from here to there, it's just, you just go and you run. If Spark was, I'm sorry, if SPSS was sort of one of the first, so it's a tool, enabling tool, then we go sort of up the hierarchy, the 15 apps that are enabled and then the next ones that come after that, what do they look like? We want to enable all of IBM on Spark. We use the analogy back in June of, Spark is exactly what Linux was back in the day for operating systems. We think it will be that fundamental to how it plays a role going forward in every enterprise. So I'm not sure you're gonna be relevant in the future if you're not building your portfolio on Spark. Let me give you one example of why. We enabled DataWorks, which is our cloud data prep service on Spark in the last six months. When we announced DataWorks last year at this show, it was 40 million lines of code. Enabling on Spark, it's now five million lines of code. Think about that, was that 87%? 90%? It's so much simpler and it's so much faster to work with data-rich applications on Spark. So it's changing how we build products and it certainly will change how clients use products. So when you talk to guys, so when IBM makes big moves like that, you got to pay attention. When you talk to the hyperscale guys, the big, you know, Googles, for instance, they'll say MapReduce, we did that. We knew that wasn't going to sustain. So we moved on to in-memory and now we're on to something else. We won't tell you what we're doing. But so relate that to what's going on with IBM because you guys played in Hadoop, but it wasn't like, okay, we're all in. There's a billion-dollar investment in Hadoop. You're doing that with Spark, right? I don't think you've quantified that investment, sort of my quantification. But where do you see as sort of the next? I mean, you guys obviously have insight on that. Does Spark, does the investment you make in Spark, is that leverageable into whatever's next, you know, Spark 2.0 or whatever we call it? Can you give us some glimpse of what you guys are thinking there? I believe it is. And the reason that we've organized the way that we have for Spark is we built the Spark Technology Center. And the best way to think of that is separation of church and state. We do nothing in there that's not open source. Everything we do in the Spark Technology Center goes back into the open source. We have other parts of IBM that will do enabling of IBM products, value out around that. But the point of doing that with that team is one is it gives us the ability to attract some great talent, which we've been doing from outside. And two is that gives us the focus of skills to take on whatever comes next. And there will be something next. Probably pretty soon. You know, you see different projects, projects popping up every month. And so that will certainly evolve over time. So it's the framework, the model that says we're going to do everything open source. We'll innovate with that open source community. Whatever it is will lead. Yes, and I don't think any other company, our size, has made this significant of a bet on open source. Oh, but this is why the open source community, Rob, says, oh boy, here comes IBM, because you guys are putting your money where your mouth is. You're bringing development resources, not just marketing. A lot of companies come in and say, okay, we're going to make a dupe enterprise ready. You heard that a lot. And then what you guys are doing is really moving the industry and shaping the industry. That scares a lot of people, you know? But you're not making apologies about that, are you? Yeah, I'm a little surprised that it scares people and that I think our reputation's pretty good in this space. I think Linux would not be where it is today if it weren't for IBM. Yeah, I don't think people are afraid that you're going to fork the code or go proprietary. No, you've got a track record there. I just think it's culturally, hey, this is our little sandbox. And now all of a sudden there's this big adult playing in it. We're actually much more immature than we appear. But IBM was- Point, case in point. John Thomas in the Q and A. I'm the best evidence of that. But someone like IBM was necessary to be the sort of the commercial sponsor, the big commercial sponsor from what's still seen as a little bit on the fringe. But my question to you is, if I understood you correctly, so Spark is not your future proof forever layer. Spark is your current analytic operating system. And you can see, if I understood you right, other projects coming along, maybe Flank or something that works near real time. But even abstracting away from sort of naming projects, if you want to take a look at the apps that you've created so far, IBM's created so far, what is that analytic framework going to look like? So I think obviously it will continue to evolve over time. I think Spark will be here for a long time. That's why we made it. But you think I'll keep them back to the Linux analogy. Supportive Linux and as Linux rolled out, it formed a lot of ecosystems off of that, right? And I think the same thing will happen here. I think you'll, you know, right now we're focused on applications and data rich applications. I think a very logical extension for Spark is going to be around IoT, Internet of Things. And you've already seen some of those, but that probably explodes. I talked about one of the keynotes this week. 50 billion devices by 2020 is our estimate for connected devices. That's going to require a new approach to technology and a new layer. Does that mean, yeah, just new layer on Spark and new obstruction or does that mean reworking Spark itself? I don't know. Every client I meet with, I get, as I walk out of the room, I get a list of, here's all the things that we want in Spark. And I hand that to the guys in San Francisco and we go make progress on it. The biggest ones right now have been around Spark SQL and the back end and a little bit on Spark streaming, those are the most common requests. Those will change over time. And I think, you know, my next list will probably be, you know, IoT type applications. It will continue to evolve. Okay, the requirements are changing so quickly. So the question is, you know, can the Spark ecosystem respond without making a mess? And I think we're seeing kind of a mess in Hadoop, the Hadoop ecosystem right now. You're drawing parallels to Linux. We, Furrier and I, for years have been saying, will there be a red hat of Hadoop? You know, and it looks like Hadoop is more like Unix. You know? Yeah. There's really a fragmented ecosystem, or worse, a lot of people tugging at each other. So, you know, will there be a red hat of Spark? Is that what IBM will be? I don't know. I think one thing that's been unique about our approach here for Spark, at least, is we've started with community, and by community I mean not only open source, but in terms of partnerships. We're partnering closely with TypeSafe, closely with Databricks, closely with Galvanize, who I think was on before. And the whole thought here is, this is going to be done as a group. This is not IBM's thing. We're not confused about that. But we need a group to work with. The guys at TypeSafe, you know, inventors of Scala, incredible guys to work with. Databricks has been an incredible partner, and together we kind of sit down and say, so what is the enterprise need? And it's a great collaboration that we have going in. But what is IBM's thing is making money on top of open source, which so many people are struggling to do. One of the themes that we talked about at Strata Hadoop this year is it's, the market's overcrowded, it's overfunded, and it's profitless. That's not your model for open source, right? It's like the exact opposite of that. And right now you've got companies that are spending two to make one, you know? And it looks like they're pointed up, but they're losing altitude. And so if the funding dries up, it kind of, it gets really interesting for you because you can pick up companies cheap. But you've got a business model around open source. Can you talk about that a little bit? Yeah, I think, first of all, Red Hat is a successful company. At the same time, if you look at Red Hat's market cap versus every other software company that was born that time, it actually doesn't really look like much of a success. Yeah, it's amazing because it's open source, but that's really the only one. Because there's really no way to drive investor returns. And Whitehurst is going through a TAM expansion right now. You've got to expand as TAM, and it's pissing some people off, frankly, but it's the right thing to do. And so that's difficult. So I do think there's very profitable business on open source, but they don't look like what we're used to. Facebook is built on open source. And what they deliver and what they use, their customers, open sources are relevant to them because it's just an enabler to an end, right? And I think that's the kind of model you will see IBM end up in where it's not about us trying to monetize the open source. That's really not an objective. It's about open source provides speed, open source provides innovation, and then we can build creative business models off of that. We've talked about partnerships with the weather company as an example here that's been covered a lot. Do you sit that on top of something like Spark? It changes the access to data in the use cases. And so it's not about us, hey, you got to pay us for this open source. It's about, we've got these great data rich applications that ingest weather data. That's how you make money. So it's a totally different approach. Well, and one of the first guys that ever point this out was Peter Goldmacher way back when said the guys were going to make money in Hadoop slash open source big data. Other practitioners that are applying these technologies. So is IBM a technology company or are you a practitioner in analytics? Or enabler of the practitioners? Yeah, you're a practitioner in the sense that you're enabling the ecosystem. So that's your sort of practitioner play, I guess. But it's a different model than what we used to was oh, I make widgets and hardware and software and services and I go sell that. Your approach is to apply analytics to change industries. It is. So maybe we're a little bit all of the above there. I think the most interesting part, at least to me, is we can deliver very unique outcomes working in different industries. One example that came public just at the conference this week is the work we've been doing with American Airlines where we've really modernized and moved their flight operations work to real time. And that's using things like streaming. It's also using other technologies like we've discussed here. And those are the use cases that excite us. And it's not because we're charging them for open source. It's because we're saying together we can build this data-rich application based on IBM technology, which, by the way, leverages some open source. And we wouldn't be able to move those projects as fast if we didn't have a role in the community. And they'll pay for that value, absolutely. So it really is, I mean, it's almost like we get enamored with Spark because it's a shiny new toy, but it's just this analysis layer. And you have these analytic data feeds, one of which might be the weather company, and it can be applied. And you have other data-rich services and perhaps software components. And then the whole thing is composable by your industry experts. And that sounds like the solution. That's right. And I think there's a lot of services work to be had in the next five years in this area. Traditional consulting, or a lot of the consulting business in the past has been IT integration. That's probably gonna be less interesting services going forward, but there's a lot of services kind of in the vein that you're describing, George, around composable analytics. How do you actually get to these different types of use cases? Like the examples that we talk about typically involve a decent bit of services, sometimes from IBM, sometimes from our partners, because it's not necessarily out of the box. You don't get landmark analytic outcomes that are out of the box. There was, it's funny when you said American Airlines because that was the first commercial online app that was built in conjunction with American. The first one was, say, Jair Defense. And that was, the reason I bring it up is it was a joint venture, it was a huge amount of money, but over time it became a packaged app. Do you see these data-rich analytic apps migrating from being fairly service-heavy to being rather packaged? I don't know, and the reason I say that is the minute that they're packaged and it's that easy to buy them and use them, there's no longer a competitive advantage in doing it. Everything that we do, last year at this conference, Conoco Phillips was talking about the work that we've done with them on oil exploration. If you package that up and then every oil company could buy that same exact outcome, then suddenly Conoco Phillips would not have an advantage. So then you have to move on to the next thing. So there's always going to be a next, and so thinking that there's going to be real packaged apps here that drive great, I don't know, I'm skeptical. So, okay, it's not that there won't be packaged apps, it's SAQ, IBM's interest in them will be in the next wave of hard apps, of hard problems to solve that add competitive advantage to the companies. Yeah, so certainly I think that space will, that space will grow. There will be a lot of applications. We're going to do a lot with solutions now. I guess my main point was that the use cases that excite us are always going to be a step ahead, and it's going to be hard to package that up, I believe. Understood. So, then that brings the question, how do you scale your business? I mean, it seems like the solutions business is repeatable. Right. But you're saying your customers want you to preserve their competitive advantage. So it's not just speed, there's some unique IP. Is that IP developed by them, enabled by you? Add some color to that. I think the way you scale is through cloud delivery, because, and as you build data assets over time, you start to monetize through data assets, as an example, we kind of do that through some of the partnerships that we have today. And those set of data services is something that you can monetize over time. So my point is we deliver the outcome, but then we've got a scalable platform that we're building everything on, and that's how we get leveraging. But it seems in talking to people today, it seems like IBM traditionally, in the last 15 years, has been very services-led, and it seems like you're extracting knowledge from your industry services expertise, and you're actually putting that into software. I mean, that's a very clear trend. Yet you're saying that you're still preserving that competitive advantage for your customers. So am I correct that the services-led emphasis component is shrinking in favor of software, that scale? Is that right, or am I misinterpreting that? I think that's fair. I think what you end up with, though, for some of these use cases is something that's probably, I'll call it, 80-20 in terms of software to services. As opposed to 20-80. As opposed to 20-80 or 100-0, which is, you know, all software, which is the ideal business. It's just, I think it's hard to get extravagant and extraordinary outcomes without having some service assistance. It's not traditional service, it's not systems integration. It's data science work. We've got the largest private research staff in the world still. You give us a set of data, we can find things that you don't know about, and we can build interesting applications. So data science is a service. Yes, okay. But continuing on that thought, if you're working deep within an oil and gas firm about upstream exploration, and you build this expertise about how to really find this effectively or with great efficiency, that model is something that's largely your IP because your data scientists are working in conjunction with the customer to do it. How much of that model stays with you? How much stays with the customer? Because if it stays with you, it's again, you know, no longer such competitive advantage for that customer. Well, it's three aspects. It's models, it's methodologies, and then it's the data. The data always stays with the customer, and it's really hard to get value out of the models or the methodology without the data. What we carry forward is we got the models to do this so it can become repeatable back to Dave's point. We got the methodology so we actually know how to drive the outcomes, but data has to be part of the equation which the clients retain for themselves. That's why I think it becomes a pretty good relationship where they do get a unique benefit and then we can go work with others as well. Okay. All right, so what's next for you? We've seen you several times this year. Yes. You're going to keep this cadence up? You're going to accelerate that? Is there a flywheel effect going on? We have data Palooza coming November 10th in San Francisco, November 10th through 12th. We're pretty excited about this. This is three days, built a data product, and it's almost sold out already, remarkably. And it's going to be a huge turnout. It's really a first of a kind. And we're doing this with a bunch of different partners, some of the folks that I mentioned before. And then we have a world tour starting next year where we're going to hit 11 different cities. This is all about data science training, build a data product in three days. I don't think we've ever seen anything like it in the industry actually, so really excited by that. And that's in November? Yeah, November 10th through 12th in San Francisco is the first event. At Galvanize. At Galvanize, that's right. Exactly. And then we hit the road worldwide starting in January. What's 16 look like for you guys? What should we be watching for? I think we'll just take vacation that year, right? We should be all done. We done? Yeah, let's take, I think, awesome. You'll be game over at that point. Yeah, I'm just wondering, wait. Look, 2016 for us is taking spark to the next level. I talked about some of the things that we've been working on in terms of it being enterprise sufficient, enterprise grade. It's come a long way. 2016 is the breakout, I would say. I've talked a lot this week around a big data maturity curve where clients spent the last five years basically using things like Hadoop to reduce cost. And we are quickly moving to the exponential part of that curve that says self-service analytics. How do you make data available to everybody in an organization and ultimately building new business models on the basis of data? Spark is going to be the fuel for those two phases and that's going to be the big thing. IBM on the steep part of the S-curve, that's exciting. Hang on. It's escalating, quick. One quick question then on that. Are we moving to Spark because of greater development and administrative simplicity that Hadoop is, someone explained it to me with all the different components. You know, you have these failure domains, security admin, all these different things that can go wrong with all the components where it's with Spark it's somewhat more unified. Is that partly why you see the maturity curve going towards Spark? There's two reasons for Spark. One is the unified programming model to your point. That's how you can reduce the time it takes to build stuff. And the second big piece is machine learning. Spark is built for machine learning. It's why we contributed system ML, it's how we're maturing that part of the ecosystem fast. Spark is about machine learning and it's hard to get to self-service analytics or new business models if everything requires people. So you need the automation that comes with machine learning. So those are the two reasons in my mind. All right Rob, thanks for stopping by again. Always a pleasure having you in the Cube. Great to see you guys. Great stuff. Thank you. All right, keep right there buddy, we'll be back with our next guest. This is day two, we're winding down here at IBM Insight, right back.