 Live from Union Square in the heart of San Francisco. It's theCUBE covering Spark Summit 2016, brought to you by Databricks and IBM. I'll hear your hosts, John Walls and Peter Burris. And welcome back to the Spark Summit 2016. We're here in the Hilton, San Francisco for the first of two days of coverage here on theCUBE. I'm John Walls and I'm joined by Peter Burris, who is the Chief Research Officer at Wikibon on theCUBE. Peter, good to have you with me here. It's great to be here, John. You guys have been doing a wonderful job. I know we got a great new guest. Absolutely. Joe Horowitz is with us, Director of Strategy and Business Development at IBM Analytics. Hi guys. Good to see you, sir. Good to be here. Yeah, first off, give me your take of what's happening here. I mean, this shows come a long way in a very short period of time. Yeah, I mean, Spark continues to, you know, maintain its momentum. And I would say it's definitely, you know, you see it here, there's a lot of interest. I think there's more booth. The booths got smaller, I feel like, but there's more of them in here, which I think is a good thing. I think, you know, we were talking about our event that we had last night at our Spark event at Galvanize. And we were talking about like, geez, you know, we grew it like three times. We're like, so should we get a bigger place? I'm like, no, you know what? Keep it in a smallish space and just like, you know, keep this concentrated community in here. And I think that's what, you know, the Databricks team and the Spark Summit team has done really well, has been not to try to like capture, you know, like and start inviting Hadoop, you know, ecosystem folks here and like try to bring in more and more ecosystem folks. They've really kept it, you know, focused on this, you know, really dedicated Spark community. And I think you, you can see that kind of like, you know, the interest of people that it's just, you know, it's about Spark. So I think it's a good thing. But John, I'd actually say that we should define something that we might call the cube heat index, which is the temperature around the cube is the constant. The enthusiasm of the energy index. The constant of the enthusiasm of the number of people milling around. It's about 99.9 right now. It's been like that all day. And certainly you feel it. You felt it at the key notes I thought this morning and a lot of deep learning discussion. But, you know, to hear from Mattheza, Harry and talking about 2.0 and what's happening with that. So I hear all these things. I mean, why in your opinion, why Spark thriving? Why is it taken off? So Spark has really become the true, you know, analytic operating system. We said that last year when we're starting to see it more and more. You know, you bring up Spark 2.0. You know, our team, we launched a Spark Technology Center last year around this time. And we've really put our money where our mouth is. You know, we've contributed, you know, a ton of code to Spark SQL. I mean, clearly we have a background in SQL. You know, it's cool to hear where Mattheza is, you know, taking the project kind of starting to lead into deep learning with like folks like Andrew Ng, you know, talking about, you know, what he's doing at Baidu and Jeff Dean at Google. So, you know, it's exciting to see like the leading edge is still going, right? That they're not like stopping. But at the same time, I mean, we're focused on truly making this an operating system so that, you know, people from many backgrounds can use this whether you're an R programmer or a SQL, you know, analyst, like, you know, it's really accessible. So, ultimately, I mean, it's always about the user at the end of the day. You know, the value, whether it's for a customer or an individual customer or an enterprise. So, I mean, what does Spark do, you think, from a unique standpoint that creates that one-of-a-kind value or that special value right now? So, again, it's all about the people, right? So, IBM Design, you know, came on strong the last couple of years. You can see it across our portfolio, across analytics, with what's analytics for, you know, the citizen analysts, all the way up to what we're doing in our cognitive businesses. And we're doing the exact same thing as we think through Spark and analytics and as an analytic operating system. And so, you know, this week, we announced the data science experience. So, instead of trying to be, you know, the one notebook to rule them all, we really focused in and we did a lot of research. We interviewed hundreds of data scientists to really figure out what makes them tick and what they're really looking for in terms of an experience and an environment for them to really be successful. Last year, you put your money where your mouth is, and stepped up in a huge way in terms of your commitment to the Spark community. What have you seen? What's the ROI been for IBM in that year? And what do you think, even beyond that, what do you think that contribution has done to take Spark to the next level? So, when we think about ROI, I think most folks just think about like, okay, what is the incremental revenue that we added to the top line? Spark is now being used across IBM, Watson, cloud, commerce, you know, everywhere, right? And that's telling, right? And so, you know, we eat our own dog food, so in that case, like we're, so the ROI here isn't just like I said, incremental. It's actually our development is faster. So like, when the example of data works, we went from 50 million lines of code down to five million, right, that's a huge drop in terms of, you know, how fast we can bring products to market. The data science experience we built very, very quickly on Spark. That's why it was built Spark native. And we're able to iterate extremely fast. And so the ROI for us is really about bringing products to market faster and also iterating faster because look, this is an emerging market. You know, we're not sure exactly where this data is going to take us, right? Is it AI? Is it, you know, this or that? And so, you know, if you're not able to work agile and think, you know, quickly and be able to put out products quickly, then you're going to be left behind. So without speaking for IBM. Yeah, I'll answer the question slightly differently and then ask you a question. So probably six or seven years ago, I attended one of the IBM partner world conferences. Yep. And Ginny Romady, the CEO, was there and she stood in front of everybody and she said, IBM is going to serve the CMO. Get to know the CMO. They're going to be an incredibly important person in your lives to all of these partners. Yep. And a whole bunch of partners scratch their heads. IBM since then has, you know, not just limited itself certainly to serving the CMO, but has been active at bringing out technologies that really can help businesses improve the engagement that they have with their markets. Yep. And the technologies that are very important to IBM, proving that are the ones that provide greater cash aid for IBM as an innovator. IBM has always had to walk that fine line between following for the enterprise and innovating for those kind of core problems. Yep. And so here's a question. Shoot. It seems as though this, one of the ROI's, one of the big sources of value to IBM is that Spark and your relationship to Spark and your investment with Spark are a proof point of your commitment to going after that new class of problem that's associated with engaging markets and doing a better job of operations. Yep. How are you, what are you doing? Are you making any announcements that are really intended to increase the size of the community, increase the engagement with the community, and open up new avenues for new problems? Yep, so those are great points. And so last year, just to be clear, I mean, we were the first, first, you know, large corporation, right, that invest in Apache Spark. Well, everyone was still talking Hadoop. Even a lot of the Hadoop vendors. So we were willing to kind of, you know, disrupt ourselves, frankly, because we had a Hadoop, we have a Hadoop distribution, and we were willing to do that with Spark because there was still a lot of questions, like, well, Spark, replacement for Hadoop, is it not, you know, we don't have to go there. But we were the first through the door that said, we see this, we see this as a strategic, you know, opportunity, and we invest it. We open source system ML, which is essentially what, you know, our cognitive business is built on, is a very, you know, we were the first to do that for machine learning. Then you saw Microsoft and Googles and others come after us. So I think in a lot of ways, you know, we were first through the door in terms of recognizing the value of Spark. And so this year, we're also, you know, some of the first folks are looking at, all right, now that we have this analytic operating system, what do you, what does every operating system need? Well, you need a, you know, integrated development environment. And a lot of users. And a lot of users. And where are you going to find those users? Are they going to go out and teach them all, you know, Scala and these things? I hope so. Scala is a great language, but there's a lot of folks who are you know, you don't hope that they're all going to learn Scala. I do, we partner with LightBend. We're very excited to be supporting Scala. It's a very nice language. But look, there's a lot of our, you know, programmers out there, over two million by our estimate. There's a ton of Python users out there, you know, a lot of SQL folks, you know, that are out there. And look, you know, they're looking at job security, frankly. And so they're, that's what they know. That's how they, that's how they, you know, interact with data. And so we're not going to, you know, And they want to create new value for the businesses that they're in, whether they're software developers or enterprise developers. But here's my point. This is where I'd like to, like to push you on this point. Here's my point. So many, many years ago as a young, as a young IT person, I was in a room as the first, one of the first times I was invited to a management meeting and we were sitting around a table and it was a long time ago. And we're sitting around a table and we were talking about how end user computing was going to unfold. This is the mid 1980s. And one of the most senior guys, developers in the room said, well, they're all going to learn to do programming and C. And we're all looking at each other saying, our executive team is going to learn to do programming and C. And the guy said, if they can't learn to program and C, they should be taken out and shot. And I remember thinking, that's not going to work. The point is, is that there are large numbers of individuals that still want to look at a computer as nothing but a tool that has an extension of how they think. And so, as we bring those new developers, we're all just going to have to come up with new experiences, new ways of bringing new classes to decision makers and problem solving into the community. Yeah, so there's a number of ways to look at this. So you have to think holistically about what you just described. So that's why we have things like Watson, right? And you think about, in a lot of other ways, for different people to consume data and analytics, right? And so, whether it's through the command line interface, whether it's through a, you might be a visual person, or maybe it's question and answer back and forth, you're going to see a ton of different ways that IBM is basically opening data and analytics to a broad set of individuals. And coming back to the data science piece, which is where our focus is today, we also joined the Arc Consortium. So outside of a lot of proprietary stacks, the Arc community has continued to grow because there's 9,000 packages now, right? That exists there for very specific domain problems, right? And so we recognize that, look, there's this community there, they're no different than any other open source community. And they needed the support of someone like IBM to come in and actually bring that in to the enterprise and make it reliable and actually, and link that tightly to Spark so that they can actually work with any type of data. People think like, oh, Spark, it's another big data technology, it's not. I mean, it is, but it's not only that, it's really a translator for working with all types of data. So the complexity piece, that's really the part that we are seeing our clients struggle with. Not the volume piece, I mean, that's important, but I actually think there's more value if you can start working with all types of data, audio, visual, log, whatever, so, yeah, it's exciting. So do you think Spark is going to be that platform that we bring all these formats in under the umbrella of analytics? Yeah, I think Spark has all of the, all of the exact ingredients we need. It has distributed computing, right? It has in memory, it has really elegant APIs that anyone can use and build on. It has a great streaming service. It has machine learning and we're gonna do a lot more to make that better. So it has all of these core components that not only make it useful, it's also very portable. So one of the things that Hadoop struggled with is, I would ask you to go and talk to some of these folks, how many of you have Hadoop installed on your laptop? No one is gonna raise their hand. It's hard, it doesn't even make sense. But how many have Spark installed? I guarantee you all of them will, right? And that's exactly what Linux did for the Unix world, right? No one installed Unix on a laptop. Linus Torvalds went off, bought a PC off the shelf, and he's like, I can't install Unix on this thing. So he reinvented, he scaled it down, took the key components he needed, and then when he was able to put on his laptop. And that's where most folks start today is on their laptop. Before you take off, I know we talked about this past year. Yeah. Let's go forward. Look, if the three of us are sitting here in 2017, knock on wood, what would you consider to be kind of benchmark activity? What would you say, yeah, that was a pretty good year. This is what I expect to see. I really hope that we can expand the community. I really hope that we can get more people involved. I think there's still a lot of fragmentation in the data analytics community. So 2017, I would love to see people rally around Spark as this operating system. There's other competing similar stacks out there, right? Which is great, I appreciate a thriving community. But for this market truly to grow in the way that Linux kind of lifted up the whole computer science era, so to speak. I think we're on the verge of the data science era, and we just need all of these folks to come together and kind of get behind the community and drive a single kind of way for us to think and work with data. So for me, I would love to have data scientists be involved, data analysts be involved, data engineers be involved, chief data officers be involved, and connect that back to application development and forward to the line of business. I think that'll be really exciting. Well now we've got you on tape. A year from now we'll replay, and we'll see how good, it's been a great year certainly, and looking forward to an even better year in 2017. Joel, thanks for joining us here. Thanks for having me. Awesome. The Cube continues here from San Francisco Spark Summit 2016.