 A lot from Galvanize, San Francisco, extracting signal from the noise. It's theCUBE, covering the Apache Spark community event brought to you by IBM. Now your hosts, John Furry and George Gilman. Okay, welcome back everyone. We are live in San Francisco. This is Silicon Angles theCUBE, our flagship program. We go out to the events and extract the signal noise. But we are at the special presentation of the IBM Spark community event here in San Francisco. The Galvanize Incubator workspace, a lot of startups here. We are right across the street, literally across town, if you will, Union Square in San Francisco, where the Spark Summit's happening. We're covering that event with folks there and also special presenters at IBM. I'm John Furry, George Gilbert, again analysts, our next guest, Joe Horowitz, Worldwide Director of Marketing for IBM's analytics platform. Welcome back to theCUBE. Thanks guys. So we just last week at Hadoop Summit and really the teaser wasn't, we were kind of like, you know, wink, wink, we kind of, I kind of heard about the news then confirmed that we were kind of under embargo but it's released as of nine o'clock Pacific last night, 12, midnight Eastern. And then it's just been gangbusters all over the world. Wall Street Journal, two New York Times articles. One, your pictures in it. At Spark Summit, huge press. Spark has sparked up huge trajectory of changing in the industry. What's going on? You're in the front lines. Tell us what the hell's going on. Well, I'd be happy to give you my perspective. I mean, it's exciting stuff. I mean, what can I say? It has been a phenomenal day. There's been some phenomenal announcements. You know, open sourcing system ML to me is like a landmark point in time. We're writing history today with this and it's just, it's amazing, right? I mean, when you think about how far we've come with big data and analytics and now we're talking about an analytics operating system that anyone can use. It doesn't matter your background. That's opening the market to a whole number of new applications, a whole number of industries, a whole number of professions. I mean, just this weekend we had a hackathon with 40 odd people here in Galvanize and there were people who had never touched Spark before. Literally had never even touched it and they were building facial recognition applications. They were building predictive crime in San Francisco algorithms. I mean, this is incredible stuff. It's an enabling technology. Kid in a candy store kind of, for tech. It really is kind of a big deal. So I got to get you a frontline perspective. I mean, obviously last week at Hadoop Summit, the developer's sessions were packed with Spark. Why is everyone jazzed about it? I mean, you're there, you're talking to people in the hallways, what is the big deal? What is Spark? Why is it so important? Because now that you guys have shined a huge light on this and the whole world's now seeing it, asking questions, what is the Spark thing and what does it mean for me? Well, I think it's a new generation thing. I mean, if you look at the founders of Apache Spark, Matej and Patrick and these guys over at Databricks, I mean, this is a new generation. It's a new way of thinking about data. It's awesome, right? And so to me, I think we're making data fun again, right? When I was working as an analytics professional in my former role, I mean, the BI world and this type of thing was frankly, a little bit boring and stale. So I think what we're seeing is the fact that Spark is truly, you know, no pun intended, sparking a new generation of analytics. Well, the relevance, well, not only it's exciting from a technology perspective, but from a business relevance standpoint, it's at the center of the action. The guys doing that work with the data are the most valuable enablers for top line revenue, not just automation. Yeah, the more we can shorten that cycle time, and when we talk about agile at IBM, we talk about application development and we talk about engaging with people. So the more that you can shorten that time between insight and action, the better everyone will be. So I think that's what we're seeing really with Spark is just this incredible ability to just shrink that time almost down to zero. It's awesome. Keying on that, Joel, why? You talk about Spark as an analytic operating system. We've had analytic technologies, you know, available broadly in the Hadoop ecosystem and in other areas. Why are you picking Spark and calling that the analytic operating system? And why is that shrinking the latency between the operational data and the point of interaction? Yeah, I mean, I would love to take credit for what's happening, but clearly it's not me. It's the market, right? We've reached this inflection point where the cost of stores, the cost of processing, the cost of bandwidth, open source software is enabling, right? Anyone can go and use this technology now. And to me, that's what's actually the driving factor and the driving enabler for anybody, like I said. And there's a whole, like I said, generation of creative people who are able to pick this stuff up now and use it without, frankly, any sort of barrier to entry. I gotta ask you a question, because again, you talked about the young generation. I'm an old generation. I've been in this software business for 30 years. And, you know, when open source was radical, we were doing some, you know, Unix. They had called Unix, you know, Zeno, because I couldn't actually use the word Unix because it was the AT&T trademark. Now you got Linux, and then the whole thing changes. But I want to ask you specifically about the open source business model now. In the new young generation, as you talk to the folks on the front lines, Joel, what's going on? I mean, it seems to work now. People don't care if you work for a company. People, and the bigger companies are donating more and more IP. Facebook donated a ton of IP to open compute. I mean, in the data center, like the, so did Microsoft. So there's a collaborative business model that's working. It used to be in the 80s, 90s, big company, don't screw up our agenda, stay away. Now it seems to be working. We're seeing it working to dupe, and now again in Spark. I mean, is that true? Do you see that? People don't really care where the contribution comes from? Well, I mean, what's the vibe? I mean, tell us the vibe. Yeah, I mean, certainly, you know, contribution matters, that's for sure. You know, put your money where your mouth is. So I think we did that today with System ML. We're not just talking about supporting Spark. We're not just talking about investing dollars. We're actually committing some serious IP to this project. So that's a huge thing. I think when I talk about the business model and what that looks like, I think what we're seeing today is more about community. So to actually be successful in any technology today, I think it's really about creating this community. And I think that's why we're opening this Spark Technology Center here in San Francisco. So it's not about necessarily cranking on algorithms and cranking on development, but it's actually about putting our foot firmly in the entrepreneurship ecosystem that's going on here. It's all about the community. It's like I said, the old expression, bring something to the party. You know, you guys just brought some good beer to the party and you're donating it. Of course, you'll participate in drinking that beer, but in this case, Spark. What is, I mean, that's really the world. Hey, look, we'll bring something to the table. Yeah, I mean, we really, I mean, we brought a lot to the party. And you know, it's exciting because when you think about what we brought, it wasn't like a superficial thing. It was really this kind of deep, kind of enabling technology system ML that is gonna make the lives of data scientists like super easy. Let me ask another angle that you'd mentioned before when we were talking, we've seen over the last year or so, really while Spark was still incubating, not technically, but you know, de facto, tons of these ISVs, these big data ISV startups, were beginning to do work with Spark as their underlying engine. Why did that happen? Why did they pick Spark? Well, I think the simple answer, in my opinion, is really the ease of use. I honestly think so. I think a lot of people will say, oh no, it's because it enables real time and it makes things go faster and they talk about that type of thing and there's enough dashboards in the world, frankly, right? I mean, between you and me. And getting, you know, insight is only as good as the decision and the action you can take from it, right? And so to me, it's more of this operating system in the sense that you can take Spark, you can actually build an application with it that'll infuse that insight into the front line. And I think that's- I hear all the time, Joel, from customers, if someone comes in and sells me another dashboard, you know, I'm gonna jump off a bridge. I mean, that's like, that kind of like, no, but that's the mentality. I don't need another platform to give me a dashboard. They want results. They want more insight. So what we're getting out of machine learning is the beginning of the building blocks. So I gotta, you know, obviously that being said, we believe that, if you believe that, then what's the next step? Is the apps, we're hearing this end to end messaging. What is the end? Where's the end? Where does it start and what does it end? Yeah, I talk a lot about this idea of a data science last mile. I think that there's plenty of data scientists who are building amazing algorithms that are able to demonstrate amazing things. But I think it's this last mile that we're actually starting to see where you can take this algorithm and this intelligent thing that you've built and push it to the edge, right? So we see things like internet of things, mobile, web, anywhere that you engage, right? I mean, even I think it was Ralph Lauren who had a shirt that's now smart and could tell you how fat you are, I guess. But in any case, it's like, it's pushing all the way to the edge. So it's exciting. I mean, so anywhere you can engage digitally, then you can use Spark. So if we were to sort of summarize that we've got existing operational applications that we're not entirely going to chuck, we're going to build on them. And Spark as the analytic OS helps shorten the time between the data that they're capturing and the interaction with the consumer, with the machine or the wearable piece of technology. Is that where we should, how we should think about it? Well, yeah, I mean, there's plenty of excellent applications today, right? And so the beauty of this isn't, we're not saying, hey, we're going to start a whole new, you know, Hadoop 2.0 or whatever people have been saying. It's really about taking what you already have and enhancing it, right? This is a complementary technology that's going to allow us to leap ahead. Not just iterate ahead, but actually do it. Complimentary to, sorry to interrupt. No, that's okay. This is really important. Complimenting to existing operational applications or complementary to existing Hadoop deployments. I would say both, right? I mean, Hadoop is a phenomenal platform for storing large-scale data. It's a phenomenal platform for, you know, has a very robust ecosystem. That's not, you know, disappearing anytime soon. In fact, it's just integrating more and more and more with the enterprise. If anything, Spark comes in and it's going to drive that demand even more. Yeah, I've always said that, you know, I always been, you know, David and I always talk about this and like, Hadoop is great, but Hadoop creates more of an appetite. Correct. For big data. That's what you're basically saying. Hey, Hadoop's not going away. It's going to grow and grow, but it's not the only game in town. Yeah, there needs to be, you know, there needs to be a demand with the supply, right? And so I think that there's been plenty of companies who have been kind of creating their data lakes as we like to talk about, or maybe a data ocean as you would say, you know, and they're becoming kind of a data marsh because it's like, it just sits there. So it's like, okay, then in walks Spark and all of a sudden, you know, like one of our clients, you know, independent Blue Cross, they're not able to do one killer application. They're able to do like a dozen killer applications, right? And very, very quickly. And not only that, they're able to go back to those apps and continue to make them better and better and better. And that's the essence of Agile to me. So we had Beth Smith on earlier, and it was funny we got to the Wild West reference because you're in San Francisco with the office. This is where the event is, where Berkeley is. And then even the New York Times has the frontier, big data real-time frontier. Explain to the folks who are watching who might not be inside baseball in the industry, what is, why is it Wild West? And again, why is it so important? If you had to explain to them, you know, CIO or CEO of a big company who's like, hey, I just consolidated my data. It seems like yesterday I just consolidated my service in my data center. And I was like 10 years ago. Okay, now today, why this move? Why Spark? I mean, I'm from Seattle originally. So, you know, A, it's pretty rainy in Seattle. I don't know if you guys are familiar with the town. But I think part of it is the weather down here is gorgeous. I think people just have this, I don't know, there's just this, there's just this feeling of doing good in the world. And I think that's what a lot of it is about. I think most people are very comfortable with, you know, openly sharing information and openly sharing, you know, I mean, when we were at this Hagathon last night or this weekend, I mean, you had companies from Ericsson and, you know, all of these Hortonworks, you know, they didn't care. They didn't, you know, have a flag and say, this is my affiliation. I can't talk to you, you know, IBM. I can't talk to you, Clatter. I can't talk to you, whoever. Because they're just interested in technology. Technology cuts through all of that. You know, innovation cuts through all of that. So I think that's one of the biggest things that people, you know, respect in the Valley here is, frankly, just innovation and innovative thinking and they respect technology. So technology cuts through the noise, is open sharing, people are collaborative. Yeah. That's kind of the perfect storm for open source. What's your plans? As you go out and market, you have to then balance the big data industry growth community as well as the profit objectives for IBM. How do you balance that? How do you talk about that? And how do you share externally? Yeah, I mean, the thing that gets me up in the morning excited about going to work is just enabling people who may have never even thought they can use data in a certain way. It's about creating insights that they never thought possible. It's about this whole new insight economy and how we think about, man, you know, people are going to be using algorithms in whole new ways. You can't throw a rock without hitting a new startup who's pitching, you know, a company that's based on machine learning. So to me, it's just, you know, when you say the Wild West, I think of it more as a gold rush than the Wild West and California is known for that. So that's what it feels like to me with, with machine learning and AI and all of this. So it's really exciting to see what people are going to create next. You know, as you were saying about Hadoop a couple of years ago, I've talked about or up until recently, Data Lake, that's sort of like, put your Hadoop cluster in and accumulate data and that's your first app. And that sort of smacked of, you know, I need to kick the tires and sort of get a sense for what it is. If, as you go around talking to IBM customers and you want to talk about something that has a budget associated with it, you know, IT or line of business manager, what are those first apps going to be? Man, that's a hard question. So when I think about, you know, this whole idea of the Data Lake and not throwing data away, I think what machine learning is going to do is it's going to enable us to actually take a very critical eye to that data. I think we're actually finally going to be able to look at it and say, I can extract the essence of this information and store it in a compact way. I think that's what we've been missing because we've been thinking about only, how do I keep all this because I never know what question I'm going to ask or I never know what query I'm going to do. But in my brain and just thinking, you know, long distance, I think about, okay, there's probably a lot of duplicate data, there's probably a lot of, you know, data that multiple people have. And so I think about, you probably won't need to store everything if you can just extract the insight and then you're going to have data flowing in that can frankly just update that model and update that insight. So I think we're going to get far, far more efficient with what we're doing here. So just a follow-up on that, it's interesting because once we, for the customers who got past that Data Lake, it was typically, it's the offload of ETL from the data warehouse, but that's still a batch process. So what I'm hearing you say and correct me if I'm wrong, now we're going to take that Data Lake and instead of operationalizing the repeatable stuff and putting it in the data warehouse, we're actually going to extend it back into the real-time operational application with machine learning. Yeah, I think it's a mix of both, frankly. I think there's going to be always a need for the data warehouse. There's always going to be a need for the Data Lake. And the reason why I say that is because you have to look at it from a case-by-case basis. There are certain things that Data Warehouse was built to do very, very well, create repeatable processes that drive your business, these systems of record as we like to call them. Whereas Hadoop and Spark, I think are making up this new area that we call a system of insight. And so for that, you need a much more agile, elastic environment to work in. But like I said, they're very complimentary. How that pushes into an operational situation I think will be expedited and be done a lot faster with machine learning because you can actually extract the essence much easier. So what's next now after today? I asked Beth and she's like, well, let's get through today. So this event's happening, Spark Summit. After this is going to close down, I know you active on social media at this comprehensive plan. Can you share any insights, what the plan's going to be, I see more trainings, more things that galvanize other partners. And for folks that want to get involved with you in an open way, how do they get involved? Yeah, thanks for bringing that up. No, I mean, we are going to be, this isn't the end, this is the start, right? This is the start of something of a new beginning. This is the start of a whole new push that we're making. We have to train a million data scientists plus. That's a huge undertaking and we're taking it on and we're the first to commit to that. I haven't seen anyone else step up and say, they recognize the problem, but they don't actually offer a solution. And I think we're the first. I think the second is, we're going to be driving hard at getting Spark out there and adopted by the enterprise. I think there's a lot of value locked into this project and we need to get into the hands of the line of business owners and that's what we're planning on doing. And as these solutions mature, you guys will be hardening them out at IBM, integrating it into your piece. That's right. I mean, we have a whole breadth of analytics that we're going to bring to the table and make it far, far easier to, you know, ingest the value that's coming out of this. All right, final question is a more personal insight from you could like, you'd like to share some color, put your opinion on, personal opinion. Share with the folks, what's going on internally at IBM? I mean, is this people jazz? I mean, I talked last week, Bob Pocciano, Rob Thomas, and other folks, you yourself last week as well. Why, they're excited. I mean, what's it like at IBM? What's it going on inside IBM? Because again, I'll be straight up. You guys are executing this vision. This is not a new strategy, this is execution. But what's the vibe like? What's the employees like? Are people jazzed behind this? Is this kind of like a, just a Hail Mary or just more of a well thought out execution? There is a very well thought out execution going on right now. And I only joined IBM a few months ago, right? And I have to say to give you kind of my status report after joining IBM, I love it here. It's a blast. I mean, who else but IBM can have this type of an impact on the market, right? In my opinion, there's very few. And so to me, having set the strategy and working with the team, everyone is just energized, is ecstatic about where we're taking this next. And it's a lot of things. Yeah, IBM, you've got Google. You guys are the big industry giant. I mean, this is a huge thing. I think Spark should be throwing a parade for IBM for all the awareness you guys are doing for it. No, it really, it brings out, it brings out, what is this, it looks like a left field. What's coming out left field? IBM is dive bombing on this Spark thing. This is amazing. What is it about? It's a collaborative effort. I mean, I would love to say, oh, it's all, it's not. I mean, we're being very conscientious of who we're kind of going to market with. We're thinking through exactly, it was a long thought process to decide if we wanted to open source system ML. It's a huge amount of research that we're making available to a growing community. And so no, it wasn't just kind of an afterthought, oh, we should do this too. We've been meeting with the team over at Databricks and others and the broader community to align and understand how IBM could bring the best of its world. It's great to collaborate with you, Joel. Again, we're proud theCUBE to work with you guys. And again, we were there at the beginning of Hadoop. So Dave Vellante and I would, back in 2009 and 2010, we're like, Hadoop, no one even knew at Hadoop. We also were first at Spark, first Spark Summit. And again, being at the ground floor of such an important part of the historic industry, this is not going away. So we're super excited to continue the journey, jump on and break in new markets with you guys and the community. Again, this is theCUBE bringing you all the action. We'll go wherever the action is and we'll get the live in. We're going to unpack what more of this, what it means for IBM's portfolio products. We're going to talk to experts. Special presentation here with theCUBE at IBM Spark Community Event at Galvanize. We'll be right back. Thanks.