 Live from Galvanize, San Francisco. Extracting signal from the noise. It's theCUBE, covering the Apache Spark community event brought to you by IBM. Now your hosts, John Furry and George Fieldman. Okay, welcome back everyone. We are live in San Francisco for this special CUBE presentation with the IBM Spark community event here live at Galvanize in San Francisco. Workspace incubator, great place for caliber education. IBM's big announcement today. Their commitment to Spark. They didn't say any numbers, but I'm counting in the hundreds of millions of years to quote Bob Pacino on my call with him on Friday with Rod Thomas. You mean dollars. Dollars, yeah. It's gonna last for hundreds of millions of years. Yeah, hundreds of millions of dollars. Getting late in the day, got the beer coming. Rod Smith's our next guest. Rod, welcome to theCUBE. Thank you very much. You were the catalyst behind Spark at IBM. Worked hard on it. Yeah, you guys, tell a story. What's the story? Well, we worked on big data and I have a group of folks that go out and work with customers all the time. And when we were doing Hadoop, we would do these cool applications that sometimes, you know, small clusters, 20 minutes you get a result and the customer would say, can you do that in a couple seconds? Kind of look around and go, what changed? I mean, did the business problem? And they couldn't tell us, but it was one of those data points in your head that go something's not quite right. You know, what's changing or what are they trying to tell me that they can't? And that's when we started learning, you know, customers were looking for technology that they could iterate on quickly, you know, open-ended questions. It wasn't the, you know, give me a problem, do the compute piece, output, I'm done. This was, oh gee, there's the journey. I now see some interesting insights. I have other questions. Was something not right? The data that they got didn't match their hypothesis or was it the expectation that if I can do it fast on Google and find a Thai restaurant down the block, why can't it work that way? Something that doesn't right was with me. That said, why can't you tell me what you're really trying to accomplish? What I learned is that as we go through these kind of digital transformations, real-time, they were thinking about how their business is gonna change so fast. And so the problem's always been for technologists and vendors like IBM tell us the problem, we pick out the technology and you're pretty well stuck with it. It stays that way. And they wanted more flexibility. Open-ended questions, lots of different data sources on demand when they had to have it on this. They wanted to see results along the way. And they would rather have analytics be approximation that they could use quickly rather than after the fact and more accurate. So when you went through that, it wasn't they couldn't find a BI person to talk about and they couldn't find a data person. So it was fun to try to put piece puzzles together and that's where Spark came into this. So obviously a lot of other trends are kind of vectoring into that convergence which is in memory databases. Absolutely. The flash for persistence store on the storage side. So this, you guys are close to all that action. What was the aha moment for within IBM is saying, hey, you know what? This Spark thing is the next Linux. We got to get out in front of this and help the community go faster and then kind of rising tide floats all boats. What was that flash point for you? We had two of them. One was that in our commerce group, there's ways that they work on online pricing. And there's a vendor standard which takes about a week when you get data off of a site or retail site, they analyze it, they correct the analytics, they put it back up again, takes about a week. But we showed them with Spark, we could do it in about four hours, a week down to four hours. And now they started to think, oh, what do we offer customers? Now we have ways to have not just one product, many products, let's bring in other data, location data, traffic data, weather data, social data. So that kind of exploded internally on, this is a big change. This is something that we can relate to customers. Some multiple data sources and the need for unification and speed or? And speed, speed first. Speed kills, right? Speed first. And it's like, hey, I got all the speed, I want to bring other data sets in. And it's time to value. I mean, if you're going to be a digital business and look at real time where it's going, Netflix, others have really set the standard on this. Okay, so then I'm gonna, so let's take it to the next level. So, Rod, you're crazy, we can't do that. I don't disrupt all these other businesses we have. So how does that conversation happen with an IBM? The way that happens in IBM is, Rod, you are crazy and you're gonna cause me agita. Please go away. And I don't go away easily. But you keep pushing on this. And part of my job is to work with customers. And I show value so I can take the product team saying, you need to take this more seriously. I've got currency now. And then, as you just said, the marketplace starts to light up. Spark is on the front pages. People are talking about how they're using it. Well, Hadoop is growing too at the same time. So what Hadoop does is seeds the market. Seeds the market? You see, you're playing in Hadoop. And you see the customer challenges and then you're like, you guys just connected dots? And then it's back to the customers talking about what their problems they want to use or the solutions they're looking for. So yeah, it takes time because it's risky. I mean, all of us have quarterly is what we're doing, but how do we now make it safer for people in IBM jumping the water? So that eventually they don't hate me anymore. So what's your comment when a friend says, hey, Rod, Linux was great, but it's a different era. Oh yeah. You know, here with cloud and mobile, open source with Apache's evolved to the point where it's very manageable for vendors to be contributors as well with non-company contributors. How do you guys see the difference between those two worlds? Because really this is a Linux moment, but there's no big, bad mini-computer companies, mainframes out there, but they're specialized for like the Z systems are great, but like this is scale out commodity hardware, Hadoop. Now that's growing. How do you describe that? Because there is a Linux correlation. What Linux was for open source then operating systems. Now this is kind of distributed analytics. Well I think the part of this is kind of real-time digital business transformations. And while there's not a bad company out there, Amazon and others have shown how they can be online businesses and use analytics and be very effective, but I'm a brick and mortar company and an online business, how do I do the same thing? And Spark starts to really show that they don't have a corner on the market, we can compete. So that's the big factor on this is while it's not one company doing this, it's I need to be able to compete at the speed the businesses that didn't have the legacy that I do. Amazon started kind of post recession, or you know, dot com bubble bursting. 10 years. You know, web services was just kind of kicking through if we remember our history lessons, and what happened was they really had no traction. They built some building blocks. They made a good decision to integrate two core building blocks, compute and storage, and they built from there. So in a way you guys can enable companies to have their own Amazon-like experience, because it's a fresh clean sheet of paper, right? It is, and I think where Spark gets interesting is like you said, into verticals. What do I do in retail? What do I do in healthcare? What do I do in finance? Right, very specialized, are we showing in Watson? You can do Watson for cancer research. You can do Watson for cooking, right? But they're very vertical now, so. And specialized, domain expertise becomes really interesting, right? That's the big part. And that's the part I really liked about Spark. The community really thought about solution developers. They stayed away kind of the middle ground. Now you don't have to be a deep-dated person or a deep analytics, a BI person. What's the problem you want to solve? How can I help you do that? I think that's a- You know what that's interesting is that that's because most people go, gee, this is speeds and feeds software. When you look at the solution, it's more holistic. But then you're really talking about customer problems, right? That's right. The so-called outcomes that people want. Well that's what, and I think that's the part that I've enjoyed is I want to talk to you about what your problem is. I don't want to talk technology. I don't want to have to make a technology choice from day one. Spark helps me with that. I don't unify programming model. All those things come together so I can concentrate, we can concentrate on talking to the customer about learn from them. What are you trying to accomplish? What's the next things on your list? Go ahead. I was just gonna say, looking at your LinkedIn page, I love this, that VP Emerging Technologies for 20 some odd years. So you've seen a lot of technologies come, a lot of emerging technologies. And the acceleration of these technologies is only going more, right? You have a whole lot more in your portfolio you have to look at today than you did yesterday or five years ago. Why is Spark so special in the cornucopia of technologies that you've seen come and go over the years? It's a good question. And as I've done Emerging Technologies, I've learned that I have to listen to customers very carefully on it. And when I hear those kind of repeatable business patterns, do I see an economic change, a transformation that really sticks with me? And sometimes things have start really big, they start out good and then they fade away. But I always look for technologies that seem to have lots of dimensions to them from a business value standpoint. That's what attracted me to Spark. And my team working with some customers on POCs, we can do them quickly. I really like to get to the point where in industry, we with notebooks and others, we can do solutions in less than four hours for a customer. And what better thing to take your employee to lunch and pat them on the back for something that you didn't expect for weeks? Well, one of the exciting things that you guys have done is you shine the spotlight on Spark and you opened up the conversation globally around IBM's making a big move. Spark was a little bit of an outlier in the mainstream press. I mean, the press were picking up Spark. Oh yeah, Berkeley, some credibility, the great people behind it. But now it's like, wow, it's going to get the attention of CXOs out there. And they're going to be like, hmm, if IBM's looking at it, it must be relevant because of the history you guys have with innovation, but they're going to ask you the question and I'm going to ask you which is, it's not baked out yet. Where are we with this? What are you guys going to do? How does IBM work with the community to continue to bake out Spark? Because a lot of people are using it, bringing it in, but it's evolving super fast and that's going to be the question, is it baked and how does it get baked faster? So I think there's lots of areas that we just talked about. If I'm doing retail or healthcare or finance, there's going to be lots of specialized analytics because that's what Spark to me is, is enabling custom analytics on this. Second part is, as you think about how you want to look at bigger problems, I think that many times our learning is to try to, once we got a technology, let's make everything fit it, rather than starting to separate it by business problems. And I think we can do that now or we can bring to the table technology, learning best practices around this, and solutions. I think at the end of the day, it's how Spark can be integrated into a business solutions and our customers very quickly, and hopefully those customers see it broadly from an interoperability standpoint of what they're going to do. So the final question I have for you is, what was the biggest learning that you've taken away from this process that was magnified through this whole journey of taking IBM from being a participant as a citizen in the community early on as a founding member of Spark. This is back in 2009, so it wasn't like no one knew anything was going on. And we've been covering Hadoop from the beginning, so we'd love to watch these ecosystems grow, but from the early days to now today, what was the biggest thing that you learned that was magnified out of all the reactions, all the feedback, all the customers, what can you share? I think for me, when we did a Spark hack, our hackathon piece, when 28,000 IBMers showed up with ideas, that told us a lot. 28,000. Yeah, 28,000. So now you stopped, and 28,000 people who were focused on the customer. So they had a thought of how this could be relevant. This is great. I mean, this isn't, like back talking before, this isn't one little vein, one little stream, it's big. And big was what we can do for our customers. When was that? About two months ago. How did you pull that up? Just an email blast, all the IBMers put on the message board, do a crowd chat, what did you do? Well, when you put out an email blast, the second one is, you put on a web conf to explain to people what you're going to do with it and what you'd like them to do, and how we're setting it up. And then you step back, and you kind of cross your fingers and hope people show up. And then when you invite 10,000 and 28,000 show up, you kind of know that we're turning a corner as a company on understanding how we can use that for. This also highlights this whole connectedness. Absolutely. And people are things too. So they're a mobile device. When you have that kind of people close to the action, the creativity is there, right? They're on the front lines and they don't feel like that the work they do is going to be taken by the machinery in the old days. I got to go back, all these hurdles, I got to jump. Now they can instantly be there with some solutions. So that's super compelling. The next question is security. And how do you see that weaving in? Because now one of the things that came up, well, first of all, let me back up before I get to security. Let me think about the security question for a second. Last week at Hadoop Summit, we were talking with the Hadoop ecosystem, Hortonworks, ODP, Conversations, et cetera. But when you looked at kind of like reading the tea leaves, it was Spark that was kind of stealing the show. The subtext was Spark. All the Spark sessions were packed. The developers had, it was salivating over Spark. So, why is that? Why are the Hadoop developers salivating over Spark? Is it because they want it to go faster? Do they see extensions? Any thoughts? I think that, I'll say it two ways. One is, I think there was, and since I did Hadoop for quite a while, I think people thought for a while Hadoop was gonna be an analytics platform. And it kind of went down the path of being a more generalized platform so you could do more than MapReduce jobs. So there's been this pent up demand for really analytics focus. And Spark offered that focus. And the performance side. I think that's the part that- So Hadoop sold kind of a fall stream or it didn't materialize fast enough. I don't think it materialized fast enough. Not a fall stream. I think I'm saying it. They promised the moon. Yeah, well, and people set those internally. Well, the press maybe. Yeah, I don't think the vendors were. I think it was more of the- Well, vendors, you know, it did too. Well, unstructured data does that. Unstructured data does that. Storing data and being able to act on it creates some interesting dynamics. I mean, I've worked with customers who started to put data in Hadoop, put data in Hadoop. We're only gonna do a year's worth of data and then putting three years of data because they want to do Monte Carlo simulations again. Don't say Monte Python. John, please. He threw water on us. And we love it. 20 years old. Those are my days. We have them on the queue. But the problem says we're talking about before, like our internal use, we can produce interesting innovations in days. That's gonna attract audiences because now they can show their business people what they can do for them. That's what's really driving this. I mean, if you got a CXO, CMO says, show me what you can do. Do segmentation on my population for these products. They want it in minutes, not going to run it in different jobs and over a certain period of time. I was just talking with the CEOs of DocuSign, Box 18, well, I think COAT was like an executive director, and then EVP, a platform that's Salesforce. The common thread amongst those executives was the new digital transformation has such a dynamic or impactful economic impact. Yes. DocuSign was using examples how literally, Deutsche Telecom saved $230 million on one process. One process with analytics and process improvement. It sounds funny, but it's extremely low-hanging fruit. We haven't had technology in the economics to be able to support it. Now we do, and now you're seeing the solution develop ago, I think I can make a business result faster. And if they can show it, then businesses react immediately. I think that's the beautiful thing about what Hadoop has done. I mean, I brought that up earlier trying to tease that out, but the reality we're seeing is that that market's continuing to grow. Absolutely. But there's a world beyond Hadoop. I mean, Hortonworks is a public company. I mean, IBM is massive. So you got Hadoop, and then Spark's a beautiful extension to that that enables so much more. Well, I think Spark will go further because it's more, to me, is another dimension. It's an integration technology. So I can have Spark hooked up to legacy systems without Hadoop in there doing analytics, in there being an avenue for doing joins on data, doing analytics on unstructured and transactional data, weather data, pulling it all together. And I think that's the, again, talking about multi-dimensional, that's what's going on. And that was hard even five years ago. Absolutely. So any relational database that's a nightmare. Yeah, and you're asking about security, so I don't want to touch on it. Yeah, okay. Go ahead. So part of the things that I like about Spark is the technology is called resilient distributed data. It's RDDs. So I read data from a source and I make it into this RDD, I can work on it. That gives me a great data point or a great interaction with a Cassandra. DataStacks did a really great job of a Spark driver. So you think about this in businesses for a DB2 or something. Now I know where I can put my security and my governance. I can put those at certain endpoints now as I'm reading in my application and writing these things out. So again, back to my point of an integration, it's not something that I'm trying to get around a business. I'm actually integrating. Extending their life and or capabilities. That's right. So I got to ask you the internal IBM question. My last question is what's the vibe like at IBM? Because I've worked at IBM way back in the day, back in the 80s, and the culture's changed so much. But there's still a huge technical group of people at IBM. So I got to ask you the question. With all this new cloud innovation, all this new capabilities to do stuff differently, what's it like for all the technical guys at IBM right now? Because they got to be like, hey, we can now do this. So new capabilities are emerging. What's the vibe like? And what are some of the things that are low hanging fruit that are game changing? Because low hanging fruit is game changing today. Oh yes. So what's the vibe internally at IBM? Vive internally is very hot. I mean, the guys and gals at this, you look at cloud computing, look what we've done with Bluemix. It's getting great press. It's getting great results with customers back to this time to value piece. It's new to us. I mean, there's only a small group that started that. So now the rest of the IBMers are going, this is really cool. How do we do it? Now you've got analytics that we're starting. You have competencies around this. Now you can think of the real time aspect. So yeah, the vibe is really hot. All those silos, identity system here. I got to build all this software. Now you got to go horizontal. So that's kind of a new thing. That's kind of exciting. Things can be fun to watch. My final question, and I guess it's my final, final question is... Have you been keeping track? 456, put some spark on that. This is the sixth final question. Well, Rod, it's great to have you on theCUBE. You're awesome. Great commentary, great insight. Spark in the cloud is what Data Bricks announced. What about on-premise? I'm a customer. I want on-prem. I don't necessarily want to do. What's next? CoreOS or other stuff? Oh, I think you're going to see, like hybrid models for cloud where Spark as a service is there, on-prem. I think one of the really exciting parts to me is that one, the unified programming model, to the portability of the analytic models. So let's say I start on-prem because I'm worried about security and other things and then I want to move it to a cloud service. Well, I don't have to go rewrite it. I can just move the analytics over from a model standpoint. So I think you're going to see this evolve very fast as people want to do either on-prem or hybrid or dedicated. Because of the integration capabilities and the distributed nature of it. That's the point, yep. Well, I'll let you get the last word on the segment. Share with the folks who are not watching. What is this all about today? Why isn't San Francisco today IBM's announcement? What's so groundbreaking about it? I know you're a part of it and you're a little bit biased, but share with the folks why now? What's this all about? What's going on here? Well, we think that the kind of epicenter for Spark innovation is here in San Francisco and AMP Lab and with Databricks and others are doing here. And we want to be a part of that. And I think Spark Technology Center is setting up is about how we can contribute and learn and help the community grow. We think this is going to flow possible. You brought some food to the party. I mean, or I said earlier, beer, right? You're bringing up, you know, the ML? Yeah. You got the wine, Napa Valley, of course, you got to go with the wine. Well, craft beer is good in North Bay. Yeah. Thanks so much for coming on theCUBE. Really appreciate the insight. This is a great color from an expert at IBM here. We're on the ground. This is theCUBE's special presentation, live in San Francisco. We'll be back with more with live coverage of the breakouts and the event tonight. IBM Spark Community Event here in San Francisco at the Galvanized Workspace Education Center. We'll be right back.