 from Galvanize, San Francisco, extracting signal from the noise. It's theCUBE, covering the Apache Spark community event brought to you by IBM. Now your hosts, John Furry and George Gilbert. Okay, welcome back everyone. We are here live in San Francisco for a special presentation of Silicon Angles theCUBE with IBM's big data announcement around the Spark community. I'm John Furry with Biggie Wanna Ask, George Gilbert. Our next guest, Harry Freiman, Vice President of Portfolio Marketing and IBM Analytics. Welcome back to theCUBE, great to see you here in San Francisco at the Galvanize workspace incubator, whatever they're calling it, but it's developers everywhere. I know, it's the first time I think we've even spoken in the city where I live. It's kind of nice, no travel. And we're always in Vegas. We've had some great conversations over the past year or so on theCUBE about analytics at IBM events. Something different at IBM right now. You guys are in the community, in the trenches, here in San Francisco where all the action has got Berkeley right across the street as part of this announcement. IBM's huge commitment to Spark, which is now opening up to the world seeing what's under the hood of all this magic called big data, obviously Hadoop. Spark, a lot of stuff going on. Take us through what that means from your perspective. Yeah, we always talked about analytics being in the hands of IT, right? And the business being delivered capabilities from the IT world. And obviously we've seen a shift there where the business is getting involved in self sourcing their own data and their own analytics. Well there was a whole community that really we feel we can reach and be engaged with and that's the community of the developers, the data scientists, the data engineers, the people who are coding stuff and coding insight into applications. So this is really all about the promise we've always had which is let's get analytics into the hands of as many people as we can and get some value out of the data we have. The other news I want to get your thoughts on here because again we've talked about this before is we've always said, and Dave and I have always talked about this on theCUBE, pedal faster. The industry wants to go faster at Hadoop Summit last week that Hortonworks put on. There was a huge packed house on all the Spark summits, all the Spark sessions I mean, leading up to Spark Summit. So there's an aroma of innovation around Spark. Okay, that's because of the engine and the data science push. Does this make it go faster and what specific things can you point to underlying use cases that make the big data and this engine fuel faster innovation? Yeah, I think there's a huge appetite for going faster and getting to outcome. It's no longer just a dashboard for evidence. It's actually gonna be about getting to outcome and that's where Spark can help. So when we look at the use cases we really see I guess three types of DNA of a use case if you will. One is highly interactive analysis on very large data sets so it's got great speed and performance there. Another reason is when you're running very compute intensive algorithms across large data sets, so Spark is great there too. And then I think as you were speaking with Joel in the prior interview about getting that insight to drive applications, processes, the things all around us and that's really where Spark can have a great role. The beautiful thing I want to share with you is observation the last week was we had users and big companies tell us they don't really care about the big vendors versus open source. At the end of the day they like the fact that everything vectors into an open solution where there's choice and no lock-in. Lock-in now is the contribution. So I want to get your thoughts of how you guys are collaborating from a product portfolio into the open source world because now you guys have to differentiate, you guys have to make money. It's not like you're just gonna give away free candy to the kids if you will or in this case system ML kickstart this revolution. So you guys have to make money so you have to integrate your stuff and differentiate. So how does that all work? Explain how that, what's in it for IBM too. Yeah, yeah, so we talk about Spark is really providing the analytics operating system as with all operating systems it's there for a purpose which is to be a foundation on top of which things can be built, value can be built. So we're looking to integrate Spark into our technology so we can build Spark into the core of the analytics platform as well as into our IBM commerce solutions. And so we see the tools and the solutions that we can build can have the value of having Spark inside them and Spark integrated into them. Talk about Blue Mix, because Blue Mix is kind of like powered in here it's subtext of the main announcement and it's in the main announcement. So the IBM will offer Spark as a cloud service on IBM Blue Mix to make it possible for app developers to quickly load data, model it, derive predictive artifact to use in their app. Okay, what does that mean? Does that mean Blue Mix is catching up to Amazon now with this or is it, because Amazon has a lot of the stuff built into the cloud so what's the Blue Mix angle on this? Obviously DevOps and analytics are like a really beautiful combination right now. Yeah, yeah, so not many people are either aware of Blue Mix or the power of what we have up there. So we have a whole range of data and analytics services out on Blue Mix which is our developer cloud. So any developer can go out there, use one or many of those services in combination and we're looking to have Apache, Spark's a service up on Blue Mix so people can mix and match those services together to build what they need. Is it free? It is available. Well it's actually closed beta now, seem to be open beta and then people can use it for free, try it out, yeah. So those of developers are you guys gonna offer Blue Mix for the developers? Mm-hmm, that's it. That's the main for the service, not like to end users yet. Yeah, exactly and then I always get the questions, you know, it's almost like we want to have a choice either or. It's either Spark or Hadoop, it's either open source or it's a tool and I always think there's an and in there. You can use them both together and that's really what we're doing when we integrate Spark with our platform. I think you're being politically correct when you say that, we were calling it out last week at Hadoop Summit. There is a world past Hadoop, Hadoop is gonna grow and continue to be important. There's no dissing Hadoop, and Hadoop is mainstream. It's got some work to do, DevOps could fit in there but this is gonna be a huge growth in Hadoop but that's not the only game in town. Exactly. There's other stuff. Exactly, it's like when people tell me that old data warehouse that's so 20 years ago and actually I remember building data warehouses and if you look at the renewal that's happening with data warehousing, there's still data stores that need to serve a purpose. I need to store history, I need to be able to run analytics on it, I need to be able to reconcile data sources together. You just named a data warehouse but you call it a data lake, a data repository. Yeah, I mean Joel basically called out the trend we'll see with millennials is that the data warehouse world, I mean when I got my degree in database back in the 80s just like you weren't telling everyone, hey I'm a database guy, like rah, rah. No, today you're a rock star because the data is so important, the developers that are working on data, it's exciting, right, it's energizing. Versus the old days of run the TPS report, these office space line there. So that's the old world, right? That's kind of like what it was, that's boring and slow. Spark kind of brings that energy. You seeing the same thing? I mean I agree with the office space comment but I mean Joel kind of pointed it out. Oh yeah, it's tremendously exciting. I think I overheard a partner of ours go, there'll never be a poor mathematician again because now the world of data scientists has come along and there's actually an appetite and a desire to have people that understand the logic, understand the maths that's gonna take to find the value in the data and so now data scientists are at a premium and there's not gonna be any poor mathematicians anymore. I wanna go back to a comment you made about the DNA of the use cases. So you're talking, when IBM is selling into the organization with the analytics solutions, tell us the roles of the people you're selling to and how it's different now that Spark is part of the solution, even if they don't see it. Does that mean you get time to value faster or does that mean the application is more effective because it can get insight to the point of interaction faster? What does that look like? Oh yeah, sure, so we have a very broad portfolio. So we see Spark as enabling that developer data engineer, data scientist to rapidly build insight out of data and deploy it out into applications like the internet of things and so we really see it as providing the Swiss army knife for that group of people, self-sufficient builders. For enterprise IT, it's gonna be part of their hybrid architecture. I think we always thought hybrid is a word only reserved for cloud but there's hybrid data architectures and enterprise IT has to look to mix open source technologies with the existing standards they have internally. And then for solutions, you mentioned getting time to value for solutions, well if Spark can deliver better insight from the algorithms, it can deliver it faster then actual industry solutions will benefit from that as we talk to the line of business side of the house. So it's both building the solutions faster and within the solution, getting the time to insight faster as well? Yeah, so for example in our IBM Open Platform with Apache Hadoop, that's our free for production use Hadoop distribution that has the Spark project in it so people can play with Spark right there with Hadoop. So I got to ask you about the San Francisco office and how that relates to some of the things you guys are going to be doing in marketing. You and I spoke a couple of months ago like you guys are going heavy into big data in the industry now I see how that all played out. What's next? What else is on to be on the ground and what target audience of developers you guys specifically targeting? Is it the DevOps guy? Is it the app developer? Is it a couple targets? Is it the new people coming in that want to learn and the millennials? Can you share some of that data? Oh sure, so I actually will use a phrase from Steve Law in his book Dataism which he called them dopes which is a great term for the data oriented people. So we're looking for the particular developers that are data savvy. They want to deliver insights so we're really targeting the data scientists, the application developer that's looking to build more data sophisticated data based content into the application so that particular audience and then at our San Francisco office I'm delighted that we're really building out the team there who are focused on understanding Spark, innovating and contributing to the core of Spark and then working with clients that come in on realizing or imagining the solutions that are possible with Spark. So I got to ask about the Databricks relationship. Why Databricks? And they obviously announced today their availability, their GA, general availability of the cloud platform. Yeah, but not everyone's in cloud. I mean, everyone still has on premise. Is there a distinction between on premise and cloud? How does that work? I mean, you guys, how do you reconcile that question? So Databricks is just a great partner to have. We'll see they have the inventors there, the great contributions out of that company. They're really driving the community to understand the value of Spark and to use Spark. So they're a great partner to have and we see the relationship is actually growing from here. As Joel said, we're right at the start right now in our journey and it will only increase and we see them as a partner in crime to get out to the million data scientists and data engineers that we really feel and needed to fuel the world with insight. So they will be partnering with us in all aspects from the technology, the community, as well as the skills in the marketplace. So what about the visualization market? How is that going to change this? Because this seems to be really the holy grail with Spark, there's so much speed coming in. The visualization will probably be a big impact on that. You see rising tide certainly on visualization. Well yes, I know people will say that the time of another dashboard in the world is past a spy and I think dashboards serve a great purpose which is when people want to see the data and look at evidence but as we all know, unless you take that evidence to action, it's not really useful, it's interesting but it's not really providing benefit. So the visualizations that we can do on big data are just so diverse that we believe there'll be actually new visualizations happening. Maybe we'll still have a dashboard but there'll be new types of visualizations because the human eye can't sift through all that masses of data unless we help them. Is that the end goal of Spark? I don't think so. The end goal of Spark is to put the analytics into the processes, into the applications and actually drive applications based on the analytics, not based on rules or pre-programmed, pre-configured logic. So I had a tweet last night, it said this is a real big move from IBM's upping their game, looking good, power move with a capital P, kind of teasing out where this end-to-end thing goes as you look at it, you're starting out with machine learning. I'm going to ask Rob Thomas these questions so I wouldn't go deep with him on this whole architecture but you're really just starting seeds of being planted with ML. The machine learning with developers, getting some of the tooling, you're bringing a big shortcut for developers with the IP you guys are donating into the community but that's not the end game, that's just the beginning of a journey. When you get to it, the next evolution is going to be in processor analytics, in my mind, I'm just connecting the dots and not speaking for IBM. So that means the power systems and the processors. Yeah, power systems, systems Z, yeah. I mean, am I kind of like going on the right dot connecting there? I mean, is that kind of where you see it happening? Yeah, we want to make sure when the data is so big you can't move it to the analytics, you got to move the analytics to the data and that's why we have such a healthy business on system Z with operational analytics. So where people have the majority of their data residing on a system Z box, we want to place the analytics there so that they can get a reward from gaining insight and putting that insight into the operational apps that happen to be running even on system Z. That's correct, I'm just tweeting this out, that was a good one, bring the analytics to the data. And again, Hadoop has the same philosophy, bring the processing power to the data and vice versa. So going next, so for the CIO out there, what's, they're going to be like, wow, this is kind of a move at a left field for IBM, what's this mean? I mean, is it, how relevant, explain to that kind of conversation this curiosity now that you guys have kind of created in global landscape, which is why? Well the, if we go back to the theory that it's not either or, and that hybrid is available to data architectures as the word as much as it is to cloud or to environments, the CIO of today faces ever increasing heterogeneity that they have to deal with. And it's the architecture that can be put in place that leverages cloud and on premise, leverages bespoke or specific tools and solutions as well as open source. It's those CIOs that manage to get to the perfect blend of all those technologies for their business that will really get the return. I really do believe it is not a, either or suckers choice anymore. It is a matter of using open source where we can benefit and using the tools and the solutions together with them to get to the outcome. Okay, great. Final comment I'll give you for the closing statement here is what does it mean for the people out there? This whole San Francisco vibe, Spark Summit, what's the big story going on at Spark Summit besides the IBM announcement? What's orbiting around this, this gravity that you guys have created? Well, having lived in San Francisco now for 20 odd years, showing my age here, I really feel it's a melting pot of artists and scientists. So the tech industry as well as all the creative people that live in San Francisco. So if anything, Spark bringing together the data engineers, the data scientists are going to bring together the art of storytelling with a science of data. And that's what it's called and that's why it's San Francisco. Yeah, and I tell you, it's a really unique place and it's booming and this is where the action is. Obviously Berkeley's right there. I think I stand for not the leaf stand from the shadow of Berkeley even though the Amplabs in Berkeley. Stanford's still important, don't you think? Yeah, it's only a short drive away. It's my town, Palo Alto. Harriet, thanks for coming back on theCUBE here, special presentation with IBM's community event. We're at Galvanize, which is a incubator, work shared workspace, a lot of startups here, a lot of developers, downtown San Francisco, we are here live for the Spark Summit. We'll be covering all day today and tomorrow this theCUBE. We'll be right back after this short break.