 It's The Cube. Here is your host, Jeff Frick. Hi, Jeff Frick here with The Cube. We are on the ground in downtown San Francisco at the Spark Technology Center at 425 Market. I'm really excited to be joining this next segment by our guest, Rob Thomas, who's a VP Product Development at IBM Analytics, welcome Rob. Thanks Jeff, great to be here. Thanks for joining us. Absolutely, so last time we saw you personally, I think was at Big Data SV in San Jose around the Big Data Week and Strata. So a lot's been going on there and then we had Hadoop Summit and then we had Spark Summit and we had the IBM Apache Spark event at Galvanize. So you guys have been busy. We have been, it's been a crazy time. Yeah, I think we sat down right before Hadoop Summit. We talked a lot about how Hadoop is changing the landscape and enterprises and lots happened since then. We had some big announcements during the week of Spark Summit and I thought today we could talk about some of the updates, what we've done since then but we're really moving faster. Yeah, absolutely, well George Gilbert, our analyst from Wikibon says this is the biggest change in software in 50 years, really moving from systems of engagement or systems of record to system of engagement and really moving closer to real time and as much as we've seen with Hadoop, it seemed to really get ignited with all the activity around Spark. Yeah, absolutely, I mean, we view Spark as the analytics operating system. So we've had all these things that we've wanted to do with data forever but we were either limited by access to different sources or we were limited by the speed of how you could access it or how you could turn it into something actionable. Spark is the answer to that. That's why we call it the analytics OS and we think that clients, that message is starting to resonate as clients, think about how they can use Spark in their environments. So why don't you give us a couple updates? What's happened in the last couple of months since all the announcements? Yeah, first, let me tell you a story. I want to tell you about a guy named Fred Rice and he's one of the first employees here at the Spark Technology Center and Fred was long time at Berkeley, did his PhD at Berkeley and he started working on, as he came into IBM, who became part of our Spark Technology Center, the backend of Spark, how we optimize that around what we're doing with System ML which is the technology that we're contributing and when you sit down with Fred, it's amazing the passion that he has of, Spark is not about solving just a few problems for large enterprises. This is about how we change how anybody can leverage data at any scale and it's amazing to see guys like Fred here at the Spark Technology Center and he's represented the kind of people that we're bringing in that see this not as an opportunity just to go support big enterprises but this is how do we bring Spark into the mainstream really to become the backbone of data and how we're using them. Let's dig down into the developments in our little bit which is why we wanted to come up here and follow up from Spark Summit. We're here in downtown services, go right on Market Street, obviously IBM's a big company, been around for a long time, you got facilities all over the place. Why the Technology Center here? You have to be in San Francisco. That was actually a fairly easy decision because we wanted to be at the epicenter of where Spark is taking off in the market plus you got obviously a number of great clients and innovative companies in the area. So it's a great place to bring in a team of people to work. We've got IBM Research in Albanyn, we've got our lab in Silicon Valley as well so you've got this great corpus of people who are kind of converging on what we think is a defining technology for the next 10 or 20 years. So that's why we're here and that's paid off. We've got, as you saw downstairs, we've got 20 students that are here with us this summer from Lagos in Africa who are working on Spark to solve traffic problems in Lagos. So we've got all these innovative projects going on with students, with PhDs, with people that we're hiring and it's a unique time for us here in IBM. Yeah, we couldn't film downstairs because the room was so packed, there's stuff all over the place and all the development boards and post-it notes and there's a lot going on and then there's a workshop going on too and we checked in downstairs, a lot of people coming in for a workshop so you're really making the most of this location. We are, we're trying to do, so we do all the engineering here, we also do design here and if you remember as part of our Spark announcement this is about where data meets design as well which is why you saw a lot of the graphics and that stuff and we also do a lot of briefings here and we're starting to host community events here as well. We really want this to be a place where anybody can come that's interested in Spark. If you want to work here, that's great but if you just want to come work with us in the community, this center is all about open source for us. Our only focus here is on open source and what we're doing in the community so we're welcoming anybody that wants to participate in that way. So that's a great segue into open source. The open source movement continues to grow and we're actually going to be at LinuxCon in a couple of weeks and Linux was probably the first kind of enterprise adopted open source project but now we see it obviously with a dupe, you see it with Spark. There's so many open source projects out there. Talk a little bit about IBM's desire to work with open source, how you guys have adopted open source not only for building your own things but really in supporting customers engagement with open source types of based products. Look, we have got to be a key part of what's happening in open source which is why we're doing what we're doing here. It's why we did what we did with Linux a number of years back and really kind of led the charge there because at the time it was about how can we build a core technology and knowledge around Linux but then use that to help our clients solve problems. We think we're at a similar juncture but now on the data side and that's the interest in the community. So one of the big announcements that we made back in June was around contributing our system ML technology into the open source and we've been working closely with our partner Databricks on that and they're helping us through that. I can announce today that we're actually going to put that technology up in GitHub and it'll be by the end of August. So anybody in the community can go out there and play with it. We've been taking a couple of months just to get it right to be in the community but we're going to put up in GitHub end of August. Anybody can go out there and use it and then we'll continue to work in the community with companies and partners like Databricks to continue to move that. I think ultimately our vision is that it starts to merge with ML Lib and what you have then is you've got a tremendous programming model with ML Lib combined with an optimizer and an engine of system ML. That is going to change the face of machine learning and enterprises and that's where we want to get to. And then that begs the question and we've got all these cool posters behind us of different solution set based on industry and you guys again have been around for a long time. So obviously you're probably building industry based solutions on top of these open source tools that leverage your knowledge and well as experience in these particular industries to come up with solutions as opposed to just software. Yeah, it's ultimately for the clients we serve it is about solutions and business impact. I'll give you one example. We've done work for the last few years with nice systems that we've talked about before. They are a software company focused on call centers. So how they make call centers and customer interaction more effective. The first project we did with them was actually moving them off of a traditional database onto a Hadoop platform with the IBM open platform with big insights. And you look at what they're doing in terms of how they automate and make call centers more intelligent. And then you bring something like Spark. It totally changes the game for customer service interaction. Suddenly they can access all the data, real-time insights. So you're talking about you're on the phone with somebody and you're making a real-time decision in the moment in terms of what you should do next with that client. That is how Spark really changes the game because before they had a great solution. Now they've taken that solution and they're delivering huge business impact. So when you asked about solutions that's the kind of thing that comes to mind for me for how Spark drives a different level of solution in the enterprise. Are there any kind of cutting edge use cases that you see out here of clients? Because some part of the problem of being in Silicon Valley is we're so into it that we forget that there's other parts of the world and sometimes we're a little bit too far ahead maybe we're not as far ahead as we think. But what are some of the use cases that you see that customers are putting in place beyond the one you just mentioned that they see the benefit. This is something we really want to get behind because we know there's huge business benefit. So the first one is certainly what I mentioned around entity knowing your customer 360 degree view. I'd say that's probably the major one. Second one is around fraud and how you look at do you know who your business partners are, your customers are. It's this whole idea where security meets trust and we see a lot of those. We're doing some work with Ernst and Young that we talked about publicly where we're helping them through their FIDs practice, their financial investigations practice, look at broad data sets and identifying patterns of fraud or suspicious behavior. And it requires I'd say the flexibility of a Hadoop type system. But again, you bring Spark, you bring machine learning to an environment like that and suddenly you're able to automate what used to be very manual. And so it starts to change the outcome around fraud as an example. What's interesting, we were talking a little bit before we came on air and you said that there's still some customers out there that just don't get it. They don't see the benefit of streaming data. They're just kind of locked into their old paradigm. And I hate to overuse Uber, but it's such a powerful example that if somebody can digitally transform the taxi industry, which none of us thought was possible before that happened. How do practitioners kind of start to think in a new way of there is value, there is new applications, there is new ways to approach things that this enables that they just couldn't do before. You know, it's been a big surprise to us. Streaming has been slower to take off than we may have assumed. Early adopters were things like telcos who said call data records has to be done, analytics is done with streaming or oil and gas companies doing exploration. Those were logical use cases. But when you start to get to retailer, bank, insurance company, it's funny. I mean, honestly, their first reaction is streaming doesn't apply to us. And so what it takes is a lot of coaching and working with them to say, well, you get these insights today. What if you continuously got these insights? What if you weren't relying on somebody to go bring a different data set? So to us, it's really the consultative process to show them that there are people doing this in industries and a way that they can lead their industry is going to be the first to figure that out for their industry. And people are starting to get there, but there's still a ton of ways to go there. And the other piece is leveraging your proprietary data that you have that no one else has, but then contextualizing it with new third party data and structured data. So are you seeing the uptake and the customers figuring out and maybe that's where they get started that why don't we maybe integrate public weather data or we integrate Twitter data or we integrate some other type of data to start to see, let me get out of my silo with what I've been doing forever on a batch on my own proprietary transactional stuff. Yeah, what's interesting is the IT industry, it's funny how these things evolve has always viewed structured data here, unstructured data here. That is not the future. And we've started envisioning this idea around an X generation platform that accesses unstructured structured data together at the same time using Spark as a basis. And it really kind of breaks down, I'd say the traditional silos for how data has been managed in organization. And I think as companies start to see that and apply those use cases, they'll realize that it's a lot more powerful than some of those traditional approaches because unstructured data right now, I mean still, my estimate 90% is unused in an organization from an analytics perspective. But when you start to integrate that into their normal business processes, you can change that. Yeah, well the other interesting thing David Floyd talked about a lot from Wikibon is almost like a nuclear half-life in terms of the value of data. That when it's live, when it's real, you've got X value and it falls off precipitously over time. And again, if they're not even grabbing it real time and they're just getting in the batch that they're missing out on potentially tremendous value in that data that they never even use it during that timeframe. You're right. And I'd say the one challenge in organization that kind of underpins that point is a skills challenge. Think about how much technologies changed in the last five years. If the skills haven't changed that fast in an organization, then the organization struggles to adopt these types of use cases or to come up with them. So the other part of our commitment back in June was we said, we're going to educate one million data scientists. And our view is by doing that part in education, we can help organizations be more aggressive about how they think about use cases and then apply them. We did a hackathon and education session through Big Data University in Brazil last week. Tremendous turnout. I mean, companies, employees and companies are dying for this education. Key thing is how do we get it in their hands and our Spark Fundamentals course on Big Data University has taken off. And it's because people are dying for this. And I think the use cases back to your question a moment ago will be more obvious as the skills get up to date with what's happening in the market and in the open source community. And you're working both sides of the equation because you're not only working the data science side, you're also trying to get the lines of business people much more engaged and active in using this data in running their businesses and kind of get it out of the data science side a little bit. Absolutely. And I think one thing that the line of business has always been gated by is what they believe is possible based on what IT has told them, which is sometimes somewhat limited. And so we are encouraging the line of business to be more aggressive. Think about a streaming use case and how that would apply to you. Think about if unstructured and structured data was all together in one place and you could seamlessly access that data. And if you do that with a line of business person who understands that data is important and there's data assets that can be tapped into, you can start to have that impact. But yeah, we're focused on data scientist, line of business, data engineer, application developers. They all play an important role in delivering these use cases and we got to be a part of each of those. So running out of time and I appreciate you taking some time out of your day I know you're traveling all over the place. It's been a busy six months since we first were at IBM System Z event which was funded. Mainframe also has a play here. What are we going to be talking about a year from now? Machine learning is still in its infancy and it's one reason why we're so interested in the technology that we contributed. As you and I have talked about before, machine learning is the only competitive mode left in business. Economies of scale is no longer there in many cases. Distribution is no longer an advantage. It's about machine learning. So we do partnerships with people like the weather company and Twitter. Access to those data sets applied with machine learning on top that can change the face of a business. And we do that with the own data that we collect in IBM and you mentioned like the mainframe event. There's some interesting things in Spark in the mainframe happening right now and I think you'll start to see technologies like the mainframe that are mission critical will evolve into the Spark world and deliver a different type of insight. Rob, thanks for taking a few minutes of your day. Appreciate it. Yeah, Jeff, great to see you. Absolutely, so I'm with Rob Thomas. I'm Jeff Frick, you're watching theCUBE and we are at the IBM Spark Technology Center. Got it. Thanks for watching.