 Live from Boston, Massachusetts. It's theCUBE, covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts. Dave Vellante and George Gilbert. Welcome back to theCUBE, everybody. We're here in Boston. theCUBE is the worldwide leader in live tech coverage. And this is Spark Summit, hashtag Spark Summit. And Robbie Strickland is here. He's the vice president of engines and pipelines. I love that title for the Watson data platform at IBM Analytics, formerly with the weather company that was acquired by IBM. Welcome to theCUBE. Good to see you. It's always by standing tongue in cheek line as the industry's changing. Dell buys EMC, IBM buys the weather company. Wow, that sort of says it all, right? But it was kind of this really interesting blockbuster acquisition. Great for the folks at the weather company. Great for IBM. So give us the update. Where are we at today? So it's been an interesting first year. You know, actually we just hit our first anniversary of the acquisition and a lot has changed. You know, part of my role, new role at IBM having come from the weather company is a byproduct of the two companies bringing our best, excuse me, our best analytics work and kind of pulling those together. I don't know if we have some water, but that would be great. So, excuse me. That's all, let me chat for a bit. Thanks. Feel free to clear your throat. So we were at IBM, the conference at the time was called IBM Insight. It was the day before the acquisition was announced and we had David Kenney on it. David Kenney was the CEO of the weather company. And I remember, we were talking, I was like, wow, you got such an interesting business model and off camera I was like, what do you want to do with this company? You guys like prime, you going public? You going to sell this thing? I know you have an M&A background and he goes, oh yeah, we're having fun. Next day was the announcement that I read about the weather company. I saw him and I'm like, ha ha. And now he's the leader of the Watson Group. That's right. That's part of our, the weather company joined the Watson Group. And the cloud and analytics groups have come together as in a recognition that analytics and clouds are peanut butter and jelly. That's absolutely right. And David's running that organization, right? So. That is absolutely right. So it's been an exciting year. It's been an interesting year, a lot of challenges, but I think where we are now with the Watson data platform is a real recognition that this is the use case where we want to try to make data and analytics and machine learning and operationalizing all of those. You know, that that's not easy for people and we need to make that easy. And our experience doing that at the weather company and all the challenges we ran into have informed the organization, have informed the roadmap and the technologies that we're using to kind of move forward on that path. And the Watson data platform was announced in I believe October. You guys had a big announcement in New York City and you took many sort of components that were viewed as individual discrete functions and brought them together in a single data pipeline. Is that right? That's right. So maybe describe that a little bit for our audience. So the vision is, one of the things that's missing in the market today is the ability to easily grab data from some source whether it's a database or a Kafka stream or some sort of streaming data feed, which is actually something that's often overlooked. Usually you have platforms that are oriented around streaming data feeds or oriented around data at rest batch data. One of the things we really wanted to do was sort of combine those two together because we think that's really important. So to be able to easily acquire data at scale, bring it into a platform, orchestrate complex workflows around that. So with the objective of course of data enrichment, ultimately what you want to be able to do is take those raw signals, whatever they are, and turn that into some sort of enriched data for your organization. And so for example, we may take signals in from a mobile app, things like beacons, usage beacons on a mobile app and turn that into a recommendation engine so we can feed real time content decisions back into a mobile platform. Well that's really hard right now. It requires lots of custom development. It requires you to essentially stitch together your pipeline end to end. It might involve a machine learning pipeline that runs a training pipeline. It might involve, it's all batch oriented. So you land your data somewhere, you run this machine learning pipeline maybe in Spark or Hadoop or whatever you've got. And then the results of that get fed back into some data store that gets merged with your online application. And then you need to have a restful API or something for your application to consume that and make decisions. So our objective was to take all of the manual work of standing up those individual pieces and build a platform where that is just, that's what it's designed to do. It's designed to orchestrate those multiple combination of real time and batch flows. And then with a click of a button and a few configuration options stand up a restful service on top of whatever the results are. Either at an interim stage or at the end of the line. And you guys gave an example. You actually showed a demo at the announcement. I think it was a retail example and you showed a lot of what would traditionally be batch processing and real time a recommendation came up and completed the purchase and the inference was this is an out of the box software solution. And that's really what you're saying you've developed. It's not, I mean, a lot of people would say, oh, it's IBM that cobbled together a bunch of their old products, stuck it together, put an abstraction layer on and wrapped a bunch of services around it. I mean, I'm hearing- That's exactly right. It's just web-sphere, just web-sphere repackage. No, it's not that. So one of the things that we're trying to do is if you look at our cloud strategy, I mean, this is really part and parcel. I mean, it's the nexus of our cloud strategy is the Watson data platform. And what we could have done is we could have said, let's build a fantastic cloud and compete with Amazon or Google or Microsoft. But what we realized is that there is a certain niche there that of people who want to take individual services and compose them together and build an application, mostly on top of just raw VMs with some additional, let's stitch together something with Lambda or stitch together something with SQS or whatever it may be. Our objective was to sort of elevate that a bit, not try to compete on that level and say, how do we bring enterprise-grade capabilities to that space? Enterprise-grade data management capabilities, end-to-end application development, machine learning as a first-class citizen in a cohesive experience so that the collaboration is key. We want to be able to collaborate with business users, data scientists, data engineers, developers, API developers, the consumers of the end results of that, whether they be mobile developers or whatever. One of the things that is sort of key, I think, to the vision is that these roles that we've traditionally looked at, if you look at the way the tool sets are built, they're very targeted to specific roles. The data engineer has a tool. The data scientist has a tool. And what's been the difficult part is the boundaries between those have been very firm and the collaboration has been difficult. And so we draw the personas as a Venn diagram because it's very difficult, especially if you look at a smaller company and even sometimes larger companies, the data engineer is the data scientist. The developer who builds the mobile application is the data scientist. You just, and in some larger organizations, you have very large teams of data scientists that have these artificial barriers between the data scientist and the data engineer. So how do we solve both cases? And I think the answer was, for us, a platform that allows for seamless collaboration where there is not these clean lines between the personas, that the tool sets easily move from one to the other. And if you're one of those hybrid people that works across lines, that the tool feels like it's one tool to you. But if you're two different teams working together, that you can easily hand off. And so that was one of the key objectives we're trying to do. Definitely an innovative component of the announcement, for sure, go ahead, George, that's right. So help us sort of bracket how mature this end-to-end tool suite is in terms of how much of the pipeline it addresses from the data origin all the way to a trained model and deploying that model, sort of what's there now, what's left to do. So there are a few things we've brought to market. The probably the most significant is the data science experience. The data science experience is oriented around data science and has, as it's sort of central interface, Jupyter notebooks, as well as, we brought in RStudio and those sorts of things. The idea there being that we'll start with the collaboration around data scientists. So data scientists can use their language of choice, collaborate around data sets, save out the results of their work and have it consumed either publicly by some other group of data scientists. But the collaboration among data scientists, that was sort of step one. There's a lot of work going on that's sort of ongoing, not ready to bring to market around how do we simplify machine learning pipelines specifically? How do we bring governance and lineage and catalog services and those sorts of things. And then the ingest, one of the things we're working on that we have brought to market is our product called Lyft, which connects as well. And that's bringing large amounts of data easily into the platform. We have, there are a few components that have sort of been brought to market. DashDB, of course, is a key source of data, CloudInt. So one of the things that we're working on is some of these existing technologies that actually really play well into the ecosystem, trying to tie them well together, and then add the additional glue pieces. And some of your information management and governance components as well. Now maybe that is a little bit more legacy, but they're proven. And I don't know if the exits and entries into those systems are as open, I don't know, but there's some capabilities there. Speaking of openness, that's actually a great point. So if you look at the IIG suite. It's a great on-premise suite. And one of the challenges that we've had with in sort of past IBM Cloud offerings is a lot of what has been the MO in the past is take a great on-prem solution and just try to stand it up as a service in the Cloud. Which in some cases has been successful in other cases less so. One of the things we're trying to look at with this platform is, how do we leverage a open source so that whatever you may already be running open source on-prem or in some other provider, that it's very easy to move your workloads. So we want to be able to say if you've got 10,000 lines of fraud detection code in Hadoop MapReduce, you don't need to rewrite that in anything. You can just move it. And the other thing is where our existing legacy tech doesn't necessarily translate well to the Cloud. Our first strategy is find, see if there's any traction around an existing open source project that satisfies that need and try to see if we can build on that. Where there's not, we go Cloud first and we build something that's tailor made to the Cloud. So who's the first one or two customers for this platform? Is it like IBM Global Business Services where they're building the semi-custom industry apps? Or is it the very, very big and sophisticated like banks and telcos who are doing the same? Or have you gotten to the point where you can push it out to a much wider audience? So that's a great question. And it's actually one that is a source of lots of conversation internally for us. If you look at where the data science experience is right now, it's a lot of individual data scientists, small companies, those sorts of things coming together. And a lot of that is because some of the sophistication that we expect for enterprise customers is not quite there yet. So we wouldn't expect enterprise customers to necessarily be onboarded as quickly at the moment. But if we look at sort of the, so I guess there's maybe a medium term answer and a long term answer. I think the long term answer is definitely the enterprise customers leveraging IBM's huge entry point into all of those customers today. There's definitely a play to be made there. And one of the things that we're differentiating, we think over an AWS or Google is that we're trying to answer that use case in a way that they really aren't even trying to answer it right now. And so that's one thing. The other is going beta with a launch customer that's a healthcare provider or a bank where they have all sorts of regulatory requirements, that's more complicated. And so we are looking at, in some cases, we're looking at those banks or healthcare providers and trying to carve off a small niche use case that doesn't actually fall into the category of all of those regulatory requirements so that we can get our feet wet, get some, you know, the tires kicked, those sorts of things. And in some cases, we're looking for sort of less traditional enterprise customers to try to launch with. So that's an active area of discussion. And one of the other key ones is the weather company, trying to take the weather company workloads and move the weather company workloads. I wanted to come back to the weather company. When you did that deal, I was talking to one of your executives and he said, why do you think we did the deal? I said, we got 1,500 data scientists, you get all this data, it's the future. He goes, yeah. And it's also going to be a platform for IoT, for IBM. And I was like, hmm, I get the IoT piece. How does it become a platform for IBM's IoT strategy? Can you, is that really the case? Is that transpiring and how so? So it's interesting because that was definitely one of the key tenets behind the acquisition. And what we've been working on so hard over the last year, as I'm sure you know, sometimes boxes and arrows on an architecture diagram and reality are more challenging. Don't do that. And so what we've had to do is reconcile a lot of what we built at the weather company, existing IBM tech and the new things that we're in flight and try to figure out how can we fit all those pieces together. And so it's been complicated but also good because there are, in some cases, it's just people and expertise and bringing those people and expertise and leaving some of the software behind. In other cases, it's actually bringing software. So the story is obviously where the rubber meets the road more complicated than what it sounds like in the press release. But the reality is we've combined those teams and they are all moving in the same direction together with various bits and pieces from the different teams. Okay, so this vision and then a roadmap to execute on that and it's going to unfold over several years. That's right. Okay, good. Stuff at the event here. I mean, what are you seeing? What's hot? What's going on with Spark? I think one of the interesting things about going on with Spark right now is a lot of the optimizations, especially things around GPUs and that. And we're pretty excited about that being a hardware manufacturer. That's something that is interesting to us. We run our own cloud. We can, where some people may not be able to immediately leverage those capabilities, we're pretty excited about that. And also we're looking at some of the, taking Spark and running it on power and those sorts of things to try to leverage the hardware improvements. So that's one of the things we're doing. All right, we have to leave it there, Robbie. Thanks very much for coming on theCUBE. Really appreciate it. Thank you. You're welcome. All right, keep it right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from Spark Summit East. Hashtag Spark Summit, right back. Since the dawn of the cloud, the Cube.