 From New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Jeff Frick. Welcome back to New York City, everybody. This is theCUBE, the worldwide leader in live tech coverage. Jeff Frick and I are pleased to have Dinesh Nirmal on. He's the Vice President of Development for Next Generation Applications and for Analytics Spark and Hadoop at IBM. Dinesh, welcome to theCUBE. Thank you. Big week for you. You gave birth last night, congratulations. Dataworks, it's here. How do you feel? Great, wonderful. I mean, it has been a project in progress. And like you said, we have given birth and reception has been tremendous, wonderful. So we have already about 10,000 users up and running on Dataworks and the feedback has been, like I said, nothing far short of tremendous. So take us back to when the project started. When did it start and how long did it take and we'll get into it. So we started, so to go back, right, we started the whole Data Science Experience project early this year and the plan was that how do we, so there's different personas if you look at it, there is the Data Science persona, there's the Data Engineer persona, there's the CDO or the Chief Data Officer, and then obviously there's the Business Analyst or the Citizen Analyst. So how do we make sure that when you take every single persona out there, how do we make sure that we build a platform that solves the challenges that each personas face? So that's how the Data Science Experience was born. And today if you look at it as a data scientist, you can bring in the data, you can ingest the data and then once the data is cleansed or transformed by the Data Engineer, the Data Science can go build the model, deploy the model, score real time, and then as a Citizen Analyst or as a Business Analyst, you can visualize the data using bot's analytics. So you have a complete solution in that platform and it has been about an eight months work in progress and like you said, yesterday we gave birth and the feedback was awesome. Relatively short timeframe. So what was the problem you were trying to solve? Was it to package all this complexity and make it simple? And does people complain frequently about the challenges of big data analytics and do et cetera in terms of the difficulty in actually getting a project up and running and getting value out of it? So three things, right? The first thing is that collaboration. How do we make sure that all these personas can collaborate between each other? The second piece, like you mentioned, simplicity. How do we make this so simple that someone who doesn't know anything about Data Science can also come using? The three is convergence. So if you look at IBM, right? We had a lot of platforms, we had a lot of products. How do we converge it into one and have spark as the execution engine behind it? Now if I take an example, machine learning, right? If you take just that piece, we have been doing the predictive analytics for a very long time and we have SPSS in our portfolio, which has SPSS algorithms and so we have been doing predictive and even if you look at a prescriptive, right? We have iLog in our portfolio, which does the prescriptive. But now we have taken it to a step further whereby which you are a data scientist, you can not only use the SPSS algorithm that we provide, but you can use R, you can use Python, and you can use Spark ML, right? So you have a variety of choices as a data scientist go use the different algorithms and the different execution engines that we provide. So just before this meeting, I was with a customer, a retail customer at lunch. Their challenge is that the data is coming in real time. So how do we make sure there's a deployment option whether it's scheduled like a batch or it's real time or it's streaming? So when we look at our data science experience plan, as a user, as a data scientist, you get all three choices. You can deploy it real time, you can deploy it batch, or you can deploy it on streams. So those are the kind of experiences or the functions that will differentiate as in the marketplace and customers really like it and they love it. So I was curious with the different personas and trying to build the experience across the personas. Any surprises on what some of those personas are actually doing relative to where you thought were kind of the lines of tasks and activities? Right, just a good point. So if you look at, you know, we say data scientist, but in a lot of shops what we see is that a data scientist also wears multiple hats. So for example, you know, the data scientist could be the data engineer who also brings in the data, shapes the data, transforms the data, cleanses the data, and build the model. So we have seen, although we say different personas, we have seen in some shops it's the same persona that's doing multiple tasks. So while not surprising, but that has been the case that we are seeing more and more that the same person or the same persona is doing multiple. Now if you look at the CDO persona, the chief data officer, that seems to be unique because once you build that data lake or the data hub, or you want somebody to set the policies and rules and all those things. So that seems to be a persona that's unique. So, you're like a chef, application developer, right? And you have all these ingredients, some of which, many of which were part of the existing IBM portfolio, some of which no doubt you had to develop organically. Take us through the stack and take us through the components of data works, right? So as you probably know, IBM acquired the weather company assets this year and that gave us an ingest mechanism which is like IOT kind of data that can really fast, right, lightning fast. So the ingest we have that for streaming, we have spark streaming, we also have infosphere streams. So we can bring the data either as a batch or streaming real time. So when you say the components, how do we ingest the data is the first component, right? I mean, how do we make sure that we can bring the data in? The second piece is that once you bring in the data, a lot of times the data is dirty. So we need to cleanse it or you need to shape it. So that's the next piece. So we have forge and keystone which is our two shaper and the canvas piece that's there. And once you have that data, you obviously as a data scientist, the next job is to how do you build a model based on that data? How do you build a pipeline or how do you build? So for that, you can use our canvas to build a pipeline or a model. And then once you build a model, obviously you want to evaluate it, you want to train it and you want to predict it in your test system and make sure that that's scoring to the level what you want. So for that, that's where our MLAs comes in. You have the model built and now you want to deploy the model and at that point we give you choices. Like I said, you can deploy it real time batch or streaming. And then we also have a mechanism whereby which today, once you build the model, a lot of time the data is coming in and the model degrades over time because the data is also changing. So you take the model offline, retrain it. And that could be a cumbersome process because you're taking the model offline, then you have to go test it again, retrain it, put it back. What we are doing is that we are doing a feedback loop whereby which real time you can train the data. So the model doesn't need to come offline pretty much ever which is a huge, huge benefit for customers. So the model is getting real time trained and then once the model is running, you can monitor it. So today, how do you monitor to see how well your models are doing? You want to, if you have thousands of models, which version of the model is doing? So you want some model management so that's also built in our MLIS, monitoring is built in, real time feedback loop is built in. And then if you go one more level, so let's say I am a statistician or data scientist who really doesn't know much about algorithms and all that. So we have something built into our MLIS which is called CAD, which is cognitive assistant for data scientist. What it does is like you throw a data set at it, it will pick you the best algorithms so which makes your job a lot easier that now you have, you don't have to know about case or you don't have to know about random forest or any of those things. Because- How good to pick the algorithm. Right, and it will give you the score that it sees. So me as a, let's say as a real estate agent, I have five years worth of data from a particular zip code but I don't know how to build a model. I don't know what algorithms to pick. All I have to do is like bring the data in, throw it at IBM MLIS and hit one click deploy and underneath the cover we use CADs to pick the right algorithms. We will provide the score and say here it is, you have the model built. So that's the, that's the, building the model deployment of the model. And the last piece in this whole stack is visualization, right? So how do we visualize it or how do we do the recording? And that's where the Watson analytics piece comes in. So now as a citizen analyst or a business analyst, you want to visualize the data. You just, you know, click visualize and we will push it to Watson analytics. And the good thing about Watson analytics is that it's NLP based. So you could just, so for example, you could just say, show me or tell me more about energy consumption. And it will tell you, okay, square footage of the house, you know, how tall or how many pluggable devices are there. So it will give you the parameters that you want to use and you could use that also to come back and enhance your model. So end to end, we give a really good story that today nobody else has. Yeah, we saw the demo last night. It was, you know, the use case was I'm going to go on a camping trip. I'm going to go to Acadia. This is my family. This is when I'm going to go, et cetera, et cetera. And it showed me very fast. Exactly. These are the packages, the tents. It made an offer for me, 10% off this little, you know, gas cooker or whatever it was, super fast. I mean, if that wasn't just a, you know, canned demo. That was real. That was lightning. I've never seen anything like that. And it was many examples as consumers. And you see the spinning logo. But so. No, it was real time and we have prepared for it and we have done enough testing, but yeah, it was real time. So, and that's the kind of speed you're getting the performance, right? That you're. It was awesome. I'm sorry, we're out of time, but that was great. Thank you for taking us through those five major components and the machine learning piece. There's a lot there. And so it sounds like you had to not only stitch those together, but also develop a friendly user interface and interaction and experience that. Right. So like I said, the three things, right? I mean, how do we make it simple? The simplicity piece is one. How do we make sure our platform is available for collaboration, right? Between all these personas, which is another one. And the last piece is convergence. How do we make sure this all converges together into one? So it's very clear to the end user what they need to come into use. Exciting times for IBM. I tweeted out last night, IBM really got to focus on simplicity and shift its model from just being services led to one that's really software led. And that's exactly what you guys are doing here. So congratulations on getting the product out and we'll be watching really excited for you guys. Well, thank you so much. Appreciate the time. You're welcome. All right, keep it right there. But we'll be back with our next guest right after the short break.