 From New York City, it's theCUBE. Covering IBM Data Science for All, brought to you by IBM. Welcome back to New York here on theCUBE. Along with Dave Vellante, I'm John Walls. We're at Data Science for All, IBM's two-day event. And we'll be here all day long, wrapping up again with that panel discussion from the four to five here Eastern time. So be sure to stick around all day here on theCUBE. Joining us now is Vikram Murali, who's a program director at IBM. Vikram, thank you for joining us here on theCUBE. Good to see you. Good to see you too, thanks for having me. You bet. So among your primary responsibilities, the Data Science experience. So first off, if you would share with our viewers a little bit about that, the primary mission. And you've had two fairly significant announcements. Updates, if you will, here over the past month or so. So share some information about that too, if you would. Sure. So my team, we build the Data Science experience and our goal is for us to enable data scientists in their path to gain insights into data using data science techniques, mission learning, the latest and greatest, open source especially, and be able to do collaboration with fellow data scientists, with data engineers, business analysts. And it's all about freedom. Giving freedom to data scientists to pick the tool of their choice and program and code in the language of their choice. So that's the mission of Data Science experience when we started this. The two releases that you mentioned that we had in the last 45 days, there was one in September and then there was one on October 30th. Both these releases are very significant in the mission learning space especially. We now support Cycletearn, XGBoost, TensorFlow, libraries in Data Science experience. We have deep integration with Horton Data Platform, which is, came out of our partnership with Hortonworks, something that we announced back in the summer. And this last release of Data Science experience two days back specifically can do authentication with Secure Knox with Hadoop. So now our Hadoop customers, our Horton Data Platform customers can leverage all the goodies that we have in Data Science experience. It's more deeply integrated with our Hadoop-based environments. A lot of people ask me, okay, when IBM announces a product like Data Science experience, you know, IBM has a lot of products in its portfolio. Are they just sort of cobbling together, you know, so these old things, older products and putting a sort of skin on them or are they developing it from scratch? How can you help us understand that? That's a great question and I hear that a lot from our customers as well. Data Science experience started off as a design first methodology and what I mean by that is we are using IBM design to lead the charge here along with product and development and we are actually talking to customers, to data scientists, to data engineers, to enterprises and we are trying to find out what problems that they have in Data Science today and how we can best address them. So it's not about taking older products and just re-skinning them but Data Science experience for example, it started off as a brand new product, completely new slate with completely new code. Now, IBM has done Data Science and Mission Learning for a very long time, we have a lot of assets like SPSS Modeler and stats and decision optimization and we are reinvesting in those products and we are investing in such a way and doing product refreshments in such a way not to make the old fit with the new but in a way where it fits into the realm of collaboration, how can data scientists leverage our existing products with open source and how we can do collaboration. So it's not just re-skinning but it's building ground up. So this is really important because they say architecturally it's built from the ground up because given enough time and enough money and smart people, you can make anything work. So the reason why this is important is you mentioned for instance, TensorFlow, you know that down the road there's going to be some other tooling, some other open source project that's going to take hold of your customers and say I want that, you've got to then integrate that or you have to choose whether or not to if it's a super heavy lift, you might not be able to do it or do it in time to hit the market. If you've architected your system to be able to accommodate that future proof is the term everybody uses. So have you done that? How have you done that? I'm sure APIs are involved, but maybe you could add some color. Sure, so we are our entire, so data science experience and machine learning, it is a microservices based architecture. So we are completely dockerized and we use Kubernetes under the covers for docker container orchestration. And all these are tools that are used in the valley across different companies and also in products across IBM as well. So some of these legacy products that you mentioned, we are actually using some of these newer methodologies to rearchitect them and we are dockerizing them and the microservice architecture actually helps us address issues that we have today as well as be open to development and taking newer methodologies and frameworks into consideration that may not exist today. So the microservices architecture, for example, TensorFlow is something that you brought in. So we can just pin up a docker container just for TensorFlow and attach it to our existing data science experience and it just works. Same thing with other frameworks like XGBoost and Keras and Cycotelearn, all these are frameworks and libraries that are coming up in open source within the last, I would say, a year, two years, three years timeframe. Previously, integrating them into our product would have been a nightmare. We would have had to rearchitect our product every time something came, but now with the microservice architecture, it is very easy for us to consume those. We were just talking to Daniel Hernandez a little bit about the Hortonworks relationship at a high level. One of the things that I've, I mean I've been following Hortonworks since day one when Yahoo kind of spun them out and know those guys pretty well and they always make a big deal out of, when they do partnerships, it's deep engineering integration. And so they're very proud of that. So I want to test that a little bit. Can you share with our audience the kind of integrations that you've done, what you brought to the table, what Hortonworks brought to the table? Yes, so data science experience today can work side by side with Horton Data Platform, HDP, and that we could have actually made that work about two, three months back. But as part of our partnership that was announced back in June, we set up joint engineering teams. We have multiple touch points every day. We call it co-development and they have put resources in, we have put resources in. And today, especially with the release that came out on October 30th, data science experience can authenticate using Secure Knox that I previously mentioned and that was a direct example of our partnership with Hortonworks. So that is phase one. Phase two and phase three is going to be deeper integration. So we are planning on making data science experience an embody management pack. So a Hortonworks customer, if you have HDP already installed, you don't have to install DSX separately, it's going to be a management pack, you just pin it up. And the third phase is going to be, we are going to be using Yarn for resource management. Yarn is very good at resource management and for infrastructure as a service, for data scientists, we can actually delegate that work to Yarn. So Hortonworks, they are putting resources in into Yarn, doubling down actually and they are making changes to Yarn where it will act as the resource manager, not only for the Hadoop and Spark workloads but also for data science workloads. So that is the level of deep engineering that we are engaged with Hortonworks. Yarn stands for yet another resource negotiator, there you go for... Thank you. The trivia of the day. Okay, but of course Hortonworks are big on committers, obviously a big committer to Yarn, probably wouldn't have Yarn without Hortonworks. So you mentioned that's kind of what they're bringing to the table. And you guys primarily are focused on the integration as well as some other IDM IPs? That is true as well as the Knox piece that I mentioned. We have a Knox committer, we have multiple Knox committers on our site and that helps us as well. So all the Knox is part of the HDP package. We need knowledge on our side to work with Hortonworks developers to make sure that we are contributing and making inroads into data science experience that way the integration becomes a lot more easier. And from an IBM IP perspective, so data science experience already comes with a lot of packages and libraries that are open source but IBM has worked on, IBM research has worked on a lot of these libraries. I'll give you a few examples, Brunel and Pixie Dust is something that our developers love. These are visualization libraries that were actually cooked up by IBM research and then open sourced. And these are pre-packaged into data science experience. So there is IBM IP involved and there are a lot of algorithms, mission learning algorithms that we put in there so that comes right out of the package. And you guys, the development teams are really both in the valley, is that right? Are you really distributed around the world? Yeah, so we are, the data science development team is in North America between the valley and Toronto. The Hortonworks team, they are situated about eight miles from where we are in the valley. So there's a lot of synergy, we work very closely with them and that's what we see in the product. I mean, what impact does that have? I mean, is it here today, oh yeah, we're a virtual organization, we have people all over the world, East in Europe, Brazil. How much of an impact is that to have people so physically proximate? I think it has a major impact. I mean, IBM is a global organization, so we do have teams around the world and we work very well with the invent of, you know, IP telephony and, you know, screen shares and so on, yes, we work. But it really helps being in the same time zone, especially working with a partner just eight miles or 10 miles away. We have a lot of interaction with them and that really helps. Body language. Yeah, you talked about problems, you talked about issues, you know, customers. What are they now? Before it was like, you know, first off, I want to get more data. Now they got more data. Is it figuring out what to do with it, finding it, having it available, having it accessible, making sense of it? I mean, what's the barrier right now? The barrier, I think for data scientists, the number one barrier continues to be data. There's a lot of data out there, a lot of data being generated and the data is dirty, it's not clean. So the number one problem that data scientists have is, how do I get to clean data? And how do I access data? There are so many data repositories, data lakes and data swamps out there. Data scientists, they don't want to be in the business of finding out how do I access data? They want to have instant access to data. And in- Well, if you would, when you say it's dirty, let me give you an example. So it's not structured data. So data scientists- Unstructured versus structured? Unstructured versus structured. And if you look at all the social media feeds that are being generated, the amount of data that is being generated, it's all unstructured data. So we need to clean up that data and the algorithms need structured data or data in a particular format. And data scientists don't want to spend too much time in cleaning up the data and access to data, as I mentioned. And that's where data science experience comes in. Out of the box, we have so many connectors available. It's very easy for customers to bring in their own connectors as well and you have instant access to data. And as part of our partnership with Heart and Works, you don't have to bring data into data science experience. The data is becoming so big, you want to leave it where it is, instead push analytics down to where it is. And you can do that. We can connect to RemoteSpark, we can push analytics down to RemoteSpark, all of that is possible today with data science experience. The second thing that I hear from data scientists is all the open source libraries. Every day there's a new one. It's a boon and a bane as well. And the problem with that is the open source community is very vibrant and there are a lot of data science competitions, mission learning competitions that are helping move this community forward and it's a good thing. The bad thing is data scientists like to work in silos on their laptop. How do you, from an enterprise perspective, how do you take that and how do you move it, scale it to an enterprise level? And that's where data science experience comes in because now we provide all the tools. The tools of your choice, open source art proprietary, you have it in here and you can easily collaborate, you can do all the work that you need with open source packages and libraries, bring your own and as well as collaborate with other data scientists in the enterprise. So you're talking about dirty, dirty data. I mean, with Hadoop and no schema on write, we kind of knew this problem was coming. So technology sort of got us into this problem. And can technology help us get out of it? I mean, from an architectural standpoint, when you think about dirty data, can you architect things in that help? Yes, so if you look at the mission learning pipeline, the pipeline starts with ingesting data and then cleansing or cleaning that data and then you go into creating a model, training, picking a classifier and so on. So we have tools built into data science experience and we are working on tools that will be coming out and down a roadmap, which will help data scientists do that themselves. I mean, they don't have to be really in depth coders or developers to do that. Python is very powerful. You can do a lot of data wrangling in Python itself. So we are enabling data scientists to do that within the platform, within data science experience. If I look at the sort of demographics of the development teams, we're talking about Hortonworks and you guys collaborating, what are they like? People picked your IBM, this 100 plus year old company and what's the persona of the developers in your team? The persona, I would say, we have a very young, agile development team and by that I mean, so we've had six releases this year in data science experience just for the on-premises side of the product and the cloud side of the product, it's continuous delivery. We have releases coming out faster than we can count and it's not just re-architecting it every time but it's about adding features, giving features that our customers are asking for and not making them wait for three months, six months, one year. So our releases are becoming a lot more frequent and customers are loving it and that is in part because of the team. The team is able to evolve. We are very agile and we have an awesome team. That's all, you know, it's an amazing team. But six releases in... Yes, we had a major release in April and since then we've had about five revisions of the release where we add a lot more features to our existing releases, a lot more packages, libraries, functionality and so on. So you know what monster you're creating now, don't you? I mean, you know. I know, we are setting expectations. Because you still have two months left in 2017. We do. They're not mainframe release cycles. They're not, they're not. And that's the advantage of the microservice architecture. I mean, when you upgrade, a customer upgrades, right? They don't have to bring that entire system down to upgrade. You can target one particular part, one particular microservice, you componentize it and just upgrade that particular microservice. It's become very simple. Sure, sure. Well, some of those microservices aren't so micro. They're not, yeah. So it's a balance, it's a balance you have to keep. Making sure that you componentize it in such a way that when you're doing an upgrade, it affects just one small piece of it and you don't have to take everything down. So yeah, I agree with you. Well, it's been a busy year for you to say the least and I'm sure 2017, 2018 is not going to slow down. So continue success. Thank you. Wish you well with that. Vikram, thanks for being with us here on theCUBE. Thank you. Thanks for having me. You bet. Back with data science for all here in New York City, IBM coming up here on theCUBE, right after this. You guys are clear. Thank you. That was great.