 Live from New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now here are your hosts, Dave Vellante and George Gilbert. Welcome back to New York City everybody. This is theCUBE, the worldwide leader in live tech coverage. We're here as part of Strata plus a dupe world. This is what we call Big Data NYC. This is our seventh year of covering this show in New York. We're seeing Hadoop evolve beyond, it's not the shiny new toy anymore. Riddika Gunnar is here. She's the vice president of data and analytics at IBM. Good to see you again. Good to see you too. So yeah, big week for IBM. You guys got a big announcement today, this evening. You got a customer event. Must be excited. It's absolutely exciting. We believe that our organizations and clients are using data in so many new ways and they need new ways to put data to work. And so tonight's launch is all about how organizations, how clients, how everyone in society puts data first and how they're building their businesses and the outcomes for their organizations. So you're calling this data works, is that right? Is that the working data? Yeah, so we're gonna announce tonight. I always get to tell you guys before it actually happens. We're gonna announce tonight a IBM Watson data works platform. And the core capabilities of this platform is the ability to be able to access all data and to be able to drive AI-powered decision-making across all organizations. Pretty phenomenal and foundational. So when you think, like we've been, like I said, covering Hadoop world since the early days, 2009, 2010. And IBM never really, I mean you obviously participated in Hadoop but you weren't rushing out to do a Hadoop distro and yeah, you had your own little deal there but you really thought about, all right, where is the market going? How do we add value beyond all this sort of plumbing? Not that you don't do plumbing, you do it okay but it's really that cognitive business that you've been creating. So talk about that a little bit. Where did the vision come from, the strategy? It's starting to all come together now. You know, we see and realized major shifts that were happening in the environment today with the numerous different clients and organizations that we talked to. There's a substantial shift that's happening in terms of how data and analytics is treated within organizations and we intend to lead that shift. Fundamentally, what we saw is it's no longer about the persistent stores. It's no longer about data in Hadoop, data in an operational database, even data in real-time streaming. What's even more important is how that data is accessed with trust and how organizations actually use that data. What's also important is to be able to allow users to be able to leverage machine learning and cognitive capabilities to be able to automatically recommend for them where they should be starting from to gain the most types of insights in their organization and effectively start to help organizations future-proof their investments by leveraging open standards and open source capabilities. One of the complaints that you often hear at events like this, particularly around the complexity of the Hadoop ecosystem, is that complexity? Does this simplify it or am I adding now more tools that add to that complexity? Well, I think there's simplicity in the capabilities that we're able to provide in the platform, especially in how users interact with it. We have user-specific tooling that is fit for purpose for each individual user to interact with the platform, given the skill set that they're most comfortable with and allow that collaboration across the different teams. So that simplicity is absolutely there, but it's an equal balance of that simplicity and that choice and flexibility which we know today's organizations want. Today's organizations want the most innovative set of technologies that are out there in the market. And so having that open ecosystem and an open platform was extremely imperative to the platform that we're announcing tonight. So, Ritika, it sounds like most of the ecosystem still stuck in, well, my persistent store is better than yours and I've got some amount of polyglot and you're reframing the debate. And talk about the life cycle of the data from perhaps the ingest to the semi-organization of a data lake and then the analytics that take place after that and how you are adding value today with AI or cognitive capabilities and what we might expect going forward. That's a great question. We intend to infuse analytics and cognitive capabilities almost at every point imaginable in the platform. And for example, at the point of ingest, we have capabilities that automatically classify the data types of the data that you have, the relationships between that data and how to catalog that data in the metadata store. And then as we access that data, you can actually perform deeper machine learning type of characteristics on that data. You can figure out what type of models that you wanna be able to create from it. We can even recommend for you what models or blend of models you have and you can quickly iterate and evaluate those types of models in a pre-production-like scenario. And then we provide the capabilities for you to easily deploy those into production. That is the power of what we're doing because we're effectively shrinking the time that it takes you to find insights in that discovery phase and then use that same platform to be able to deploy those insights into production. Is that auto-classification capability, is that new, is that... No, you might have heard capabilities from us before called Entity Analytics and we're using our Entity Analytics capabilities on the platform. And of course, we're modernizing it, right? Because we're actually leveraging new metadata techniques even founded on open source capabilities like Apache Atlas to be able to help us with how we store in catalog. But the actual classification of types and finding those relationships is using our Entity Analytics technologies. At the point of creation or use. At the point of creation. So you can scale. Absolutely, so where data finds data. And the thing in what makes it cognitive is as the data changes, so do those relationships. And so that's what is extremely powerful because as you learn more and as you expose to more data as most machine learning projects grow in the amount of data that they operate on, those relationships can also change. So what's the business impact of that? That provides flexibility as things change. You know, one of the number one things I hear from clients is they don't even know what data they have, how to access it, where it's stored and what other data may have some sort of semblance that they want to be able to augment with to be able to find those insights. That's the power of being able to use Entity Analytics because you now know where data is, how to find it, what the relationships are and what else you may want to augment it with to find the right relationships. So if I were to ask you, talk to the chief data officer, even the CIO or even the CFO who's been budgeting money for these data lakes or data as a service or just big data, it sounds like what you're saying is all those stories about talent being really scarce, data scientists and expensive, you're making them far more productive and you're making the likelihood of success far greater. Absolutely, I mean we intend, especially like in the area of data science and the data science profession, to democratize what it means to be a data science professional. If you recall in our June announcement, I remember talking to you guys in June when we launched the Data Science Experience, which is an experience that's built on this platform, we talked about how our number one goal is to be able to help these data science professionals learn their craft, to be able to learn what it means, to be able to code in new languages like R, Python or Scala, to be able to create and build in any language and then to be able to collaborate, that notion of collaboration and learning from each other and that network effect is absolutely important. So we're lowering the barrier to what it means to be able to drive insights from all data, regardless of whether that data actually sits within the platform or it's outside. As a data point, we know that in the next three years, each application will have probably 10 to 15 external data sources associated with that and that means what you know about how you treat data today is gonna change dramatically now and we intend to be able to reinvent how organizations think about that and act on that. So this is, I wanted to bring up the issue of applications. I think, I spoke to Nancy Hensley who works for you, that the estimate today is that, or the current thinking is that roughly 70% of these sorts of pipelines will be to enhance the machine learning within existing apps. How mechanically might that work and then give us maybe some examples of how those applications could change when you have this capability? Well, I talked about one of the greatest, one of the great things that we're doing on the platform is being able to not only in the same model be able to discover and iterate on insights that you have on data but be able to deploy those into production. If those applications, for example, are built in a framework, you can easily take what you have built and operationalize those into applications. We're actually gonna show an example tonight in the demonstration where we take insights that we've developed through a R model and we actually infuse that into an application that has been built within Bluemix seamlessly in the snap of a button and so these type of things are actually real even for existing applications where now that collaboration that needs to happen between a data science professional that is driving and finding those insights and the application developer who is codifying that in their web and mobile applications, that collaboration needs to happen in real time so that you can realize the benefits within your applications quickly. So that's interesting, the relationship, you're affecting the relationship between the data science team and the application development team. Is that a sort of a mesh gray area? Is there a fine line there? Well, it always overlaps and we actually go, we've actually had a lot of internal discussions back and forth on are they tasks or are they personas? Because the reality is in many organizations, the data scientists may be playing an application developer role. But the fact of the matter that remains is that the skill sets for data engineering, for data analysis, for data science and for application development are absolutely necessary to core critical projects in any organization and that collaboration is essential within those skill sets to be able to bring value from that data. And you need to service that independent of the organizational structure or where the skill sets lie in one camp or the other. So would it be fair that I'm being tactical here but the developer, the application developer might take that model if it's our, it might put it on an R server and then the application could call that for predictions. Distributed applications, right? Right, could even be, let's throw in some more terminology, could be a microservice. Well, the platform is built and founded on that core value of microservices and the most comprehensive set of data and analytics services. So absolutely microservice architecture based. Okay, so then it's essentially, we've taken the dev ops culture and processes from the traditional application world or microservice world. Now we're adding an analogous set for the machine learning. That is a wonderful observation. And in fact, we call that insight ops, the ability to be able to take insights that you find in the discovery phase where your data science professionals are actually finding new golden nuggets. When you want to operationalize those into production like environments and you want to do that seamlessly, insight ops. It's about insight operations. Some we call it data ops, we call it insight ops. Insight ops, that's scale. Absolutely. Simple enough that you don't need a rock star data science person to make it happen, okay? One, there's one more step, which is not necessarily in dev ops, which is the data feedback loops that you get from machine learning models where they have to keep getting smarter, preferably automatically. Is that something that you've tooled up for yet? Absolutely. You know, we have, when we take a look at the models that you build, you want to prepare your data, you want to train on that data, you want to continually evaluate. And that evaluation process is an extremely iterative process before you then deploy in production. And even in production, you're going to continually to evaluate and train those models. All right, we got to go. Last word. We are so excited about what we're announcing with IBM Watson DataWorks, and we're going to talk about a method as well, the data-first method that really helps our clients through this journey, because this is not just a technological journey. This is a journey that involves a transformation in technology, culture, processes, and organizational change. So stay tuned. We're looking forward to the journey that we're on. Love it. We'll see you tonight. We'll be covering that event with theCUBE. So thanks very much, Rilika, for coming on theCUBE. Thank you. All right, keep it right there, everybody. We'll be back with our next guest. It's theCUBE, we're live from New York City. Right back.