 Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. Welcome to theCUBE, we are live in San Jose in the heart of Silicon Valley. At the DataWorks Summit day one, I'm Lisa Martin with my co-host George Gilbert and we're very excited to be talking to two Rob's. We have Rob Squared on the program this morning. Rob Bearden, the CEO of Hortonworks, welcome Rob. Thank you for having us. And Rob Thomas, the GM rather of IBM Analytics. So guys, we just came from this really exciting, high energy keynote. The laser show was fantastic, but one of the great things, Rob, that you kicked off with was really showing the journey that Hortonworks has been on in a really pretty short period of time. Tremendous inertia, and you talked about the four mega trends that are really driving enterprises to modernize their data architecture. Cloud, IoT, streaming data, and the fourth next leg of this is data science. Data science, you said, will be the transformational next leg in the journey. Tell our viewers a little bit more about that. What does that mean for Hortonworks and your partnership with IBM? I think what IBM and Hortonworks now have the ability to do is to bring all the data together across a connected data platform. So the data in motion, the data at rest, now have in one common platform, irrespective of the deployment architecture, whether it's on-prem across multiple data centers, or whether it's deployed in the cloud. And now that large volume of data, and we have access to it, we can now start to begin to drive analytics end to end, as that data moves through each phase of its lifecycle. And what really happens now is now that we have visibility and access to the inclusive lifecycle of the data, we can now put a data science framework over that to really now understand and learn those patterns and what's the data telling us? What's the pattern behind that? And we can bring simplification to the data science and turn data science actually into a team sport, allow them to collaborate, allow them to have access to it, and sort of take the black magic out of doing data science with the framework of the tool and the power of DSX on top of the connected data platform. Now we can advance rapidly the insights end to end of the data, and what that really does is drive value quickly back into the customer. And then we can then begin to bring smart applications via the data science back into the enterprise. So we can now do things like connected car in real time and have connected car learn as it's moving in through all the patterns. We can now from a retail standpoint really get smart and accurate about inventory placement and inventory management, right? From an industrial standpoint, we know in real time down to the component what's happening with the machine and any failures that may happen and be able to eliminate downtime, agriculture, same kind of healthcare, every industry, financial services, fraud detection, money laundering advances that we have, but it's all going to be attributable to how machine learning is applied in the DSX platform, it's the best platform in the world to do that with. And one of the things that I thought was really interesting was that as we saw enterprises start to embrace Hadoop and Big Data and say, you know what, this needs to coexist and interoperate with our traditional applications, our traditional technologies. Now you're saying and seeing data science is going to be a strategic business differentiator. You mentioned a number of industries and there were several of them on stage today. Give us maybe one of your favorite examples of one of your customers leveraging data science and driving a pretty significant advantage for their business. Sure, yeah, well, so step back a little bit, just a little context. Only 10 companies have outperformed the SAP 500 in each of the last five years. We start looking at what are they doing? Those are companies that have decided data science and machine learning is critical. They've made a big bet on it and every company needs to be doing that. So a big part of our message today was kind of, I'd say, open the eyes of everybody to say there is something happening in the market right now and it can make a huge difference in how you're applying data and analytics to improve your business. We announced our first focus on this back in February and one of our clients that spoke at that event is a company called Argus Healthcare. And Argus has massive amounts of data sitting on the mainframe and they were looking for how can we unleash that to do better care of patients, better care for our hospital networks and they did that with data that they had in their mainframe. So they brought data science experience and machine learning to their mainframe, that's what they talked about. What Rob and I have announced today is there's another great trove of data in every organization which is the data inside Hadoop, HDP leading distribution for that. It's a great place to start and so the use case I just shared which is on the mainframe, that's going to apply anywhere where there's large amounts of data. And right now there's not a great answer for data science on Hadoop until today where data science experience plus HDP, it brings really, I'd say an elegant approach to it, it makes it a team sport, you can collaborate, you can interact, you can get education right in the platform. So we have the opportunity to create a next generation of data scientists working with data in HDP, that's why we're excited. At least this question and your answer on that. In terms of sort of the data science experience as this next major building block to extract or to build on the value from the data lake, the two companies, your two companies have different sort of go to markets, especially at IBM with the industry solutions and global business services, you guys can actually build semi-custom solutions around this platform, both the data and the data science experience. With Hortonworks, what are those, what's your go to market motion going to look like and what are the offerings going to look like to the customer? There'll be several, you just described a great example. With IBM Professional Services, they had the ability to take those industry templates and take these data science models and instantly be able to bring those to the data. And so as part of our joint go to market motion, we'll be able to now partner, bring those templates, bring those models to not only our customer base, but also part of the new sales go to market motion in the white space and new customer opportunities. The whole point is now we can use the enterprise data platforms to bring the data under management in a mission critical way, but then bring value to it through these kinds of use case and templates that drive the smart applications in the quick time to value and just increase that time to value for the customers. So how would you look at the mix changing over time in terms of data scientists working with the data to experiment on the model development and the two hard parts that you talked about, data prep and the operationalization. So in other words, custom models, the issue of deploying it 11 months later because there's no real process for that that's packaged. And then packaged enterprise apps that are going to bake these models in as part of their functionality, that way that Salesforce is starting to do and Workday is starting to do, how does that change over time? It'll be a layering effect. So today we now have the ability to bring through the connected data platforms all the data under management in a mission critical manner from point of origination through the entire stream till it comes at rest. Now with the data science, with deep through DSX, we can now then have that data science framework to where the analogy I would say is instead of it being a black science of how you do data access and go through and build the models and determine what the algorithms are and how that yields a result, it's the analogy is you don't have to be a mechanic to drive a car anymore, right? The common person can drive a car. So now we really open up the community business analysts that can now participate and enable data science through collaboration. And then we can take those models and build the smart apps and evolve the smart apps that go to that very rapidly and we can accelerate that process also now through the partnership with IBM and bringing their core domain and value that drivers that they've already built and dropped that into the DSX environments. And so I think we can accelerate the time to value now much faster and efficient than we've ever been able to do before. You mentioned teamwork a number of times and I'm curious about, you also talked about the business analysts, what's the governance like to facilitate business analysts and different lines of business that have particular access and what is that team composed of? Yeah, well, so let's look at what's happening in the big enterprises in the world right now. There's two major things going on. One is everybody's recognizing this is a multi-cloud world. There's multiple public cloud options. Most clients are building a private cloud. They need a way to manage data as a strategic asset across all those multiple cloud environments. The second piece is we are moving towards what I would call the next generation data fabric, which is your warehousing capabilities, your database capabilities, married with Hadoop, married with other open source data repositories and doing that in a seamless fashion. So you need a governance strategy for all of that. And the way I describe governance, simple analogy. We do for data what libraries do for books. Libraries create a catalog of books. They know they have different copies of books, some they archive, but they can access all the intelligence in a library. That's what we do for data. So when we talk about governance and working together, we're both big supporters of the Atlas project. That will continue. But the other piece, kind of this pointer on the enterprise data fabric, is what we're doing with Big SQL. Big SQL is the only 100% ANZ SQL compliant SQL engine for data across Hadoop and other repositories. So we'll be working closely together to help enterprises evolve in a multi-cloud world to this enterprise data fabric. And Big SQL is a big capability for that. And an immediate example of that is, in our EDW optimization suite that we have today, we'll be leveraging Big SQL as the platform to do the complex query sector of that that we'll go to market with almost immediately. Follow-up question on the governance. There's, I mean, to what extent is end-to-end governance, meaning from the point of origin, through the last mile, if the last mile might be some specialized analytic engine, versus having all the data management capabilities in that fabric. You mentioned operational, I think, and analytic. So, like, our customers going to be looking for a provider who can give them sort of end-to-end capabilities on both the governance side and on all the data management capabilities. Is that sort of a critical decision? I believe so. I think there's really two use cases for governance. It's either insights or it's compliance. And if your focus is on compliance, something like GDPR as an example, that's really about the life cycle of data from when it starts to when it can be disposed of. So, for compliance use case, absolutely. When I say insights as a governance use case, that's really about self-service. The ideal world is you can make your data available to anybody in your organization, knowing that they have the right permissions, that they can access, they can do it in a protected way. And most companies don't have that advantage today. Part of the idea around data science on HDP is if you've got the right governance framework in place, suddenly you can enable self-service, which is any data scientist or any business analyst can go find and access the data they need. So, it's a really key part of delivering on data science. Is this governance piece? And I just talked to clients, they understand where you're going. Is this about compliance? Or is this about insights? Because there's probably a different starting point. But the end game is similar. Curious about your target markets. I've talked about the go-to-market model a minute ago. Are we targeting customers that are on mainframes and use that I think in your keynote, 90% of transactional data is in a mainframe. Is that one of the targets? Or is it the target, like you mentioned, Rob, with the EDW optimization solution? Are you working with customers who have an existing enterprise data warehouse that needs to be modernized? Is it both? The good news is it's both. It's about, really the opportunity and mission is about enabling the next generation data architecture. And within that is, again, back to the layering approach. It's being able to bring the data under management from point of origination through point of at rest. And if we look at it, you know, probably 90% of the at least transactional data sits in the mainframe. So you have to be able to span all data sets and all deployment architectures on-prem multi-data center as well as public cloud. And that then is the opportunity, but for that to then drive value ultimately back, you've got to be able to have then the simplification of a data science framework and tool set to be able to then have the proper insights and basis in which you can bring the new smart applications and drive the insights, drive the governance through the entire lifecycle. On value front, you know, we talk about and Hortonworks talks about the fact that this technology can really help a business unlock transformational value across the organization, across lines of business. This conversation, we just talked about a couple of the customer segments. Is this a conversation that you're having at the C-suite initially? Where is, are the business leaders in terms of understanding, we know there's more value here, we probably can open up, you know, new business opportunities, or are you talking more at the data science level? So I think it, look, it's at different levels. So data science, machine learning, that is a C-suite topic. A lot of times I'm not sure that audience knows what they're asking for, but they know it's important and they know they need to be doing something. When you go to things like a data architecture, the C-suite's discussion there is, I just want to become more productive in how I'm deploying and using technology because my IT budget's probably not going up. If anything, it may be going down. So I've got to become a lot more productive and efficient to do that. So it depends on, you know, who you're talking to, there's different levels of dialogue, but there's no question in my mind. I've seen, you know, just look at, you know, major press, financial times, Wall Street Journal last year. CEOs are talking about AI, machine learning, using data as a competitive weapon. Like it is happening and it's happening right now. What we're doing together is saying, how do we make data simple and accessible? How do we make getting there really easy? Because right now it's pretty hard, but we think with, you know, the combination of what we're bringing, we make it pretty darn easy. So one quick question, following up on that, and then I think we're getting close to the end, which is when the data lakes started out, it was sort of a, it seemed like for many customers, a mandate from on high, we need a big data strategy, and that translated into standing up a Hadoop cluster, and that resulted in people realizing that there's a lot to manage there. It sounds like right now, people know machine learning is hot and so they need to get data science tools in place, but is there a business capability, like sort of like the ETL offload was for the initial Hadoop use cases, where you would go to a customer and recommend like, do this, bite this off as something concrete? I'll start then Rob can comment. Look, the issue is not Hadoop. A lot of clients have started with it. The reason there hasn't been, in some cases, the outcomes they wanted, it's because just putting data in Hadoop doesn't drive an outcome. What drives an outcome is what do you do with it? How do you change your business process? How do you change what the company's doing with the data? And that's what this is about, is kind of that next step in the evolution of Hadoop. And that's starting to happen now. It's not happening everywhere, but we think this will start to propel that discussion. Any thoughts you'd have, Rob? Spot on. Data Lake was about releasing the constraints of all the silos and being able to bring those together and aggregate that data. And it was the first basis for being able to have a 360 degree or holistic centralized insight about something and or pattern. But what then data science does is it actually accelerates those patterns and those lessons learned and the ability to have a much more detailed and higher velocity insight that you can react to much faster and actually accelerate the business models around this aggregate. So it's a foundational approach with Hadoop. And it's then, as I mentioned in the keynote, the data science platforms, machine learning and AI actually is the thing that transformationally opens up and accelerates those insights so then new models and patterns and applications get built to accelerate value. Well, speaking of transformation, thank you both so much for taking time to share your transformation and the big news and the announcements with Hortonworks and IBM this morning. Thank you, Rob Bearden, CEO of Hortonworks. Rob Thomas, General Manager of IBM Analytics. I'm Lisa Martin with my co-host, George Gilbert. Stick around, we are live from day one at DataWorks Summit in the heart of Silicon Valley. We'll be right back. Right.