 Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. Welcome back to theCUBE, we are live at day one of the DataWorks Summit in the heart of Silicon Valley. I'm Lisa Martin with theCUBE, my co-host George Gilbert. We're very excited to be joined by our two next guests going to be talking about a lot of the passion and the energy that came from the keynote this morning and some big announcements. Please welcome Madhu Kochar, VP of Analytics and Product Development and Client Success at IBM and Jamie Angusser, VP of Product Management at Hortonworks. Welcome guys. Thank you. First time on theCUBE, George and I are thrilled to have you. So in the last six, eight months doing my research, there's been announcements between IBM and Hortonworks. You guys have been partners for a very long time and announcements on technology partnerships with servers and storage and kind of presumably, all that gives Hortonworks, Jamie, a great opportunity to kind of tap into IBM's enterprise install base. But boy, today, socks blown off with this big announcement between IBM and Hortonworks. Jamie, kind of walk us through that. What is, sorry, Madhu, I'm going to ask you first. Walk us through this announcement today. What does it mean for the IBM Hortonworks partnership? Oh my God, what an exciting, exciting day, right? We've been working towards this one. So three main things come out of the announcement today. The first is really the adoption of by Hortonworks of IBM data science and machine learning, right? As you heard in the announcement, we brought the machine learning to our mainframe, where the most trusted data is now bringing that to the open source, big data on Hadoop. Great, right, amazing. Number two is obviously the whole aspects around our big sequel, which is like bringing the complex query analytics where it brings all the data together from all various sources and making that as an HTTP and Hadoop and Hortonworks really adopting that amazing announcement. Number three, what we gain out of this humongous, the obvious, the from an IBM perspective is the whole platform, right? We've been on this journey together with Hortonworks since 2015 with ODPI and we've been all, you know, champions in the open source delivering a lot of that. And as we start to look at it, it makes sense to march that as a platform and give to our clients what's most needed out there as we take our journey towards machine learning, AI and enhancing the enterprise data warehouse strategy. Awesome, Jamie, from your perspective on the product management side, what is this, what's the impact and potential downstream great implications for Hortonworks? I think there's two things. I think Hortonworks has always been very committed to the open source community. And I think with Hortonworks and IBM partnering on this, number one is it brings a much bigger community to bear to really push innovation on top of Hadoop. So I think that innovation is gonna come through the community and I think that partnership drives two of the biggest contributors to the community to do more together. So I think that's number one is the community interest. The second thing is when you look at Hadoop adoption, we're seeing that people wanna get more and more value out of Hadoop adoption and they wanna access more and more data sets. To number one, get more and more value or seeing the data science platform become really fundamental to that. And we're also seeing the extension to say, not only do I need data science to get new insights, but I need to aggregate more data. And so we're also seeing the notion of, how do I use big SQL on top of Hadoop, but then I can federate data from my mainframe, which has got some very valuable data on it. DB2 instance is in the rest of the data repositories out there. So now we get a better federation model to allow our customers to access more of the data that they can make better business decisions on. And I can use data science on top of that to get net new learnings from that data. Let me build on that, which is, let's say that I'm a telco customer and the two of you come together to me and say, we don't want to talk to you about Hadoop. We want to talk to you about solving a problem where you've got data in applications in many places, including inaccessible stuff. You have a limited number of data scientists and the problem of cleaning all the data. And even if you build models, the challenge of integrating them with operational applications. So what do the two of you tell me that the telco customer? Yeah, so maybe I'll go first. So the telco, the main use case, the main application, as I've been talking to, many of the largest telco companies here in US and even outside of US, it's all about their churn rate, right? They want to know when the calls are dropping, why are they dropping? Why are the clients going to the competition and such? And there's so much data, right? The data is just streaming and they want to understand that. So I think if you bring the data science experience and machine learning to that data, that asset doesn't matter now where the data resides, right? Hadoop, mainframes, wherever we can bring that data, you can do sort of a transformation of that, clean up the data, the quality of the data is there so that you can start feeding that data into the models and that's when the models learn. More data it is, the better it is, they train, and then you can really drive the insights out of it. And now data science, the framework, which is available, it's like a team sport. You can bring in many other data scientists in the organization who could have different analyst reports to go render for, right, or provide results into. So being a team support, being a collaboration, bringing together with that clean data, I think it's going to change the world, right? I think the business side can have instant value of the data they're going to see. Let me just sort of test the edge conditions on that. So some of that data is streaming and you might apply the analytics in real time. Some of it is, I think as you were telling us before, sort of locked up as dark data. So the question is, how much of that data, the streaming stuff and the dark data, how much do you have to land in a Hadoop repository versus how much do you just push the analytics out to and have it inform a decision? Maybe I can take a first thought on that. So I think there's a couple of things in that. There's the learnings and then how do I execute the learnings. So I think the first step of it is I tend to land the data and going to the telecom churn model, I want to see all the touch points. So I want to see the person that came through the website, he went into the store, he called in to us. So I need to aggregate all that data to get a better view of what's the chain of steps that happen for somebody to churn. Once I end up diagnosing that, go through the data science of that to learn the models that are being executed on that data and that's the data at rest. Is what I want to do is build the model out so that now I can take that model and I can prescriptively run it in the stream of data. So I know that that's customer just hung up off the phone. Now we walked in the store and we can sense that he's in the store because we just registered that he's asking about his billing details and the system can now dynamically diagnose by those two activities that this is a churn, high rates or notify that teller in the store that there's a chance of him rolling out. If you look at that, that required, I'll say the machine learning and data science side to build the analytical model and it required the data flow management and streaming analytics to consume that model to make a real time insight out of it to ultimately stop the churn from happening. Let's just give the customer a discount at the end of the day. That type of stuff. So you need to marry those two. It's interesting that you articulated that very clearly, although then the question I have is now not on the technical side, but on the go-to-market side, you guys have to work very, very closely and this is calling it a level that I assume is not very normal for Hortonworks and it's something that is a natural sales motion for IBM. So maybe I'll first speak up and then I'll let you add some color. So when I look at it, I think there's a lot of natural synergies. IBM and Hortonworks have been partnered since day one. We've always continued on the path. If you look at it and I'll bring up community again and open source again, but we've worked very well in the community. I think that's incubated a really strong and fostered a really strong relationship. And I think at the end of the day, we both look at what's going to be the outcome for the customer and working back from that and we tend to really engage at that level. So what's the outcome and then how do we make a better product to get to that outcome? So I think there is a lot of natural synergies in that. I think to your point, there's lots of pieces that we need to integrate better together and we will join that over time. And I think we're already starting with the data science experience, a bunch of integration touch points there. I think you're going to see in the information governance space with Atlas being a key underpinning and information governance catalog on top of that, ultimately moving up to IBM's unified governance. We'll start getting more synergies there as well and on the big SQL side. So I think when you look at the different pods, there's a lot of synergies that our customers will be driving and that's I think what the driving factors along with the organizations are very well aligned. And VP of engineering, so there's a lot of integration points which we're already identified and big SQL is already working really well on the Hortonworks HDP platform, right? We got good integration going but I think more and more on the data science and I think in end of the day, we end up talking to very similar clients, right? So going as a joint go-to-market strategy, it's a win-win. Jamie and I were talking earlier, I think in this type of a partnership, A, our community is winning and our clients, right? And so really good solutions out there. And that's what it's all about. Speaking of clients, you gave a great example with Telco. When we were talking to Rob Thomas and Rob Bearden earlier on the program today, they talked about the data science conversation is at the C-suite. So walk us through an example of whether it's a Telco or maybe a healthcare organization. What is that conversation that you're having? How is a Telco helping foster what was announced today and this partnership? Maybe I'll start. When we look at in a Telco, I think there's a natural evolution. I think when we start looking at that problem of how does a Telco consume and operate data science at a larger scale? So at the C-suite, it becomes a people process discussion. There's not a lot of tools currently that really help the people in process side of it. It's kind of an artist's capability today in the data science space. So what we're trying to do is, I think you mentioned team sport, but also give the tooling to say, there's step one, which is we need to start learning and training the right teams in the right approach. Step two is start giving them access to the right data, et cetera, to work through that. And step three, giving them all the tooling to support that. And tooling becomes things like TensorFlow, et cetera. Things like Zeppelin, Jupiter, a bunch of the open source community evolved capabilities. So first, learn and training. The second step in that is, give them the access to the right data to consume it. And then third, give them the right tooling. And I think those three things are kind of helping us to drive the right capabilities out of it. But that, to your point, elevating up to the C-suite, it's really, they think people process. And I think giving them the right tooling for their people and the right process to get them there. Kind of moving data science from an art to a science. I would argue at a top level. About on the client success side, how instrumental though are your clients, like maybe on the telco side, in actually fostering the development of the technologies or helping IBM make the decision to standardize on HTTP as their big data platform? Oh, huge, huge. A lot of our clients, especially as they're looking at the big data, many of them are actually helping us, they're committers into the code, right? They're adding, providing, if we can move fast enough in the engineering, they are coming up and saying, hey, we're going to help and code up and do some code development with you. They've been really pushing our limits. A lot of clients, actually I end up working with on the Hadoop site, is, you know, like, for example, my entire information integration suite is very much running on top of HTTP today. And so they're saying, okay, what's next? We want to see better integration. So as I called a few clients yesterday saying, hey, you know, under embargo, this is something going to get announced. Amazing, amazing results, right? And they're just very excited about this. So we are starting to get a lot of push. And actually the clients who do have large development community as well, like a lot of banks today, they write a lot of their own applications. We're starting to see them, you know, code developing stuff with us and becoming the committers. You have a question? Well, if I just were to jump in, how do you see over time, the mix of apps starting to move from, you know, completely custom developed? Sort of the way, you know, the original big data applications were all written, you know, down to the metal in MapReduce. And for shops that don't have a lot of data scientists, how are we going to see applications become sort of more self-service, more prepackaged? So maybe I'll give a little bit of perspective and I'll maybe give some perspective. So right now I think IBM's got really good synergies on what I'll call vertical solutions to organizations, financials, et cetera. I would say Hortonworks has took a more horizontal approach. We're more of a platform solution. An example of one where it's kind of marrying the two is if you move up the stack from Hortonworks as a platform to the next level up, which is Hortonworks as a solution, one of the examples that we've invested heavily in is cybersecurity. And an Apache project called Metron. Less about Metron, more about cybersecurity, people want to solve a problem. They want to defend an attacker immediately. What that means is we need to give them out of the box models to detect a lot of common patterns. So what we're doing there is we're investing in some of the data science and prepackaged models to identify attack vectors and then try to resolve that or at least notify you that there's a concern. It's an example where the data science behind it, prepackaging that data science to solve a specific problem. That's in the cybersecurity space. In that case, it happens to be horizontal where Hortonworks' strength is. I think in the IBM case, there's a lot more vertical apps that we can apply to, fraud, adjudication, et cetera. So it sounds like we're really just hitting the tip of the iceberg here with the potential. We want to thank you both for joining us on theCUBE today, sharing your excitement about this deepening expanding partnership between Hortonworks and IBM Madhu and Jamie. Thank you so much for joining George and I today on theCUBE. Thank you, Lisa and George. Appreciate it. For my co-host, George Gilbert, I am Lisa Martin. You're watching us live on theCUBE from day one of the DataWorks Summit in Silicon Valley. Stick around, we'll be right back.