 From Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone. Live in New York City for Big Data NYC, our annual event with SiliconANGLE Media, theCUBE and Wikibon in conjunction with Strata Hadoop, which now called Strata Data as that show evolves. I'm John Furrier, co-host of theCUBE with Peter Burris, head of research for SiliconANGLE Media and general manager Wikibon. Our next two guests are two legends in the big data industry. Rob Beard, the CEO of Hortonworks, really one of the founders of the big data movement. You know, get Cloudera and Hortonworks really kind of built that out. And Rob Thomas, general manager of IBM Analytics. Big time investments have made both of them. Congratulations for your success guys. Welcome back to theCUBE. Great to see you guys. Great to see you guys. You've got an exciting partnership to talk about as well. Yeah, absolutely. So, but let's do a little history. You guys, obviously we're going to get to that and get to clarify on the news in a second. But you guys have been there from the beginning, kind of looking at the market developing it almost from the embryonic state to now. I mean, what a changeover. Give a quick comparison of where we've come from and what's the current landscape now because you have, it evolved into so much more. You've got IoT, you've got AI, you've got a lot of things in the enterprise. You've got cloud computing, a lot of tailwinds for this industry. It's gotten bigger. It's become big and now it's huge with your thoughts guys. So you look at the arc since really all this started with Hadoop and Rob and I met early in the days of that. You've kind of gone from the early few years was about optimizing operations. Hadoop is a great way for a company to become more efficient, take out cost in their data infrastructure. And so that put huge momentum into this area. And now we've kind of fast forwarded to the point where now it's about, so how am I going to actually extract insight? So instead of just getting operational advantages, how am I going to get competitive advantage? And that's about bringing the world of data science and machine learning, run it natively on Hadoop. That's the next chapter and that's what Rob and I are working closely together on. Rob, your thoughts too. We've been talking about data in motion. You guys were early on on that, seeing that trend. Real time is still hot, data is still the core asset people are trying to figure out and move from wrangling to actually enabling that data, right? You know in the early days of big data, to Rob's point, it was very much about bringing operational leverage and efficiency and being able to aggregate very siloed data sets and unlocking that data and bringing it into a central platform. In the early days and resources in Hadoop, went to making Hadoop an enterprise viable data platform with security, governance, operations, management capability that mirrored any of the proprietary transactional or EDW platforms. And what the lessons learned in that work is that by bringing all that data together in a central data set, we now can understand what's happening with our customers and with our other assets pre-transaction. And so they could become very prescriptive in engaging in new business models. And so what we've learned now is the further upstream we can get in the world of IoT and bring that data under management from the point of origination and be able to manage that all the way through its life cycle, we can create new business models with higher velocity of engagement and a lot more rapid value that gets created. It though creates a number of new challenges in all the areas of how you secure that data, how you bring governance across that entire life cycle from a common stream set. Well, let's talk about the news you guys have. You're obviously the partnership. Partnerships become the new normal in an open source era that we're living in. You're seeing open source software grow really exponentially in the forecast coming in next five years and 10 years in exponential growth in new code. Just new people coming on board, new developers, DevOps is mainstream. Partnerships are key for communities. I mean, 90% of the code is going to be open source. 100%, as they say, the code sandwich is Jim Zemmel and the executive director at Linux Foundation points to. And you're seeing that work. You guys have worked together with Apache Atlas. What's the news? What's the relationship with Hortonworks and IBM? Share the news. So a lot of great work's been happening there in generally an open source community around Apache Atlas and making sure that we're bringing mission critical governance capabilities across the big data sets and environments. And as we then get into the complexity of now multiple data lakes, multiple tiers of data coming from multiple sources that brings a higher level of requirement in both the security and governance aspects. And that's where the partnership with IBM is continuing to drive Apache Atlas into mission critical enterprise viability. But then when we get into the distributed models and enterprise requirements, the IBM platforms leveraging Atlas and the work we're doing together, then take that into the mission critical enterprise. Yeah, the open source, and I got the enterprise, we've talked many times about the enterprise as a hard environment to crack for, say, a startup, but even now they're becoming reliant on open source. But yet they have a lot of operational challenges. How does this relate to the challenge of CIO and his staff of, now new personas coming in, you're seeing the data science role, you're seeing expanding from analytics to DevOps. Yep. And they have challenges. Look, enterprises are getting better at this. Clearly we've seen progress in the last five years on that, but to kind of go back and link the points, there's a phrase I heard I like, so there's no AI without IA, meaning information architecture. Fundamentally what our partnership is about is delivering the right information architecture. So it's a dupe federated with whatever you have in terms of warehouses and databases. We partner around IBM common sequel for that. It's metadata for your core governance because without governance, you don't have compliance, you can't offer self-service analytics. So we are forming what I would call the fluid data layer for an enterprise that enables them to get to this future of AI. And my view is there's a stop in between, which is data science, machine learning, applications that are ready today that clients can put into production and improve the outcomes they're getting. That's what we're focused on right now is how do we take the information architecture we've been able to establish and then help clients on this journey. That's what enterprises want because that's how they're going to build differentiation in their businesses. The definition of an information architecture is closest to applications and maybe this informs your perspective. It's close to the applications that the business is running on. Goes back to your observation about we used to be focusing on optimizing operations. As you move away from those applications, your information architecture becomes increasingly diffuse. It's not as crystal clear. How do you drive that clarity as the data moves to derived new applications? So I think we're, and Rob and I have talked about this. I think we're at the dawn of probably a new era in application development. Much more agile, flexible applications that are taking advantage of data wherever it resides. We are really early in that. So right now we're in the, let's actually put into practice machine learning and data science. Let's extract value of the data we got. That will then inform a new set of applications. Which is related to the announcements that the Hortonworks made this week around data plane, which is looking at multi-cloud environments. And how would you manage applications and data across those? Rob, you can speak to that better than I can, I think. Well, the data plane thing, I want to just get that second. This information architecture, I think you're 100% right on. The data that we're hearing from customers in the enterprises, they see the IoT buzz. Oh, of course they're going to connect IoT devices down the road. But when they see the security challenges, right? When they see the operational challenges around hiring people to actually run the DevOps, they have to then re-architect. So there's certainly a conversation that we see on what is the architecture for the data, but also a little bit bigger than that, the holistic architecture of, say, cloud. So a lot of people are like trying to clean up their house if you will, to be ready for this new era. And I think Wikibon's, your private cloud report, you guys put out, really amplify that by saying, yeah, they see these trends, but they got to kind of get their act together, right? They got to look at who the staff is, what the data architecture's going to be, what apps are being developed. So they're doing a lot more retrenching. And how does, so given that, if we agree, what does that mean for the data plane and then your vision of having that data architecture so that this will be a solid foundational transition? So I think we all hit on the same point, which is it is about enabling a next generation IT architecture, of which the X and the Y axis are network and data. And generally what Big Data's been able to do and Hadoop specifically was over the last five years, enabling the existing applications architected, and I like the term that's been coined by you, is they were known processes with known technology. And that's how applications in the last 20 years have been enabled. Big Data and Hadoop generally have unlocked that ability to now be able to move all the way out to the edge and incorporate IoT, data at rest, data in motion, on-prem and cloud for hybrid architecture. What that's done is it said, now we know how to build an application that takes advantage of an event or an occurrence and then can drive the outcome in a variety of ways. We don't have to wait for a static programming model to automate a function. And in fact, if we are wait, we're going to fail. I think that's one of the biggest challenges. I mean, IBM, I will tell you guys, I'll tell you Rob, that one of the craziest days I've ever spent as I flew from Japan to New York City for the IBM Information Architecture Announcement back in 1994, and it was the most painful two days I've ever experienced in my entire life. That's a long time ago as ancient history. We can't use information architecture as a way of slowing things down. What we need to be able to do is we need to introduce technology that again, allows the clarity of information architecture close to these core applications to move. And that may involve things like machine learning itself being embedded directly into how we envision data being moved, how we envision optimization, how we envision the data plane working. So as you guys think about this data plane, it ends, everybody ends up asking themselves, is there a natural place for data to be? What's going to be centralized? What's going to be decentralized? And I'm asking you, is increasingly the data going to be decentralized, but the governance and securities and policies that we put in place going to be centralized, and that's what's going to inform the operation of the data plane? What do you guys think? It's our view, very specifically from Hortonworks perspective, that we want to give the ability for the data to exist and reside wherever the physics dictate, whether that be on-prem, whether that be in the cloud. And we want to give the ability to process and take action on an event or an occurrence or drive an outcome as early in the cycle as possible. Define what you mean by early in the cycle. So as we see conditions emerge, a machine part breaking down, a customer taking an action, a supply chain inventory outage. So as close as possible to the event that's generating the data? As it's being generated, or as the processes are leading up to the natural outcome, and we can maybe disintermediate for a better outcome. And so that means that we have to be able to engage with the data irrespective of where it is in its cycle. And that's where we want to, we've enabled with data plane the ability to abstract out the requirement of where that data is and to be able to have a common plane, pun intended, for the operations and managing and provisioning of the environment for being able to govern that and secure it, which are increasingly becoming intertwined because you have to deal with it from point of origin through point at rest. But this is what the benefit is. There's a new phrase, the single plane of glass. As the guy's customer, all joking aside, I want to just get your thoughts on this. Rob, too, customers, what's in it for me? I'm the customer. Right now, I have a couple of challenges. That's what we hear from the market. I have data. I need data consistency because things are happening in real time. Whatever events are going on with data, we know more data. Then we come up from the edge and everywhere else, faster and more volume. So I need consistency of my data, and I don't want to have multiple data silos. And then I got to integrate the data. So on the application developer side, a DevOps-like ethos is emerging where, hey, if this data being done, I need to integrate that into my app in real time. So those are two challenges. Does the data plane address that concern for customers? That's the question. It's emerging to the DevOps world. Today it enables the ops world. So I can integrate my apps into the data plane. And my other data assets, irrespective of where they reside, on-prem, cloud, or out to the edge, and all points in between. Ralph, for enterprise, is this going to be the single pane of glass for data governance? Is that how the vision, you guys see this? Because that's a benefit. If that could happen, right? I mean, that's essentially one step towards the promised land, if you will, for more data flowing through apps and app developers. So let me reshape a little bit. There's two main problems that collectively we have to address for enterprises. One is they want to apply machine learning and data science at scale, and they're struggling with that. And two is they want to get to cloud. And it's not talked about nearly enough, but most clients are really struggling with that. And so then you fast forward on that one, we are moving to a multi-cloud world. Absolutely, I don't think any enterprise is going to standardize on a single cloud. That's pretty clear. So you need things like data plane that acknowledge it's a multi-cloud world. And even as you move to multi-clouds, you want a single focus for your data governance, a single strategy for your data governance. And then what we're doing together with IBM Data Science Experience with Hortonworks is saying whatever data you have in there, you can now do your machine learning right where that data is. You don't need to move it around. You can if you want, but you don't have to move it around because it's built in and it's integrated right into the Hadoop ecosystem. That solves the two main enterprise pain points, which is help me get the cloud, help me apply data science and machine learning. Well, we'll have to follow up. I'd love to do just a segment just on that. I think multi-cloud is clearly the direct with what the hell does that mean? If I run 365 on Azure, that's one app. If I run something else on Amazon, that's multiple clouds. But there's no integration. Not necessarily moving workloads across. So the question I want to ask here is, it's clear from customers, they want single code bases that run on all clouds seamlessly. So I don't have to skill up on things on Amazon as you're in Google. Not all clouds are created equal on how they do things. Right, storage, through, ever, all the data factories of how they process. That's a challenge. How do you guys see that playing out? That you have on-premise activities that have been bootstrapped. Now you have multiple clouds with different ways of doing things from pipelining, ingestion and processing and learning. How do you see that playing out? Clouds just kind of standardizing around data playing. I don't know. And there's also the complexity of, even within the multi-clouds, you're going to have multiple tiers within the clouds. If you're running in one data center in Asia versus another one in Latin America, and maybe a couple across the Americas. But as a customer, do I need to know the cloud internals of Amazon, Azure, and Google? Today you do. In a standalone world, yes you do. And that's where we have to bring and abstract the complexity of that out. And that's the goal of data playing, is to be able to abstract whether it's on, which tier it's in, on-prem, or whether it's on irrespective of which cloud platform. But Rob Thomas, I really like the way you put it. There may be some other issues that users have to worry about, certainly there's some that we think. But the two questions of, where am I going to run the machine learning? And how am I going to get that to the cloud appropriately? I really like the way you put that. At the end of the day, what users need to focus on is less where the application code is, and more where the data is, so that they can move the application code, or they can move the work to the data. That's fundamentally the perspective. We think that businesses don't take their business to the cloud, they bring the cloud to their business. And so when you think about this notion of increasingly looking at a set of work that needs to be performed, where the data exists, and what acts you're going to take in that data, it does suggest that data is going to become more of a centerpiece asset within the business. How does some of the things that you guys are doing lead customers to start to acknowledge data as an asset, so they're making the appropriate investments in their data as their business evolves, and partly in response to data as an asset? What do you think? So we have to do our job to build to common denominators, and that's what we're doing to make this easy for clients. So today we announced the IBM integrated analytics system, same code base on private cloud as on a hardware system, as on public cloud. All of it federates to Hortonworks through common SQL. That's what clients need, because it solves their problem. Click of a button, they can get the cloud. And by the way, on private cloud is based on Kubernetes, which is in line with what we have on public cloud. We're working with Hortonworks to optimize yarn and Kubernetes working together. Like these are the media issues that if we don't solve it, then clients have to deal with the bag of bolts. And so that's the kind of stuff we're solving together. So think about it, one single code base for managing your data federates to Hadoop. Machine learning is built into the system, and it's based on Kubernetes. I mean, that's what clients want. Yeah, and the containers is just great too, Kubernetes. Great cloud native trend. You guys been great active in there. Congratulations to both of you guys. Final question gets you guys the last word. How does the relationship between Hortonworks and IBM evolve? How do you guys see this playing out? More the same, keep integrating in code. Is there any new things you see on the horizon that you're going to be knocking down in the future? I'll take the first shot. The goal is to continue to make it simple and easy for the customer to get to the cloud, bring those machine learning and data science models to the data, and make it easy for the consumption of the new next generation of applications. And continue to make our customers successful and drive value, but to do it through, transparently, enabling the technology platforms together. And I think we've acknowledged the things that IBM is extraordinarily good at, the things that Hortonworks good at, and bringing those two together with virtually no overlap. And then you've been very partner-centric. Your thoughts on this partnership? Look, it's what clients want since we announced this. The results, I mean, the response has been fantastic. And I think it's for one simple reason. So Hortonworks mission we all know is open source and delivering in the community. They do a fantastic job of that. We also know that sometimes clients need a little bit more. And so when you bring those two things together, that's what clients want. That's very different than what other people in the industry do that say, we're going to create a proprietary wrapper around your Hadoop environment and lock your data in. That's the opposite of what we're doing. We're saying we're giving you full freedom of open source, but we're enabling you to augment that with machine learning, data science capabilities. This is what clients want. That's why the partnerships working, that's why that we've gotten the response we have. And you guys have been multiple years into the new operating model of being much more aggressive in the big data community, which is now morphed into a much larger landscape. You pleased with some of the results you're seeing on the IBM side and more coding, more involvement in these projects on your end? Yeah, I mean, look, we were certainly early on Spark, created a lot of momentum there. I think it's actually ended up helping both of our interests in the market. We've built a huge community of developers in IBM, which is not something IBM had even a few years ago, but it's great to have a relationship like this where we can continue to augment our skills. We make each other better. And I think what you'll see in the future is more on the governance side. I think that's the piece that's still not quite been figured out by most enterprises yet. The need is understood. The implementation is slow. So you'll see more from us collectively there. Well, congratulations and the community work that you guys have done. I think the community will models evolving mainstream as well. Open source community growth, congratulations. Rob Bearden and Rob Thomas here inside theCUBE. More coverage here in big data NYC with theCUBE after this short break.