 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015, brought to you by headline sponsor Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunituan Disco, now your hosts, John Furrier and George Gilbert. Okay, welcome back everyone. We are live in Silicon Valley in San Jose. This is theCUBE, our flagship program, where we go out to the events, and then start to signal the noise. I'm John Furrier, the founder of Silicon Eagle. So my co-host for this week, George Gilbert, our Wikibon big data analyst, our next guest, Tim Hall, VP of Product Manager at Hortonworks. Welcome to theCUBE. Thanks, John. So you guys doing a great job. Rob Bearden up there, it looks like John Chambers. He's confident, he's smiling, couldn't shade. I think last year he had the flu or something. I remember him interviewing theCUBE, he was sneezing away, but he's got a spring to his step. Business is good, the Hadoop ecosystem is thriving, although we were speculating on the opening around some of the shifting natures of some of the technologies and analytics, so I want to get your take. Hortonworks is now a public company. You're out in the open everywhere. You see your developing code in Apache, you're out with customers, and you're also a public company. What's changed? What's the update on Hortonworks? Product-wise, is there any new things that we should know about? Sure, I think one of the big things that we're doing product-wise is a real focus on ease of use and simplicity. A lot of the early adopters that originally started this conference and attended the conference, they're familiar with command line interfaces and being down and dirty with a lot of the technical details, but I think we're approaching a whole new market segment in terms of Hadoop adoption and how that's being used in terms of analytics, and that crowd wants a simpler experience, and so we've been promising to put a new face on Hadoop and working on that new face of Hadoop within the community. We started by offering a views framework as part of Apache and Bari that we built last December, and what we're showing off at the conference this week is sort of the initial user experience and design work that we've been up to over the last six months to try to accelerate the community forward in terms of that simplicity and ease of use. So on the industry's reaction to ODP, for instance, has been really amazing, so George and I were talking about the impact of pure open source, we'll call it Red Hat, Horton works, if you will, and then now industry consolidation around these kind of de facto standards. ODP, you could call it de facto standard. Some people are playing, some are not. How has that changed your interaction since the announcement? What specifically has come out of that in the industry? I think a few things. First and foremost is we invested in these extensive ability points at various parts within the Apache ecosystem projects, and we did that not for us, but we did it for the partners. And I think what you're seeing in terms of ODP and developing around a common core is that those partners can take advantage of those extensibility points to mount their own technology and unique capabilities on top of Hadoop. So one of the things that- What are some of those examples? So for example, there's three extensibility points that we built within Apache Embarie, which is one of the key components as part of ODP. The first one was called Embarie Stacks. This is the idea of moving the definition of what components Embarie can monitor and manage and deploy out of code and into configuration. So essentially there's an XML description of what components Embarie will deploy, manage, and monitor. Partners can now take advantage of that on top of the common core, Hadoop at the bottom, Embarie managing along the side and mount their own unique capabilities on top and have that common experience for deployment management configuration. We're taking a page out of the Embarie book actually in the upcoming release of Apache Ranger and Apache Knox and providing stack definitions for those security products, such as you can now do a declarative authorization and declarative API management and gateways through Ranger and Knox respectively. I got to ask you about the data operating system, which for the record was coined on the queue by myself and Avi Metta in 2010. We have the videotape, so if there's any debate, Doug Cutting, whoever was arguing, like Arun was arguing with Doug Cutting about that. Let's go to the replay. Let's go to the replay. Definitely us. Debakastop, we have the tape. It's really an important concept, so I want to ask you a couple of questions on the state of operations, which by the way, we totally believe in, as you know, we're religious about where we see that. But it's complicated, an operating system has to operate, stuff has to have some subsystems and other elements. So the cloud is a huge enabler, so I got to ask you, you look at ODP, I brought that up earlier, the people involved, Pivotal, IBM, and a hell of a bunch of people in there, industry leaders, right? So are we stalled right now in analytics, waiting for the cloud to catch up because there's a lot of great apps out there that when they try to plug into the infrastructure, there's a lot of weird details like this systems management program. So for the app developer, a lot more software has to be written. Arun, Murthy was talking about yarn up on stage. So orchestration, DevOps, containers. So cloud's kind of catching up now and becoming that engine. So what's going on? Is cloud catching up? Has analytics slowed down? Is it, what's the dynamic there in your opinion? And none of the things speak for Hortonworks, but also your personal opinion as well. No, I think it's all continuing to accelerate, to be honest. And I don't know if back in April, we acquired a company called Sequence IQ. Now, why do we do that? A number of reasons. First and foremost, engineering beach head in Budapest, very talented engineering team. But second, they had invested in a couple of very critical technologies, particular for cloud-based deployment of Hadoop. And so one of the things we wanted to do was provide that to our customers. And really what we're looking at is the automated stand-up of Hadoop clusters in cloud-based environments. So to your point, we want to provide that experience to our customers, to the development community, to allow them to rapidly stand up and remove some of those goofy nuances that you mentioned so they can get started and build their next generation applications. What was the name of that company? Sequence IQ, acquired it in April. And now we're working through three pieces of their technology. One that we were working together with them in the community for DevOps, actually, called Embarie Shell. They essentially combined the Embarie REST APIs with the Spring Shell to provide a nice, easy-to-use DevOps console so that you can manage and monitor your cluster, add nodes, update configuration parameters, et cetera. They then extended that further and said, hang on a second, not only do we need a DevOps console, but what an automated way to stand up these clusters in the cloud. So they used the second extensibility point of Embarie that we invested in last year called Blueprints, which essentially allows you to define the configuration and deployment of your cluster and stand it up in a repeated fashion using that same Blueprint in a variety of environments. So the Sequence IQ team built something called Cloudbreak, which effectively uses the Blueprints API to launch clusters in the cloud. You essentially provide your credentials, you pick that cloud provider, based on those credentials, obviously, using those, you pick the Blueprint, you launch HTTP in the cloud. So in the data operating system model, what's the big building blocks and where are the white spaces that are being filled in either by the ecosystem, yourselves, and others? So the building blocks, fundamentally, are around a variety of things. We already talked about ease of use and simplification. When we're hitting this next wave of customers, it's all about governance, security, and operations. Operations has been there for a while, again Apache Embarie being a key element of that strategy, security. Again, we did an acquisition last year of a company called Xasecure to bring a common authorization model to all the components within the Hadoop ecosystem and HTTP. We're continuing to extend that investment over time. Now we're up to, I think, seven different components that Apache Ranger, which is where the technology landed within the ASF, can protect. So Ranger now can support authorization around HDFS, Hive, HBase, Storm, Nox, and we're just completing the work to extend that to Kafka, as well as Solar. And then last but not least, on the governance front, around the data operating system, the big concern is as customers are landing more and more data sets, they want to make sure they know what they landed in the lake. And so the idea is, how can we provide a classification scheme and discovery system on top of that so they can understand what's going on. Okay, so let's wrap this up in a little burrito of insight if we can. So the cloud is catching up, or analytics is slowing down. I mean, it seems to be, it has to work together. Obviously cloud is an engine. You see that, you guys acquisitions, speak volumes. So they're coming together pretty well. You see, accelerated coming together or? I think so. And I think if you look at what Microsoft has done around HD Insight, it's a cloud-based service with tremendous tools, backed by Hortonworks technology at the core, providing that analytics environment for a wide variety of customers. It's a great example of exactly what you're talking about. This HD Insight, they built their tooling, I assume, separate from, because they had to do it before your governance technology was there, the security technology operations. And I assume there's this tailored towards fitting into the Azure tooling. That's correct. So yours will be more cloud independent and then you can plug in ecosystem, other ecosystem components. Yeah, I think there's two different dimensions to it. First dimension is they're offering HD Insight as a whole service offering to their end customers. They also offer Azure infrastructure as a service, which is one of the targets that Cloudbreak can deploy HDP into. Talk about partnering. Talk about partnering, right? Because again, you run product management, which is a tough job when you're in the open, public stock, in the open, open source, and in a business model that requires a lot of partnering. That's right. So you're juggling a lot of balls in the air. What's going on? What's just the thesis? What's the guiding principle for Hortonworks? And what's going on in the ecosystem around partnering? Because partnerships are key right now. You guys have announced a slew of them. Give us the update. So partners are absolutely critical to the success of not only Hortonworks, but we were trying to fuel the whole ecosystem. So the idea is what can we do to work with partners to make sure that their technology can marry up and link to the big data sources that we have access to? Speaking on the governance front in particular, this is one of the challenges. This governance is not unique to Hadoop. It's a challenge across anything that you're trying to manage in terms of large sets of data. We had this back in the days of traditional database and even flat files when those were the storage mechanisms. And so one of the things we're trying to do is, A, solve the governance problem for Hadoop in particular, make sure that we're open to integrate with our partners who need to solve these challenges for the data center as a whole, for enterprises as a whole. And so one of the mindsets that we have around doing that is making sure we have open APIs and we're working with that partner community to ensure that we're not building a silo. So as a pure open source vendor in Hadoop, which you guys are at Hortonworks, give some examples of where that's worked. For the folks out there, and George and I were speculating in the red hat, there could be a red hat. You guys have gone public. You guys are the first ones to go public in that, in this world. What's success is, what can you point to specifically that you're enabling people to create wealth, create value for customers? So, well, one of the things to look at is the rate of customer adoption. We're really fueling, or feeding the hunger, if you will, for Hadoop. The last two quarters as we went public and announced customer acquisition. How are we doing sales-wise? Close 99 new customer logos in Q4. And normally Q4 is a typical high point for software companies. You've got your pipeline, close those deals. Q1 tends to be a little bit of a lull. You've drained it. You've drained it and you got to build that pipeline back up. We closed 105 new customer logos in Q1. So, 200 plus customers in two quarters is a pretty big leap. And so, we're starting to see the customer success in terms of using the technology and leveraging it in a wide variety of ways. I'm curious. Talk about the startups out there. You guys are anegosis and you're enabling through partnering, but a lot of them are scared to death that Amazon, Oracle, IBM, HB will integrate and build their feature. Because a lot of the startups can grow in, hey, I'm doing something with Spark, I'm Databricks or whatever, and all of a sudden the market can literally change overnight. Yes. I use Databricks as a random example because they're tipping my tongue. But there's a lot of other startups here that really uniquely specialize in one thing and then try to expand out. But now the winds are shifting pretty fast with cloud on coming on the scene really fast. What are you guys doing to kind of bear hug those startups and give them more juice, if you will? Yeah. So, a couple things. One of the examples that we were talking about before the show started was Apache Zeppelin. Consistent with what we're trying to do in terms of put a new face on Hadoop and this ease of use push, company out of Korea, NF Labs, essentially built an open source project called Zeppelin, Apache Zeppelin, they contributed to ASF and they're driving ease of use for Spark. So linking up to your Databricks suggestion. And what they're trying to do is bring this concept of how can I explore and access data from a data scientist desktop and leverage Spark effectively. Again, moving out of the command line and up into the browser. And it's simple to use and actually a tremendous user experience. Graphs, graphical representation of data, being able to change parameters on the fly. It's very, very compelling. So one of the things we're trying to do is encourage that kind of development and work with those companies that are contributing in those ways that are consistent with where we think we need the entire Hadoop ecosystem to go. So that sounds like you're putting now a GUI over what was a sort of command line interface. Sounds like for developers or data scientists. Is there something similar that you could do for administrators so that this tool chain looks more comprehensively integrated? Yeah, absolutely. So that's one of the things that we're trying to do with Apache Ambari for the operators. That's the original landing zone for the operations team. And now we're extending that out and saying, how can we build specific personas through the browser that you can manage through one central console? And so bringing those things together using the Ambari views framework, which is the third extensibility point of Ambari to drive that going forward. And even if you've built a user experience outside of Ambari, you can actually wrap it in the views framework and provide a consistent deployment and lifecycle management experience. So, and we've done that such that, we've got community users here that are using things in a standalone way. For example, Tez as an execution engine has an extremely rich debugging experience that we put together over the last year, but not everybody is deploying it using Ambari. So there's a standalone mode that you can use for debugging. And then there's also an integrated mode that actually allows you to deploy it coincident with a new SQL builder for Hive. Jim, talk quickly about Zeppelin. What is that about? We have one minute, I want to get a couple of key technologies that are hot out there. Yeah, so Zeppelin is the user experience for Spark that's been put together by NF Labs. What are some of the hot things out there, besides Zeppelin that you've seen that are excited about that's popping up? Again, you know, you look at what's coming out of the Berkeley AMP Lab, which was Databricks, a bunch of other innovations, Spark is kicking ass, taking names. But that's again, pushing the envelope. What are the new things going to come out? What's the next? So, Flink is another popular toolkit that we see emerging for the in-memory space. So streaming this, Flink, Flink, Flink. Yeah, there's a whole variety of use cases that it supports, but it's targeted in memory as well. I think one of the things is looking at the innovation that's occurring within the projects themselves, so looking at what's happening at Hadoop Core, in terms of being able to take advantage of additional resources. So as we look at things like non-volatile memory, becoming cheaper and easier to use, we're looking at innovations from HP and EMC flash-based, flash-backed RAM. I think that's going to change the dynamics a little bit in terms of how the software is going to ride on top of that and take advantage of those things. So while we can look at individual, you know, new components that are emerging in the ecosystem, it's also important to look at what were the original dynamics around hardware, data center, power and cooling, these sorts of things. Well, you tied two things together, which is potentially very important, the flink, which is if it's a more memory-centric model and then the hardware where you've got the speed of memory and the capacity of storage, tell us how that changes the applications. Right, well so that's one of the goals, is if you can innovate at the core and make sure that you can take advantage of the stream or the smear, if you will, of the resources from slow spinning disk to high-performance spinning disk to memory and make that totally transparent to the application, that's the goal. Awesome, Tim, we've got to break it to about the 10 seconds, describe the value of this event and what's going on in product at Hortonworks. Well, this is huge for us, for our partners and the community. We appreciate the opportunity to interact with the other members. You know, a lot of times you're getting the one-to-one correspondence. Here, we get everybody together and we get to hear about what's going on, their projects, their success, and I get to share that with everyone. And the product focus for you guys on the roadmap, what's the big guiding principle? Again, ease of use, simplification, enterprise readiness. That's what we're focused on for all of 2015. Tim Hall, the Vice President of Product Management, Hortonworks here inside theCUBE, live in Silicon Valley, on the ground for three days of wall-to-wall coverage at Duke Summit, 2015, we'll be right back at this short break.