 The Cube at Hadoop Summit 2014 is brought to you by anchor sponsor Hortonworks. We do Hadoop. And headline sponsor WANDISCO. We make Hadoop invincible. And welcome back everybody. I'm Jeff Kelly with Wikibon. You're watching The Cube. We're here live at Hadoop Summit in San Jose. I've got a really interesting segment coming up here. We've got kind of a heavy weight of tech, Microsoft joining us along with Hortonworks of course close partners. We've got Aaron Kelly, who's a GM, data platforms at Microsoft joining us along with Longtime Cube alum John Chrysler, VP of Strategic Marketing at Hortonworks. Welcome guys. Thank you, Jeff. Welcome first time and welcome back, John. Thanks. So John, let me start with you. Obviously, big day today. We watched the keynote and some good stuff from our friend, Mervadrian, Rob Bearden. What's your feeling this morning? It's great. Pretty excited and maybe a little tired from all the planning. We're super excited. We've got plenty of energy to keep going. It's a great start for the show. As we said, over 3,200 registered, so fantastic growth, support back from the ecosystem, from the community, from folks like Microsoft in terms of supporting the event. So we're very happy with how things are going. I think generally the community and everybody should be happy with really how people are coming together with a lot of energy. Yeah. I mean, the energy here is great. Obviously, the show is growing. What were we at last year? About 2,500. About 2,500. So growth continues. Some really good stuff. Let's dive into the relationship with Microsoft and Hortonworks. We've covered this in the past and we know that Hortonworks and Microsoft work very closely to bring Hadoop to Windows. Aaron, let me talk to you a little bit about what does that mean for Microsoft customers and why that was so important? Well, what's really important for us is we looked at the data landscaping that was evolving. Hadoop was going to be a cornerstone of many enterprise implementations. Being able to give customers that option of running HTTP on Windows in their environment, Windows is the predominant operating system for IT teams, and so being able to use Windows tools to manage the environment, if that was important. The other thing we were very focused on is how do we make it really easy for customers to start using Hadoop in the cloud? And so again, this is where our partnership has come to fold, where with HD Insight, which is our Windows Azure solution for Hadoop, it makes it very, very easy for customers to spin up clusters and get started on Hadoop. We're seeing patterns very similar to what Merv talked about today in his keynote, where you see a few clusters coming up for DevTest, then moving to pilot projects and then full on production as customers are really starting to take advantage of that easy on-ramp to the cloud. But what was critical is we wanted to ensure that that wasn't some sort of proprietary distro of Hadoop. We really wanted it to be consistent with what Horton's work was doing, 100% Apache. So customers' applications that may have run on Linux or run on Windows on-prem or in Azure can be consistent, and that's very, very important. We believe long-term, and it's also why we've invested a lot back into the community. We'll talk about a little bit why that is so important. I think some of our viewers may know, Microsoft kind of had their own kind of competing big data framework, which you kind of abandoned to put all your effort into Hadoop when it comes to big data. Talk about why that openness is so important, and kind of how that fits with your relationship with Hortonworks in particular. Yeah, no, it's a great example if you go back to Jaya, which is the Microsoft technology. And when we looked at it at the time about 24 months ago, it was, hey, do we want to go down the path of a proprietary solution and try to build an ecosystem around it, or do we want to embrace Hadoop? We decided to go in the direction of Hadoop in partnership with Hortonworks so that more people can use the technology. Because ultimately, we see it as part of a building block, and the more access that customers who are familiar with, say, SQL Server and like writing T-SQL, being able to access data in Hadoop through our tool set we thought was really important. When it comes to the user experience, and Ranga talked about this in the keynote, where we think of the data platform, we think of analytics, but we also think a lot about those users. It was very, very important to us that every Excel user on the planet have the ability to pull information from Hadoop so they can start to interact with it in a richer way. And going on our own direction would have made it a lot more difficult to build that center of gravity. Now, because of the partnership, it's very easy for customers right out of Excel to pull the results of a MapReduce job into Excel. Well, you've got a billion plus Excel users out there. So that's obviously a, could be a gateway for Hadoop, to bring that into the enterprise. John, talk a little bit about that and the importance of that relationship with Microsoft and how Excel essentially kind of opens you up, opens up Hortonworks to this wide base of users. Yep. It's been a great partnership. Kind of go back to your original comment as well. I think at this stage, it's one of our longest partnerships that we've had at Hortonworks and certainly easily one of our deepest, if not the deepest partnership in terms of the work we've done together, engineering, product management, really to first bring Hadoop to the Windows platform, but then ultimately to begin to open up to all of the different tools that Microsoft has because there are so many users that really that's their traditional way of accessing the data. And we know for Hadoop to be successful, you have to really make it so that they can access the traditional tools and skills that they already have. So that's what makes this a great partnership is there are so many skills and tools out there in both just in SQL server and in Windows technologies and in Excel and kind of the more front end user technologies. It's super important for us to make sure that we enable usage of Hadoop in the mode and with the skills that they already have. So that's really where this works out great kind of synergy between the strategies of what Microsoft's working on and what we're working on. So you know, speaking of kind of using some of the tools and techniques that people are familiar with SQL, of course, being being one, talk a little bit about the Stinger Initiative. So you've kind of completed that within the last couple of months. For those out there who don't aren't familiar with it, tell us a little bit about the Stinger Initiative. Sure. I'm going to talk with Aaron about kind of the commitment of the contributions I should say that Microsoft made to that. Sure. Absolutely. I'll just briefly, the Stinger was a 13 month initiative, as you said, closed about three months ago to make Hive, the de facto SQL access engine inside of Hadoop, to run a hundred times faster. And it had a bunch of different components in terms of how we're going to improve Hive, replace the underlying execution engine, as well as, you know, make improvements across the board that we're going to make Hive much more fast, as well as improve the SQL semantics that Hive provided out to the end user. So a lot of different community members participated in the ecosystem, participated. Microsoft played a substantial part in terms of how they helped us make Hive deliver on that. And actually, the Stinger Initiative is now delivered, and Hive has actually achieved some of those goals that we had in terms of making it faster. So Microsoft is a major contributor, I guess. You want to talk a little bit about that? Yeah. I mean, it's, again, it's been a quarter of our strategies. How do we make customers who are familiar with the existing paradigms, we've talked about Excel, that's sort of the obvious one, but the T-SQL as well, it's another one where there's a lot of DBAs out there and how to write a T-SQL, and allowing them to have access to data in Hadoop is a really important part of the strategy. It's very congruent with what we're trying to do. And we realized that the best way to get scale out of that was to push it right back into the contribution. So since we started working with Hortonworks, we've contributed over 30,000 lines of code and 10,000 engineering hours to make this happen, including the core query engine that led to the performance that John was talking about, about 100 acts. So that's been a very, very important part of the strategy. And ultimately, we believe we can really help bring Hadoop to make more mainstream, not only to end users as most people think about, but also to all the DBAs out there that no one loves SQL. So as Hadoop becomes more mainstream, talk a little bit about how Hadoop fits with the rest of the Microsoft portfolio of data management technologies, you've got SQL Server, you've got Excel, Power BI, you've got other tools out there. Where does it fit in your kind of hierarchy of data management tools from a Microsoft perspective? Yeah, so we think of it as a great complement to some of the technology you've already described. If you start with the user, we think of Excel and our new Power BI service as a way of delivering great value and bringing BI to every user, we like to say, bring BI or big data to a billion users. So that's sort of that user stack and being able to interact with data both through the tool they know in Excel as well as application types experiences. Then there's an analytics layer, which of course Hadoop plays a big part of that, so a lot of the analytics layer there, as well as SQL Server, a classic column store for data warehousing and other layers. And then the platform where the data is stored, and this is where focused on a hybrid solution where you can run this infrastructure on a Windows environment, on premises, or you can take advantage of it in Microsoft Azure and the cloud. And so having this hybrid core platform to run the infrastructure is that third layer of the stack. So we very much see Hadoop as, again, a great compliment to SQL Server as part of that analytics layer. So I'm curious, there's been a lot of talk out there about the relationship between the data warehouse and Hadoop. Do you see Hadoop encroaching on the data warehouse in terms of some of the workloads? I think it's pretty definitively been decided it's not going to replace the data warehouse, but I think there's certainly some overlap. And in some cases, you're going to be kind of fighting for some of the same dollars as a partner like Horton works. How does that tension, how do you live with that tension inside? Yeah, well, so we see it as a great compliment. And I thought, again, I'm stealing from Merv this morning. And he talked about how last year, 2012, there was a percentage of people thought that had replaced the data warehouse and that's actually dropped and cut in half over the last year. And so we see that similar thing where customers think originally looked at it, hey, this might replace the data warehouse and they've looked at it and they've worked through their scenarios and then they've come to realize this is a great compliment to the traditional data warehouse. I think if we look at the latest release of SQL Server 2014 as an example with our updatable in-memory column store, I mean, the kind of performance and the compression that you get in that kind of environment is so, so high that it's a great compliment to the large amounts of data that I'm still waiting to see what the value is, but I'm glad I have it and I do. And so we see it as a great compliment. And yeah, maybe some scenarios may go one way or another. But ultimately, our goal is to deliver, if we're going to bring big data to a billion users, then we got to make sure that we can connect to all kinds of technologies and support customer choice on how they're going to consume on-prem, in the cloud, structured, unstructured. It's important to support it all. So let's get right to some use cases. Who's actually, who are some joint customers? What kind of traction are you getting in the market, going to market together? Maybe could you give us some examples of some of those customers and what they're doing with the combined Port & Works and Microsoft portfolios of tools and technologies? Sure, I think one of the neat ones we talked about today in the keynote with the video was Virginia Tech. Just incredibly excited when you think about some of the statistics. A few years ago, it was $100 million to sequence a geome, now it's $6,000. And so it's amazing how just Hadoop and this access to low-cost compute and storage has really accelerated their ability to basically come up with cures for cancer. That's really what they're focused on. And they were even describing before they started using Hadoop in Azure, it would take them two weeks to sequence one genome. Now they can do 100 in a day. And think about how that changes the way you not only think about your research, but in fact how you might, in a life-and-death situation, react to a particular patient that's coming in. Hey, if I can sequence your genome here in an hour or so while you're being tested, I can get a whole new set of data on how to better treat you as a patient. So it's amazing how the partnership of bringing again, of actually Hadoop, the delivery in the cloud so it's easy to access, has transformed things for Virginia Tech. Very cool story. Yeah, I think another one is a company called Zermed. It's based in Kentucky. It's, they do medical billing for healthcare providers. So great example and there'll be some case studies and things coming out very soon on this. A great story where they're collecting all that billing data and then using Hadoop to analyze that data and then provide it back through to users who can use it in a familiar interface. And one of the reasons they're using Hadoop on Windows and providing it back to those users is for skills. It's to leverage the skills that they already have. It's a great story of being able to collect data that they really weren't able to efficiently collect and utilize and analyze before. Now they can do it inside of Hadoop because of the kind of scales and volumes that they're looking at and the nature of the kind of the billing data that they're collecting in terms of searching through there for billing codes and all the different things you have to search for. So it's fairly unstructured data. And Hadoop is providing that platform to build an analytics system that they can provide back through to their customers so they get insight into what the billing patterns are and get some additional information. But great use case and it's all about the skills that they have available and how they wanted to leverage and they were a window shop. They wanted to make sure they stayed on that platform. So tell us a little bit about the support arrangement between Microsoft and Hortonworks. For those customers you just mentioned, is it a single source support that they go to Microsoft or they go to Hortonworks? How does that kind of work? So it depends what they purchase. So if they go with HD Insight in Azure and they come to Microsoft as the sort of one throat to choke if you will or single line of support and then we tier three back to Hortonworks for key issues. If it's HDP on Windows, they go to Hortonworks because they've bought the HDP product. And what have you done to kind of incent the Microsoft sales force to go out and resell Hortonworks or HD Insight? How does that, was there a lot of work that had to go into that? Because we talked a little bit, the market or I should say the price dynamics between something like Hadoop and something like a more traditional data warehouse are significant. How does Microsoft go ahead and actually kind of incent the sales teams? Yeah, sure. So again, in most cases we see it as a compliment or a complimentary installation. If anything, it's growing deal size for some of our sellers where they're able to, hey, sell in SQL Server as a data warehouse or as an OLTP engine and then compliment that with an HD Insight subscription in Azure or when you land HDP on Windows, of course it drives Windows server licenses. So that's a bonus for our sellers. So for the most part, our sellers look at this as an augment or something to add on to the deal as opposed to a replacement. Because again, in many cases customers have a data warehouse today, they're continuing to use it, they're bringing relational data into that data warehouse. But what they want to do is augment that with unstructured data that can be born and analyzed in Hadoop and that connection is actually really important. In fact, we just released about a month ago a new appliance called the Analytics Platform System that brings together Hadoop Region and a in-memory column store into a single piece of iron. And so we call it big data in a box. And it's a great example where field sellers love it because they can land at the world record holding in-memory column store in the same region and a similar adjacent region I've got basically HDP on Windows and it runs together. I can query across both using PolyBase. It's a very, very powerful solution that brings together the best of both worlds. So we're excited about it. I think it's significant too because the combination of the data, and that's what you really see where organizations are getting additional value by combining some of the data they were storing in their traditional system with the new types of data that they're collecting in Hadoop. That ability to query across those things and join that information to get new insights. It's actually pretty powerful capability. So yeah, dig into a little bit more. We've got time for just one more question, but I'm curious, how deep is the technical integration and how close to the engineering teams of Microsoft and Hortonworks work together? I think our engineering teams probably have a, is it a weekly call? Yeah, so it's very deep. Yeah, they work very, very closely together. We have a quarterly all day sync up in Redmond when the teams get together, but I think everyone's on a first name basis, probably texting people back and forth. I am, it's a very important relationship. And again, because the strategies are so complementary, Hortonworks wants to bring 100% Apache Hadoop to every user, and we want every user to be able to take advantage of big data analytics. So it's a great partnership. I think that's a great place to wrap it up, John. Aaron, thanks so much for joining us on theCUBE today. Appreciate it. Stay tuned, you're watching theCUBE at Hadoop Summit. We'll be here all day today and for the next couple of days. Stay tuned, we'll be right back. Great. Thanks, chef.