 At Big Data SV 2014, is brought to you by headline sponsors WAN Disco. We make Hadoop invincible, and Actian, accelerating Big Data 2.0. Okay, welcome back everyone. This is Big Data SV. This is Silicon Angles, the CUBE, our flagship program. We go out to the events, extract the signal from the noise. I'm John Furrier, the founder of Silicon Angle, and we are live in Silicon Valley for the Silicon Valley Big Data event that we're putting on. We're also covering the Stratoconference which is going on right across the street, covering all the news. This is now the fourth year I think we're in now. John, it just seems like a decade. My special guest I'm proud to have on is John Kreis, who's the CMO or VP of Strategic Marketing now at Hortonworks, the Big Data player. They're part of the trio of the Big Three making it happen on the distra side, and who's been with the CUBE ever since we started going back to the Cloudera days. Happy to work with you guys. Great to have you on. And our guest is Aaron Kelly, who's the general manager of the SQL Server Product Marketing Group from Microsoft, which has been in the news. Big spring in the step there. Great leadership change there. Gates is coming back, got the founders still around, and the company's pumped up. We've covered that extensively at Silicon Angle. Welcome to the CUBE. Great, thanks a lot, John. Thanks, John. So Microsoft's got a spring in their step. Obviously, that's been the buzz on the tech business. Huge change. A great breath of fresh air and relief to the CEO who's got to act together and knows what's going on. You guys have been doing great work. So we were going to talk about Big Data. So talk about your role with Hortonworks and Big Data. What's new? Tell us what's happening, and some of the things, and we'll get into some of the announcements coming down the pike. Yeah, sure. So really, the partnership with Hortonworks goes back about 18 months, and it's been a great partnership where Hortonworks has really helped us focus on Hadoop as a great distribution and a great platform for many of our Big Data solutions. And together, we've contributed a tremendous amount back into the community. And so it's one of the first examples where Microsoft is putting IP back into the open source community. Over the last 18 months or so, we've contributed 25,000 lines of code in almost over 6,000 engineering hours back into Hadoop. And it's manifested itself in HTTP on Windows. It's manifested itself in HD Insights, which is our Azure solution with Hadoop. And the reason why the partnership works so well from our perspective is Hortonworks really focuses on Hadoop and making Hadoop great. What we're focused on is how do we help customers get data out of Hadoop and deliver it to their end users. And so Monday was a really big day for us, frankly, when we announced the general availability of Power BI, which is a great BI service that allows end users to easily grab data off of Hadoop, manipulate it, and great visualizations, which is part of our vision of how do we bring and help the community bring big data to a billion users. Well, one of the things that you mentioned, the open source that give Microsoft a lot of props is obviously big developer focus, big innovation focus. You guys own the enterprise, your earnings were fantastic. But at the open compute summit just two weeks ago, we were there live with theCUBE. Microsoft announced huge donation of IP and server specifications in the cloud. So this is the trend. This is not a one-off thing. We're seeing Microsoft getting that DNA You feel the same way? Is that true? Is that a true statement? No, it's definitely part of the strategy. In fact, tomorrow we're kind of doing this early, but tomorrow we're gonna announce some new contributions to the open source community. For one, we're gonna announce the availability of Hadoop 2.2 in Windows Azure as part of HD Insight. We're gonna have a new preview of that service. So it's a great way of getting a platform as a service version of Hadoop. Hadoop 2.2 supports Yarn and some of the new initiatives there. Just to get technical. Just be specific on the announcements. So you're announcing tomorrow 2.2 for Azure? For Azure, yes. Azure, okay. And so that was in partnership with Hortonworks bringing that to market. What we're also gonna announce is Stinger Phase 2, which is a great example of where Microsoft's taking an RIP and pushing it back into the open source community. And so what Stinger Phase 2 does is it basically, we went into our data warehousing technology as part of SQL Server today and pulled out some of the query optimization engine work and some of the compression technology and are now making that available. And John and the Hortonworks team have been testing Stinger Phase 2 in their labs and they're seeing 40X performance improvements on query with Hive. It's pretty staggering. What are some of the implications of that announcement? So platform as a service is really a hot contested area in the cloud. That's going to be kind of a middle layer battleground if you will. And some people are gonna take more of a proprietary approach of saying the bigger legacy vendors, but with open source, what is the angle there? Because there's pressure from both sides. I want to have some differentiation, I want some openness for innovation. Can you talk more about what's that gonna enable? Yeah, well so the way we look at it, and again this is very consistent with how we've been working with Hortonworks is we're very focused on Hadoop as 100% Apache. And all of the innovation that we build, we want to put back into the community so that it's consistent. What Azure represents is the ability to consume that as a platform service. So you don't have to worry about patching and maintenance. You just turn it on and it works. And then you can focus on the higher level elements of building your applications or doing your analysis. And so there's a whole notion of, a few clicks and I have a do Hadoop cluster running, that's the value of Windows Azure and that's the value of HD Insight. And because again it's based foundationally on the same technology that's core to Apache Hadoop, you know that it's, you know what it is, right? And you can know you can use it. It's not something that you have to, we haven't forked the tree, we haven't done something different. It is the core technology. It's a good, you know, I love the open source quotes. The, John, I want to go ask you a question because you know we've been talking, you've been on theCUBE many times. I'll see you guys run Hadoop Summit, which is a very technical developers conference. You've been going on and on for days, well or versus that. This is a fruition of some hard work. Can you give some insight into, you know, pun intended to where this has come from and kind of where it's, how it's flowered up? Sure. Yeah. So, I mean, our strategy remains the same. It's one of the places where we've been very well aligned with Microsoft in terms of always contributing everything back and making 100% open source distribution. So, we've been focused on that and that's been one of the great things when we endeavored to bring Hadoop onto Windows. We've been aligned around pushing everything back into the open source community. And that's continued to build in terms of momentum. It's been a great way to align with Microsoft in terms of their overall strategy, bringing Hadoop to just first Windows, but then to the cloud and Azure. And I think that's something that customers are really starting to kind of recognize and rally around and build momentum in terms of, you know, this is the right strategy. This is the one that we believe in. You know, we know that it needs to stay open source really for Hadoop to be successful and we continue to work with that. And Microsoft's been a great partner supporting us there. So, Stinger, Yarn, I see integration. That's kind of key, right? That's right. That's right. And Yarn is the data operating system for Hadoop. It's been the thing that's really enabling the next generation of applications, multiple workloads on a single Hadoop cluster. It's really going to be the next phase of Hadoop and really expand and explode the way that Hadoop can be used for different workloads within the enterprise. Yarn's a big topic this show. You're hearing a lot about Yarn. You know, what's your take on that? What's the state of Yarn right now? Yarn is a maturing technology. It's out in Hadoop 2.0. Now it's in Hadoop 2.2 that Microsoft's bringing in and of course in Hortonworks data platform really driving the next generation. It allows different technologies to really integrate natively and use the resources within the cluster more effectively. And Yarn talked about the fact that we're seeing 40, 50% higher performance on things like just queries which is related to the Stinger project, but also overall platform and cluster utilization. We're seeing big enterprises being able to reduce, in some cases, the number of nodes that they have to use to run the same workload. So it's a very efficient framework within Hadoop. So 104 days ago you said the following quote, this is our big data, our power BI. Yeah. We go back to 104 days ago. You were quoted by Gene Deems out there, Gene Deems, Microsoft plans to bring big data to a billion users. That's right. Okay, so I mean, how do you feel about that? I mean, we're now the next strata, 104, five days later, this is one step in the direction. Has there been a little bit of shift in the vector? Are you guys vectoring right on plan? Has there been any shifting of wins at all for you guys? Have you adjusted, made any changes or tweaks? No, I think the strategy and the vision statement still holds and in fact we're just really building momentum towards that. With the release of Power BI on Monday, it does make it really, really easy for any user to get access to data on Hadoop and start to do analysis. We've got great customer examples now of customers that are doing just that. The city of Barcelona is one of my favorites where they're collecting Twitter sentiment and they're measuring and connecting and correlating their Twitter sentiment for citizens based on festivals with the availability of different resources like are the buses on time? They'd have a restaurant they ate at, then look at the sentiment and then look at the festival. All that stuff, right? How late they stayed out in Barcelona. Exactly, you know, well, no, no. At 2 a.m. There's actually a great example of where a concert ended, it was 2 a.m., people went to the bus stops, the buses were gone, they started tweeting about it. They saw that the buses were not available and they rerouted buses back to different stations so they could pick up citizens who were complaining over Twitter. And so it was a great example of using Hadoop and HD Insight in the cloud with Power BI to connect that data back to decision makers who were deciding where do I route buses and let me add some extra routes. When you guys bring up a good point, this brings up a lot of the key conversations that are on the strategic roadmap. John, you probably run this all the time with your customers you talk to. And I'm going to get your take on it and that is really looking at data differently, right? Having a platform that enables discovery, maybe some structured discovery in terms of whether it's some sort of other database or governed discovery as we were just talking about earlier. But I want to get your perspective on the concept of data fusion. Data fusion is happening as a big part of mashing up data together, doing it in either on a directed basis on advanced analytics or dealing with a massive ingestion. So given that this is the environment, it's never going to stop. How do you guys look at what data fusion? What does that mean to you guys? How does that relate to, say, businesses and their opportunity? Yeah, I mean, I'll answer and then certainly Eric will weigh in. I think one of the kind of killer applications of Hadoop is the ability for it to ingest just very broad sets of data across a wide range of sources, new sources of data that organizations weren't able to take advantage of before. And so this enables a whole new generation of analytic applications that they couldn't do before where they might do some of that analysis in Power BI or other tools like that. But it's simply an area where they didn't have that capability, it wasn't cost effective for them to really bring that data together and it was considered, in many cases, exhaust data. Like, I'm just going to throw this away, let it drop on the floor, and now, hey, there's gold in those hills. They're looking and able to really exploit that. I think the city of Barcelona is a great example of that kind of application of this data and you're going to see many, many more. Yeah, it's really an interesting time. When we sit down and talk to customers about big data in BI in general, the real value comes with, let me take the data that I'm already familiar with and I own in my relational systems, let me combine it with external data, let me combine it with data that normally was left on the floor and that's where the new insights come from that can drive real change or new business opportunities. We even have a fun example that we've talked about in the past, but it's just fun to bring up even within our own team with Halo, the ability to now use sedub to analyze gameplay. There's these giant Halo contests now that are happening and we're able to identify cheaters when it happens by interrogating the logs and doing analysis on Hadoop. So there's whole new things that you weren't able to do before that you can do now and I think that our vision of Microsoft is how do we bring that to everybody and the way we bring that at everybody is through services like Power BI and Excel because there's a billion users with Excel today. Well you guys are the big gorilla in the sequel market. I got to ask you this question. Obviously we're seeing startups kind of like Dave and I were going back to gen one of Hadoop, right? Now we're down at the point now where you're going to start to see people dropping out, tapping out of the race. Some startups going to get that B round, C round, some maturity on the ecosystem, certainly on the innovation side, but as someone, there's a sequel on Hadoop has obviously proven itself to be legitimate, right? You're seeing the use cases out there. So what are the challenges for the startups to try to do sequel on Hadoop and what are some of the trends that they could either go into if they're not going to make it? Yeah, so I guess the way we think about it, and this is maybe take a step back for a second, we generally believe that where data is born, that's where you're going to store it most efficiently at lowest cost. And so some data will be born in a relational environment, part of a line of business system, some data is going to be born in Hadoop, some will be born in Blob storage, some will be born in JSON, but ultimately where we see the market going is having the ability to leave data where it was born and then interrogate it and query over all of it. And so from our perspective, Hadoop is a critical component of all that, but it's not the only piece of technology that you're going to have in your environment. And so if I look at our data warehousing appliances, an example, we have technology called PolyBase that allows you to use a T-SQL query over both relational store and Hadoop. And so it's an example of using technology to go after data where it's born, it's lower cost for the customer, and it creates these new insights that John was talking about. And that's why, again, I think the partnership works so well with Hortonworks is we're laser focused on one element of making Hadoop as best as it can, but then we're not confused that there's going to be lots of other data technologies around it, and customers that can embrace and tie those together are the ones that are going to create new insights most efficiently, and that's where we're focused. Yeah, you know, I wrote a blog post in 2008 before I started SiliconANGLE, and it was on my other blog, and it was called Data is the New Development Kit. And it was kind of a radical post at the time, but the thesis was data is what people will develop on. And if you believed open source, then data is freely available. And we're kind of seeing that play out, and you guys, essentially, this announcement talks about having data available and let any app work on it. So, you know, that's kind of nice, but I'm trying to connect the dots now going forward. Data is now fundamentally seeing data mashups and structural change on the infrastructure and in the open source community now enabling rapid, rapid iteration of data. What's the next trend that you guys see to take it to the next level? Is there a mega trend you're watching that's emerging the next wave that you possibly see kind of forming that you guys see you riding the surfboards on? Well, I think there's still some legs in SQL access to data that's stored in Hadoop. And I think that the more that we can enable the existing skill sets that are out there, which is partially what that's about, right? You've got educational. Yeah, and you've got millions and millions of users who have been trained on SQL skills and know how to access data in that paradigm. So, the more that we can make that the same paradigm to access these new kinds of data, the more benefit they'll get out of it. So, there's still plenty of legs on that. There's still lots of work to go on. There's more work we'll do on the Stinger Initiative with Microsoft and others. And so, there are definitely phases to go there. But going forward, I think, just, and I'll go back to the kind of yarn initiative. If I say that multiple workloads, so whether it's plugging in other kinds of predictive engines or graph-based engines or what have you into that framework and utilizing that data as a big pool of resource, whether it's through PolyBase or other technologies which will enable that, that, I think, can be one of the things that's really going to continue to carry Hadoop forward as we go through the next week. So, workload agility you see is a big part of the mix. Absolutely, because they're getting the data and they're starting to collect it in a data lake or whatever term you want to use to call it. I call it ocean. Data ocean, data lake, you know. Tomatoes, mama. That's a big data race. Right. But it'll be in some form or fashion, but we see wanting to get multiple access points into that. I'm streaming data in using some technology. At the same time, I want to query that, do some predictive analysis, feed it into a recommendation engine. All of that needs to be in kind of one integrated system and that's really what yarn kind of enables. So I think really we're right at the precipice of a whole new future of utilizing Hadoop and the data Hadoop. What I would add to that, I agree with that, and what I would add to that is, well we're seeing a lot of examples of these hybrid environments where some of the data, again, is born on-prem and it's more efficient to stay there. Some of the data is going to be born in the cloud and it's more efficient to keep it there, but I want to do analysis over all of it with one query. And so that's an area where I think we see a lot of, a lot of innovation, a lot of investment, and a lot of interest from certainly our joint customers who say, yay, for some scenarios, I want to run HTP on-prem on Windows from Hortonworks. In other cases, I want to use HT Insight in Windows Azure, but the data is part of an integrated solution. It's chaotic yet available to be worked on. Well, the data has gravity, right? So I mean, as you said, people don't want to necessarily move it back and forth between those infrastructure, so where it's born intends to be where it lives, but that doesn't mean just because data is born in the cloud, I don't want to merge that with some data that's on-prem. So the hybrid environment I think will be, making that a stronger, tighter integration will be one of the keys, I think, going forward as well. I know we got to wrap up, we're getting the hook here, but I want to let you guys go down and share some of the things you're excited about. Obviously, you guys announced a strategic extension to your partnership with Red Hat. You guys got an announcement coming on tomorrow. What else are you guys excited about, just in your business, the landscape? What's exciting you right here? I mean, I'll start, but I think it's just, it's incredible interest. This, the explosion of interest in Hadoop and the technology and the exciting new ways that customers are using it. We just see that as a major kind of initiative going into 2014. The activity that we're doing with our partners, they're deeper and stronger than ever. It's such a core part of our strategy that we're super excited about the potential that we have with them out there. So, 2014 is going to be a fantastic year for us and for our partners. Yeah, just echoing in that, 2014 is really, really looks good in the sense that, not only is the market hot, but customers are really getting very, very interested in putting this technology to work. I mean, that's- We have one guest that's got a lot of budget being allocated to show. First time in strata they've seen budget being allocated here at Big Data SV. Yeah, no, no, the budgets, when I talk to our salespeople, you know, the momentum we have right now in the SQL Server business is phenomenal. The momentum we have on new services like Power BI in Azure is just phenomenal. And 2014 is going to be an amazing year because customers have dollars now, they have interest, and they're starting to see, how do I use this technology to derive real business value? And that's great for everyone in the ecosystem. Hortonworks and Microsoft, great partnership. You know, flowering out and growing. Beautiful fruit coming off the tree, as we say. Meat on the bone, whatever you want to call it. It's good stuff happening, good structural change in the business, great growth. Big Data is still smoking hot this year. Congratulations. And thanks for coming on theCUBE, really appreciate it. This is theCUBE, we'll be right back with our wrap up here. Big Data SV, budgets are being allocated. Big Iron to Big Money, Big Scale, Cloud Mobile, Big Data, we'll be right back.