 Okay, we're back here live in New York City for SiliconANGLE.com's coverage of theCUBE at Strata Plus Hadoop World. This is a big data week, a lot of action we're going to get right into the interview. This is theCUBE, our flagship program where we go out to the events, extract the signal from the noise, get the signal, share that with you. I'm John Furrier, the founder of SiliconANGLE.com and I'm joining with my co-host. I'm Dave Vellante of Wikibon.org, open research, free research, go there and check it out. And we're here with Saptek Sun, who is the senior product manager at Microsoft for big data. Saptek, welcome. Thank you so much for having me. Yeah, you're welcome. Good to see you again. Everybody wants to know what's going on with Microsoft. You guys were sort of the last of the big whales to sort of indicate your intention in Hadoop and you've done that. And so you've cleared up a lot of the questions last year and so we're happy about that. So thanks for doing that. So why don't you give us an update on what's new with Microsoft and what's happening? Absolutely. First of all, we made some announcements yesterday. So we announced the preview of Microsoft HD Insight Server, which is the on-prem version of Hadoop on Windows, optimized for Windows. It's totally 100% Apache Hadoop. So anybody who's using Hadoop today should work straight out of the gate. We also announced the HD Insight service for Azure. So that's the, you know, you can set up a HD Insight cluster pretty easily on Windows Azure and be able to mount Azure storage, pull in data from there, do your analytics that you need to do and that's also 100% Apache Hadoop. And you're using Hortonworks HTTP, right? You have a deal with Hortonworks. Yeah, so yeah, we are working very closely with Hortonworks and partnering on the engineering front. We are doing a lot of the optimization for Windows and Azure ourselves and we are submitting those patches back into the Apache tree. So tell me about Microsoft's role and Hadoop, I want to just clear that the air here. Obviously Microsoft's got a lot of bashing going on at Windows 8 right now and everyone likes to kick Microsoft when they're down, but you guys are doing some pretty good work. Obviously the Excel demo we saw was the talk of the show yesterday. We're going to have those guys come on. We made a slot form at 1030. So if you want to come back at 1030 and check that out. But talk about the history of, you know, it's no stranger to big data. Yeah, you had to. So just clear up for the record, you know, Microsoft's role with open source and big data, specifically. Specifically it means big data is kind of ambiguous term, but that's why we love it. It's great for our media business. Great for our research too. But if you think about it, we have been doing big data forever. We have some of the largest web properties on the planet with Bing, with Xbox Live. You can think of it, it's a super-charged social network of course. You have Microsoft.com, which is, I think one of the top 10 websites on the planet. So you'd think that we need a lot of capability in terms of scale-out infrastructure to process a lot of data. And we do have that infrastructure, and like you rightly said, we have Cosmos and Dryad around that. But we also have that experience to build a scale-out infrastructure. That's what is so cool. On the other hand, we have deep experience with the enterprises. It's hard to name a... Yeah, you guys own the enterprise. Whether it's Office on the end user side, or in the data center with SQL Server, certainly you guys rock and roll on that. No problem. But bridge the gap here. Obviously Microsoft's known for their developer community. And developers now shifting to open source. You have run WebScale, and Hadoop started at Yahweh where we talked about, the WebScale guys were the first generation to understand and build Hadoop except for the NSA, which we had squirrel on. They had their own little private version of big data. But so just talk about how you bridge that gap from the new order of the new world, the modern world of Hadoop to the old school Microsoft developer community. Because essentially HD Insights fits that. You guys have that product. So explain how that fits in, and what it means to developers. So HD Insights or Hadoop in general provides something great. It's a very robust distributed storage platform as well as a computation framework in terms of map reduce, for the timing at least on top of it. There are other computation frameworks out there for distributed computing. And we already have been doing in our HPC space with that and some other areas with our MPP database in parallel data warehouse and so on. So, but we think that Hadoop is a very important part of the overall big data platform, so to say. But it's not the be all and end all. You have, even today, means 80% or somewhere around that range, data sits on relational databases, or people need to build cubes to do kind of analytics that they want to do to have that interactivity from the end user BI tools. They have to deal with the streaming data and that's where complex event processing engines become important. Then, of course, that's the source of the data, but at some level across all these different data stores, you need to do enrichment. You want to do discovery and recommendation. And that's where I think one of the greatest value is and that's in the industry that's the missing piece where that layer works well with the data source or the data store layer. And we are super focused on that. HD Insight and Hadoop provides, of course, the raw storage and the MapReduce computation layer on top of it. But beyond that, how do you enrich that data to be and do discovery and recommendation? So let's take, let's break this down. So I just want to understand this because obviously the big thing about open sources is you can bring data to the table and data mashups is really popular. So you've got Windows Server out there and you've got Azure. How do I bring data into the cloud without paying through the nodes? Because right now, most developers are doing bare metal because it's a lot less expensive. You may or may not have an answer for this yet, but I want to ask anyway. One, I'm provisioning my own hardware and bare metal, I'm standing those up, servers up, commodity gear, scale out, all great. How do you guys provide that insight? So one of your elements is data enrichment and insight and HD insight. So how can I leverage and develop on the top of the stack without using Azure or doing anything else? So in the first part of your question, you mentioned that, yeah, it's expensive to bring data in and. So. You're smiling. Why are you smiling? It is true, right? It is true and I completely agree. And that's why data locality is so important because in today's day and age, as I'm carrying around my phone and walking around, I'm generating data. And maybe the best place to store that data is in the cloud because it's being collected from, as we today call it, internet of things. By the way, that's the next hype. I can see it coming already, yeah? Yeah, exactly. It's happening. So from all these end points that we have now, and that's where the boom is, we never had the earth of data. In fact, a very famous physicist, John Archibald Wheeler, who worked very closely with Einstein and he was a mentor of Richard Feynman, said that what we call the past is built on bits. So if all our past is built on bits or present by the same analogies, bits and future is bits, we always had a lot of bits, but what has changed is our capability with this end points, with this sensors to capture that data, and then the need to process that data and provide analytics on that data. And then, of course, our ability to store that data because the storage cost has plummeted and that's all goodness for us. So I wanted to follow up, Sceptac, on something you said about the percentage of data that sits in SQL today, 80%, nobody would disagree with that. What's interesting though is the conversation of what's going to look like five years from now or 10 years from now. And I love the Hortonworks manifesto where they talk about that within five years. Now who knows if that's the right timeframe, the 50% of the world's data will be on Apache Hadoop. So now, I don't know if it's five years, 10 years or 15 years, but is Microsoft's view that it's heading in that direction? You would you agree with that? Yeah, absolutely, means because of what I said, means data is coming in so fast and furious, often we don't have the time to think when the data comes in as to what should I do with it. So I can't determine a schema when the data comes to me, I just dump it and so naturally the best place to go for that kind of data is HDFS. We often term it as the digital shoebox. I'm just stuffing things in the digital shoebox. So, okay, so now I think back to my history if I watched this industry for a long time and I watched Microsoft essentially turn the mainframe into a place that was, it's not gone away, but it's become a dinosaur. Will the traditional data warehouse, as we know it today, become a dinosaur? And I don't think so because when I speak with customers today and a lot of our large customers are already using Hadoop, how they use Hadoop is that they have this giant drive in the cloud, especially for the enterprise. So this drive is HDFS and Hadoop MapReduce on top of it. So they are putting all this data there, but they have this for their hot data warehousing requirements, they have this data warehouse where they are keeping their last six months of relational data that they want to do dashboarding on. So the years in Hadoop, six months in the early years. Absolutely, yeah, and that's a great balance to have and especially as we evolve towards tool which is agnostic of the data stores. And that's what the demo that you will see, that's what it shows that you can keep using the tools that you are used to, means everyone knows Excel. If you give that power of ease of use and be able to pull in data from Hadoop, from relational data stores, from data markets all over the internet, that's power. And if you can even go a step further, for example, we have a data explorer plugin, if you try it out, that actually recommends what are the different data sets that you can really. Awesome, yeah. So step that we are going to wrap up and I want to ask one final question for you. Obviously, exciting to have Microsoft on theCUBE. Go is great to have the big whales kind of come in and show a little bit of a product direction and insights. Appreciate that. It's HD Insights from Microsoft. Check it out, it's in the third test preview I believe. So my final question is, share with the folks what's next, what's coming around the corner, what can they expect from Microsoft in this area? From you and your team. So yeah, one of the main areas of focus is to make sure people can get the insights irrespective of where their data is, including HD Insight and other versions of Hadoop, it doesn't matter. So they can get the insights from the tools they're familiar with, that's number one. The second is, enterprise is being able to give SLAs to their internal customers around Hadoop just like any other infrastructure. That's key so that they get a single pane of glass to say manage, monitor, deploy and do all those fun stuff from. So that's the other big area. And the third big area is that we want to be, most big data projects actually start small, sometimes with a access database or a Excel spreadsheet or even a notepad list, right? But over time, as the people generate IP, it kind of tries to, starts to take off. We want to ensure that as it goes from small to really big data, from a few kilobytes to a petabyte, we, they don't have to dramatically change how they think about that data, how they interact with that data. It doesn't break their march towards bigness. Okay, great to have Microsoft on the queue. We're going to be tracking you guys. We watching you guys. Great to collaborate and extract that signal from all the noise out there, especially around Microsoft these days. So congratulations on the great work. We'll be right back with our next guest after this short break here on theCUBE, SiliconANGLE.tv's exclusive coverage of Strata plus a duper old in New York City for big data week. Thank you so much. Thank you very much.