 Live from Midtown Manhattan. The Cube's live coverage of Big Data NYC. A silicon-angled Wikibon production. Made possible by Hortonworks. We do Hadoop. And when does this go? Hadoop made invincible. And now your co-hosts, John Furrier and Dave Vellante. Okay, we're back here live in New York City. This is the Cube, our flagship program. We go out to the events, extract the signal from the noise. We're actually creating an event called Big Data NYC, which is sitting right outside of Hadoop World and Stratoconference, Big Data Week here in New York City. I'm John Furrier with my co-host Dave Vellante. We're joined with Quinton Clark, Corporate Vice President of Microsoft. Welcome to the Cube. Thank you. Yesterday, you guys were really leading the news on the trending alerts on our CrowdSpots program. We monitor all the chatter, and the conversation, and the thought, and the relationship with HD Insight got a lot of buzz. Yesterday really led the news cycle, even in today's keynote. That's great to hear. So congratulations on that. Good call by getting that out early. You get the pre-show buzz. And then I'll see today, the big conversation is the data platforms, right? So talk about what you guys are doing here real quick, and then we'll jump into some of the specific questions we have. Yeah, certainly. I mean, so just to start, and set some higher level context, we're here because we're super committed to Hadoop, of course, and we're working on product offerings in the cloud and our partnership with Hortonworks to ensure that Hadoop is really available for all of our customers, right? We almost have, I would call it a responsibility to help the Windows ecosystem benefit from everything that Hadoop has to offer. And so we've been working tremendously hard on that. And then as you noted, we announced just today the general availability of our HD Insight product offering as part of Azure, which is at Hadoop in the cloud. So honestly, you guys have a huge install base. In fact, we were joking the other day, Microsoft's earnings came out, and it's always the negative press under the whole Balmer thing, and people are like, oh boy, Microsoft's really hurting billions and billions of dollars in cash flow. So you guys are obviously not hurting from a business standpoint, huge install base, certainly the enterprise, right? Which really is looking at Retool, and we're just talking about self-defined data center and with Azure, you guys are positioned perfectly to extend out your leadership from an existing enterprise into the cloud, which then enables a lot of the big data applications. So the question on everyone's mind is, okay, on the enterprise grade Hadoop, what does that data platform look like? What's your vision around that? And what do customers do with you guys to get started? How are they road mapping out to the cloud and big data? Yeah, well, it's a great question. And Hadoop is a cornerstone of big data, but it's not the entire structure of big data, right? If you think of a cornerstone as one part of a greater structure, the work we're doing to make sure Hadoop is enterprise grade, it's around security, and Windows integration, and tools integration, and we're contributing a lot. I mean, we've put in thousands of hours of development and tens of thousands of lines of code into Hadoop, not just to make it work great on Windows, but just to make Hadoop great, right? And there is a foundational piece where customers will get started and then grow from there. And as you noted, we have Microsoft SQL service most widely deployed database in the world. And helping those professionals also start to embrace the big data phenomenon and embrace Hadoop is also a big part of the journey for us. So Quinn, you got Microsoft customer, they're enterprise ready. He has been in the business for decades. So they have an expectation for availability, reliability, performance, they have an experience in their mind as to what they're going to get when they adopt a Microsoft product. Compare that traditional Microsoft product with where you're at with the Microsoft products for Hadoop. Are we, is it parity? Are we still got a little ways to go? Still got a long ways to go? Can you describe that? Well, I mean, a big part of our effort to get the HG insight to GA was to get it to that enterprise expectation, security integration, tools integration, the work we've done to make Hadoop work great on Windows is a lot about making sure that it has that enterprise sort of ability. There's another piece of it which is integration with the rest of the platform, right? There, one of the things I was talking about this morning at the keynote was about reaching a billion people with a value of big data, right? And it's kind of a provoking thought in a way because what you really want at the end of the day is for anyone who can benefit from having the information available to them and being able to understand what it may mean and have that conversation and dialogue with the data to be able to do that. And so, whether the data is in Hadoop or it's in our relational database or it's coming from sources like OData, we're trying to make sure we funnel all that data in a way that allows people to start to build the kinds of BI and analytics models that'll let them understand what it means. So I think I told you I was listening to the keynote but I wasn't watching, I was doing some other stuff and I heard the billion and I got my, usually when you hear numbers like that you think okay, it's a phone, right? It's the data on my phone. That's how I'm going to get to a billion but then, you know, I listened and said, okay, no, they're not talking about that. Talking about Excel, essentially an office. So there's a billion office users, right? Like most of them, I would presume, use Excel in some way shape form, right? Maybe half, I don't know, maybe more than half, maybe a little bit less than half but a lot, you know? So is the idea that you'll put the power of big data into Excel and then people like I can just use it, I'm gonna, the reason would be a provision. In the office, and there's, you make a great observation, like we're gonna, there's more than a billion people that are gonna participate in big data in a passive way. You're participating in big data when you're driving your car, you're participating in your carrier on your phone, right? What I'm talking about is the more active participation where you're curious as a human about something and something's able to help make you smarter in that space. One of the things that we're looking at, even for example, is in the sports arena, right? We're actually talking to a couple of the large US-based sports leagues and saying, would you like a power Q and A model, which is the thing we can do, natural language interrogation of big data sources? Would you like a power Q and A model over your dataset for the fantasy league fans and district fans and general and like, and there's a sort of pause and then you walk them through what that could mean and of course they do, right? So they'll be consumers, if you will, of big data in that arena, but are they using Excel? No, not really, they're just participating, right? So okay, so what's that experience gonna be like? I'm gonna be able to ask a question. Can I do that today? We actually have some models with some of these datasets and I won't go into the details, which are not sort of that far along yet with these customers, but let's just say I have fantasy teams that I'm doing pretty well with in part because my ability to really have the data, have a conversation with the depth of data, to really understand what these players capabilities are and what conditions and who's on the bench at any given time, given weather conditions and which day in the run over history and all that, and it's been good. So I can play Moneyball and fantasy sports is really what's going on. Yeah, and you know, but instead of having an analyst have to sit and understand which columns mean what, you just walk up and you just ask questions, right? Like so that consumer side, I mean you're grinning ear to ear, it's interesting, right, it's fun. In the business side, it's transformational, right? We have, of course, as you might imagine, models that we use for our own business. I have it for my own engineering work, I have it for the business that I run and our ability to gain insight in real time and really interact with the information to understand what may be done with it is very different having using these tools. So why did you guys decide not to do your own Hadoop distro? Take us back to that decision. Yeah, it's a good question. And primarily our focus has been in improving the Apache core, right? And by doing that, all of the great work that we've been doing to make Hadoop work great on Windows and to make Hadoop great is finding its way into all of the distributions as a result. And our partnership with Hortonworks was just such that it just made sense to let them lead HDP into Windows and our customers were happy with that and so we're kind of in a win-win-win situation where they're able to continue to push and have that cross-platform distribution capability. We're able to give our customers a solution and keep up with the cadence of the industry. How about the data portion of big data? I mean, you guys obviously have robust apps, tools, middleware, database, you also have a lot of data. At Microsoft? Yes, at Microsoft. How are you making that data accessible to your community? Are, how are you monetizing that? Are you monetizing that? Do you have plans to monetize that? There's a lot of valuable information that's public or quasi-public or you own that doesn't violate privacy. I'm sure there's lots that does and you can't put that out in the public domain but there's plenty that doesn't, whether it's traffic information or this is an endless list of things. What are you doing? What's the data strategy? Yeah, that's a great question. I'll just give a couple of examples of the kinds of things that we've done and some direction that we're taking. A couple examples are we have a synonym service that's out of Bing. So you call an API and it hands you back a bunch of synonyms, which doesn't sound like a big deal until you realize that synonyms in terms of understanding language for search and query and all that stuff is like super, super, super important. So as an example, we have that database which is actually derived out of our Bing crawls, right? And so we're able to drive that data set with machine learning, open that up as a database basically and offer that out to people. There are other examples that are like that in terms of things that are emerging but whether it's data that's being pulled together by Bing or data that we have because of our phone assets or Windows or Xbox Live or Office 365, as you know there is a lot of data and we are working at an industry level to figure out what data is relevant for verticals across various industries, I'll give you an example. There is tremendous value in package shipping company data to banks. It's like, well, that's a little odd, why is that? Well, because if you're doing small loans in retail it turns out how many packages are coming in and off the front door is a really interesting data signal as to whether one of those loans are going to be valuable in the future, et cetera. One of these connections that wasn't obvious at first but once you dig into the details it turns out it makes a ton of sense. So we're doing a bunch of work like that and trying to understand the patterns that are emerging on an industry basis. Another example is we have a catalog of public data sources and so you can go into Power Query which is one of the Power BI tools that's integrated into Excel. You can type in things, one of my favorites is King County which is the county for where Seattle is in Washington state and King County health inspection data and because of the work we do in conjunction with being defined interesting data sources we have a public data source catalog that includes an entry for where to find King County health records and then you find all this interesting information on restaurants in terms of how they were inspected which is really interesting when you put it up on a map. It just teaches you a lot as a consumer but you can get from being curious about that to seeing a globe with stacked views on how many inspections and how many of them were of the interesting kind as a consumer versus not. Inspections of interest. Yeah, inspections of interest. You can get to that result in under a minute with the tools and before that, how would you even begin to figure this out? Yeah, Quinn, talk about the, you mentioned Bing so let's talk about as you guys have the huge back end big cloud, I'll see a lot of things going on with that which will be subject to our cloud show when we go to AWS re-invent and open stack and other cloud events will be up but this is the data show but you mentioned Bing. As your service is Bing so there's a ton of infrastructure that's servicing Bing I assume and I've heard, I'm not sure what the official statement is but that's well known in the industry that's a lot of stuff you're doing with Bing. Internally, what have you guys learned and what are you guys taking out of your experience and dealing with that massive engine and Bing because there's a lot of different, it's all these are there on the search engine side. State data sources, diversity, size of data, you mentioned metadata management with the crawling and taking those cinemas and macros these kinds of concepts. And you take that into an environment where it's much more smaller scale, you got to kind of package some big concepts that Bing has done and bring it down to the big data world. Is it easy to do? Are you guys doing that? Can you share some insight into what you've learned and some of the best practices being a supplier to Bing which could be a proxy for essentially large enterprises? Yeah, well I mean the first thing we're really doing with that learning is applying it to Azure and having that whole cycle really work. Managing large scale systems at cloud scale as you can imagine it's a big engineering challenge, big operational challenge. And so that cycle of learning between the work that we're doing across all of Azure and our online properties is a virtuous cycle. And as you may imagine they also have in the data space they have needs for analytics and visualization tools and reaching everyone in every business whether it's the Xbox business or the Office business or the Bing business with insights that ultimately help them shape what they do. And so we're sort of very fortunate to have this very strong first-party muscle that lets us shape the platform and get use of the platform to help inform how to really scale. What do you think are blind spots for the industry out there? Looking back I'm not saying anything bad about Microsoft but like if you could share advice to folks out there saying knowing what you guys have known and experience of Bing out to the rest of the market around social data because that's a big topic here which is essentially consumer data what they're touching on their mobile phones how that's coming back into a data center. What would you share with folks out there that's potential blind spots that they need to be aware of? Yeah, Mel, I'll give you just one quick story about this. I literally met with two very large hotel conglomerates in the course of a business week. I think there was an intervening weekend but across several days I met with two in our customer briefing centers back in Redmond. It was a very, I walked out of those two meetings realizing just what was really at stake. One customer had plumbed their hotels with telemetry, with RFIDs, and said they're giving the hotel key you get it's not the magnetic stripe instead it's the proximity thing which as a consumer feels really cool or in high tech right. But what they don't tell you is that there's actually a lot of proximity sensors in the hotel so they kinda know who's going to the gym, who's not. There's a lot of information that they can collect as a result of having these things plumbed this way. They've also embraced social in a very deep way and so they understand what the sentiment is, what people are saying, what people's hobbies are and so they're able to really customize and tailor the experience for everybody at scale programmatically and I'm like wow these people are really headed somewhere. A few days later I met with another one of these conglomerates and they were nowhere with us and I'm sitting there and I'm watching them and I'm like you're kind of in trouble. I mean competitively you're not gonna be able to differentiate you're not gonna be able to compete because you're not already embracing all of this telemetry and they don't think of it. I mean like the swiping data from a hotel key like they think it was operational data they didn't think of it as an advance for them and personalization of experience for their guests and these are two worlds apart and so the thing I would just tell people is look at what the big companies are doing and embracing and look at what's behind that and eventually what seems high scale to the few will end up being pedestrian to the many over time. So talk about the data center we were talking with Wayne Disco earlier about their position which resonates well with some of the enterprise customers we've talked about the data center whether it's software defined data center whether it's a future cloud around up time. We've heard the stories about Netflix going down and people that's an obvious consumer example but there are issues around big data being up and running, having high availability having discontinuous operations. Where is the state of the market in that area? In general cloud availability. It's just in the big data world I mean you know they have a unique value proposition when Disco around hey data center fails we have complete failover. That's unique I mean not a lot of people talking about that what's your take on that? Well look property of cloud property a commercial cloud property like Azure has these capabilities built into it. I mean you have to. You don't get to run a business like Bing where if you had a data center fail you're out of business that day right? You don't get to do that. It's a zero fail, it's zero tolerance. You need to talk about a rambunctious group of users you can't take Xbox live down, like ever. Ever nice. In fact that would cause a riot in my own house. Bing you go down for a little bit but not Xbox live. Little slow query but Xbox no lagging, no lagging. You don't earn the reputation that we have and you don't get the results that we have in those businesses without having solved that problem. So yes we've had to of course embrace the global scale and availability and data center management and it's a lot of engineering work but it's a tremendous power once you get to that scale when things actually work turnkey and creating a new business. So like when we did Power Q and A we weren't worried about data centers where are we gonna find those and it's all just turnkey for us, right? And you get a lot of engineering power and agility out of that. Quentin on that note we're gonna take a break here but I want to get you to get to the final word. Maybe we'll do some crowd chats with you guys in the future because this is really hot topic, this enterprise grade. This is really important. I want to get your final thoughts on kind of what you're gonna see come out of the show and beyond. What's gonna happen next? But how do you see this evolve? Now obviously people are starting to settle into these, starting to see some tech get some solid ground, still some areas to work on, a lot of white spaces, still opportunities but how do you see it kind of settling out and going forward? Yeah and even just listening to the talk this morning I would say that there's a sort of consistent meme of how is this really being embraced by business for results, right? Sometimes early in the evolution of technology everyone's exploring what the tech is capable of and I think there's now a focus shifting to what it's gonna impact and how it's gonna change businesses operationally on a day to day basis going forward and so there's a lot of interest in real time and finishing the last mile of analytics and intelligence and all that stuff to the users because I think there is a shift now on to what greater purpose? Okay this is theCUBE, we're live in New York City, this is theCUBE, I'm John Furrier with Dave Vellante, we're live from Big Data, NYC, Hadoop World, Stratocommerce, all the detailed coverage here, we'll be right back with our next guest after this break.