 from Dublin, Ireland. It's theCUBE, covering Hadoop Summit Europe 2016, brought to you by Hortonworks. Now, your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are here live in Dublin, Ireland for theCUBE, SiliconANGLE's flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, co-CEO of SiliconANGLE. I'm sure of my co-CEO of SiliconANGLE, co-CEO, this is theCUBE. Our next guest is Raghu Ramakrishnan, who's the CTO for Microsoft's data group, part of Azure. Welcome to theCUBE. Thank you. So, great to have you on. Thanks for spending the time with us. My pleasure. I want to really look at the large-scale, big data problem as it meets the growing sea of developers. And you've seen the large-scale folks, certainly you guys are big, big large scale with Azure and inside Microsoft. And then you have all the cloud players building out large-scale. Then you've got the on-prem hybrid cloud. All of this is really fueling the application market around big data, which in some cases just companies and their apps. But the nuance here is that new types of data sources are being enabled. Internet of things at the edge of the network. This is causing kind of like a new integration concept around data layered, more glue software, mixing and kind of distributed computing large-scale paradigm with software. Which is really exciting for the geeks out there like us. So, share with us your thoughts on the market. Put some color around that. You hit all the key points there, right? I think this trend towards many diverse kinds of data from many diverse sources and then wanting to do many different things with it. It's led to both a technical revolution and a potential business revolution. Technically, we've been talking about risk database architectures for a long time. Well really with HDFS and yarn as common sockets into which you can plug everything from hive to spark to use SQL to whatever. You're seeing a very dramatic realization of that risk concept, right? And strictly in terms of scale, database systems used to talk about scaling to hundreds of nodes, now we talk about hundreds of thousands of nodes literally, right? We have individual clusters of tens of thousands of nodes today at Microsoft. Individual queries that run on thousands of nodes, right? Business-wise, what I think is happening is the complete transformation of what we think of as data warehousing. It's no longer just ETL from operational relational stores. I mean, it's becoming the primary source of where data rests for say IoT data, right? Data is born here in addition to being aggregated from all of the usual places. You're storing everything from multimedia to tables to web logs, to sensor data. You're processing it as it arrives in real time. You're doing BI with it, you're doing machine learning with it, of course you're running batch SQL queries with it, right? There are ad hoc jobs, there are production jobs. So you have all the requirements of the traditional data warehousing market and a whole lot more. And the result is really an information production factory if you will, not just a traditional warehouse. Super exciting. When I was back in my youth in college, I remember graduating and getting my first programming jobs and the word information processing was a department. I remember back in the day, and you bring the things that you're talking about, this is about information processing now. So the database equation used to be get the schema nailed down, get the schema, you're all set and your constraints were schema-based. Now you have a mix and matching between schema-based, large-scale databases, mix and match, pick your flavor, plus the unstructured stuff in real-time, add in Spark and other things going on in real-time, creates the perfect opportunity for innovation. Where do you see that innovation? Or even saying invention and innovation, kind of separating the two. You got some invention, new kinds of approaches, and then innovations, which you guys are doing. Obviously Microsoft has an install base, you have a pre-existing business, but you're also innovating. And we've seen Azure do things that open compute with the reference architectures. We see what you're doing here in the community here. Where are those new innovation and invention opportunities? From a technical perspective, I just spoke on the work we are doing with Yarn, right? This is incredibly exciting, right? We are talking about, for example, a federation where a single query can run distributed across the world. A single query, processing incoming data in real-time, and combining it with petabytes of historical data, running on thousands of machines. How do you schedule these, right? How do you make sure that when some one part fails, you seamlessly pick up and continue and it doesn't, how do you do this while maintaining traditional goals like SLOs? Keep in mind a challenging thing here. When data gets to be of the scales we are talking about, you'll need to share. You can't just create copies of petabyte files. And that sharing means multi-tenancy is a fact of life. Multi-tenancy means you stomp on each other's toes. How do you take care of that? Unless you have deep, deep work on how to do this in a principled way, you can't make SLOs. You can't. So I got to ask you the question. I love the federation concept, but also unification is another word that we're hearing in kind of a new way. And just Dave and I were talking last night, analyzing and squinting through the Facebook at Mark Zuckerberg 10-year roadmap. And you're seeing things like Facebook that billions of users and they're promoting their identity system. Correct. Again, the same concept of this universal need for identity at a user level, which is also data-driven, because you've got organic data coming from gesture data from users, interaction data, transactional data, as well as the IoT and other machine data. So you have this confluence of vectors coming into one thing, and it just seems that open needs to be the way. And no one vendor like a Facebook should be owning any universal anything. What are your thoughts? Well, de facto, the common way people refer to something becomes a standard, right? To that extent, the game is on in terms of identity for various kinds of entities and who owns that identity. LinkedIn wants to own your professional identity. Facebook wants to own your personal identity. And if you get into the business world for various other specialized areas, companies are competing, okay? Around context, whatever that. Around context, exactly. On the platform side, a similar story. If you look at HTFS, it's bidding fair to be the universal API for all your data, no matter where it resides, right? Whether it's in main memory, connected with RDMA, whether it's on archival storage, whether it's on prem, whether it's in the cloud, you need a uniform set of APIs to provide access to all of these myriad processing apps that are emerging. That's the HTFS play, right? Once you do that, once you know the key part of HTFS, by the way, it's the fact that it can tell you where to find any given part of your data. When you try to run a job, the first thing you do is look up the zip code of your data and then try to co-locate your computation in that same zip code, right? So HTFS is one half of the zipper and Yarn is the other half. You can then say, hey, I know where my data is. Take this computation and stick it there. Yarn. Exactly. So that's the how. It's the stack. That's exactly it. And standardization is important. Absolutely, absolutely. Several times today you've said, well, how do you do that? How do you do that? In your talk, you talked about literally hundreds of petabytes per day that you're processing, that kind of scale. So the how, you said you must architect for failure. You've got to have massive parallelism and you've got to be able to move the compute to the data and then you've got to have tooling like the zipper with Yarn. So are you able to deliver that experience to customers today? You're part of the Azure group. Yes. So what's driving your customers to cloud and are you able to deliver that experience today? So the bottom line is let's be honest. All of this is getting complex. As you add security, as you add data governance, audit trails, the whole ball of wax, the IT overhead of managing these systems is non-trivial, right? And part of the attraction of the cloud is it's so much easier for a vendor like Microsoft to do the heavy lifting and then amortize that investment across many, many customers. As a customer, the one thing we are hearing loud and clear make it simple, right? People want to be able to come in and say, create a file, store this data and oh, it's going to keep coming at you 10 terabytes a day. Don't drop it, okay? And the first week, I want it in main memory. After that, stage it. And then I need it again, make sure it's there, but make sure I don't get too hefty a bill. And oh, by the way, I want to be able to do this and that and that, interact in real time, machine learning, SQL. I just want a rest endpoint where I can submit my job and you return my answers to me, right? Yeah, I will give you a bill too. But they don't want to be bothered with any more platform responsibility than they can possibly avoid. So they're looking at total cost of ownership. They're looking at simplicity and they're looking at one more thing. As the life cycle grows richer, data and compute, they need to scale very differently at different stages in the life cycle. So for example, when data is hot, you're willing to pay for the premium of keeping it in memory. Later, you want the cogs efficiency of cold storage. At the same time, when you need to run that daily batch update, you're going to burst up in terms of your CPU consumption. At other times, not really. So a regime where you tell them, I want a cluster that has a hundred machines and I calculated a hundred because that's the max of my storage and my compute needs all up, leaves people increasingly unhappy. So you're talking about customers changing their operating model as they move to cloud. And that's really an economic consideration. It's simplicity drives that economic. So many people say, oh, the cloud's more expensive. What they're missing is, you said at the TCO, so much money is spent on IT labor. My question is, where are your customers putting that spend? Are they dropping it to the bottom line? Are they doing it to develop apps? Are they moving it into business process? All of the above, all of the above. And I would also say that we are in the early stages of this revolution. Businesses at this point clearly grok the value. But in terms of the ability to move and to shift their internal processes to take full advantage, while there are some who are ahead of the others, this is very much a journey that's happening now. So I would say a lot of the energy right now is in that cultural shift within their organizations. The bridge between their existing and the new worlds. What's the philosophy around pricing, specifically as it relates to cloud offerings? Microsoft's always been about simplicity, simple tools, lowering costs, the market's elastic. You guys understand that very well. You've got probably $10 billion in operating profit in your cloud and your enterprise business. So you have the financial wherewithal to make solutions very attractive. What's your philosophy with regard to that specifically in the elasticity in the marketplace for demand? Let's take a step back. And let me give you my answer through the lens of data. In SQL Server, we have a market leader. It's a multi-billion dollar annual business force. I mean, that alone makes more money than entire companies. In light of that, where do we see the future? We see two tremendous changes. One is the shift to cloud. And the other is the enormous increase in the data centricity of our customers in how they want to not just use these systems as a transactional backend, but to glean insight and to operationalize that insight in their business processes. For us, that move to the cloud, it's going to disrupt us. But we see that as an opportunity because we are disrupting ourselves, right? We are in the cloud. SQL Server 2016, our flagship on-prem product was baking for several months in the cloud, right? We are cloud-first in our thinking. And that strength on-prem allows us to craft a hybrid strategy, which virtually all our customers want. The other part, historically, we have enjoyed tremendous success on the OLTP transactional side. We see this as an opportunity to redefine the warehousing side and to compete there with renewed vigor. So for us, we are super excited. We see these changes, the cloud, and increased. Totally changed nature of analytics as something we are all in, right? And you hit the nail on the head when you said simplicity. We think the place where we, Microsoft, can establish an identity and deliver unique value is by simplifying. Yes, we have these obscene scale badges we can flaunt. Kind of cool. I won't deny that. But at the end of the day, I think it's by making it simple that we will succeed. We had a chance to interview Satya Nattela on theCUBE when he was before he got promoted at an Excel partners event in Silicon Valley. And you could see the gleam in his eye. And he had his finger on the pulse. And we knew that all along. So when he got promoted, we saw the shift. Everyone did. The mainstream press were writing about it. But there was some little things I mentioned earlier. Open compute out of the blue. Reference architecture from Azure come in and lay down some offerings for the open source community. Obviously Facebook is involved now, it's a forum. Your contribution here, that is a new Microsoft. It's an open Microsoft. Very much so. Talk about that. And the yarn contribution, you guys are involved in. Share the culture now at Microsoft. I came to Microsoft from Yahoo, right? I was there at Yahoo from 2006 to 2012. I was chief scientist for cloud. I was both a producer of parts of Hadoop with my team. Things like ZooKeeper, Pig, and a huge consumer, right? And so I was a big believer in Open. And I spoke to Satya at length before I moved to Microsoft. And he said, yeah, we're all in on Open. I can tell you- He's not saying that either. He's not saying that. He's walking the talk. That's absolutely true. Inside Microsoft, we have green fields for patents, meaning super easy to open source in certain areas where the lawyers have reviewed that there's no infringement of potentially sensitive Microsoft IP. That's how far we are going to reduce the friction. So the due diligence internally is already moved down the path. Absolutely, right? We contribute extensively. The president of the Apache Hadoop Council, Chris Douglas, works on my team. We have many, many committers across the board. Linux, SQL Server, and HT Insight, both are available on Linux. Can you believe that, right? A big, big fraction of cycles on Azure is in Linux, right? This is mind blowing. And it's not just Apache. If you take R, we acquired Revolution R. We're a big backer of the R open source community. Jupyter, notebooks, again, we are all in. We work very closely with the Jupyter open source community for notebooks, in particular with, we're doing a lot of work with them in the area of large scale data and it's integration with Jupyter, right? Across the board, Microsoft today has a very simple policy. Meet the customer where they are. If they won't open, we give them open. They want Linux, they want Hadoop, we give them that. We will also give them some unique tools of our own, but the beauty of the Hadoop ecosystem in particular is the plug and play sockets, right? In a very principled way, we can say you can take it or leave it, you can mix and match and that's working really well for us. And you said before that's disruptive. Yeah. This is part of the disruption, it necessitates that you change the way in which, in some respects, make money. And can you describe that change? Does it shift to services, to infrastructure services, tooling? That shift has actually been very helpful in thinking about open. Because at some level, what we care about is getting our customers to be successful on Azure, right? Because if that works, at the end of the day, the revenues will take care of themselves, right? So that's disrupt, eat your own children before someone else does, as they say. Partly, but here's the other part. Those children become part of an ecosystem, a community, right? Here's a very, very important insight that we have taken to heart. You don't need to do everything for everyone. If someone wants open source tools, they don't need to go anywhere other than Azure. They will find the very best Spark service or the very best Hadoop service, the very best Storm service right there on Azure. But at the same time, they will also get it with all the benefit of our scale, our simplification, our deep integration. For example, if you're running Active Directory on-prem, out of the box, it's integrated with our Active Directory on the cloud. So these are only on Microsoft Azure kinds of enhancements to your open, right? But at the same time, you can run exactly the same code you're running on-prem on, say, Storm. But at the same time, if you want to come to the cloud and take advantage of some of the unique tools we are providing there, say you SQL, which is the same tool we use internally for our big data processing. You have the optionality without making a new copy of your data. In fact, you can simultaneously be running two different tools for two different scenarios. And that we think is the beauty. We get paid all the same. It's all Azure compute, Azure resource consumption. So God bless you. We get paid. You segment it out nicely. You give people the choice and they can get scale, they can turn the scale knobs with Microsoft if they have Microsoft or need to grow. It's just good business for you. It's good for us. And if there's some part of the world where our tools give them a unique edge, they can get that while consuming whatever else they like side by side. It's not give this up to get that. You said in your keynote that Microsoft is a data-driven company. What does that mean and has Microsoft always been a data-driven company or is that new? You know, when I came here from Yahoo, remember, I was part of the Hadoop lifecycle there. I saw how it grew like a hockey stick. And I thought I was used to big data. I came to Microsoft and had to reset my calibration. Right? Big, big data. I mean, this company, virtually everything, it's about okay, and what does the data say, right? Every business, right? Gathers data and is informed by that data. The notion of data warehousing, if I use the word, it's a pale limitation, right? What's really going on here is information production as an integral part of running the business, right? You release Windows and you gather data on how it's being used, where the issues are, and in near real time, you push out updates, right? That's not something Microsoft was able to do 10 years ago. Today, that's the way we do business. And that's one example. Bing, search ranking, ads, Skype. Every single one of these, Xbox Live, every single one of these, the nature of our business relationships, the customer experience, everything is transformed by the ability to back the data and then look into it deeply. It's amazing. I can ask my security question. What should corporate board, what should CIOs be telling corporate boards about security? What's changed? What is on the need to know list that the CIO should communicate to boards of directors? Compared to your typical web company, the enterprise CIO is a very, very well-informed person when it comes to security. So, frankly, they know an awful lot. The one thing I'd say that's changing. If you take the cloud historically, one of the arguments against the cloud was, do I really trust this core, core business data to someone else? Well, actually you do already. Your data centers are probably operated by people you're contracting out to. If you look at the public clouds, any issue, our vulnerability is magnified 100-fold. So, we take the serious. We really invest here. So, the bar for everything, not just security, data governance in general, the whole ball of wax. It's going to get better and better and exponentially better. So, that's an area where if I were a CIO today, I would take a long, hard look at what my long-term strategy ought to be. How I can best leverage the combination of what needs to be on-prem. And there are many reasons for that with what needs to be on cloud. And security is no longer a reason for not really thinking about the cloud. If anything, it's a reason for doing so. Yeah. You mentioned the multi-tenancy. That's something that they'll deal with. Quick clarification. I noticed on Twitter, Arun Murthy, Cube alum, great, great, great friend of ours, said that you have 100,000 yarn servers or 10,000. What's the number? I saw, I saw two numbers. So, right now we are in the, we are playing around with Federation. We are starting to push it out. We are right now in the tens of thousands scale. That's what we have tested, we are working with. But we have no reason to believe it won't scale to the full clusters. That'll happen over the course of the next several months. When all is said and done, we'll be running yarn to manage hundreds of thousands of nodes. Okay, so up to 100,000, 10,000, okay. No, hundreds. Hundreds, yeah, 100K plus. Okay, so, other question, just to kind of change gears back to Satya and Nutella. So, talk about the conversations you have with him. Just share some color to what he's like as a person and the kind of leadership style he has because I know he's really been all about the cloud. He sees the future. It's really obvious from his conversations. We've had with him also his public statements. I know he's couched as a CEO now. Doesn't kind of give those forward looking statements but you mentioned you got a green field of projects. Can you share some of the conversations that you've had with him and can you give us a little bit of a telegraph of the kinds of projects that you'll be releasing? So, in terms of the types of projects, we have covered a lot of that, right? If you take open source of all stripes, Apache, Linux, things like Miso's, Docker, Jupiter are, across the board, we are all in, right? We are consuming, we are contributing, we are integrating with some of our own offerings. If you take all of this, it's part and parcel of open in an even bigger sense. Almost immediately after Satya took over, we released Office on Apple and Android devices, right? It's all part of the same story. On Azure, we have support for Linux side by side with Windows. At the same time, Windows itself is a crown jewel for us, right? We are baking in the ability to do Linux in as part of Windows. Things like Xamarin allow for cross-platform, right? Bottom line is, it's not an either or for us. We want ultimately to meet our customers wherever they are, whether it be on our platforms, other platforms, or a blend, right? That's the single biggest change I see with Satya, right? His ultimate goal is to make customers successful because that means we'll be successful. We're not doing it because we are altruistic. Well, we hope to sit down with him sometime soon. Satya, is it watching? We'll be seeing you soon. Transformation of not IBM, Microsoft, because IBM's transforming all the big companies. Oracle is even all in on the cloud. You guys have absolutely transformed. So I got to ask you a question. So when I was a developer as part of Microsoft, the Microsoft Developer Network, really amazing success Microsoft's had with developers in the old Microsoft. Now you have a new transformation going on. You mentioned a few of the highlights. Open source, go open. How have Microsoft changed on the developer front? What have they kept? What have they kept in terms of that working formula? And what are they adopting to with the new open source? I'm obviously here. Share with the folks what's the strategy? Obviously they think a lot of good things to keep. But some of the things are kind of old hat. What's the new stuff, what they keep for the old stuff? Bottom line here is we want to make sure we are front and center with the developer ecosystem again. And if that means supporting the new world, the non-Microsoft world, as part of traditional Microsoft developer favorites like Visual Studio, yes, we'll do it. If it means making it easier to write once, deploy across these tools. Yup, I mean the Xamarin acquisition, which we are giving away free, speaks volumes there, right? That in a nutshell is the story. We don't care what you want to develop on, right? We want you to develop with us. And we will make it as easy as cross-platform as we possibly can. Raghu, my final question, first of all, thanks for spending the time on theCUBE here. Share your insights. They're in high definition here on theCUBE. Thank you, HD insights on theCUBE. Unintended. Final question. Yes. What is the future of data from your perspective and from Microsoft's perspective? Wear your personal hat and then throw your Microsoft's hat back on. What's your, what is the future of data? What is this all about? Data is the future, my friend. I think virtually every single thing that goes on, every breath you take, every little thing that happens under a streetlight, your Rolls-Royce engine in your plane, your tractor as it goes by a line of plants, the thermostats in your house. At the end of the day, the thing that makes them be what they are is not the hardware, it's the data. The data they emit and the insights from data that they can use to tailor their responses. And that's replaying at every dimension. The individual devices, the larger business processes, the people are beginning to realize that data is the pulse of everything they do. And the moment something becomes observable, you can also optimize it and do it better. Agile, data's the new developer kit. Okay, we are here live in Dublin, Ireland. Raghu, thanks so much. Raghu, CTO of Microsoft, we'll be back with more after this short break. We're live here, this is theCUBE in Dublin, Ireland. We'll be right back after this short break.