 Live from Midtown Manhattan, the Cube's live coverage of Big Data NYC, a silicon angled Wikibon production, made possible by Hortonworks, we do Hadoop and WAN Disco, Hadoop made Invincible. And now your co-hosts, John Furrier and Dave Vellante. Okay, we're back here live in New York City for the Big Data NYC event. We're here live. We're getting all the action in New York City, at Hadoop World, Stratoconference, all happening this week, a lot of action. This is the Cube, our flagship program. We go out to the events, extract the signal from the noise, and talk to the thought leaders, talk to the pioneers, talk to the entrepreneurs, CEOs, customers. Get that data, share that with you, and want to thank all of our supporters out there for the Cube. We really appreciate it. I'm John Furrier, John Furrier and Dave Vellante here in New York City. Our next guest is one of my favorite guests of all time on the Cube, one of our top tech athletes, as we'll see. I'm our Iowa dollar co-founder of Cloudera. Welcome back. You've been on, I've been in the Cube so many times. You're up there with Pat Gelsinger and all the other tech athletes, but we did an interview back years ago, three years ago, when you were kind enough to let Silicon Engels sit in the Cloudera space when you had like 30 employees. You said a quote to me, I'll never forget, it was really epic quote. You said, I started Cloudera, I started Cloudera because when I was at Yahoo, when I was doing my work as an entrepreneur, I saw the future, and I wanted to create the future. My first question to you is, it was right, the future is here, and the validation here at the Duke World this year is really complete validation for the Duke. You've seen the maturation, the grounds hardening around certain areas, certainly the data platform, the data hub, as you guys are announcing, is the modern platform. It's happening, customers are adopting, it's not just POCs, it's scaling. So I got to ask you, how do you feel about that on a personal level, and then your observations here this week, given the history? Yeah, so obviously I feel great. I mean, absolutely right. I mean, in our pitch to investors back then, again, all of us, I'm one of the founders of Cloudera, there's a number of us. So yeah, we would go into the investors and we'd say, it's back to the future. I mean, we're coming from the future, being at companies like Yahoo and Facebook and Google, we saw the future, we saw what it looks like. And we came back with, what was the name of the car from back to the future, the DeLorean? The DeLorean. We came back with the DeLorean, and we're telling you, this is how it's going to be five years from now for everybody, not just for web companies. Web companies just happened to be at the tip of the iceberg. They sold first because of their businesses all being done on the web. But everybody, all of us across industries, we are moving bigger chunks of our business to be running online, right? We're moving more of our business to be running within servers. And we're instrumenting bigger parts of our business with new sensor networks, with GPS equipment, with RFID tags, et cetera, et cetera. So we are more sophisticated than ever in terms of data collection. And it doesn't take you a lot of genius to figure out that, that means you're going to have a mountain of data. And that means you're going to have many different types of data, which would require this kind of system. Yeah, and it's interesting. We also, years ago, when you were actually at Yahoo, when I last started a podcast, we did a podcast. You were one of my first guests there too. And one of the things that I noticed about your background, and your unique insight with the Facebook and the Google co-founders in Yahoo was, you also have a background in virtualization, and obviously you're gaming this. But there's also a lot of physical infrastructure behind the scenes of big data that you had a window into. Hence, the virtualization trend is now exploded. So that also plays into it. So you have this confluence of big trends, right, that have happened at the infrastructure level. And then certainly, obviously, with Hadoop and storing data and making it accessible. How does that relate in this? Because we're hearing enterprise ready as a key theme. You guys have enterprise edition. Absolutely. So talk about the confluence of this software-defined data center concept that's being promoted heavily out there. And what does that mean to the folks out there in the big data community? Because that's under the hood, right? So there's a lot of action on top of the convergence. What's your take on that infrastructure for customers and the industry? Yeah, we're starting to see the intersection of big data and virtualization take place. So we're starting to see the first signs of that. In fact, a story that we also discussed back then when we talked the first time is how Cloudera, we changed our business model, actually, of what the company does. Our name is Cloudera because the initial pitch, the initial business model that we were going to do is build a cloud service where our customers can upload their data to the cloud, process the data, and then load the results. So we were going to be a cloud business since the name Cloudera. But then very quickly, we start realizing that, no, what most companies want to do, they want to be able to deploy these technologies within their enterprise. They don't want to move their data outside of their body. Actually, we would say that data is like blood. Data is like the blood that flows within the veins of the enterprise. And just like humans, nobody wants their blood to be outside of their body. We want our blood inside of our veins. Now, and that's why we didn't see big data moving into the cloud or moving into virtual environments per se as well. That's starting to shift. We're seeing a shift happen now, where enterprises are getting more comfortable moving their data into virtual environments and into the cloud. It's just the beginning, just beginning to happen. And hence why we announced all these partnerships over during this conference with companies like Verizon Enterprise, IBM, SoftLayer, T-Systems. And there's not a key one I'm forgetting right now, but that's OK. Pretty much everybody. Yeah, pretty much everybody. Yes, I already worked with Amazon. Yes, server-centric link. Thank you. That was good. We're paying attention. We're paying attention. So you don't have to change your name to On-Premesa. Yeah, exactly. On-Premesa. So we're not going to do On-Premesa. We're going to stay with Cloud Era. And essentially what we're saying is, yeah, we're enabling essentially selling the tools that enable you to have a big data platform running On-Premes. On-Premes on virtual infrastructure with VMware or OpenStack is the other standard we're supporting. And in the cloud with the number of vendors. So that's how we see the future. I know you were customer meeting, so you didn't see all of Mike's keynote, but I think it was he who made the point that way back when we started Hadoop World, we talked about the sort of fundamental change in mindset of keeping the data where it is, shipping the code, et cetera, et cetera. So essentially that's not problematic that the customers have a lot of data. Because the point was once you get a petabyte of data, you don't want to move it. So I guess it's not a problem, because you've got a petabyte of data on-site. Great, and you want to do more stuff in the cloud. How will that sort of migration occur? You won't move the data, you'll just start doing more external analysis in the cloud and you'll have a federated data structure. Is that what's gonna be- You will hybrid the future, the future will be hybrid. I have no doubts about that. There's some organizations that care a lot about security and care a lot about high performance, super high performance. So you can see that with like special financial institutions or medical companies where for them it would be very hard to move the data into the cloud. So that might not happen for them, for all of their data. But I think the majority of enterprises will have a hybrid environment where either new projects will launch in the cloud or the older data will get moved to the cloud. So you're absolutely right in that data movement is expensive and both in terms of bandwidth but also people time spent. But over time that will get better. So is it the case today that a lot of the action is with super large companies, financial firms, big government agencies, et cetera. And this just opened, the cloud opens up the door for the little guys to now compete with the big guys. And it's always the story with cloud, right? Absolutely, so yeah, for a new company and you start up a new business, they just start in the cloud. And that's very natural for that to happen. Obviously the question mark, the big question mark we had is the big businesses that really have been around for tens of not hundreds of years when they start shifting their data into the cloud. And that's what we're starting to see happen. Yeah, we've had some interesting talks around the whole security meme. And there are a lot of people believe that security in the cloud is going to be better than security in the vast majority of enterprises. I am one of those believers. But I am too, but if the customer doesn't believe it, I guess it doesn't matter. But do you see that perception changing? And especially on Wall Street, I don't see it changing, frankly. But other parts of the, other industries you do see it changing, would you agree? Because it goes back to skill sets. It's very hard to hire the kind of skill scouts that can guarantee the security levels that you care about. So companies like Amazon, Google, they're able to hire essentially the top security experts in the world. Now a lot of the financial institutions, they have the top experts in the world when it comes to security. And they have much more stricter requirements on not just electronic security, but physical security. Like who moves into, who can actually go into the cage inside of the dissenter, scanning their fingers and doing full background checks on them and so on. And they just have levels that are way higher than what Amazon or Google can provide today. And they value that as part of their core assets. So that's why we don't see them shifting. But for the majority of other enterprises, yeah, of course, Amazon can do much better job at security than I can. So that's absolutely part of this movement. So there's no question about that. So the other piece is, I remember you and I talked, there was several strata's ago, but there was virtually no competition for you and all of this competition, which is great. It's what you expected, I'm sure, because of this huge, huge market. But one of the other things that when competitors early started to come in, they said, well, we're going to make a dupe enterprise ready. You're a very humble guy. He said, it was kind of off-putting to you. I could tell that. Well, we know a little bit about making a dupe enterprise ready as well. So you've been marching down that path. Ultimately, it's turned out to be really good for the customer, right? So where are we in the state of a dupe being enterprise ready? Feels like it's there. I think the proof is in the pudding. I mean, we have customers today running Hadoop in production environments mission critical 24x7 for two or three years. Ready. If that's not enterprise ready, then what is, right? I mean, we have very large enterprises, portion 1,000 companies running this system in mission critical production environments. What is the definition of enterprise ready if it's not that? Yeah, all right. You can talk about scale and security and blah, blah, blah. It's that. It's that. It's the proof is in the pudding. Can you actually do it? Did you do it? For how many years did you do it? And I think we have all of these check marks. So that's how I look at the, to answer the question. I want to talk about the platform side of it. Obviously, platforms are OSs. I mean, as we talk about a data operating system, you know, whether you're looking at it in the stack, it's a data fabric, whatever you want to call it. But what you guys just announced with the data hub essentially is an operating environment, right? You decoupling map reduces all kinds of yarns out there. Different elements are coming in that can be, I don't want to say plug and play, but they're highly cohesive elements. And a lot of those white spaces that we've talked about over the years are filling in. You're seeing real time now developing fast machine learning, a lot of cool tech is coming in. So give us your vision on the data platform, the data OS, the data hub that you guys have rolled out for the industry. I mean, there's a vision of what it should look like in a preferred state of harmony, because the growth is off the charts. If people want more, they want more Hadoop. They want more integrated into either existing systems. You see SAP playing, here you see others coming in. So Hadoop is here. So how does that data platform, that OS look like? Yes, thank you for that question. That's an excellent question. So first, again, to give another aspect of how we predicted the future, if people were to go to YouTube right now and search for the word Hadoop, just search for the word Hadoop on YouTube, one of the very first videos that we'll show up is one of my talks from a number of years ago where we said where we're gonna be. We describe the data operating system, we describe that's gonna be about how many applications will come into this ecosystem. So the way we look at this is, there's an existing market called the EDW market, the Enterprise Data Warehouse Market. We are saying now there's a new market called the Enterprise Data Hub Market, EDH, right? So there's EDW, there's EDH. Us, Cloudera, being the very first leaders of this movement, we are naming that. That is the name of our market, the market that we're in, as Mike shared in his keynote yesterday. So again, then comes the question is why do we need an EDH when we already had an EDW? What's really different here? And the energy I love to use is, an EDW is like a digital SLR camera. You know the digital SLR cameras have very big lenses and they can take really good pictures. So what an EDW can do is take really, really good pictures, meaning it can run queries really fast. It's a bit pricey, but it's really good at doing that. However, it's the only thing it can do, just like the SLR camera. An SLR camera can only take pictures. Okay, maybe it can take videos too, but that's it. The vision for the EDH on the other hand, the EDH, the Enterprise Data Hub, is more like an Android device or an iPhone, right? More like an Android device, given the open source nature of Hadoop, it's more like an Android device, I guess. So an EDH can take pretty good pictures, right? It can take pretty good pictures. However, the EDH pictures won't be as good as the SLR camera, just like the iPhone pictures is not as good as the SLR camera, but they're very good and they're at a much more economical price. In fact, if you were asked a lot of people in a room to raise their hands if they took a picture with their iPhone versus an SLR camera, the majority will say we took a picture with an iPhone or with an Android. However, the only thing that the main thing that differentiates an EDH is that the EDH is not just about taking pictures. It can do many, many other things. It can take pictures, but it can also run interactive queries. It can also do search. It can also do machine learning. It can also do statistics, et cetera, et cetera. It can play songs. It can run ads. It's important to know that you're referring to the data warehouse. Taking the metaphor back to the smartphones. Exactly. Yes, some of the apps will come for us. Other apps will come from companies like SAS now run inside of the platform. Splunk has HUNC, I don't know if you guys hear it. Yeah, you know the HUNC, right? So HUNC now runs in front of the platform and Informatica now runs inside the platform because we know some apps will be good at building, but the majority of the apps will not be, we will not be good at building. So I love this analogy, because you asked the question, if you have an EDW, why do you need an EDH? And then the flip side of that question, if you have an EDH, why do you need an EDW? Your answer answered that. Yes, because for some pictures. But I always say, the reason I love that is because I say which business would you rather be in? The SLR business or the smartphone business? And I guess for you, obviously, the smartphone business, but it does depend, right? Because some people who are in the smartphone business but they're not doing that well, so... Well, I think the analogy is the... The EDW business may be a safer place with good cash flow. So I'll let you do the projections into the future. I'll let you do the projections into the future. Oh, the projections are clear. The EDH market is going to be enormous and the EDW market is going to kind of go sideways for a long time. You can see from the attendance and the people at this event how big of the interest is, but we're still way at the beginning of this movement. We are kind of in the first year of when the iPhone came out. Consumer devices obviously can grow at much quicker rates than enterprises. Enterprise is doing major changes inside the architecture. It takes longer for that to happen. Now, the impressive thing about this movement is how quickly it is happening within the enterprise. Never within the enterprise has an infrastructure technology matured and evolved into how big this event is right now at this speed. It's just never happened before. It didn't happen with VMware. It didn't happen with Linux. With the Linux movement. It didn't happen with the database movement. Well, the interesting thing to me is that the average age of an enterprise app is like 19 years. There you go. And yet yet the value proposition for all this new stuff is so compelling that it's happening despite that. So I want to ask a question. We've got a little Twitter action here on the crowd chat. Someone said, I'm going to discuss and then I have another question. Omar, what curious, what mission-critical workloads are they running? Your customers? What examples of mission-critical workloads? Well, we just tweeted, hey, you know, three years, it's enterprise ready. You mentioned we have customers running it. What mission-critical workloads are they running? Like an example of a mission-critical workload. That's a good question. I should have been prepared to give an answer. The reason why I'm not prepared is we work with a lot of customers like big banks and big telecommunication companies that are very secretive about exactly what they're doing with the technology. Especially the ones doing the mission-critical workloads. But I mean, some examples that we use commonly is obviously we help the government today. We help the government in terms of predicting when the next terrorist event will take place or who is going to do that. So, I frequently would joke and say, I'm a victim of being, I'm from the Middle East and I have brown skin and I frequently would get stopped at airports just because I look this way. I would rather the government analyze my data and know that no, Omar is a good guy. He's not a suburb guy. So I would rather have that. Unless they look at me and Dave. We have the data on these guys. They're the bad guys. So that's just one way how our systems are being used in mission-critical environments to make it. So we'll get back to Jim. But if I may, so I mean, we don't have to name names, but down here, I mean, certainly there's this risk, you know, credit risk. I can mention use cases. Yeah, use cases of course, like in finance doing risk modeling for how we offer loans or how economies are actually going to be suggested to risk is a very, very key use case. For all detection in finance, so three of the top credit card companies, I can't say the names, but you can think who are the four top credit card companies. A few of those are customers of ours using this technology for doing fraud detection and detecting when a transaction is a fraud transaction or not. That's a very good example. One of the largest agriculture companies that essentially research new types of seeds that can grow in different environments under different conditions and temperatures is using this technology to come up with these new types of seeds. I would put marketing into that mission critical list. People don't generally think of marketing in real-time ads serving as mission critical, but it is now, because it's driving revenue. And if you're not there, you're not competitive. So that's mission critical today. Absolutely. Like, yeah, one of the most common use cases is doing all the analytics on Twitter and all of the online marketing, all the advertising happening. That's one very key. Hemabarka complains about it, but there's good things that will come out of that. Yeah, we are more proud of the use cases that are happening, social impacts, and that are changing our lives in a positive way, as opposed to use cases where it's just helping people. But maybe some interesting research will come out of that, some kind of socioeconomic. It's good for the economy, obviously, yeah. Of course. It adds the economy with lots of money. I want to ask another question. I didn't mean to get sidetracked on that, but Tim Crawford-Gudder has got his answer. You know we love search, we know we love discovery. That's kind of dear to our hearts. So the BI market, essentially the modern advanced analytics is essentially discovery-based. You mentioned fraud detection, it's essentially going out and taking data in, running algorithms, machine learning, whatever other techniques are out there that the computer sciences drive. So search concept is keyword search, essentially you ask for something, you get something back. But now, with big data, it's a lot of personalization. You're seeing personalization, you're seeing the persona of one, whether it's marketing for other detection. How does the discovery of data, serving the end user, what's your vision on that? Obviously, you worked at Yahoo, so you have experience in search, and you know a little bit about the old web search. In the modern discovery, where it's real-time, event processing, graph data, things of that nature, it's all new, what's your vision on that? So first I wanna lock on something very key you said, which is personalization. A very good, important, fundamental part of this movement is personalization. It's the fact that now, no longer will we, as companies, organizations, and governments, look to classify you as one of a group of people. Like, build a segment that you are a male in this age range, you live in this zip code, you must love to buy diapers, and I'll just make that up right now. So we tended to do that. That's the old way of doing things. What we are doing now as companies, organizations, sometimes we were right. Sometimes you would be right. Every now and then. Actually, most of the time you'd be wrong. Absolutely. Versus if you were really modeling what I armor here about, what I armor am doing, then you would come with the much better answer for me that will be right most of the time, and you'll avoid essentially upsetting me every now and then. So a big part of this movement is actually about that. It's going away from modeling segments of people, segments of products, to modeling every single thing. And actually, that's what's driving big data, is we're modeling every single thing. What's happening to my product, what's happening to my credit card transaction, is armor a terrorist or armor not a terrorist? That's really how... We're getting the hook here. I know you gotta go, but I wanna get you to address some of the tech that you see happening evolve. You've got some machine learning, you've seen some cool tech out there on the computer science side. What else do you see? What's your vision around? Some of the things that are gonna make this new discovery happen in this whole new way. It's not just crawling, pounding an index with a key where there's a lot more complexity involved. What vision do you have for that? Yeah, the key part going forward, the key part going forward is how can we bring all of these capabilities together? How can we do the search and do the interactive SQL and do the statistics on the same underlying data assets? That's gonna be the key. That's the key in the future. That's the key capability that now all of us as companies working on this enterprise data hub mission need to be focused on. So that spark, that storm, is that machine learning? What can you name some text that you're watching closely? It's all of the above. It's all of the apps. It's all of the apps that you can bring to your data assets. How do you bring them together with the common management layer, a common security layer and most important of all a common metadata layer that allow you to make all of these apps leverage your platform. Some of them is systems like Storm, which we announced, I mean, thanks. Sorry, Spark, and Spark allows you essentially to do data mining and machine learning. That's what Spark is really good at and we announced this partnership with Databricks. But just one of them. That's just one of them. It's not about that. It's about the whole. Again, it's going back to the- It's the totality of the ecosystem and the controllers. Exactly, going back to the power of the iPhone, the power of the Android devices is not that they can do maps. It's not that they can do calendar. It's not that they can do email and contacts and games. It's about that they can do all of these things in a holistic, unified experience. That is the future. That's the operating system, the data operating system. Love it. Amar Awadallah, great to have you on theCUBE again. We're proud, four years ago we did theCUBE when Dave didn't even hear about what Hadoop was. We came back from storage and came up and I made him like, you got to come to Hadoop world. What the hell's Hadoop? Four years later. How is Hadoop expert? They can't, we're like a tick. We're embedded in it. We're like a tick in the Hadoop ecosystem. We can't get rid of us. This is theCUBE. We're Amar Awadallah. Thanks for coming on. Thanks for your support. I was very appreciative of you guys and you personally for helping us out. Even four years ago, when you guys were 30 employees, how many employees now over 4,000? Almost, no, no, no, 500, 500. 500. Next year, 4,000. The new space looks good in Palo Alto. This is Amar Awadallah, co-founder of Cloudera visionary tech athlete friend theCUBE. We'll be right back after this short break.