 I'm John Furrier with SiliconANGLE.com and SiliconANGLE.tv, here at IDF Intel Developer Forms in the developer zone, software.intel.com. I'm Omar Treiman with Cloudera, a new era in computing, a new cloud era, the new data center era. Omar, you've been on the Cube before. Welcome back to our mobile Cube. Thank you. Thank you very much. So tell us, Cloudera, you guys are a new growing company in here with the old growing Intel and their keynote today was really about a modern era. It was talking about mobile, obviously, and fourth generation processors. You guys are really have taken advantage of the whole Intel commodity hardware, industry standard hardware, whatever you want to call it, in the massive scale out, open source has been a real force right now. So what is the scene here for you guys, why you here at Intel? Well, I think there's actually two reasons that we see a lot of this changing how data centers are managed. There's the reason for the change, which is coming from mobile, which is coming from the explosion of processors. That means more machine generated data, it means more log files, it means more events, it means more types and varieties of data. And those are creating pressures on the classic model that you have in the data center where you buy real big iron that's special purpose that can only do one aspect of, in this case, data management, data processing, data analytics. So the change we're seeing that happen, that effect have on the back end is that within the data center people want to standardize. They want industry standard hardware, industry standard components. You still get the variations. You get spinning disk, which is going to be around at least for a little while longer. Of course you get flash, which is going to take over as the next generation. There's now 10 gig and eventually 40, 100 gig networking, but 10 gig now starting to standardize larger volumes of memory and typically Intel processors. And then those actually change depending on the application that you're deploying it for. Even within Hadoop, even within big data management, there are different types of storage heavy or compute heavy applications. And so now from an IT perspective, instead of buying big iron that's specially designed for one thing, you get industry standard hardware. We still have to kind of design it and tune it for the different applications. One of the trends we've been seeing here at Intel is Intel moving from being a component player to actually being an actual data center player. Pauline Niston, you guys have taken advantage of that trend by being a major player in the data center. Can you share with your experiences, what you've seen in the data center and or the cloud, private, public, or hybrid, where there's been kind of architectural change? We've been talking about IO centric infrastructure on the queue with you and Omar, I mean, Amar and Mike as well. What changes are you seeing specifically around architectures, sands and the impact of say flash and solid state? Absolutely. And I think flash and solid state are pushing it forward. This is still in there to some extent, but really the move is away from separating, which has been the classic way of doing things, separating storage and compute towards using these standardized components to bring them together. So Intel, of course, sells CPUs, they sell networking gear, but it's also a lot of the IO channels that are now built into the servers. You just put the storage in the server, you put intelligent software on the servers, and you also need a lot less of separate storage. The storage and compute come together. There's a variety of different use cases. It's not all one size fits all, but it's a general philosophy. And so with that, we were excited to be at the first Hadoop world ever a couple of years ago, and at that time, Abhi Mehta was at Bank of America, he's now got his own startup. He coined the term data factories, and that was interesting at the time. We all said, that's absolutely home run, we love it. But can you talk about how that's evolved today, because we're kind of seeing that happen. Specifically talk to the audience around how you're seeing people organizing their data. I think data factors is very prescient about how things are evolving. Today we hear data hub or data reservoir, but it's the exact same concept. Instead of data flowing freely throughout the enterprise, landing on file servers, getting shuffled over to compute servers, and then ending up through an ETL grid in a database, it's getting centralized on joint storage and compute architectures. And so by putting data in the same place that you actually compute on that storage, maybe bringing different engines to this scale out of the industry standard, compute and storage architecture together, that's the evolution that we're seeing. And that becomes your data hub, it becomes your data factory, it becomes your data reservoir where you have pristine data. So in the storage business, there's been a lot of talk about tiering and thin provisioning, which has been around. And we've seen some great companies get started and get sold and do some great things around that. How has the storage business changed? Because you guys play in this area heavily with the big data. And in a variety of different capacities, data is closer to the server, you pull it in, you push the processors to the data, and it's versus all kinds of different approaches. What have you seen around the storage industry specifically that's changing? And what's being hyped up right now? We're seeing a ton of companies saying they got some Hadoop here, a little connector here, what's your take on that? I think IT now gets to make a decision. Do they want to store data for the purpose of storing it, or do they want to store data for the purpose of using it? And if they're going to store data for the purpose of using it, they're typically using something like Hadoop, they're using a big data platform in order to run it. And then they want to connect it with the rest of their data management infrastructure. When you install something like Hadoop on industry standard hardware, nothing happens. It doesn't materialize anything. It's like a spam filter or an email gateway. You actually have to connect it to the various data sources that are driving data in, as well as what's consuming data on the back end. So all that connectivity is critical. And that lets IT decide, do we want to move data through this pipeline where they're actually using data and using it to drive the business? Or do we want to offline it and put it on tiered storage where it's not really accessed or used anymore? And so I was showing you our little tool. We were measuring some of the crowds, IT professionals and data scientists. We're seeing some interest in Hadoop. We're seeing some interest in MongoDB, Aussie analytics. Can you share with the folks where Hadoop and Mongo differ? We're seeing some people like Mongo over here better. There's obviously a different NoSQL databases out there. What does Hadoop, how does Hadoop differ from the different products? I think Hadoop's philosophy is that there isn't any single engine that's going to solve all problems, but all the engines in a world where every piece of data is big and needs to be touched by a lot of engines should live on some kind of unified storage and compute fabric. So I think Mongo is very useful. It's an important specialized engine and it solves a lot of interesting problems. Primarily I think we've seen kind of the document and application serving space. But when you talk about these deep data pipelines, advanced analytics, when you move into big data and real-time analytics, you actually need to stitch all that together in one fabric and apply different kinds of engines on top of that. So can you give a specific example where Mongo's different from say Hadoop? And what's this thing, Storm, people have been talking about, Storm? In order. So I think if you look at Mongo where we see people using it is typically on the front end serving an application. Whereas if you're generating a lot of data, you need to compute that data and feed it back into your application. That part of it would actually happen on Hadoop system where you're keeping your historical data forever. Storm is being used a lot more on the real-time side. We used to be called complex event processing where you're actually pushing data but not storing it and you're doing compute on the wire as data flows. And again, it's sort of a separate area within big data but most of the data that we see today is in this large data reservoir. All right, give us an update now on the Cloud Airfront, obviously. How many employees you guys have and that kind of thing? Quickly, don't go too deep on that. I don't want to waste too much time on it because you guys are growing. Just get the numbers out there. And to talk about what's going on in the Apache open source area and what's the exciting things coming down the pike if you can share a little bit about what's coming at Strada at Doopworld? Yeah, so we are growing fast. I think we're closer around 300 employees at this point which is for a four-year-old company that's pretty wild growth. A lot of that investment actually to your point has been on the open source side. Over a third of the company is just focused on building great software that is open source that people can use. So we spent a lot of our time contributing to the Apache community. We do all of our development on the platform out at Apache. And then we actually use Apache projects to build CDH. So you get 100% Apache open source artifact but you get it reliably on a quarterly basis with minor versions every quarter with major versions every year. The really cool feature is that we've seen coming out so CDH4 now has Apache high availability. There's extensibility within the platform co-processors within HBase Act like triggers for real-time events so you can build much richer real-time applications. Those are some of the interesting things we're seeing on the platform. So I got to ask you the Hortonworks question because you guys to me are like cousins. You both have kind of grown up in the Hadoop community. You guys were the first commercial venture-backed company. Those guys spun out of Yahoo. Again, got heavily backed by Benchmark. You guys are backed by Excel. So in a way, you're kind of the cousins. And then you have other guys. MapR, you got EMC, Green Plum doing some stuff in a variety of other proprietary or kind of old school legacy vendors and BI and data warehousing. So how do you tell the folks how to make sense of those worlds? This is not so much a Cloudera versus Hortonworks because what's the difference between you guys and then what's the difference between you guys and the other guys? Well, I think there's a community that builds this open-source software and certainly obviously Hortonworks participates. Yahoo participates heavily still with their engineering team, Facebook participates, Twitter does. A lot of the web companies do. In fact, a lot of our customers on the commercial side, non-web properties are starting to contribute as well. Cloudera invests, as I said, a lot of our time contributing and building the capabilities of the open-source ecosystem. The question is, once it's out in open-source, what happens next? So there's over a dozen different projects that make up the Apache Hadoop ecosystem. All told, there's over 50 different releases from every single branch, from every single project every year. And it's very hard for someone in an average company to actually track. They just want to consume software that solves their problem, that's tightly integrated. And so I think we have different philosophies as to how to approach that. So talk about HBase. HBase becomes part of, as I said in theCUBE, the Holy Trinity of Hadoop. You got HTFS, you got MapReduce, and you got HBase. A lot of people have been criticizing HBase, saying it's a tailored suit, one use case, not very scalable, hard to work with. Mongo's been saying a lot of things over here. What's the true and false statements that have been kicked around around HBase? And what are people missing about HBase? How relevant is it in your mind? HBase actually is pretty critical. I mean, it's modeled on what Google created when they created Bigtable. Today, Bigtable powers a lot of their applications in infrastructure. And we're seeing HBase power the equivalent things out in the real world, or outside of Google, right? I guess Google's a real world too, outside of Google. But Facebook, for example, famously chose HBase to actually power their messages every time you message on Facebook. That's an HBase application. If you get point of interest data that is not driven by Google, it's driven by anywhere else, it actually comes from Nokia. They have a unit called Navtech that collects all that data, refines it in Hadoop, and publishes it using HBase. I think HBase touches a lot more people in a lot more ways than they actually realize. It's just in the back office, it's underpinning. What is it good for? HBase is really great for any time you need real-time atomic access to data. If you can tie a session or a user ID or a user profile to a lot of variable data, it might be a catalog, it might be a shopping cart, it might be a user session, and then bring all that together and continuously update and analyze it in real-time as you're interacting with the user. And that's different from what are the approaches and why is it better? Well, there are other approaches that are more classic relational. HBase is not a relational database. I don't think it was ever designed to be if you need relational database technology. You use a relational database. And there's other solutions that are more focused on storing documents or collections and indexing them in relation to the rest of the data. I think HBase is a lot more focused on discrete access. Final question, I'll let you go and great content here with Omar from Cloudera on the cutting edge of big data and enterprise. Tell the folks out there what's happening here at Intel at this developer forum. What's your impression and what's the vibe here? Here it's actually a lot, very energetic, very exciting and very diverse. I think I've been very impressed in seeing, as we talked about mobility as well as things that are deep within the bowels of the data center, as well as real application and solution focus. What do the different components that Intel manufacturers come together to actually solve for real people? Again, Intel, like HBase, touches a lot of people in many different ways that you don't really realize. And now I think Intel is very focused on being able to tell that story and communicate what you use Intel for. Okay, Omar, thanks a lot. This is John Furrier with SiliconANGLE.tv and SiliconANGLE.com reporting here at Intel developer forum 2012 here in Musconi in San Francisco, California. Right back with our next guest.