 Furrier, we have a great guest now from Edmunds.com, right? Gregory Rokita, is that how you pronounce your name? He's the director of software architecture at Edmunds. You guys know Edmunds, right? Every time you go buy a car, you hit Edmunds, and you try to figure out what the invoice cost is, what other people are paying, what features are you? Mix and Match is a fantastic website. So welcome to the Cube. Thank you very much. It's great to be here. Yeah, so get nice and close here so the audience can hear you. So now, let's see, Gregory, why don't we start? Tell us a little bit more about Edmunds and what your role is there, and then we'll get into what you're doing with the Duke. So Edmunds is the largest website to empower automotive consumers to make decisions. We provide a plethora of information about cars and vehicles. You make model information options, colors, and pricing. So when people go to our site, they're able to find the best deal. We can direct the consumer to the dealers, and we try to help them as much as we can in their purchasing decision. Right, and Edmunds makes money, what, through a combination of advertising and referrals? Advertising referrals. And we're pursuing right now a tier three, which is direct dealer relationships. So that's a big effort that we're currently trying to pursue. As I say, it's a very useful service, many of us, if not most of us, use it. Talk about your data challenge. So we had a really great talk yesterday. We had a lot of great questions afterwards. And the message that I try to communicate to people is that the technology changes so rapidly. And also, the other fact is that the new data stores and the new applications come up so rapidly that you have to position yourself to take advantage of those things. So when Hadoop and HBase came along and we decided, well, we need to use it for our business analytics, we were like, how do we do it? But we already invested so much time in our publishing architecture that allows us to move the data from the source system to the destinations that we were able to do a plug and play. So we didn't have to invest a lot of effort into taking advantage of the new technology. So I would give advice to others. Take care of your integration. Take care of the system that allows you to move the data easily with your enterprise. Take us through that. I mean, how did you do that? So obviously, you have an existing system. And one of the beautiful things about Hadoop and HBase, as you mentioned, it's new. I mean, a couple of years ago, when you architected your system, you had other technology. So take us, walk us through that process. So before even we started using Hadoop and HBase, we tried to separate our data sources from our destination applications. So instead of using databases on the website, we tried to use more modern technologies, like coherence and solar. And instead of doing a tight coupling between our source database and our destination, which caused a lot of problems, we invested a lot of effort in our publishing infrastructure. So we were able to publish our source data independently into different destinations. So when Hadoop came along and HBase came along, we just pretty much reused that system to populate the data into those destinations. So you hear from Mike Olson and others, Amaro Delos and the Cube yesterday. Basically, one of their design objectives, which you could drop in Hadoop into your existing IT infrastructure. Exactly. And I heard that. And I said, hm, I wonder if that's real. So you're a practitioner. Yes. And that just sped our development dramatically. I mean, before, we used to spend months developing reports. And now it's a matter of weeks. So we spent a lot of effort to make it generic. In a way where a new data set comes along, we can just pull it in easily into the Hadoop and HBase. So part of the effort was, how do we make our development agile? And a lot of people say, well, data warehousing is not really agile. It's really hard to apply. It's kind of arcane technology. But once you bring Hadoop into the picture, you realize that your data warehousing and business intelligence becomes kind of like another regular development platform. And you can take advantage of a more agile methodology. So you're saying, in a way, that Hadoop has agileized your existing EDW? Exactly. Can you give us some more detail on that? Because I have the same reaction, enterprise data warehousing is cobbled together. It's slow. It's hard. It's painful. I'll tell you another anecdote. So we're actually hiring people. And we had this job description that had a lot of ETL terms and kind of data warehousing. And we didn't get in much traction. So we're like, let's just rewrite it a little bit. Let's add a little bit of spice to it. So we put big data, Hadoop, and all of a sudden we get all this resumes. And it's the same job. So it's like, you know, those little things, an image sometimes, but it's not just that. It's a cool factor with Hadoop, because people want to be on the cutting edge. But we talked with the guy from Hedat, from Yale University. He said it's really cool because you can talk about theory and then build it fast. So people want that in this new market. Yeah, exactly. How many people did you hire? Well, we're just getting a lot of resumes. So we're going through the process. But you're lucky. A lot of people are trying to find that. And it's hard to find these data scientists now. And are you finding the same kind of challenge with data science? To us, it's trying to kind of align that the data warehouse with other kind of groups within the company that already moved past. And we're able to kind of take advantage of those agile approach and design thinking approach. Edmunds is kind of trying to stay on top of the latest and greatest, not only in the technology, but also in the way we do process. So for us, it's not just the technology, it's also the process. And if you kind of structure your process in the right way, you can take advantage of the technology in a better way. What are some of the benefits you're getting out of this new system on the analytics you mentioned, with HBase and the data? The other great difference compared to our previous system that we used is just the fact that because we can load the data so much faster and we can query the data so much faster, we allow our business analytics to be way more productive. So we integrate our Hadoop with Netiza. And what that allows us to do is we can do queries that before couldn't even finish. Not only they were not taking a long time, they couldn't even finish. So our business analytics is way more productive. We can deliver reports way faster than we did before. Can you describe what wasn't delivered or you couldn't get the queries, what specifically were you doing? Sure. So previously, we were using Oracle to a greater extent, Oracle Rack. And there were some, I mean, we get a lot of data, especially from our logs. It's billions of rows a month. Well, maybe not billions, but close. So once the data grows to a specific size, I mean, certain queries will not be able to perform or even finish. So by our integration of Hadoop and Netiza, we were able to actually not only bring the load and query time down, but also able to execute queries that we were not possible previously. How much data are you managing? We currently aggregate about two terabytes a month. So we aggregate all of our structured data, like vehicle data, vehicle information, pricing, dealer information, inventory, and also unstructured data, which is the data that comes from clicks from the website. And also leads, so the referrals to dealers. So we're talking hundreds of terabytes. Yeah. And another thing is that we were able to bring all of our data sets into one data store, which is HBase. And previously, that was not possible for the company. We had solar. We have coherence. We have relational databases for structured data. But the flexibility of HDFS and HBase allow us to bring all of those structured and unstructured data sets under one roof. And allows us to develop applications that were previously not possible, because we can relate all this information. We can correlate the lead information to the dealer information, to the clicks information. So it's a really powerful concept, I would say. So I'm interested in this relationship between the traditional enterprise data warehouse and these new emerging Hadoop applications. Do you see them as growing together, kind of like the gentleman that JP Morgan Chase showed yesterday, showed a very interesting chart. And there's that debate over what that's going to look like in the future. What do you think it's going to look like? Give us your prediction. I can't really predict the future, but I know that new technologies will be emerging. And going back to my previous point, is position yourself to the fact that you will have to integrate with new technologies. So it's possibly going to be Hadoop and HBase, but it's quite possible that another startup is going to come along and develop something great. And first, don't try to predict your message. But predict that change will happen. I mean, that's the best way to position yourself. And prepare for that. Make your systems easily integratable with the new and future applications. What's your take on this event, Hadoop World 2011? It's pretty dynamic. So I want to get your impression of kind of what's happening here, but also talk about how you guys are staying on top of everything. I mean, we're seeing new stuff come out like Mahoud and other components. You've got Hortonworks competing with Cloudera. And you got to run your business. You've got a good solution with Hadoop. So what's the vibe here? What's your opinion of the show and the ecosystem? And then how do you stay on top of it? I mean, I think it's great. I mean, as you said before, it's going more mainstream. People are taking notice from different industries. They're interested. They ask really great questions. The nice thing about what Cloudera is doing is that all those components work together. And we don't have to worry about versioning issues. So we're using, for example, Uzi for our job coordination. And it works way better than the systems that we did before, like OEM and cron jobs. So I think the fact that there's so much momentum adds to the value, because developers feel like they're learning systems that are useful, that they can kind of rely on the future. Greg, we've been having a discussion this week about, of course, we love the marketing wars, right? And Hortonworks comes in, and EMC last May. And generally speaking, I think our conclusion was that Cloudera is open, obviously, open enough. Now there's a management console, which is not open. It's a proprietary console. And there's a discussion that we've been having. And I'd like you to weigh in as a practitioner, which is the lock-in question. So the theory, the premise is at scale. If your processes are tuned to or tied to a management console that's proprietary, you're switching costs are going to go up. Does that concern you as a buyer? In practice, Edmunds is actually currently not using those tools. We actually develop our in-house Chef provisioning, which allows us to manage our cluster. So it's still open source. I don't find it to be too concerning myself. I'm kind of thinking about the way Cloudera put themselves in a position where it's the easier you make it for people to use your system, the less incentives they have to use your other solutions. So it's kind of there in a little bit of a position of, I wouldn't say uneasiness, but it's a tricky position to be in. And they have to be careful how to position certain products to be open source versus not. But I guess it's their problem to solve. And I'm sure they'll do a great job of doing that. If I understand your point, if a sales rep comes knocking the door and says, hey, will you buy this? He'll say, well, no, I'm getting all this stuff for free. And I have my own homegrown management system. I'm good. So Cloudera is not currently deriving revenue from Edmunds, is that correct? Currently not. But yesterday there was this talk on Weeby Data, which I believe it's one of the Cloudera projects. And they try to add value. And someone asked about licensing costs. And they haven't figured out yet what the licensing for that component is going to be. But it's going to be part of the Cloudera distribution, which gives you the advantage of versioning and all this stuff. So I'm sure they'll figure a way to provide additional value to people at the same time, make money. Yeah, absolutely. So OK, well, we're here with Gregory Rakita, who's with Edmunds.com talking about the use cases. My last question is you've given some advice to users. But let's say you're a practitioner that really hasn't gotten into Hadoop at all. You're maybe here. I think Mike said two-thirds of the people here were actual users. Let's say you're one of those who's just kicking the tires. What would your advice to people who are just getting started? I think at the beginning, the cost of entry might seem kind of overwhelming to people. Because even simple things like setting up a cluster might be too much for certain people. So if you have a strong technology team, I would say go for it. You might need some help at the beginning from outside. But I think it's really worth taking. So my advice is to be open-minded and take advantage of the fact that you will have different use cases and different applications for different purposes. So position yourself in a way where you can take advantage of that. Excellent. Gregory, fantastic use case. I really appreciate you taking time and coming on YouTube and sharing with our audience what's going on at Edmunds. And good luck with your projects. Thank you very much. Great to meet you. Great to meet you. Thank you.