 The Cube at EMC World 2014 is brought to you by EMC. Redefine VCE. Innovating the world's first converged infrastructure solution for private cloud computing. Brocade. Say goodbye to the status quo and hello to Brocade. Okay, we're back. This is Dave Vellante with Jeff Frickin. We're live here at EMC World 2014. This is the Cube. The Cube is a live mobile studio. We go out to the events. We extract the signal from the noise. Jim Hauska is here. He's the director of technology, classified ventures, LLC, cars.com. Jim, welcome to the Cube. Glad to be here. So cars.com, we were talking off camera a little bit. You guys were founded in 1997, right in the middle of the bubble and you survived that bubble unlike many companies. So first of all, congratulations. And great to have you on the Cube. Yeah, glad to be here. So, classified ventures, cars.com was founded in 97 as you talked about. And essentially, we're owned by, it's a joint venture with several media companies, Gannett, Belio, Graham Holdings, Washington Post and the Tribune Company. And essentially the newspapers knew that there was a paradigm shift coming and they knew that technology or advertisements were moving away from paper and they would be moving into the digital realm. So they had the forewithal to get together and come up with a plan to basically take their business digital. And here we are today almost 20 years later. So it's a joint venture between a number of media companies, is that right? That is correct. So five media companies, represent ownership in cars.com. And it's privately held? It is privately held. Okay, and at one point, it must have had just back in the, in 1999 for example, it must have had just the, what a 50, 60 billion dollar valuation or something. So, were you there at the time? I was not, I've been here for about five years. Okay, so, well I was going to ask you how things have evolved and how things have changed. But, well, five years is a long time. So how, in the past five years, how have things evolved? How is the whole venture working? Obviously you guys are very successful. Maybe you can give us some metrics or whatever you can share. Yeah, we've been tremendously successful. I've been here since 2009 and we've seen double digit growth every year both in revenue and up income and increased site visitors, et cetera. I would say that from the perspective of what has changed from a technology perspective, in 2011 we went through an agile transformation where we wanted to basically improve the time it takes for us to bring product to market. We went from doing, you know, kind of a waterfall 20, 30 projects per year with a lot of upfront planning and really, you know, not that much success to an agile methodology where we're probably doing over 450 releases per year in our new agile framework and we've been able to deliver superior products and services to market through our mobile and web applications. Can you talk about your data architecture a little bit? Sure, so obviously we are looking to leverage data as a means to a competitive advantage. We have traditional enterprise data warehouse architectures and we have more modern data frameworks we're looking at in terms of in-memory caching and in-memory grid solutions, modern data fabrics, looking at Hadoop and HDFS as a means to basically collect data, analyze the data and make determinations in real time in terms of how we can deliver customized content to consumers consuming our mobile and web applications. People talk about, you know, structured content, unstructured content, semi-structured content. You've got a little bit of each, right? Yeah, so we have our traditional kind of structured content that we're housing in either our Teradata or on the VMAX in terms of a block platform and we have unstructured content specifically around images. That's an area where EMC's been tremendously a great partner in terms of our deployment of Isilon. We've been able to take, essentially, if you look at our web applications, they're generating 4.7 billion requests for images per month. So we're leveraging Akamai as a CDN partner to offload maybe 60 to 85% of that, but that still leaves our internal dynamic imaging infrastructure serving over a billion images a month. So there's basically two components to that. The first is Liquid Pixels as a kind of dynamic rendering tier and that has basically, let's say 15 nodes connecting to a shared file system on the Isilon S series that's basically the origin for all of our image processes. I think I heard you say before you're dabbling in or utilizing Hadoop technologies, is that right? Yeah, so we've recently started to look at some HDFS Hadoop MapReduce deployments on commodity hardware and are also looking at what Isilon could offer in terms of decoupling the compute from the storage component and being able to leverage, let's say, the same data across different protocols across object across HDFS. I was listening to Bill Richter the other day in one of the sessions and he said that HDFS is one of the, is the fastest growing part of the Isilon business. So are you going to use HDFS or Isilon based? Currently we're doing some proof of value testing on a TED node cluster that's running on basically two U commodities, just the UCS hardware, but coming out of that we're definitely going to look at if Isilon makes sense in terms of basically being a centralized repository that we can tier in and out of based upon those needs. Yeah, the reason I ask is because you think about it, we talk to a lot of practitioners and they just use the off the shelf drives and especially when they're doing sandbox type of stuff. So the question would be what would be the appeal of doing something like Isilon and HDFS and what would be the potential drawback? Simplicity, manageability, quick time to deployment would be all appealing from an Isilon perspective. And the CapEx is going to be a little higher. In terms of I guess there's this traditional mindset within the Hadoop community that if it's not data locality of storage, it's not efficient. But if you look at like leading edge companies in terms of web scale companies like Facebook, they've already realized that they need to be able to decouple the compute from the storage via a high speed rack at the high speed top of switch rack. And because they were either had too much compute or too much storage and they were essentially purchasing excess capacity they weren't effectively utilizing. Because they were buying it in blocks. Because they were essentially buying it as a all in one compute network storage, one or two you pizza box. Interesting, so. It's interesting. We'll talk a little bit about agile methodology and how that transformed your company, not only in the software development, go to market, how you guys are rolling out functionality but how about in terms of the way you operate? Has it changed kind of what you guys do and the way you look and the way you act? I would say tremendously both from an infrastructure perspective as well as from a development perspective. Essentially before there wasn't the worry about not being able to keep up with the demands of the business. Now there's the worry of we need to be able to keep up with the demands of the business because the business is driving top line and bottom line growth for our company and as we make more money we invest more in technology and the cycle continues. So we definitely needed to put in place technologies that are innovative that basically provide us with competitive advantage to deliver our products to market faster. And the other thing you got to have more flexibility in the way you grow that infrastructure because if you're not doing waterfall with these scheduled releases once every so often but this continuous rollout I imagine based on uptake and new features, kind of reacting to real time information as to what's working and not working you got to be able to flex up pretty quickly. Totally, we used to have a very big blue footprint in terms of hardware and middleware infrastructure and we've been migrating to best of breed solutions. So we pulled in EMC VMAX with Cisco UCS as our compute infrastructure as well as VMware as our virtualization layer and we're looking at various other VMware assets from a cloud orchestration automation perspective. We've also deployed leading edge technologies like Splunk and CompiWare Dynatrace around kind of log aggregation, data visualization, being able to track transactions from the end user browser all the way to the back end. So really having full visibility not only from an infrastructure perspective but from how the end user experiences our applications. And I think that's a paradigm shift that the storage industry is going through from the perspective of we used to design storage from an infrastructure perspective and now we're looking at storage from an application-centric point of view looking at policy-driven automation and other things that can really provide value in terms of this faster-paced agile product development lifecycle that's really necessary to compete in today's modern business world. And I wonder if I could take on that last comment. So, right, taking the application view of the world that's what you guys always want that the industry has struggled to get there. But to the extent that the industry's been able to do that it was all in application silos and you get purpose-built systems for silos. Are you constructing an infrastructure that, for the lack of a better term, call it more horizontal where you're able to support more applications across the portfolio or are you actually purpose-building infrastructure for different apps? It's probably a combination of both. Obviously we talked about we have some appliance-based solutions that are providing very specific needs which are kind of more tactical than strategic, you could argue, because you're choosing to solve things in a certain vertically stacked way, vertically integrated stacked way. But I would say the overarching premises or kind of approach that we've been looking at is horizontal and scalability at all tiers. Meaning that traditionally in a three tier or four tier architecture, you struggle when you get down to the data fabric tier because it's not necessarily horizontally scalable in the way that let's say a web or app tier is. So really looking at how we can leverage modern data fabrics that create horizontal scalability that enables to basically abstract away some of the constraints associated with vertically scaled components of your architecture such as like OLTB type components. Jim, your company's growing very fast. Is your IT budget growing very fast? I would say not as fast as the company is growing but we have been able to make a lot of strategic investments in technology innovation that have made a lot of headway for us. But still it's never, the demands of the business are always growing faster than your ability to support it from a financial perspective. So does that mean that your IT budget as a percentage of revenues is declining? If the revenues are growing faster? I would say it's not, it's, I don't know off the top of my head exactly but I would say that we've been remaining relatively stable and even trying to say other ways we can take it down a little bit. If we have major purchasing cycles where we're bringing in large amounts of whatever it may be, the next year we may not have that same major purchasing cycle because we have maybe a three year asset life on whatever we just brought in. So you still have the do more with less pressure but because you're so high growth you actually have the luxury of getting more funding than many IT organizations. Yeah, so I would say that but it's still the do more with either the same or maybe not do more with less but do more with the same and when the demands are being increased upon you then it's do more with less essentially. Okay, so how much of your time do you spend trying to figure out, if you had to take a pie, how much of your time do you spend trying to figure out how to do more with less versus how to drive business value with new initiatives? We take the viewpoint of anytime we're looking at a new initiative we need to know what's the question, what is the business value? How is this going to enable the business? How is this going to help us drive profitability, consumer engagement, et cetera? Do you allocate some percentage to kind of green field opportunities that are kind of the classic 10% go off and explore and experiment because there might not be an immediate evident So we have a really strong architecture team that participates in a lot of different POCs and kind of exploration of new technologies and then generally speaking, we look to deploy the ones that we know are going to provide immediate business value and then we keep our eyes out for maybe industry trends or things that are changing over time that we still need to be aware of but maybe are not actionable at a given point but we try to spend a significant portion of our time looking at the strategic and how we get out in front of the business or stay in line with the business versus kind of being left behind. So if I use the old metagroup taxonomy, run the business, grow the business, transform the business, obviously, small portion of that's going to be transformed, right? You can't have the whole budget being transformed but run the business and grow the business. Grow the business seems to be, I'm inferring anyway, a big part of your emphasis. Yeah, we have a very big portion of our efforts related to growing the business. For example, when we were talking about the Icelon cluster, if you're serving billions of image requests per month, we had essentially a legacy platform serving that that not only presented organizational risk from a DR perspective, it also was very difficult to manage and it was not performing. When we brought Icelon in, we were basically able to reduce our kind of core image-serving processing timeframe during peak from one second to 100 milliseconds or during non-peak traffic from 250 milliseconds to 100 milliseconds. So now we're essentially serving that 100 milliseconds across the board which is essentially a 252,000% increase in performance. Not to mention the manageability in scalability is just way, way better than what we had before. We used to dread having to, we were spending a lot of time managing our storage versus getting insights out of our data. When you say you're bringing Icelon in, are you bringing in a box, are you bringing in a solution, is it a combination, are you having to build that solution, is EMC helping you build that solution? How does that all work out? In this case, we have a relatively simplistic three-node S-series cluster in production that has roughly around 250 million images on it, maybe 150 million or so which are active and dealers as they're updating their inventory, they want to get that inventory off their lot. We have roughly 20,000 dealer customers. They want to get that inventory off their lot as soon as possible because they're paying interest on that inventory. So they want their pictures of their vehicles to be on the site as quick as possible. Right now it's around 35 minutes in terms of time to glass and when those images come online. So we really were looking for ways not only to improve the speed at which we get things online but also make the customer experience better. You're not going to go out and buy what is one of your more expensive purchases in your lifetime, a car, without being able to see a lot of detailed images that entice you to go to the dealership to test drive that vehicle per se. I'm curious over time with the whole big data and lots of different data sets out there, have you guys started to bring in any other kind of third party data over time to enhance your solution? Yeah, so we use a variety of third party data aggregators that basically pull in data from dealer management systems that DMI, ADP is an example of one of them. So they have access into all the dealer management systems and as dealers update their inventory, we have a feed process that basically pulls those into our systems. And there's another area where Iceland provided a lot of value in terms of we used to be able to process around 80,000 images an hour meaning downloading them from the dealer aggregated services. And now we can do almost 300,000 images per hour. So we've been able to, and in all likelihood those images are of higher pixelation or in terms of quality per se. How do you protect all this unstructured data? We have, from the perspective of our images, we leverage replication from our primary data center to our secondary data center which serves as both a DR and a staging performance testing environment. So from that perspective, we try to rely on replication as much as possible either at the application layer or the infrastructure layer, but then we also have inline, duplicated backup solutions, data domain, semantic backup, snapshotting technologies, other things like that. So. Jim, thanks for coming on theCUBE. I'll give you the last word here. Talk to your peers. What kind of advice would you give them in terms of if they're struggling with this multi-structured data problem? What would you tell them is the thing they should be most focused on? I would say look for a very specific use case that's going to drive business value and then be a champion and drive that business case through. And then once you see the successful results of that and you're gaining the benefits of it, you can use that to kind of cascade into your next endeavor. Don't try to start small and iterate. Don't try to eat the elephant in one bite. All right, Jim, we'll leave it there. Thanks very much for coming on theCUBE and sharing your insights. Keep it right there, everybody. Jeff Frick and I will be back with our next guest. This is theCUBE. We're live from EMC World 2014. We'll be right back.