 Live from Boston, Massachusetts, it's theCUBE at the HP Vertica Big Data Conference, 2014. Brought to you by HP with your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone here. Live in Boston, Massachusetts, this is HP Vertica Big Data Conference. This is theCUBE, our flagship program. We go out to the events and extract the signal and the noise. I'm John Furrier, the founder of Silicon and I'm joined by my co-host Dave Vellante, co-founder of wikibon.org. Our next guest is Jordan Chernev, who's a system architect at Wayfair.com. Welcome to theCUBE. Thank you for having me, guys. Great to have you on. Want to talk about some data in action, talk about how you guys are using the technology to change your business. First, explain to the folks, what is Wayfair? What do you guys do? So Wayfair is an actual e-commerce platform for home, office, furniture. It's been around since 2002. It actually didn't used to be Wayfair. It was CSN stores at the time, so about 200 different websites. In 2011, we had a big strategic initiative towards consolidation of all these stores. And once we did that, we combined everything together under the flagship umbrella of Wayfair.com. We also have three more additions under our portfolio of websites. We have jossinmain.com. We also have O-modern and we have O-modern Baby. How many brands did you guys consolidate into Wayfair? What was the number? I think it was about 200 plus. 200 plus, but they all have separate siloed architectures, different sites, different technologies. Yeah, absolutely. Facing web-based, right? Yes, so I think one of the main initiatives that we wanted to do that is we wanted to provide a unified customer experience for shopping across different websites at the time. And we wanted to bring a platform where people can just go and say, hey, I'm looking for this particular type of furniture, be it home, office, business, everything. And that was the main drive for that. Yeah, I didn't even know you were never Wayfair, right? To me, it was always Wayfair and you guys are obviously specializing in anything home, really, and, like, say, office. But so, how do you position, compete against, differentiate from the big whale in the retail business, which is Amazon. Presume like everybody else, you got an Amazon war room. I wonder if you could talk about that a little bit, because that's a real driver of your business and pressure as you as an infrastructure pro. Can you talk about that a little bit? Yeah, absolutely. Everything that we've been doing as a company since the early days has been very data-driven, very data-focused. So every single business decision that we try to make as an organization has relied itself on data. We have this concept of data democracy internally. So this kind of touches a little bit on the concept of self-service BI. We believe in every single person from any point of the business, be it an engineer or be it somebody in, say, in category management, everybody has actual tangible access to all the data and all the metrics that we have collectively as an organization. So that type of self-service BI drives very small department level decisions all the way up to strategic initiatives. So that kind of helps us drive growth across the board and we've been growing 50% year over year since 2002. How does a business user interact with the data? Is it through some kind of visualization tools? Is it through spreadsheets? Is it through a system that you've developed? Talk about that a little bit. I love that concept of self-service BI, but I'm a skeptic. Definitely. It's actually been a platform that has been in development and constant evolution ever since 2010. Before that, our data sets did not include anything in terms of big data because, A, we weren't at the level of organizational needs to capture that information at the time and B, we didn't actually start looking at specializing our data stores. So around that time, we started looking into more specialized technologies for analytics. At the time, we looked at IBM ETSA because that was probably one of the flagship technologies of 2010. It was just starting to become a thing in terms of MPP. You started hearing people talk about that as a term and we started looking into that. As we progressed and as other technologies started to become more and more sporadic and actually more prevalent, I'm sorry. We started looking at other things like Apache Hadoop. We started looking at more specialized, common-oriented stores like HP Vertica. So we're just looking to elevate our game from an infrastructure standpoint. Now going back to your question, in terms of how do the business users interact with our data, that is very Excel spreadsheet-driven. We also have Microsoft Analysis Services Web Stack. We also have a visualization tool in the face of Tableau, which we think is probably one of the best out there in the market. Okay, so you've developed a capability to essentially put data into a spreadsheet so any business user can utilize it and then manipulate it any way they see fit and then the users are trained on Tableau as well? So the way that we approach this is I want to clarify a little bit in terms of the Excel spreadsheet. Excel spreadsheet manipulation and interaction is about 90 to 95% of all use cases for every single business user in Wave. You have exceptions in terms of, hey, this is something that people can actually look at in terms of dashboards and we have specialized dashboards that look at particularly interesting business metrics that are real-time or near real-time that people go to those dashboards for. So we leave the design and those dashboards to our specialized teams in terms of business intelligence and those guys are doing a great job of creating and maintaining those but the actual business users will be interacting with those dashboards at the very end. I see, and then I wonder if you could talk about, you mentioned Hadoop, did it start with Hadoop or did it start with the sort of the Natesa Enterprise Data Warehouse platform and then why did you move off on the Natesa to Vertica? What was the driver there? Yeah, so it's actually an interesting story and I'll walk you through the evolution of our platform a little bit. So we had a need to capture unstructured data in very large volumes, very large quantities at a very high speed. So I think that's what most people may be referring to that is a traditional clickstream data set. So we're basically trying to get understanding of, hey, how are people interacting with the website? How can we do things like A.B. testing? How does this impact revenue? What are the next business features or priorities that we wanna focus on in terms of making the website more usable, making the website being more friendly? How do we actually help customers find the things that they're looking for? So Hadoop came around that. We wanted to get a lot more on deeper understanding of, hey, how does this actually work? How are people interacting with the website itself? So we started augmenting the existing IBM Natesa infrastructure with Fadoop and at some point maybe like 12 or 15 months into it realized that now we have a tremendous amount of growth and data that we need to look at and based on how fast the business was growing, we were more or less looking at vastly overgoing our existing infrastructure in terms of MPP at that point. So this is when we started looking at, hey, how do we make the next step? How do we make the next leap in terms of both strategically positioning for Wayfair and how do we make our analytical stack like better and faster for both people who are using it and people who are working on it? So your existing platform just couldn't handle the volume? How much data were you talking about? At a time we're looking at about 30 terabytes of this is just purely structured data at that point. Right now we're looking more at about, I would say 60 terabytes. And I think that's growing day by day. Obviously, when you're growing, your graph of growth goes like this. But it's not an enormous amount of data, right? I mean, so what was the bottleneck? The bottleneck was more or less we were trying to solve problems for speed, delivery of data, how fast can you materialize data capture, the actual data point that gets captured on the website to an actual business inside that people can use and make strategic decisions on a daily basis. So time to delivery, if you want to like, let me down where was where we wanted to focus our time on things like real-time analytics or near real-time analytics are more or less a better fit for technology like Puerto Rico. Koko, you compressed that time to deliver. Absolutely. In the Hadoop infrastructure that we're using, was it sort of an open source Apache Hadoop? Was it a vendor distribution? So we looked at different options at the time. Right now we're using Clouder, but this doesn't mean that we're not necessarily looking at other alternatives. But at some point we'll probably re-evaluate and maybe think about... Are you paying for it or are you using a free version? We're using the open source version of it right now. So you guys have a lot of funding. You raised over $300 million in venture funding. Revenues are awesome in terms of the e-commerce piece. We had Jim on earlier, the United States Post Office. E-commerce is the hottest trend. Retail is booming. Big data on the web certainly has been around. Predictive analytics around ads and doing all that stuff. That's great. But I got to ask you about the next wave that's coming, which is the mobile surge. How are you guys dealing with the mobile in particular? Because that's going to be certainly a traffic driver in terms of user experience, one for growth, traffic, and also user acquisition. But how are you using the data in the mobile use case, both for user experience and for some of those collective intelligence, predictive analytics decisions? Absolutely. So to give you guys a background, we already have a couple of mobile apps. One of them is the Jossin main app that allows people to shop online for our flush sales sites. We just launched the Wayfair app a couple of months ago. So we already captured that amount of data to an extent. Right now we have a project of collecting and integrating that data to the overall data set that we already have. So we don't want to think of that data as a vacuum. We want to think of, hey, this is just another channel. How can we look at those things holistically across the board? So that's one of the next projects that we will be working on in the fall. And how far along is that mobile initiative? Is it a concept? Is it some penetration out there with users? I actually don't have good numbers to give you, but I know that the Jossin main app has been around for about 18 months to date. And I know we have a pretty good penetration in terms of the market. I can probably. The demographics tend to be on the younger side with the mobile, especially on the news and content hostily with things like the Buzz feeds and the social networks. But a lot of the older folks still aren't seeing that conversion for the older demographics. You guys tend to skew north of 30 years old in terms of demographics because of just the products? I would say that the biggest groups that at least these are the numbers that I know for Jossin main are those are metrics that we can look at. And they're usually between 30 to 45. It's my wife. More or less. Love's way fair. Does she use it? Yeah. They have a ton. They sold 10 million. What's the couches I heard? Is that story? Yeah, we have a product catalog of five million plus products as of today. What's the biggest surprise that caught you from a technology perspective? Could be good or bad, like a big something that you didn't expect to happen in your career recently? I would say it's the rate of growth. We're overgrowing solutions like super quick and we always have to be a lot more proactive in terms of, hey, how do we think about the next 18 months? Usually when you think about those things is most people are like, hey, we're going to build this and you're going to last us maybe three to five years. Whereas our lifespan is way more contracted and you always have to build new tools and looking for, hey, how do we make the next leap? How do we grow 10X, 20X? How diverse is the data? Presumably you're combining multiple data sets certainly more than two. Can you talk about that a little bit? Specifically, how diverse is the data and what about data integration challenges and data quality challenges? How are you handling those? Yeah, so you're touching up on something interesting that is a very traditional problem with data warehousing is that is integration. We have our own custom built tool that extracts data from relational databases and puts it automatically on the fly into our vertical platform. We do the same thing with the bigger data sets that we have within Hadoop at that point. And it's an interesting challenge, definitely. But I think we're in a good stable position where we're mature in that process. In terms of diversity, I think we're combining very traditional business problems that have been around for a while that which are more central to enterprise data warehousing like BI. It's just the challenge of, hey, how do we take this traditional data set and take your new data set as a clickstream data set and a structural one and you combine and enrich and dose. We're also looking at other options like, hey, network traffic, how can we predict specific network failures? How does it relate to revenue? The biggest challenge with those data sets is trying to come up with reducing the signal to noise ratio because we have way too much data and what is the meaningful piece of that at that point? You need to extract the signal to noise. That's what we're here at theCUBE. We want to follow up with you, George. It's been a great interview. Again, this is a great example here at the HP Vertica event where the big data impact is really not about the vendors but it's people using the data. You guys are a great example of at scale, e-commerce, retail, and just beginning. I mean, the revenues are good, you've funded, great valuation, success. But the work is just getting started with mobile. Don't only imagine the internal conversations going on around that. You have networks and all that good stuff with virtualization. So congratulations and thanks for sharing, George. And here inside theCUBE, wayfair.com, great example of consolidation of brands, one global platform, unification. Really, really, this is the future. You're watching theCUBE. We'll be right back after this short break.