 Live from Boston, Massachusetts, extracting the signal from the noise, it's theCUBE, covering HP Big Data Conference 2015, brought to you by HP Software. Now your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are here live in Boston, Massachusetts for HP Big Data 2015. I'm John Furrier, my coach Dave Vellante. This is theCUBE, our flagship program where we go out to the events and extract the signal noise and again, special presentation for HP Software's Big Data event. And third year now we've been doing theCUBE here. Great event, all about the big data. DevOps meets infrastructure, meets software, meets big data, all that kind of coming together with all the heavy hitters. Our next guest is Jordan Chernev, Manager of Data Technologies at Wayfair. Last time we talked to you, we were pre-public, now you're a public company. You guys are growing like crazy. E-commerce, welcome back to theCUBE. Yeah, thank you for having me. Good morning everyone. Yeah, it's kind of funny how much things have changed over the past 12 months and you're looking back and reflecting. Hey, what are the things that are still happening? And it's just a tremendous growth that is here across the board. It's the more time that we sort of spend, I find myself reflecting. Hey, what are kind of the things that we've done over the past few years? And it's literally being just grow, grow, grow. And what does that mean for a person who isn't the DevOps kind of infrastructure part of Wayfair? It means how do I get ahead of technical challenges so I can solve the business problems in terms of analytics? You got a gun to your head, you're under a lot of pressure. In all serious now, scale is huge, right? We heard Mike Stonebreaker up on stage talking about data science is the career path but you just can't become it, learn statistics. And what we're seeing on the app side is apps are the workload. They're dictating policy to the infrastructure. So to be large scale, high growth, you got to get out front of those curves. You got to understand the pressure points for scale, right? So I got to ask you, okay, how is that shaping out? Are you guys rolling out infrastructure? Do the app guys, are they involved in the conversation? Where is the line on the stack where you see this integrated stack kind of trend going on? Again, it's speed, it's DevOps, it's software driven enterprises. Take us through kind of the mindset of what you guys do and where's that line? Does infrastructure go to the app or where does big data fit into all that? Sure, sure, sure. So essentially, if you guys listen to Stonebreaker's keynote this morning, he mentioned the three Vs about data, it's volume and kind of velocity. From infrastructure standpoint, you're looking to kind of get ahead of those. Over the past 18 to 20, kind of 24 months, we've been fighting this never ending battle of, hey, are we going to have enough capacity to store all the data that's most of our analytical and data science teams are looking to engage in their applications. Are they going to be able to satisfy SOAs, SOOs in terms of performance from a customer or an engagement standpoint? How do we get ahead of that? And all these conversations do not actually involve a single person. It's actually a team of people who are architects in their respective fields. We meet people who are architects for wayfarer from a database standpoint. We meet people with architects who are from sort of a standpoint, which are kind of more on the hardware side. Do we get the right file system solutions? What are kind of the NFS shares that we may need? We also meet with application architects who kind of dictate, hey, I want to be able to do this because the business requires us to do this particular metrics and this particular amount of requirements so we can satisfy the business go off, engage in the customer more, getting some more information for them in the first place. So everything is sort of like, everybody gets engaged, everybody's designs, and everybody's vote in terms of input. And once we kind of figure out all the different bits and pieces, the good thing is that we attack the same problems from multiple angles. If one kind of solution doesn't work very well because, hey, this is going to be really difficult to implement in terms of both time and resources, can we do to solve this at a different part of the stack? Can we put this up in the application layer? Can this be solved, say, at Tableau as opposed to Verica? Can this be solved at a storage layer? Do we throw more capacity in the system? Do we optimize code? It's always cool to see that at Wayfair we always approach and engage these problems at multiple attack points. I've been at previous organizations and you don't necessarily always, oh, absolutely. You can see that everybody's going to. It's like playing a video game except the stakes are higher. Yeah. I commented on camera. How is it your website is so fast? I mean, your website is really, it's truly awesome. I mean, you said a lot of hard work. But now, Stonebreaker basically put forth the scenario that all this Hadoop hype and NapReduce and HDFS and Spark, this is all still all about the data warehouse. And the data lake is just one big bit bucket and a junk drawer that's a data warehouse. Do you see it that way? I don't necessarily see it. I think there may be a bit of a marketing buzzwords kind of war going on in terms of how do you actually position your solutions in the marketplace so you can differentiate yourself from everybody else. Hey, we're offering this, so we're going to be a data lake now. The way that I approach this is, hey, what is the right solution for the problem we're trying to do or solve? Is this the right tool that we need in our tool belt? What does this mean from, hey, do we need some people who are going to make SMEs on that particular piece of technology? How much can we push this particular piece of technology? What are the pain points? What are the limits? Eventually, the end goal or the high level goal for everybody, wayfarers, solve the customer problem. You're trying to provide value for the customer. How do we make wayfarer the place to be for people to shop? And we have the zillion options that we want to just say, you know, present to everybody, in terms of like, you know, furniture, decor, all that. How do we provide value for the customer? The actual solutions that we tend to pick, we sort of trying to fix it for solve for a particular problem. But traditionally that EDW was sort of the sun and the solar system. And you would sort of develop, you know, BI and analytics sort of around that embedded into the database itself. And is that changing? Are you starting to see analytics become, you know, more tightly aligned with applications? I know there's still a lot of customization going on, but I wonder if you could talk about that trend and that change a little bit. Yeah, you definitely see a lot of proliferation in terms of technologies that you're trying to use, not just the traditional enterprise, like, you know, data warehouse that you had previously. The advent of Fidup, that sort of changed that a lot. You see companies use the traditional, one-two kind of like, you know, combo between, hey, we're going to use Fidup and we're going to use solutions like Vorica or maybe like IBM, it is like, you know, combining those, then you're going to start using things like, hey, I need some R, maybe I use some R studio, maybe some distributed R. You know, how do I line up all these different components? It's becoming harder and harder to integrate all these pieces because I can only solve this by using this, like, you know, this problem by using this, the integrating and actually getting the right team in place who is able to run all these, you know, technologies at the best performance, that's a different type of challenge as well. But you definitely see that there are more pieces to the positive these days. Jordan, I got to ask you, is one of the things we love to opine and speculate and pontificate on is the cloud. Okay, and so it's pretty clear from the Hadoop Summit, our last event, we're going to be in big data NYC Hadoop Week next in October. And again, it's converging around, most people are stalled with Hadoop because it's a great data warehouse, it's a data junk drawer, data landfill, whatever you want to call it. Data ocean. Data ocean is a little bit different, that's my definition, more dynamic, more relevant. Cloud powers, infrastructure powers, the analytics capability. I want you to comment on, one, your view of the cloud as a company and maybe personal comments as well, that's fine. And two, cloud on-prem still, it's a resource. How does infrastructure, whether it's cloud on-prem, there's already reasons why a public company may have on-prem, but what's your view on cloud? Are you there? Are you going there? Are you not going there? And then the role it plays in powering really next generation analytics, the kind that HP was talking about on stage here, ones that are real-time in the apps, really providing great value. Sure. So before I can answer most of these questions, I have to give you some Wayfair history. Our CTO, Steve Cohnine, kind of like made the decision of here let's go for an on-prem as opposed to a cloud. Obviously Wayfair will start it back in 2002, so cloud was not a thing back then, so we used a separate data center that we had at the time. As the team grew and we sort of started becoming a much, much bigger engineering department, we decided that hey, there is an actual strategic advantage to keeping most of this physical kind of in-house, kind of keep it on-prem, because that allows us more flexibility in terms of innovation. Hey, maybe I can control all my infrastructure better, most of the problems that I see with infrastructure, maybe the awesome people that I've hired, they can solve some of these problems for me as opposed to me having to rely on a third-party vendor. As time progressed, we sort of started seeing more and more people opting for the cloud infrastructure. We've sort of looked at that, but never kind of actually acted on it for a variety of reasons, some of it's compliance, we do a lot of stuff with personal identifiable data, so we feel like security is a little bit better if you're on-prem as opposed to in the cloud at this point in time. That said- You're controlling your own destiny, basically. You got a data center, you got a huge commerce engine. It's hard to change the airplane, change models, and the engine at the same time, right? Right, it's very difficult, and actually we've gotten a little more aggressive to the point where we don't just have one data center anymore, we have three globally, so they're in different geolocations across multiple continents, so you can see that we're still subscribing to the, hey, this is our own thing, this is our own-prem implementation. We believe in the people who can make that particular- Okay, so what about the cloud would be attracted to you? Just to say hypothetically speaking, I know you're not speaking for a way here, but let's just say, can you envision a preferred future where, okay, you got the on-prem, totally buy that, you become the data, a lot of leverage there, a lot of sunk costs, and also you guys could end up being the Amazon for your own market, right? So I can get that. But where would you use cloud if you could envision that? I would use cloud for solutions that require a lot of elasticity. So some of the probably technologies problems that you're trying to solve are, hey, I have something that has a huge spike in workload, but it's very intermittent. I can give a very good use case for some of these. Are you guys familiar with the flash sales type of websites? Sorry, what? The flash sales? Yeah, sure, absolutely. So they follow that type of workload, hey. We're victims sometimes. Yeah. And those particular sites linger, stay very quiet for periods of time, hours on end, and then in one hour you just basically have to satisfy a lot of bottom interests of requests. So what you're kind of hoping to do is, if you have that in the cloud, that infrastructure in the cloud, that allows you to kind of be cheap and get a lot of ROI and most of the infrastructure that you put in there. Other solutions that you can probably use therefore is if you're trying to store what people call big data sets, a lot of volume, a lot of velocity, that's what elasticity also helps for the cloud. You're going to be able to store things like, hey, can I do machine data? Can I do sensory data? Can I do semantics? Can I do clickstream? Most of these things where you traditionally may have or may need. It's like spot computing, like a spot price is in the Nazis, hey, we want to throw a bunch of compute at some workload data. So I wonder if I can follow up on that. So it's saying it's the ability to deal with unpredictable workloads, more so than the simplicity of a sort of an integrated data management approach, which a lot of the cloud guys are doing, because what I'm hearing from you is, Wayfair sees the ability to compete and differentiate. So you're not looking for that simplicity of data management that's less functional. You're looking for elasticity in certain use cases, but to maintain competitive advantage for the on-premise stuff. Is that a fair summary? I believe so. Interesting. So I was talking about the software piece now. I kind of got the cloud. Where's the analytical engine or is it up and down the stack? I mean, software guys right now are writing large scale, I say DevOps app, but cloud-ass mainly, or large scale on-prem infrastructure powering software. So if I'm a software developer, what's your, how do we work together? Let me say, hey, Jordan, give me some big day, big, big iron, give me some bare metal. What's, give us, take us through the, the provisioning, you know, day in the life and escalated into like one minute. Sure, absolutely. The way that we work with our kind of environment, we make, we draw a very good line between infrastructure and application development. That said, we allow people to interface with us, like, you know, pick our brain, hey, how do I best implement this? What is the best design for this particular application? This is especially true for some of our mobile solutions that are coming out right now. There are a couple applications out there for, for your phone, if you guys are interested, you can probably download them. It's the Wayfair mobile app, it's the Jocelyn main mobile app. I have the Wayfair mobile app. Oh, you do? That's awesome. Yeah, so we, we get a lot of, we get a lot of like, you know, direct face-to-face contact, hey, how do I, how do I do, how do I best do this? Can I leverage existing solutions that we have in our environment? Sometimes we, we have already solved a particular problem in one, four applications, so we can just borrow the design, like, you know, okay, let's, let's take it implemented across the board. So we got a question from the crowd. Sure. So over the last several years, there's been explosion in MPP SQL databases that run on Hadoop. In addition to the distro vendors, you got HP's database, you got Facebook's Presto, Pivotal's Hawk, Actia, and IBM has stuff, Oracle's got stuff, Teradata, Aster, et cetera. How do customers like you make sense of this explosion of choice, and what are your critical decision factors? It's actually a very, a very interesting question to ask from the audience. Thank you for asking that. Our process was very, very interesting, and we sort of had to do the same thing back in 2013. Basically, Dave Drolet, who is our senior director of analytics, myself, and Ed Macri, who is our SCP of analytics, sat down and said, hey, we're looking for a new solution that will kind of like, you know, help us grow. And we sat down and said, hey, what are the critical things for you guys from an analytical standpoint? So everybody righted off, like, you know, here about 15 to 20, like, you know, key things that are required for us. Do we need like, you know, software development? Do we need like, you know, high speed, high concurrency? Then from the database point, which is where I come from, we decided, hey, here like, you know, the top 15 to 20 items that we have, like, you know, from our standpoint. We took all these together, we pre-packaged them, put some weight on them, and we kind of came up with a short list of maybe 35 to 50 requirements that we started looking at vendors. And as somebody kind of like, you know, in the audience pointed out, it's particularly difficult for you to pick a solution right now. Most because of there is so much options, there is so much variety. And most of the solutions, you know, like, you know, blur, kind of like, you know, blend. So if you don't have that particular requirements map that solves your internal solution, you know what you're looking for, you know, to be able to solve this, it's going to be a hard time. So my recommendation is, organizations start with generating those list of requirements internally. As you kind of like, you know, get to that point, you can start reaching out to vendors and saying, hey, are you guys able to solve this particular problem for me in this particular environment with these six requirements? What about if I wanted to grow with this platform for 18 months, 24 months, 36 months, how do I scale this? Asking the hard questions, asking vendors to prove that they can actually, like, you know, solve most of these questions for you. Those are kind of like, you know, the things that we did in 2013. And the vendor that kind of like, you know, we ended up picking out in early 2014 was HP Verica because they were the best fit for our environment. Jordan, thanks for coming on theCUBE. Really appreciate you taking the time, sharing your insight. Final comment, I'd like you to get to the last word here, is what is this event about? I mean, share with the folks out there the vibe here. You've been here multiple years. It's kind of gotten corporate bigger, but they tried to balance that out on the keynotes. But what is going on here? What's the big thing happening at this event? To me, the way that I approach this conference personally, and I think maybe a lot of folks will agree with me, it's about engineers trying to solve cool problems with Big Dead and Alex. If you actually take a look at the list of sessions that people have on the agendas, you can see that it's very engineering gear. Like, you know, hey, how do I solve this? How do I integrate things like Apache Kafka and Storm and with Verica? We have sessions on like, you know, clickstream. How do you get the best ROI from most of these, you know, events? So to me, it's still like, you know, a slightly different vibe, but it's still about engineering, solving problems for businesses. The key thing is engineering, real engineering going on across a broad set of things, not just developers of data science, engineering. Attacking the problem from multiple angles, way fair. They engineer excellence in your platform. Congratulations. This is theCUBE, we'll be back with more insight and data from the HP Big Data Show, hashtag HP Big Data 2015. Go to crowdchat.net slash HP Big Data 2015. Join the conversation, be on the record, ask us questions, we'll take them on CrowdChat. We'll be right back after this short break.