 Live from Midtown Manhattan, the Cube's live coverage of Big Data NYC, a Silicon Angle Wikibon production, made possible by Hortonworks, we do Hadoop, and when this go, Hadoop made invincible. And now your co-hosts, John Furrier and Dave Vellante. Okay, we're back live here in New York City for Big Data NYC. This is where all the action is happening in New York City for Big Data, we've got Hadoop World, we've got Stratocommerce, a lot of news, a lot of conversations, business models, technology, tech athletes, we're here bringing it to you. I'm John Furrier, the founder of Silicon Angle, I'm John Mike Coase, Dave Vellante. This is the Cube, our flagship program, we have the advanced extracted signal from the noise. Our next guest is Wouted to be with Spotify, great service, everyone should have an app, should have that downloaded. If they do, they love it. Welcome to the Cube. Thank you. So you're a tech athlete, as we say. I'd love to know what's going on under the hood, a lot of tech buffs out to want to know what's going on with Big Data. A lot of real interested businesses, and we had Bill Schmarzl who wrote a book on using data for big business. So obviously data is changing the landscape on the business front, but there's still a lot of huge technical opportunities with it, especially open source, and we were just talking about open sources as being a community thing that can do more than just one company. So that's an awesome thing. So I want to first get your perspective on this show. You guys had a great buzz going on day one, a lot of the conversations around your service, because one, it's popular, two, you're doing some pretty innovative things. So talk about what was going on early on in the show here with you guys. What was the big conversation? Well, the big conversation obviously was the music streaming, right? But when talking about data, from the beginning, we really knew that data was going to drive this company. We knew that we would have a lot of users, that was the goal, and we were getting there earlier. We have a lot of users, millions of them. So that creates a lot of data. We started really talking and thinking about how can we leverage that data to do smarter business. So we've invested a lot of time, money, into building a true data infrastructure, so that we basically, everybody at our company is able to use the data that we have. So data is key, obviously. They're getting the data flywheel going, that's all another thing. You have users, they're on devices, they're sending off all kinds of data, gesture data, all kinds of usage data. What did you guys do? You sit back and say, okay, we're going to be data-driven, great, there's a use case, your customers make a great experience, right? So that's kind of the business goal. What did you guys do next? I mean, take us inside the walls of the company, and just some of the whiteboard conversations, scratching your heads, okay, I'm going to do it this way, multiple architectures, probably some interesting conversations around approaches, platforms, build your own, go open source. Just take us through some of those thoughts. Yeah, we've been a pretty early adopter of Hadoop, because as I said, we knew that we had a lot of data, we're going to have a lot of data. So it all started with reporting towards record labels, license holders, and things like that. And up to a certain point, that's pretty simple. And then we came to a point where actually our CEO, Daniel Ick, I think pretty big visionary in the industry, said, why don't we leverage that data? Why don't we start doing analytics? So that was sort of the next step, business intelligence, analytics. And from there, we've taken that to using data in our product. So we do recommendations, we do machine learning, basically directly towards our users. And Hadoop was sort of a given thing, I think, at that time, and I think still today, it's the platform for doing big data. So we've evolved on that. We've upgraded our versions. We've been looking at different vendors for doing so. I'm sure you've seen the news. We're now doing business with Hortonworks, and they give us a great platform for exploring and exploiting Hadoop. So John, we have the full spectrum here today. We had Sears on earlier, the oldest retailer in the country, transforming, and now we have Spotify, transforming the music business. So how much of that? Talk about your business drivers that are really the businesses pushing you toward. You are a very disruptive business model. A lot of traditional people in the music business probably aren't too happy with you. But at the same time, they're learning, realizing they have to evolve. So talk about some of those drivers that affect you as an IT practitioner. It's basically all about scale. So being able to handle scale, the amount of users, the amount of, there is so much music out there, right? We serve more than 20 million tracks. So being able to scale all of our back end systems across different continents, that is the big driver. So growth of the company is the main business driver that sort of falls down onto IT. And what about the data conversation? It's very similar. The more users we get, the more data we get, the more features that we put into our product. Because Spotify obviously is known as sort of like iTunes in the cloud, but it's so much more than that. We have radio. We have discovery of music. We have playlists and artist pages. We are on a multitude of different platforms, mobile, web, desktop. So with all those features, there's more data. And also, since we've been trying to transform Spotify into a real data-driven company, we've seen that data exploding internally. So basically, across three dimensions, we have that growth. Product features, amount of users that join the service constantly, and internal usage of data. So what does that mean to you as an IT practitioner to be a data-driven company? You hear that a lot. When practice, what does it mean? For us, it really means using data that we have to validate hypotheses that we come up with. I would say that Spotify four years ago was more driven on a gut feeling. Business decisions were made, OK, this sounds good, this looks good, this is what our competition does. But nowadays, we really try to validate our hypotheses with data. We do a lot of A.B. testing, for example. So testing out which features do work, which features don't work. And that, to me, is really data-driven. I remember in the early 2000s, the Harvard Business Review came out with an article basically talking about how great leaders govern lead by gut feel. That gut feel seems to always trump. And they gave a number of examples, whether it was Leia Koka or Jack Welch or whatever it was. So I'm interested in, it used to be a gut feel driven company and now you're more of a data-driven company. What has the result been in terms of the effectiveness of your decisions and the productivity impact on the organization? There has been a large impact there. Since we basically, what we try to do is enable everybody within the company to develop based on data, make them learn from what they do. And because people now have access to data, they can iterate much quicker. We push out a feature, we can, within matter of days basically, we can see how that feature affects user engagement, for example. So that, those learnings we take in and we iterate on a particular feature or on the whole product. So, it's an interesting discussion for me because I've watched a number of waves. John and I, of course, have, because we've been around for a while. I think of the microprocessor revolution, right? It was, you remember, I'm sure, John, the first PCs, right? You get a PC, it was transformational in terms of your productivity. The internet was similar, networks were the same way, right? The internet was the same way and starting to be able to use email outside your organization in a big way. Data, there's not a device I can buy. There's not a thing I can plug into the wall or a wireless connection or a LAN, it's, so what's the metaphor for a PC in the data world? What's the tooling? What does that look like to transform the company? I think we're still trying to figure that out. I think mankind in general still tries to figure out what is sort of how do we do this and as you say, like the microprocessor, we sort of know how that works. The internet knows how that works. We talk about database technology. There's sort of a standard, like SQL, how we do kind of things. If you look at data, big data, we haven't really figured that out. And a lot of people are currently trying to build that. Hadoop is one part and I think that Hadoop is sort of like the real foundational thing there. But there's a lot of different technologies emerging and that will probably converge in the future to that metaphor that you're looking for. Well, Dave, we've talked about this in the queue before and this is why I like this segment because the PC liberated people, gave them access to information through computing power, right? So that the PC allowed for anyone to do stuff, that they'd have to get time sharing on the mainframe. The mobile device is interesting because now I can have the edge of the network, things like Spotify. And what's interesting about the data model there is that the data is access to the data, the liberation is the data, but also the sharing, right? If you look at use cases, people are sharing. They're sharing what they're listening to, they're sharing their gestures, where their status updates are, whether it's Facebook or LinkedIn or Twitter. So the user experience, the liberation is the connections, relationships, and then the overall collaboration. So I think music is a great use case, right? I mean, besides the tech stuff that we're into, talking tech, people love listening to music, right? Lifestyle of the cube concept is tech conversations, but music is lifestyle. So enjoyment is the killer app, right? So, okay, now, do you agree? Yeah. Okay, so in that case, what tools are you using? What technology? You talk about machine learning. Can you guys be specific about what kinds of algorithms you guys are using? What kind of coding? And how does that relate to a Hadoop and open source? Right, personally, I'm more in the infrastructure part of the smartphone data infrastructure, but I know we use collaborative filtering to do recommendations. That's as far as I know. For me, it's really, if it runs on our cluster, then it's fine. Can you describe the infrastructure a little bit? Yeah, we use Hadoop. We're actually currently moving to HTTP2 from Hortonworks, which will give us Yarn, which is fantastic. I think that Yarn will be one of those game changers for us. MapReduce is great, but it doesn't solve everything. We do, we start looking into graph processing, other types of machine learning that require a different paradigm. And I have a bunch of teams lined up already saying, like, okay, when can we access Yarn so that we can start writing our own Yarn applications, for example? What will Yarn do for you? I mean, obviously, it broadens your scope, but talk about that a little bit. Yeah, it will enable us to write different types of applications on top of Hadoop, on top of our cluster. Right now, it's only used for MapReduce, so we try to sort of mold all our problems into a MapReduce problem, and that might not always be the right thing to do. So with Yarn, we sort of take that away, and MapReduce becomes just an application on top of our cluster. But there are other things, MPP, MPI, graph processing that we can start doing, building a more clever machine learning or more efficient machine learning algorithm. And why Hortonworks? We've been in a quite extensive vendor selection process. And we basically found that for us, we've invested a lot in Hive. Hive is used by a lot of analysts. There's a lot of ad hoc stuff being done through Hive. It's easy for people. We've built a large part of our infrastructure around that. And the way that Hortonworks is attacking Hive, making it 100 times faster than it is today, or than it used to be six months ago, that was for us the biggest thing to jump into this. So the expertise around that is one of the moves. Yeah, and they're roadmap. Obviously, we looked at MapR and CloudEra. I know CloudEra is very focused on Impala. But as I said, we have Hive already, and we would like to continue with that. So that was, I would say, the biggest driver. So talk about the cloud. Cloud infrastructure, obviously, we were just talking about Android, using those metaphors earlier, Android, iPhone, iPhone's closed, and Android's open. We were saying, well, Amazon's kind of like the iPhone of the cloud. It's kind of open, but they have an integrated stack. It's beautiful, it's fast, it's great, elegant. Other cloud, like OpenStack, a little bit more plug-and-play, do-it-your-stuff. Do you guys use cloud, and what's some of the architecture to look like on a brand? Most of our infrastructure is in-house. So we have a few data centers around the world that we, where we actually have our own irons. Oh, so your own premise. Yeah. We're looking at how can we expand to the cloud, because there's definitely a lot of benefits there. But what I think that was the initial selling point for Spotify, and I think it still is, is that latency with playback is phenomenal. So you click a track and it looks, you feel as if you're playing that directly from your device, either at your desktop or your mobile. And that is something that is slightly harder to do when it comes to cloud, because. Yeah, you can't control it. Yeah, you can't control it. It's all SLA's on latency, yeah. Although, it's very lumpy in the performance. You can get some low latency, but you can't always expect it. Kind of control your own destiny, that's kind of the mission, right? I think everyone's pretty much at that level. Let's talk about, when you mentioned earlier about the extensive evaluation of the vendors. Obviously, Horton works one, right, with you guys, right? You get, they support you guys. Do you guys look at cloud error? Do you guys look at other folks as well? Yes, we talk to Cloud Era, we've talked to MapR, those three guys, and then, yeah. And what was the main reason that they didn't go with, say, Cloud Era? It was Hive, I would say. We also found that Hortonworks, the cultural fit was very good with them. Great. Yeah, we were just talking open source. You know, the community is bigger than any one company. Yeah. And George Kediva, who runs HP, a great guy over at HP, one of the stars at HP. Runs HP software. Runs HP software said, open source is not about free, it's about freedom. And that's something that we truly believe in. Really appreciate you coming on theCUBE. It's awesome. Love the Spotify service, you guys are great. Continue to do good, we'll watch you guys grow. I'm sure you're going to continue to grow in a big way. A lot of happy users, folks that don't have Spotify downloaded, you can listen to any song anywhere. It's not random, like Pandora, which I do like Pandora as well, but Spotify, so much more elegant and easier to use when I want to hear a song. So to me, that's my personal review, but thank you for what it's worth. Appreciate it. Big data stories, big data conversations here on theCUBE. Big data NYC is the event, we're covering all the action in New York City, Hadoop World, Stratoconference, a lot of action. We'll be right back with our next guest after this short break.