 Okay, welcome back to SiliconANGLE Wikibon's theCUBE, our flagship program. We go out to the events and extract the silo from the noise. This is theCUBE, I'm John Furrier, the founder of SiliconANGLE. I'm joined by my co-host. Hi everybody, I'm Dave Vellante at wikibon.org. We're here at the Vertica User Conference in Boston. We're here at the Western, at the waterfront, and a beautiful day in Boston. Rob Winters is here. He's the director of analytics at Spill Games. He's leading big data and personalization and predictive analytics initiatives at the gaming company. Rob, thanks for coming on theCUBE, welcome. Oh thanks, I'm really happy to be here. Yeah, so gaming is a great use case for data. Why don't you tell us a little bit about Spill Games and your role? Yeah, absolutely. So I'm the director of reporting and analytics at Spill. My team is responsible for the entire data flow. So from the point when someone decides that they want a piece of information available, to the point where it's actually presented either in a report or put back into production and used to make a personalized gaming experience, we deliver that entire information. Full pipeline. Full pipeline, from JavaScript on the front end, all the way through to reports on the back. Now, in developing that approach, did you start from sort of with a blank piece of paper or did you have to undo a bunch of processes that were already in place? Talk about that a little bit. You know, it was, it's kind of a mixed bag. It was great because we basically started from ground zero. Our original data warehouse at the company when I joined was a Postgres database with about 10 tables and no reports. So we got to build from the ground up, but it also meant that there was no functionality in place that allowed us to kind of easily expand out what was already there. So it's been quite a journey over the last year and a half. So talk about the business a little bit. What's kind of driving, you know, your initiatives, your decisions? What's the business clamoring for? So if you look at our business, we're a web-based gaming platform. We also do some publishing and developing of titles. For us, what we're really focused on right now is delivering a much more relevant customer experience. So if you look at the big online gaming companies, the opportunity is you have this portfolio of thousands of pieces of content. You have hundreds of millions of people playing on your site each day or each month. How do you make sure that they find the right game for them? And so we're really moving to this sort of personalized analytical space very quickly. What's your experience been in terms of, you mentioned your publisher obviously, you're putting games out there and you're hoping they hit, but you don't know. What's your experience been with your ability to predict what's going to work, what's not going to work and how do you iterate that over time? Well, to be honest, it's pretty easy for us because we're the publisher, the developer and the platform. So to make a successful win, all you do is you promote the heck out of it. So, but in terms of games we bring forward from other companies where we're basically just publishing their content, we're still trying to figure out how to predict that secret sauce. So UX is obviously a huge issue. So you mentioned front-end, front-end development has really been really key with real-time web. Obviously you got web softwares with a web browser, you got Mozilla, Chrome going native, you got Node.js, Reddit, these new stacks are out there, developers going crazy in this environment. So what I ask you, how do you keep up with the front-end development innovation at the same time, using the data to create a user experience is positive because you can learn a lot from the data where people drop off, where people are adding value, where they're excited, what the patterns are. How do you vector that back into the front-end? You know, it's something which we're really working to solve right now. If you look at it, we've kind of divided the problem into two different pieces. Number one, what do you want to show to someone? And number two, how do you want to show it to them? And you take that, what do you want to show to them? That's all back-end work and that's all modeling about content recommendations, personalization, targeting, those sort of things. How do you want to show it to them? That's where you really get into that multivariate testing using things like epsilon-greedy modeling. And so we're building this entire flow to be able to control both those pieces and allow our product marketeers to be able to put the right message in front of the right customer at the right time. So is that homegrown middleware software between the front-end and back-end? Yes, yeah. And what's the thought behind that? Why is it agility? And what are the developments, is it pure back-end, front-end, and the guys straddle both sides, full-stack developers or? So my team is really, we don't have to worry too much about the actual production architecture, just the information that's put back into it, which is a blessing really for us, just keeping scope down a little bit. If you look at my company's technology approach, we've been very heavy into open source, custom development, and actually for us, we only have two major pieces of commercial technology involved in our entire technology stack and that's Vertica and Tableau. What about the front-end? What do you guys do for front-end development? What's the innovation there? What open-source tools do you guys do? What are you programming? We are currently rewriting our entire front-end architecture using a mixture of Erlang and then using some JavaScript to really control that front-end experience. Hard to find front-end developers these days. It's very hard, you know. Yeah, we know. So gaming is one of those things where it's a big data problem waiting for the huge opportunity and there's a big idea right there and that is that there's so many small pieces of data, very much a consumer-like environment, a truly consumer environment. Someone clicks on something, they do something, if it's first person shooter, they shoot someone, there's points, there's trading, there's currency, all kinds of things going on, multiple dimensions. How do you guys deal with that? I mean, honestly, going from Postgres to, what are you using for the back-end? Is it Hadoop? Is it Vertica? What's the architecture in the back-end? You know, if you look at it, what we're doing is basically, we own the JavaScript library that fires off all of our events into our MapReduce platform. So we're doing a lot of log rotate into Hadoop and Disco, which are two MapReduce platforms, rotate that data into Vertica for all the stuff that we want to use for personalization. So the thing is, we're firing off, you know, billions of events per day, but a lot of them are just information that needs to be aggregated to really keep track of, say, that user experience, whether it's about how long they're spending on a page or something like that. So that all flows into Vertica. In Vertica, we do a lot of our analytics, we also feed that in through R for the more advanced predictive modeling. And then that all moves back into a proprietary NoSQL solution, and that's what we use to serve the data back out into the production environment. And as you built that. Yeah, we built that as well. On top of what? Open source, from open source standpoint. It's from an open source standpoint. It really is, we looked at, you know, the functionality of React and our tech teams identified some key areas where they said, this is not going to work with our technology stack that we have here. So they went out and they built their own solution all in our lang to be able to do this. Okay, nice. Little another unstructured database, a homegrown opportunity, but custom built for what you guys need. Custom built for what we need. So what's the feedback been from some of the advertisers and on the product side, obviously the product and you got advertisers in your platform. What's been the reaction to those on the publishing and the ad side to some of the new techniques that you guys have done? You know, if you look at it, the more data you have about a customer, the easier it is to serve an ad which is really relevant to them. And so with our ad partners, we're really using some of these data management platforms where we can take what we know about users either from information they've provided to us or based off of behavior, what we're seeing and what we can presume about them and we can feed that back to these ad providers which gives us a very nice boost in terms of the value of each impression served. What's a big challenge that you guys faced that if you could resolve, it would really have a major business impact. People always talking about, okay, you heard Meg Whitman today, we're going to help you monetize the data. And I wonder if you could talk about some of the challenges you faced, some of the really hard problems that you're working on that have a major impact on your business. Well, you know, the thing is we're dealing with people who are interacting with the platform. And so as we, you know, as you move closer and closer to real time, the challenges of doing your analytics increase, I would say exponentially. It's one thing to be working in a telco where you've got 25 million subscribers and you need to regenerate a couple of numbers once a month. And that's actually where I came from. It's another thing to say, I've got X million people live on my platform and they're on page, you know, this page and I have one to three minutes to decide what I want to show them at their next experience. And so for us, that's really, that's the problem we're dealing with right now is how do we move this complex analytical modeling closer and closer to real time? Yeah, so real time, you know, people always sort of debate what real time is and our David Flores from Wikibon says, we're real times before you lose the customer. Is that kind of how you define it before you lose the audience? That's where we're trying to get to, yeah. So for us, if we would say, and actually for me real time is kind of a dirty word because of this definitional discussion, we're driving for near real time. So my goal is by the end of the year to be able to pass these recommendations within say five minutes back out into the production environment. In time. In time, yes. I like that. Okay, so what kinds of things are you looking at to solve that problem? Is it pure tech? Is it also people in process? Talk about that a little bit. You know, it's technology. It's really, you know, taking a very aggressive look at how we do some of these modeling things and also trying to identify if there's an 80% solution which will get us almost what we need or close enough to really make sure that that customer experience is good enough for now and then we'll optimize against that the next time they come back. And just iterate. And so at the same time, that 80% solutions, it's got a scale, right? So you have some balancing and trade-offs that you have to do there, right? To maybe talk about that a little bit. Yeah, so you know, one of the things we're looking at right now is there's really a good way to do personalization using hundreds of different data points and elements and everything else and re-aggregating those every time you gather new data. That's the solution that we can run in non-real time. In real time, you have to identify the key values that have changed since the last time you modified that user experience and only leverage that. So the classic example here on content recommendation is the difference between, do you look for users who are very similar to this person and try and recommend content based off of that? Which is very expensive, but very accurate. Or do you look at what content they just engaged with and make recommendations based off of that, which is much, much cheaper computationally. So I was interested in listening to Colin Mahoney's keynote this morning. I mean, it wasn't like super flashy. It was, you know, it was meaty. And one of the things he said is he wanted to make this event about the customers, the use cases, et cetera. So, and he really, you know, it wasn't pound in his chest, so I liked that. But at the same time, I want to understand why Vertica? Why did you choose Vertica? You know, maybe give us a sense as to what else you looked at. What's it do well? What doesn't it do well? Maybe talk about that a little bit. Yeah, absolutely. So from our perspective, we were sitting here on this row based system, which the performance was terrible and we knew that if we tried to scale it all, it would just fall apart. So we knew we had to move to a column based system or some sort of data warehouse appliance to be able to do all the analytics that we wanted to eventually be able to do. And we're a small company. We're not huge. And so you immediately throw out things like Teradata and Exadata because the pricing on that is just insane for a company of our size. We looked at some of the other solutions like InfoBright or at the time we were making this choice Apache, Cassandra hadn't really moved in the right direction yet, really wasn't mature. And so for Vertica, what we saw was it gave us the performance we needed. It gave us the ease of use that we needed so that we could very quickly build the data warehouse and the pricing actually worked for the business model that we had. And sort of this expectation of starting now and then growing into the future. How do you deploy it? Everything in the cloud? No, we have a data center in Amsterdam that we run everything in. So it's all private. I have my final question. I know you're tight on time. Appreciate you coming on theCUBE, sharing your expertise with the audience out there. What's been the biggest surprise to you around some of the things you've been innovating with big data? Obviously gaming's on the front line. Some guys are pushing some stuff out there. New audience, new millennials and you've got all the new active users. And so you've got to push the envelope. What's been the biggest surprise for you around, wow, that's impactful. I never would have thought that would happen. That's a good question. We've kind of been struggling as we go through this journey to figure out what really makes a difference on the customer experience. And we've got, we'll pull tons of data about users and the stuff that we presume is really valuable at a user level. Oftentimes has no impact on the recommendations we make. So we'll try and use this data and then the model will actually not perform any better than one that's running a much simpler data. At the same time, we're finding certain aggregate statistics are so highly predictive of user experience or preference that we can get away with extremely simple techniques to massively change the user experience. So rather than going for the massive corpus and drilling down on it, you just get more peaked and looking for the right data, but smaller pieces of it. Absolutely. Okay, well, hey, you know, big data is huge and it's happening. There's a lot of machine learning. You know, I wrote a post this morning about natural language learning. Obviously, Siri, everyone knows on the iPhone. All these techniques, you got Google Glass, wearable computers, all this stuff's going to be part of the consumer experience. Obviously we'll have an impact, but gaming in particular today is a collaborative environment and some say we'll predict what the work environment may look like. And so, obviously, big data plays a big role in that. So you guys are doing some real cutting edge work. Thanks for sharing on theCUBE. We really appreciate it. This is SiliconANGLE Weekly Bonds, theCUBE. We'll be right back with our next guest after this short break.