 Live from San Francisco, it's theCUBE, covering Spark Summit 2017, brought to you by Databricks. Getting close to the end of the day here at Spark Summit, but we saved the best for last, I think. I'm pretty sure about that. I'm David Goad, your host here on theCUBE, and we now have data scientists from Riot Games. Yes, Riot Games, his name is Wesley Kurt. Wesley, thanks for joining us. Thanks for having me. What's the best money-making game at Riot Games? Well, we only have one game. We're known for League of Legends. It came out in 2009, has been growing and well-received by our fans since then. And what's your role there? It's a data scientist, but what do you really do? So, we build models to look at things like in-game behavior. We build models to actually help players engage with our store and buy our content. We look at different ways we can just improve our player experience. All right, well, let's talk about it a little more under the hood here. How are you deploying Spark? So, we relied on Databricks for all of our deployment. We do many different clusters. We have about 14 data scientists that work with us. Each one is able to manage their own clusters, spin them up, tear them down, find their data that way and work with it through Databricks. So, what else were you covering? You had a keynote session this morning, right? Give a recap for the CUBE audience of what you talked about. So, we talked about our efforts in player behavior where we build models and deploy models that are watching chat between players. So, we evaluate whether or not players are being unsportsmanlike and come up with ways to sort of help them curb that behavior and be more sportsmanlike in our game. Oh, wow. Unsportsmanlike, how do you define that? It's just able to be abusive? I mean, seriously. Yeah, so what we saw was there are about one or two percent of our games, there is some form of serious abuse and that comes in terms of hate speech, racism, sexism. Things that have no place in the game and so we want them to realize that that language is bad and they shouldn't be using it. It's all keyword-driven or are there other behaviors or things that can indicate? So, right now it's purely based on things said in chat but we're currently investigating other sort of, other ways of measuring that behavior and how it occurs in game and how it could influence what people are saying. Maybe like tweets coming from the White House. We should be able to do that as well. So, how about those warriors? No, George, did you want to talk a little bit more about the technical achievements here? When you look at like trying to measure engagement and sort of maybe it sounds like converting high engagement to store purchases, tell us maybe, tell us a little bit more how that works. So, we look at, we want our game is completely free to play. Players can download, play it all the way through and we really try to create a very engaging game that they want to come back and they want to play and then everything that they can buy in the store is actually just cosmetics. So, we really hope to build content that our players love and are happy to spend money on. As far as we just really want engagement to be around players coming back and playing and having a good time and it's less about how to get that high engagement conversion into monetization as we've seen that players who are happy and loving the game are happy to spend their money. So, all right, so tell us more about how you build some of these models, like turning it into not Spark code, but how do you analyze it and sort of what's the database mechanism because the storage layer in Spark is just like the file system. Yeah, absolutely, so we are a worldwide game. We're played by over 100 million players around the world. And so that data comes flowing in from all around the world into our centralized data warehouse. That data warehouse has gameplay data so we know how you did in game. It also has time series events so things that occurred in each game. And our game is really session based so players can come play for an hour. That's one game and then they leave and come back and play again. And so what we're able to do is then sort of look at those models and how they did. And I'll give you an example around our content recommendations. So we look at the champions that you've been playing recently to predict which champions you're likely to play next. And that we can actually just query the database, start building our collaborative filtering models on top of it and then recommend champions that you may not play now, you may be interested in playing or we may decide to give you a special discount on a champion if we think it'll resonate well with you. And in this case, just to be clear, the champions you're talking about are other players, not models. It's actually the in game avatar. So it's the champion that they play. So we have 130 unique champions and each game you choose which champion you want to play. And so then that plays out for like, it's much more like a sport than it is like a game. So it's 5v5 online competitive. So there are different objectives on the map. You work with your team to complete those objectives and beat the other team. So we like to think of it like basketball but with magic and in a virtual world. And the teams stay together or are they constantly recombining it? Yeah, your next game may find 10 or nine other people. If you're playing with your friends then you can just keep queuing up with them as well. So the champions that they control there happen to be who you're playing in that game. And when you are trying to anticipate champions that someone might play in the future, what are the variables that you're trying to guess? And how long did it take you to build those models? Yeah, it's a good question. Right now we're able to sort of leverage the power of our players. So we have 100 million. And so what we do, and we have in our game, there are roles. So for instance, like there's a center in basketball, we have a bot lane. So we have bottom lane support and bottom lane ADC. So a support character is there to make sure that your ADC is able to defeat the other team. And if you play a lot of support, odds are there are other players in the world who play a lot of support too. So we find similar players. We find that if they engaged on the same sorts of champions that you play, for instance, I'm a Leona main. And so I play her a lot. And if I were to look at what other people played in addition to Leona, it could be things like Braum. And so then we would recommend Braum as a champion that you should try out that you may be not played yet. Okay, so, and then what's the data warehouse that you guys use for the ultimate repository of all this? All the data flows into a Hive data warehouse stored in S3. We have two different ways of interacting with it. One, we can run queries against Hive. Tends to be a bit slower for our use cases. And then our data scientists tend to access all that data through Databricks and Spark. And it runs much quicker for our use cases. Do you take what's in S3 and put it into a Parquet format? Sometimes. So we do some of those rewrites. We do a lot of our sort of secondary ETLs where we're just joining across multiple tables and writing back out. We'll optimize those for our Spark use cases. And they're writing back to read from S3, do some transformations right back to S3. And how latency sensitive is this? Are you guys trying to make decisions as the player moves along in his level? So historically we've been batch. We do, our recommendations are updated weekly. So we haven't needed a much higher cadence. But we're moving to a point where I want to see us be able to actually make recommendations on the client and do it immediately after you've finished a game with say, Leona, here's an offer for Braum. Go check it out. Give it a try in your next game. Yeah, so Wesley, what would you like to see develop that hasn't been developed yet that would really help in your business specifically? So one thing that's really exciting for gaming right now is sort of procedural generation and artificial intelligence. So here, there are a lot of opportunities. You've seen some collaborations between DeepMind and Blizzard where they're learning to play StarCraft. For me, I think there's a similar world where we have a game that has different sorts of mechanics. So we have a large social piece to our game and teamwork is required. And so understanding how we can leverage that and learn artificial help influence the future of artificial intelligence is something that I want to see us be able to do. Did you talk to anybody here at the Spark Summit about that? Anyone who would listen? Yeah. So we chatted some with the teams up at Blizzard and Twitch about some of the things that they were doing for natural language as well. All right, so what was the most useful conversation you had here at the Summit? The most useful one that I had, I think, was with the Databricks team. So at the end of my keynote, I was kind of serendipitous. I was talking about some work we had done with deep learning and sort of doing hyperparameters searches over our worker nodes. So actually being able to quickly try out many different models. And in the announcement that morning before my keynote, Tim talked about how they actually have deep learning pipelines now. And it was based on conversation we had had. And so I was very excited to see it come to fruition and now is open source and we can leverage it. Awesome. All right, well, we're up against a hard break here almost at the end of the day. Wesley, it's been a riot talking to you. I really appreciate it. And thank you for coming on the show and sharing your knowledge. You bet, thanks for having me. All right, and that's it. We're going to wrap it up today. We have a wrap up coming up, as a matter of fact. In just a few minutes, my name is David Goh. You're watching theCUBE at Spark Summit.