 From the Walt Disney World Swan and Dolphin Resort in Orlando, Florida, it's theCUBE. Covering Splunk.com 2016. Brought to you by Splunk. Now, here are your hosts, John Furrier and John Walls. And welcome back to .com 2016 here on theCUBE along with John Furrier. I'm John Walls, continuing our coverage here, live streaming here throughout the day. Talking to a lot of partners here from in the Splunk community also with a lot of customers. You know about Yelp. We might use it for dinner tonight, but you probably use it at some course over the week. We use it all the time when we travel. It's cute things, of course. Yeah, because we got to know. Like, where's the best barbecue in Texas? Where's the best seafood Orlando, right? Right, we got to know. We were in Barcelona and we did Yelp, but we needed to get a restaurant. You need to Yelp. And we went to this great restaurant. Great Stewart was actually there. We didn't even know who was there. And the owner's like, all these Americans come in. For some reason, we get all these Americans. We're Yelping Maniacs from Yelp Reservations. Two gentlemen with us, Chris Wainer, VP of Engineering, and also Charles Gunther, Senior Software Engineer. Thanks for joining us, guys. Thank you, thanks for having us. Well, let's just talk about reservations first. You just received me and moved on to reservations now. For people who don't know about the service, you know, the extension that you've done with reservations, what's that all about? Yeah, so Yelp Reservations is a way we allow people to transact, directly interact with the restaurant, make your reservation online, and combine the great data that Yelp brings to the table with the ability to actually integrate the reservation experience and be able to act right in the site and make a reservation right when you're buying the great restaurant. You don't have to go anywhere else. It's a one-stop shop. All right, so what's your fundamental nightmare or your fundamental problem then? You've got volumes of data coming in, right? And so you've got the customer interaction, you've got the facility, you've got your rating system, you've got all that happening. What's your keep-nashing day like? I mean, the teeth-nashing problem, I think is more of a teeth-nashing problem for all of Yelp than probably extensively across the industry, right? It's about the fact that we have a ton of microservices and a lot of legacy code, and together those things generate tons of data, and we need to be able to see, for us, especially on the reservation side, the customer experience of being able to track through the lifecycle of the customer's interaction with us and those customers are getting an optimal experience that we know when we're having errors, we can act on them quickly, and then take that data back full lifecycle and be able to do analysis on it and bring the learnings that we've got from a variety of customer interactions back to improve the experience, right? But we generate so much data that that's a real challenge for us. So that's some of the slunk problem. We get that on the table and kind of talk about architecture at the same time. So the problem that you guys solve is you have to create a whole new system within the Yelp experience, which is kind of like, I won't say maybe it's made of the wrong word, off-domain or different app, which makes a lot of companies go through that problem. But you mentioned microservices, right? So I can imagine you have a lot of microservices within each application. Yep. How do you orchestrate that? Are you guys using Docker containers? Would a Splunk fit in? Lay out the architecture or how this all fits in? Sure, so Yelp is largely run off of a platform as a service product that we built called Pasta, and it's actually built on top of Apache Mezos, and we use Marathon as the framework to run those containers on top of it. And the large majority of those microservices are actually running on top of this pasta platform. And really what that's about is a developer empowerment. We don't want them having to reach out to like the site reliability team or the operations team to say, I want to deploy this new code. We want to enable them to just check in some configuration, check in their code, and basically just let go. And that enables rapid deployment, that enables no blockages, and that enables exactly what you need in order to kind of scale out. Yeah, they're pushing code. They're pushing code without calling. Without calling up and saying, hey, I need some authorization or whatever. Correct, correct. And I think one of the unique things that we've done at Yelp is we're actually using this same concept with Splunk. We're actually scaling out a lot of our fleet using the same concepts. And I think that's something that's unique or maybe something that we're doing that a lot of people haven't tried out yet. We'll actually be talking about that here at Splunkoff. What's the Marathon thing? Can you explain what Marathon is, that orchestration? I think you're probably a better... Yeah, yeah, it's an orchestration piece of it. I mean, it is orchestration is one way to think about it. I think about it really as a nip for the data center. It's the thing that knows, these are the containers that compose my service. These are how they need to be run. These are how they need to be restarted. This is the current running version. How many of them are there? What are its needs in terms of resources? And then make sure that my fleet of services stay healthy, up, live, tracked and exposed as an API for us to interact at a very simple, high level to be able to deploy them, start them, stop them, scale them up, et cetera. So that's the reliability layer for the developer. For the code pushing guys who want to, using the containers, right? So Marathon stabilizes that so they don't have to go to the... Nominally, it's the scheduler against that compute cloud. So if you're thinking about running your own pass and even if you're running it in the cloud or on your own hardware, we're trying to build a kind of our own cloud on top of that, right? We want to enable that kind of cloudy field to our developers. And Marathon really is what enables that, right? You push to it and it schedules against that cloud and it does the right type of packing against servers so you won't overload it, right? So it's a nice comfort for the developer. Okay, how about where Splunk fits in? So how are you Splunking the data? How does that fit into the plan? Can you share a little bit about how you guys use Splunk? Yeah, absolutely. So we, one of the things that Yelp did a while back that is actually a great investment but something that I think is relatively unique is we took the time when we rolled out our microservice strategy to centralize all of our log data and a lot of other stuff all together in a single place which then gives us the ability to connect Splunk rapidly and ingest the streams really, really easily. Part of that developer empowerment is making sure that best practices are baked into the services when they get pushed and that includes logging. So this makes it so that when we push service we know that the service is gonna emit logs in a way that is immediately available to all the downstream analytics tools which for us includes Splunk. The interesting part for the Marathon usage specifically to tie that together is that the centralized bus that the logs stream to we ultimately then consume in a service that's deployed on that same development platform. So we consume, our Splunk forwarders are running inside a Marathon hosted platform that then consume off the centralized log bus. So you guys get that data, you can see what's going on relative to any kind of events going on with that as much alert? Absolutely. All right, well this is our first interview we had here at the dot-com that's brought in kind of the microservices angle which we love talking about that. So let's talk about what you guys have as an on-prem, you guys have Amazon, well you guys using the cloud, how are you guys handling your backbone and your infrastructure? Absolutely, so we're a hybrid solution so we actually have physical facilities and we also are very heavily invested in cloud, we're an AWS partner, we use AWS extensively so a lot of our services at this point are pushing more and more heavily cloud-based. Are you moving workloads on-prem to the cloud? Yeah, what we're trying to do actually is develop it so that the cloud and the on-prem are pierced to each other and we can relatively smoothly move between the two of them and we're really getting close, we're using a lot of things like Terraform and things like that that really enable us to quickly be able to with a single command spin up an entire facility effectively on demand. This is awesome, you guys going to be a reinvent this year? We absolutely will be. The queue will be, they have to look for us. One big stage, maybe two, well we're always a great show, it's kind of, it's probably one of the best shows, it's kind of like Splunk, right? Everyone's kind of rocking and partying, having a good time, geeking out. Next level, real time has been a big part, certainly on mobile, getting that data in. You guys doing anything in memory? How are you guys handling kind of streaming of data? Does that fit into the plan at all? It absolutely does, I mean, one of the things that I think is most important to us at Yelp is being able to detect errors as fast as possible, right? When we push code we need to know as we canary changes, as they sort of go into limited distribution and then they start rolling out around data centers, we need to know quickly and repeatably with very low latency if things are wrong with the new code release and we use a lot of Splunk real time features in order to track our error rates and during the deploy process specifically to be able to see if the site's healthy. You guys have been a great company to watch because it's been like born in two movements, okay, you're pre DevOps, early DevOps, and now what I call post mature DevOps, if you could call that Ketterer. I think we're almost post mature, I think pretty much everyone realizes cloud native DevOps is the way to go, no doubt about that. What have you guys learned? What can you share with folks on terms of scar tissue, tricks that you've learned, you've developed because a lot of folks like Yelp, I'm sure you guys have done things on your own, built your own stuff, tripped on some stuff, hit some speed bumps and scaled stuff. So as you scale, what are the big learnings? What would you share, share some insight? I think the biggest thing that I would say is everything has to be code and that is all the way from the configuration into the actual data center deployments, centralized, get it code reviewed, get it checked in, automate everything. And until you get to the point where now we're getting, due to the efforts of a lot of really smart folks inside the organization, where you can actually execute a single command and spin up an entire facility, I don't really think that you're there to the point of maturity that you can really call yourself cloud native, I guess, to be perhaps overstepping, but that is the thing. That's the outcome you want, that's the outcome people should strive for. I agree, I agree, that is the goal. To this point, one of the things that you kind of struggle with as you go from on-prem to cloud is the tendency is when something's happening, to go onto the box directly and make that change as an operations event, right? Like you don't necessarily have those processes. And as you move to cloud, those boxes don't persist. It's going to go away and it's going to come back later as something else entirely fresh and new. Which means that all that manual configuration you've done is just poofed. It's just, it's instantly gone, right? So unless you've checked all that stuff in and you have a repeatable process for doing the exact same thing, you're going to do yourself a disservice and that's really what it comes down to, DevOps, the practices, repeatability through code. Yeah, and I think it also makes the job fun. I mean, when you automate away those tasks, one, it's a legit reason to do it, as you mentioned, because you want to have this spinning up, but also more fun, you get to work on cooler stuff. What are some of the cool things that you guys have working on right now? Because, you know, scaling up at like Yelp service, you got a lot going on, right? You have a lot of data. What are some of the cool things that you guys have done? I mean, I think probably the most coolest thing. The coolest thing, ah. So, I mean, the thing that we talk about a lot that I think is probably the coolest thing, which is more intricate, is the way that we actually manage our iPad fleet. So Yelp Reservations is a product that is online, it is an online availability engine for you guys, consumers, but the majority of the product is actually restaurant-facing. And what we, restaurants have very flaky networks and they need really high availability of their data. So what we had to do was make it so that our iPads can operate fully offline and then be able to resynchronize with our facilities. And so what we've actually built is a multi-master replication protocol where the iPads actually act as replication peers to our facility in the cloud. And so we end up being able to have those iPads disconnect from the network, operate successfully, reconnect, merge their data changes, and both sides are seamlessly working together. You take the Achilles' heel offline and flip it around as an advantage. Yes, absolutely. On top of that, one of the coolest things about that iPad app is we actually build a binary database that we're able to lay directly down on the file system for iOS. So instead of like downloading JSON, parsing it, pushing it through kind of the performance that you have of working with a very like small CPU, we're actually just handing it a pre-baked database, downloading it direct to disk and saying, run. Which is, I think it's just... That's down and dirty, man. That's like old school when I went to college, down machine language, getting to the way of overhead issue, right? You don't want to have that overhead. Because, again, because of the network issue, because you might have downloaded a 30-meg file and you've got your network flipping in and out, it's probably not. So you take that efficiency and what do you shift it to for the application? Well, we shift it into the cloud. We actually take the workload of building that binary database and shift it into the cloud. And that ultimately is, when you walk into the restaurant, this is about the customer's experience, right? Nobody cares about all this tech stuff with the iPads. What they want is to walk into the restaurant, smoothly go, welcome, sir. Let me show you to your table. They don't want to wait for an iPad to download the database. I mean, you just said on, Chris, a little bit of it. You said restaurants have kind of flaky websites, right? Yep. So you're seeing all kinds of different levels of sophistication. Yep. As well as from consumers. I don't know how many millions you have. I mean, so what does that create for you in terms of making sure this works, you know? And then from a security standpoint, knowing that you've got a lot of targets, I'm sure you get hacked all the time. Yeah, so what does that create for you? The challenges you face? I mean, the challenges, the diversity of the consumer and the diversity of the restaurant sophistication really means that we have, the thing has to be lights off stable, right? And it has to clearly tell the user what's going on and we have to have a team on our side that we maintain a very detailed playbook for them to be able to correctly diagnose, handle those user issues. And just, it's all about the customer's experience. But it can't be, you know, we take, we try and take the technology out of it as much as possible as it faces the restaurant and consumer. To the security question, we recently opened a public bug bounty. We are a very secure, security-first organization. I'm actually really proud of Yelp's security. I think we do an excellent job. Guys, final point before we wrap up. What's your advice to people that are evaluating Splunk? Why is Splunk integral to your business? Get the plug out for Splunk. Let's get that out there. I think the biggest thing that Splunk offers to you is that it enables you to fail fast. And that's something that someone always says, you'll hear from a developer constantly. The successful developers, you hear them say it consistently. Fail fast, fail often. But you're typically scared of your failure because you don't know where that failure is, right? Splunk really enables you to, in the middle of that fire, figure out exactly what's happening, roll it back and make an informed decision. And once you have that confidence in your ability to see the failures, you're going to be confident in failing and trying new things. Well, you iterate faster. What's that? And you iterate faster. And you iterate much faster and you become better faster and you stay agile. And really, like those are the three buzzwords you hear in the industry. Like agile, fail fast, fail often, right? And Splunk really enables three of those things. Be more agile, fail fast, get to the cloud. Guys, thanks so much, appreciate it. But I do not want to fail fast with my dinner choice tonight. Let's make sure we've knocked that one out of the park. Can I cut the line on all the up reservations? Maybe Splunk can help me with that too. Back with more.com for 2016 and just a bit here on the queue.