 Live from the Computer History Museum in Mountain View, California. It's theCUBE, covering DevNet Create 2018. Brought to you by Cisco. Hey, welcome back everyone. This is theCUBE live here in Mountain View, California. The heart of Silicon Valley for Cisco's DevNet Create. This is their cloud developer event. It's not the main Cisco DevNet, which is more of the Cisco developers. This is much more cloud native DevOps. Joining with my co-host, Lauren Cooney. And our next guest is Christine Yen, who's co-founder and Chief Product Officer of Honeycomb.io. Welcome to theCUBE. Thank you. Great to have an entrepreneur and also a Chief Product Officer because you kind of blend in, you know, the entrepreneurial zeal, but also you're going to build a product in the cloud native world. You have done a few ventures before. First take a minute and talk about what you guys do, what the company's built on, what's the mission, what's your vision? Absolutely. Honeycomb is built. We are an observability platform to help people find the unknown unknowns. Our whole thesis is that the world is getting more complicated. We have microservices and containers, and instead of having five application servers that we treated like pets in the past, we now have 500 containers running that are more like cattle, and where any one of them might die at any given time. And we need our tools to be able to support us to figure out how and why. And when something happens, what happened and why, and how do we resolve it? We look around at the landscape and we feel like this dichotomy out there of we have logging tools and we have metrics tools. And those really evolved from the fact that in 1995, right? We kind of had to choose between GREP or counters. And as technology evolved, those evolved to distributed GREP or RRDs. And then we have distributed GREP with fancy UIs and well, fancy RRDs with UIs. And Honeycomb, we started a couple years ago, we really feel like, what if you didn't have to choose? What if technology supported the power of having all the context there, the way that you do with logs, while still being able to provide instant analytics the way that you have with metrics? So the problem that you're solving is one, antiquated methodologies from old architectures and staffs, if you will, to helping people save time with the arcane tools. Absolutely. And the main premise? We want people to be able to debug their production systems. All right, so what's beyond that, now the developer that you're targeting, take us through a day in the life of where you are helping them vis-a-vis the old way. Absolutely. So I can tell, I'll tell a story of when myself and my co-founder, Charity, we're working together at Parse. Parse, for those who aren't familiar, is used to be, RIP, a backend for mobile apps. You can think of someone who just wants to build an iOS app, doesn't want to deal with data storage, user records, things like that. And Parse started in 2011, got bought by Facebook in 2013, spun down very beginning of 2016. And in 2013, when the acquisition happened, we were supporting somewhere on the order of 60,000 different mobile apps. Each one of them could be totally different workload, totally different usage pattern, but any one of them might be experiencing problems. And again, in this old world, this pre-honeycomb world, we had our top-level metrics. We had latency, response, overall throughput, error rates, and we were very proud of them. We were very proud of these big dashboards on the wall that were green, and they were great, except when you had a customer right in being like, hey, Parse is down. And we look at our dashboard, we'd be like, nope, it's not down, it must be you, it must be network issues. That's on your end. Yeah, that's on your end. Not a good answer. And especially not if that customer was Disney, right? When we're dealing with these high-level metrics, and you're processing tens or hundreds of thousands of requests per second, when Disney comes in, they've got eight requests a second, and they're seeing all of them fail, even though those are really important eight requests per second, you can't tease that out of your graphs. You can't figure out why they're failing, what's going on, how to fix it. You've got to dispatch an engineer to go add a bunch of, you know, if app ID equals Disney, track it down, figure out what's going on there, and it takes time. And when we got to Facebook, we were exposed to a type of tool that essentially inspired Honeycomb as it is today, that let us capture all this data, capture a bunch of information about everything that was happening down to these eight requests per second, and when a customer complained, we could immediately isolate, oh, it's one app, okay, let's zoom in. For this one customer, this tiny customer, let's look at their throughput error rates latency. Oh, okay, something looks funny there, let's break down by endpoint for this customer. And it's this iterative, fast, highly granular investigation that is kind of where all of us are approaching today. With our systems getting more complicated, you need to be able to isolate, okay, I don't care about the 200s, I only care about the 500s, and within the 500s, then what's going on? What's going on with this server, with that set of containers? So this is basically an issue of data, unstructured data, or having the ability to take this data in at the same time with your eye on the prize of instrumentation. Absolutely. And then having the ability to make that addressable and discoverable in real time. Is that kind of? Yeah, we've been using the term observability to describe this feeling of I need to be able to find the unknown unknowns. And instrumentation is absolutely the tactic to observability to the strategy. It is how people will be able to get information out of their systems in a way that is relevant to their business. A common thing that we'll hear is people will ask, oh, can you ingest my nginx logs, can you ingest my mysql logs? Often, that's a great place to start. But really, where are the problems in an application? Where are your problems in the system? Usually it's the places that are custom that the engineers wrote. And tools need to be able to support providing information, providing graphs, providing analytics in a way that makes it easy for the folks who wrote the code to track down the problem and address them. It's a haystack of needles. Yep, absolutely. They're all relevant, but you don't know which needle you're going to need. Exactly. So let me just get this, so I'm backing out, just trying to understand because this is super important because this is really the key to large scale cloud ops, what we're talking about here. We're a developer standpoint, and we just had a great guest on, talking about testing features in production, which is really the important, people want to do that. Absolutely. But for one person, but in production scale, huge problem opportunity as well. So if most people think of like, oh, I'll just ingest with Splunk, but that's a different, is that different? I mean, because people think of Splunk and they think of Redshift and Kinesis on Amazon, they go, okay, is that the solution? How do you guys differ on your tool? How do I understand you guys' context of those known solutions? First of all, kind of explain the difference between ourselves and the Redshifts and the big queries of the world, and then I'll talk about Splunk. We really view those tools as primarily things built for data scientists. They definitely dabble in the big, they're in the big data realm, but they are very concerned with being 100% correct. They're concerned with fitting into big data tools, and they often have kind of an unfortunate delay in getting data in and making it queryable. Honeycomb is 100% built for engineers. Engineers, office people, the folks who are going to be on the hook for, hey, there's downtime, what's going on? And in- So what's business benefits, more data warehouse-like? Yeah, and what that means is that for Honeycomb, everything is real-time. It's real-time, we believe in recent data. If you're looking to query data from a year ago, we're not really the thing, but instead of waiting 20 minutes for a query over a huge volume of data, you wait 10 seconds. Or when you're, it's 3 a.m., and you need to figure out what's happening right now, you can go from query to query to query to query as you come up with hypotheses, validate them, or invalidate them, and kind of continue on your investigation path. So that's, you know- That makes sense. So data wrangling, doing queries, business intelligence, insights as a service, that's all that. Yeah, we almost, we played with and tossed the tagline BI for systems, because we want that sort of BI mentality of what's going on, let me investigate. But for the folks who need answers now, and where an approximate answer is better, an approximate answer now is miles better than a perfect one. And absolutely- You can't keep large customers waiting, right? At the end of the day, you can't keep the large customers waiting. Well, it's all so complicated, the edge. The edge is very robust and diverse now. I mean, Node.js has a lot of I.O. going on, for instance. So let's just take an exemplary developer talking the other day with me, it was a genuine about Node.js. It's like, oh, someone's complaining, but they're using Firefox. It's like, okay, different memory configurations. So the developer had to debug. There's the complaints were coming in. Everyone else was fine, but the one guy's complaining because he's on Firefox. Well, how many tabs does he have open? What's the memory look like? So like, there's a weird thing. I mean, she's not a weird example, but that's just the kinds of diverse things that developers have to get on. And then where do they start? Absolutely. So this is something we ran into or we saw our developers run into all the time at parts, right? These are mobile developers. They have to worry about not only which version of the app it is, they have to worry about which version of the app, using which version of RSDK on which version of the operating system where any kind of strange combination of these could result in some terrible user experience. And these are things that don't really work well if you're relying on some sort of pre-aggregated 10-series system like the evolution of the RADs I mentioned. And for folks who are trying to address this as something like Splunk, these logging tools, frankly, a lot of these tools are built on storage engines that are intended for full text search. They're unstructured text. You're grepping over them and then trying to build indices and structure on top of that. There's some lag involved, too. There's so much lag involved. And there's almost kind of this negative feedback loop built in where if you want to add more data, if on each log line, you want to start tracking browser user agent, you're going to incur not only extra storage costs, you're going to incur extra read time costs because you're reading back more data, even if you don't even care about that on most queries, and you're probably incurring costs on the right time to maintain these indices. Honeycomb, we're a column store through and through. We do not care about your unstructured text logs. We really don't want them. We want you to structure your data. Do you guys write your own column store? We did write our own column store because ultimately there's nothing off the shelf that gave us the speed that we wanted. We wanted to be able to, hey, you're sending us data blobs with 20, 50, 200 keys. But if you're running analysis and all you care about is a simple filter and account, you shouldn't have to pull in all of this. And column stores like Ferrari, if you customize it, it's really purpose-built. That's what you guys did. That is. So talk about the dynamic because now you're dealing with things like, I mean, I just had a conversation with someone who's looking at, say, blockchain. Were there some costs involved? Obviously, writing to the blockchain. This is not like a crypto. I think it's more of a supply chain thing. They want visibility into latency and things of that nature. This sounds like you would fit there as a potential use case. Is that something that you guys thought of at all? It could absolutely be. I'm actually not super familiar with the blockchain or blockchain-based applications, but ultimately Honeycomb is intended for you to be able to answer questions about your system in a way that tends to stymie existing tools. So we see lots of people come to us from kind of strange use cases who just want to be able to instrument, hey, I have this custom logic. I want to be able to look at what it's doing. And when something either, when a customer complains, my graphs are fine, or when my graphs are complaining, being able to go in and figure out why. Take a minute to tell about the company. You found it. How many employees? Funding, if you can talk about it. Use case customers you have now. And how do you guys engage? Service, is it, do I download code? Is it SaaS? I mean, you've got all this great tech. What's the value proposition? I think I'll answer those. Company first, status of the company. Sure. Honeycomb is about 25 people, 30 people. We raised to series A in January. We are about two and a half years old, and we are very much SaaS of the future. We're very opinionated about a number of things and how we want customers to interact with us. So we're SaaS only. We do offer kind of a secure proxy option for folks who have PII concerns. We only take structured data. So at our API, you can use whatever you want to slurp data from your system, but at our API we want JSON. We do offer kind of a wide variety of integrations, connectors, SDKs to help you structure that data. But ultimately- You provide SDKs to your customers. We do. So that if they want to instrument their application, we just have the niceties around like batching and doing things asynchronously so it doesn't block their application. But ultimately, so we try to meet folks where they're at, but it's 2016, it was 2017. You have a hardened API. API pretty much defines your service from an inbound standpoint. Prices, costs, how does someone engage with you guys? When does someone know to engage? Where's the smoke signals? When is the house on fire? Is it like people are standing around? What's the problem? When does someone know to call you guys up? People know to call us when they're having production problems that they can't solve. When it takes them way too long to figure, to go from there's an alert that went off or a customer complaint to, oh, I found the problem, I can address it. We price based on storage. So we're a bunch of engineers, we try to keep the business side as simple as possible, for better or for worse. And so the more data you send us, the more it'll cost. If you want a lot of data, but short for a short period of time, that will cost less than a lot of data stored for a long period of time. One of the things that we, another one of the kind of approaches that is possibly more common in the big data world and less in the monitoring world is we talk a lot about sampling. Sampling as a way to control those costs. Say you are Facebook. Again, I'll return to that example. Facebook knew that in this world where lots and lots of things can go wrong at any point in time, you need to be able to store the actual context of a given event happening at that. Some unit of work, you want to keep track of all of the pieces of metadata that make that piece of work unique. But at Facebook scale, you cannot store every single one of them. So, all right. You start to develop these heuristics. What things are more interesting than others? Errors are probably more interesting than 200 okays. Okay, so we'll store, we'll keep track of most errors. We'll store 1% of successful requests. Okay, well, within that, what about errors? Okay, well, things that time out are maybe more interesting than things that have permissioning errors. You start to develop this sampling scheme that essentially maps to the interestingness of the traffic that's flowing through your system. To throw out some numbers, I think- Machine learning's perfect for that too. And then use the sampling. Yeah, there's definitely some learning that can happen to determine what things should be dropped on the ground, what things, what requests are perfectly representative of a large swath of things. And Facebook, Instagram used a tool like this inside Facebook. They stored something like a 10th of a percent or a 100th of a percent of their requests. Because they simply, that was enough to give them a sketch of what is representative traffic, what's going wrong, or what's weird that is worth digging into. Final question, what's your priorities for the product roadmap? What are you guys focused on now? Get some fresh funding, that's great. So expand the team, hiring, probably, but product, what's the focus on the product? Focus on the product is making this mindset of observability accessible to software engineers. We're entering this world where more and more, it's the software engineers deploying their code, pushing things out in containers, and they are going to need to also develop this sense of, okay, well, how do I make sure something's working in production? How do I make sure something keeps working? And how do I think about correctness in this world where it's not just my component, it's my component talking to these other folks' pieces? We believe really strongly that the era of the single person in a room keeping everything up is outdated. It's teams now, it's on-call rotations, it's handing off the baton and sharing knowledge. One of the things that we're really trying to build into the product that we're hoping that this is the year that we can really deliver on this is this feeling of, you know, I might not be the best debugger on the team, or I might not be the best person, best constructor of graphs on the team, and John, you might be, but how can a tool, so how can a tool help me as a new person on a team learn from what you've done? How can a tool kind of help me be like, oh man, last week when John was on call, he ran into something around my SQL also. History doesn't repeat, but it rhymes, so how can I learn from the sequence of things that he ran? Yeah, how can we help build experts? How can we raise entire teams to the level of the best debugger? And that's the real thing, metadata. Metadata is a wonderful thing. As Jeff Jonas said on theCUBE, he's a CUBE alumni entrepreneur, a famous data entrepreneur. Observation space is super critical for understanding how to make AI work, and that's, to your point, having observation data. Super important, and of course, our observation space is all things here at DevNet Create. Christine, thanks for coming on theCUBE and spending the time. Thank you. Fascinating story, great new venture, congratulations. Thank you. And tackling the world of making developers more productive in real time, in production, really making an impact to coders and sharing and learning. Here in theCUBE, we're doing our share, live coverage here in Mountain View. DevNet Create will be back with more after this short break.