 From San Mateo, it's theCUBE, covering Scalar Innovation Day, brought to you by Scalar. Hello everyone, welcome to the special CUBE Innovation Day here in Silicon Valley in San Mateo, California at Scalar's headquarters. I'm John Furrier, the host of theCUBE. John Hartz, the tech lead back-end engineering here at Scalar. Thanks for having us. Thanks for having me, John. So what's the secret sauce of Scalar? You guys have unique differentiators. We've covered it with some of your peers and the founders are all talking about it, but you guys have a unique secret sauce. Take a minute to explain that. I think, yeah, it's a few different things. So first of all, you've got just the design level, which is we don't use keyword indexes. So that's a big one right there off the top. On top of that, you've got a couple different implementation paths. We've got our own custom-written data store. So we're able to really control all the way down to the bytes on disk, how we lay things out, optimize for speed. We have a novel kind of scatter gather approach for fanning out a query to make sure we can get all of our nodes involved as quickly as possible. And then finally, this is just kind of being smart, which is we have a time series database for repetitive queries, and that's on demand. You don't have to do anything, but we're gonna speed up your queries in the background if we know it's a good idea. Talk about the time series. I think that's interesting because that comes to play. We hear about real time a lot. We talk a lot about in cybersecurity, time series has been beneficial. Where does time series fit for you guys in here? That's a good question. I think one of the big differences with Scalar versus other uses of time series database is with Scalar, you're outputting your logs. There's all kind of information in your logs. Some of that might be a good thing to put in a time series database, but I think with a lot of other products, you would have to decide that ahead of time, like, hey, let's get this metric into the database. With Scalar, the moment you have anything in your logs that you might wanna put into a time series, you just start querying it. You put in a dashboard, you've got a time series. So we're gonna back propagate that for everything you've already given us. So all of those queries are fast from there on out. So it's built in from the beginning. Exactly, and you don't have to do anything. It's just on demand. So key words have been what other people have been used for years. That's been standard for these log management software packages and indexes. Indexes can slow things down. We've got a tutorial on that. Why is those two areas haven't been innovated in a while? What's, when people just haven't figured it out, you guys have first, what's the differentiation for you guys? Why did you guys get there? Well, I think the main reason is that log data is just fundamentally different than most other things that you might use a database for. And there's a couple different reasons for that. With log data, you're not in control of it. You can't design it. An index is great if you're making a relational database. You've got control of your columns. You know what you're gonna join on. You know what you wanna index. Nobody designs their logs like they design their database tables. It's just a bunch of stuff. It's from systems you don't control. It's changing all the time. So just the number of distinct fields that you would have to index is really, really high. So if your system depends on indexing for good performance, you're gonna have to make a lot of indexes. And then indexes, of course, they're right amplifying. If you've got like one gigabyte of raw data, but then you've got to put five or 600 indexes on top of it, you're gonna have five or 10 gigabytes of raw plus index data. That means you've got to do a lot more IO. And at the end of the day, how much you have to read from disk determines how fast your query's gonna be. So in essence, indexing creates a lot of overhead you shouldn't even need to do because of the nature of log files. Because the nature of log data, it's overhead that doesn't serve log data very well. Yeah. And what about the log data that's changing? Cause one of the things we're seeing, Internet of Things, more connected devices. You know, imagine the Tessels that are gonna be connecting in with all their data, all this stuff, cameras, you got a huge amount of new kind of data, up-down status. This is gonna be a tsunami of new types of log data. Yeah, and none of it, are you really gonna have a ton of control over, right? It's gonna be changing a ton. Maybe you've got 20 different versions of devices out there that are all sending you different versions of logs. You've got to be able to handle all of it. So you want a system that is adaptive to your needs as they come up, as opposed to something you have to plan out with indexes ahead of time. So if someone asks you, hey, you guys say you're faster, why? Is that true? Is the statement, you're faster than others? And if so, why? It is true. And that's really comes down to the secret sauce. So the brute force, the key to brute force, and I think we've talked about this a little bit today, is you gotta bring a lot of force as quickly as you possibly can. And we do that. We've got a lot of custom code. We're not using off-the-shelf components. We are trying to get that time as quick as we can. So I think our median performance is still better than 100 milliseconds. That might be for a query that's talking to two or 300 machines, or maybe even more, all of which to get, and maybe it's going to scan a terabyte of data, all of that's going to come back within 100 milliseconds. It's extremely fast. Talk about why log data is different from other data types. For folks that are in these cloud native environments, their time is precious. They have looking at a lot of different data. How is log data different? I think the fact that it's dynamic in terms of what's coming out is something new, right? It changes so rapidly. The other really big thing, too, is the way you query it changes from day to day. Most of the time you're going to your logs, you're trying to troubleshoot a problem. Today's problems are different than yesterday's problems. So every time you go in, you're using it in a different way. So it has to be very fast. It has to be exploratory. And that's one of the big things about scaler speed, is it enables this really exploratory, you can kind of move through the data quickly as opposed to making a query, getting a cup of coffee, waiting for the query, and then deciding what you're going to do next. I'm kind of dating myself here, but it's like the first time you ever used Google. Whoa, how did that happen? That's what it's like the first time you use scaler. And you guys are the unique architecture, we talked about that, but you guys have certain speeds, but it's not just the query speed, it's the time it takes to do the query. So you factor in a much bigger perspective if someone has to build a query and then takes 15 minutes, games over. Yeah, and instead, you're just clicking on things. We try to make it very easy for you to move from, oh, here's an alert. Well, here are the log files that caused that alert. Oh, what's the thread stack for that particular log? Oh, I can go and look at everything else that happened in that thread. That's five or 10 seconds of scaler tops. You guys have a unique engineering culture that targets engineers, products built by engineers, for engineers. Great story, and it's real, and you guys build in it every day. What is the engineer threshold of pain when it comes to log data? And have you seen any anecdotal? Because the engineers that are in this space, they need the access to it, there's SLAs now tied to it, people are sharing data, there's all kinds of new ways, reasons why you need to have the scaler solution. But what's the pain point for most people to tolerate an inferior solution? Well, so for me, I actually have an answer for this, right? Because before I was a scaler employee, I was a scaler customer, and before I was a scaler customer, I was a Splunk customer. I used Splunk for about five years before I think scaler even necessarily existed, and I was really happy with it because I needed it, right? I had my own company, we were generating tons of logs, my support guys needed to use those logs, and prior to using something like a Splunk, I was SSHing into servers to check the log files, which is, of course, not scalable. So I was really happy that the product as an idea existed, but it just kept gnawing at us. Every time we would query, sometimes it would be fast, sometimes it'd be really slow, sometimes the results would be down, because an indexing server was down. It was just- You mean the Splunk solution? The Splunk solution, yeah, it was just extremely painful. So I read actually one of the blog posts written by Steve Newman, and thought, huh, that's a great idea. That is how you should attack this problem. New indexes, brute force, all the flexibility you get from that. I loved it, and then I forgot about it for like six months. I was busy, right? But then six months later, I was really frustrated again with Splunk, again, being really, really slow, and I thought, what was the name of that company again? I looked them up, I installed it, and within like, certainly within a day, I was blown away by the performance, and within a week, I had uninstalled Splunk from every single one of my servers and switched to Scalar instead. And you're happy with that work for you? Yeah. Came to join the company. Yeah, exactly, in kind of conversations with the support team here, I was one of their early customers to use Windows, so I had a lot of questions. They had questions for me, how did I get it working? It wasn't a supported platform, and all of my emails were responded to by two guys named Steve, so I figured that was probably the support team. Pretty funny, they've got a support team of two people, both named Steve, and then at one point in one email, Steve Newman said to me, you may have realized there's only two of us here. And that's when I kind of, oh wait a second, there's two people total, and two guys, I assumed in a basement, they weren't in a basement, but I assumed they were in a basement. They had software that was way better for my needs than Splunk, which at the time was worth probably eight, $10 billion, the public company had thousands of engineers, so that's when I thought, huh, when I get a chance, maybe I should go work with these guys. You know, it's interesting, maybe they can create a new category, brute force as a service. Yeah. I mean, this is what they're doing, they're bringing in the right tool at the right time for the right problem for speed and to solve the problem. Yeah, really care how it gets done. Put as much data as you can and just get that answer back as quickly as you can. So this is the big challenge, final question for you is obviously, a lot of people we talked to in the DevOps world, they're really fickle, on one hand they'll try anything. If they like it, they'll stay with it, but they don't, you'll know about it. Where's the value point for people to start thinking about Scalar because is it ingest to value, trying to get that ingesting is one part, that's kind of a trial. Sure. Where's the value immediately come in? What do you see? What's the first sign of light value once the ingestion happens? Right, so part of it is just it's a very short period of time, right? From the ingestion to the time you're querying on it is very, very short. If you get a real time view of what's happening on your servers, not a five minutes ago view, that by itself I think can pay for it right there. If you're a DevOps person and you've got some alarm pinging, if that alarm is from 10 minutes ago, that means your customers are already annoyed. If you then have to wait another 10 minutes just to even see what's happening, you've got a really big problem, right? So being able to have the alarm and you know that's triggering on something that happened maybe a second or two ago and then immediately be able to dive in with no interruption to your workflow, no reason not to dive in. That's a pretty big one right there. So a pretty immediate impact. Yeah. Oh, okay, so for people that don't know Scalar, what should they know about Scalar as a company from a value proposition as a former customer now, key employee in the back end, engineering, what is the key things that they should know about? Okay, so speed, we keep talking about it, right? We have a really, really good cost basis. Because we're not making those indexes, we don't have to store as much data. It's just generally cheaper for it to run, right? So we actually have a really good cost point and we just, we get you from the alerts. You don't have to decide stuff ahead of time. You can do it all on the fly ad hoc. We get you from the alerts to your answers as quickly as you possibly can. That's pretty good. Every culture has its own unique kind of feature. What's Scalar's culture here? I mean, Intel was Moore's law, Cadence of Moore's law. What's the culture here at Scalar like? That's a good question. I guess I would say I'm just tremendously proud to be working with these engineers, right? We're all here because we want to get better and we want to work on really, really hard problems, writing our own code, not just running and kind of patching together open source systems that already exist, right? We want to be doing something cutting edge. So that's, I would say, the biggest one. And big problems behind that. You got AI right around the corner. Applying AI is going to be a natural extension. Yeah, because we got the data, right? And we can deal with the data. All right, John, thanks for the insight. Appreciate it. Good talking to you. John Furrier here, Innovation Day with theCUBE, here in Silicon Valley in San Mateo at Scalar's headquarters. I'm John Furrier. Thanks for watching.