 Well, welcome back to theCUBE, everyone. I'm John Walls, and once again, we're glad to have you here for AWS ReInvent 22. Our coverage continues here on Thursday. Day three of what has been a jam-packed week of tech at AWS, of course, has been the great host for this. It's now a pleasure to welcome in Honoran Gupta, who is the founder and CEO of Shoreline, joining us here as part of the AWS global showcase startup program. And Honoran, good to see you, sir. Thanks for joining us. Thank you so much. Tell us about Shoreline, about what you're up to. So we're a DevOps company. We're really focused on repairing issues. If you think about it, there are a ton of DevOps companies when we all went to the cloud in order to gain faster innovation, and, you know, buy-in-large check. Then all of the things involved in getting things into production, artifact generation, testing, configuration management deployment, also buy-in-large automated. Now, Pity the poor SRE is getting this deluge of stuff on him. You know, every week, every two days, sometimes multiple times a day. And it's complicated, right? Kubernetes, VMs, lots of services, multiple cloud sometimes. And, you know, they need to know a little bit about everything. And you know what? There are a ton of companies that actually help you with what we call day two ops. It's just that most of them help you with observability, telling you what's gone wrong, or incident management, routing something to someone. But, you know, back when I was at AWS, I never got really that excited about one more dashboard to look at, or one more, you know, like better ticket routing. What used to really excite me was having some issue extinguished forever. And if you think about it, like the first five minutes of an incident are detecting and routing. The next hour, two hours, is some human being going in and fixing it. So that feels like the big opportunity to reduce. So hopefully we can talk a little bit about different ways that one can do that. What's some of the day two ops? Just tell me about how you define that. So I basically define it as once the software goes into a production, just making sure things stay up, and you know, are healthy, and you're resilient, and you don't get errors, and all of those sorts of things. Because everything breaks sooner or later, you know, to a greater or less. Especially that SRE you're talking about. Right? So let's go back to that scenario. Yeah, yeah, you're pity the poor soul. Because they do have to be a little expert in everything. And that's really challenging. And we all know that, that's really hard. So how do you go about trying to lighten that burden then? So when you look at the numbers, about somewhere between 40% to even 95% of the alarms that fire, the alerts that fire, are false positives. And that's crazy. Why is someone waking up just to deal with- It's a lot of wasted time, isn't it? A lot of wasted time. And you know, you're also training someone into what I call click ops, just to go in and click the button and resolve it. And you don't actually know if it was the false positive, or it's the, you know, rare, real positive. And so that's a challenge, right? And so the first thing to do is to figure out where the false positives are. Like, let's say Datadog tells you that CPU is high, and alarms. Is that a good thing or a bad thing? It's hard for them to tell, right? But you have to then introspect it into something precise. Like, oh, CPU is high, but response times are standard and the request rate is high. Okay, that's a good thing. I'm not going to ignore this. Or CPU is high, but it kind of resolved itself. So I'm going to not wake anybody up. Or CPU is high, and oh, it's the darn JVM starting to garbage collect again. So let me go and take a heap dump and give that to my dev team and then bounce the JVM. And, you know, without waking anybody up. Or CPU is high, I have no idea what's going on. Now it's time to wake somebody up. You know, what you want to use humans for is the ability to think about novel stuff, not to do repetitive stuff. So that's the first step. The second step is, you know, about 40% of what remains is repetitive and straightforward. So, like, a disk is full. I'd better clean up the garbage on the disk or maybe grow the disk. People shouldn't wake up to deal with a, to grow a disk. And so, for that, what you want to do is just have those sorts of things get automated away. One of the nice things about Shoreline is that we take the experience in what we build for one company and if they're willing, provide it to everybody else. You know, our belief is, a central tenant is if someone somewhere fixes something, everyone everywhere should gain the benefit. Because we all sit on the same three clouds. We all sit on, you know, the same set of database infrastructure, et cetera. We should all get the same benefits. You know, why are we all, you know, why do we have to scar our own backs rather than, you know, benefiting from somebody else's scar tissue? So that's the second thing. The third thing is, okay, let's say it's not straightforward, not something I've seen before. Then, in that case, what often happens is on average, like, eight people get involved. You know, it initially goes to L1 support or L1 ops. And, but they don't necessarily know because, you know, as you say, the environment's complex. And so, you know, they go into Slack and they say, at here, can somebody help me with this? And, you know, those things take a long, much longer time. So wouldn't it be better that if your best SRE is able to say, hey, check these 20 things, you know, and then run these actions, we could convert that into like a Jupyter Notebook where you could say, the incident got fired. I pre-populated all the diagnostics and then I tell people very precisely if you see this, run this, et cetera, like a wiki, but actually something you can run right in this product. And then, you know, last piece of the puzzle, the smaller piece is, sometimes new things happen. And when something new happens, what you want is sort of the central tech of Shoreline, which is parallel distributed, real-time debugging. And so the ability to do, you know, execute a command across your fleet rather than individual boxes so that you can say something like, I'm hearing that my credit card app is slow. For everything tagged as being part of my credit card app, please run, for everything that's running over 90% CPU, please run a top command. And so, you know, then you can run in the same time on one host as you can on 30,000. And that helps a lot. So that's the core of what we do. People use this for all sorts of things, also preventative maintenance, just the proactive, regular things, like your car, you do an oil change. Well, you know, you need to rotate your certs, certificates. You need to make sure that there isn't drift in your configurations, there isn't drift in your software. There's also security elements to it, right? You want to make sure that you aren't getting weird, inbound, outbound traffic across to ports you don't expect to be open. You know, you don't want to have these processes running. You know, maybe something's bad. And so that's all the kind of weird anomaly detection that's easy to do if you run things in a distributed parallel way across everything. That's super hard to do if you have to go and whack them all across one box after the next. Well, which leads to a question, just in terms of setting priorities then, which is what you're talking about, helping companies establish priorities as hierarchy of level one warning, level two level, level four. Sounds like that should be a basic, right? But you're saying that that's not really happening in the enterprise right now. I would say that if you had an automated deployment, you should do that first. If you haven't automated your testing pipeline shame on you, you should do that like a year ago. If you, but now it's time to help people in production. And you know, because you've done that other work and you know, people are suffering. You know, the crazy thing about the cloud is that companies spend about three times more on the human beings to operate their cloud infrastructure as on the cloud infrastructure itself. I've yet to hear anybody say that their cloud bill is too low. So, there's a clear savings also available. And you know, back when I was at AWS, you know, obviously I had to keep the lights on too. But, you know, I had to do that, but it's kind of a tax, you know, on my engineers. And I'd really spend, prefer to spend the headcount on innovation, on doing things that delight my customers. You know, you never delight your customers by keeping the lights on. You just avoid irritating them by turning them off, right? Right, right. So, why are companies so fixed in on spending so much time on manually repairing things and not looking for, you know, these kinds of little, much more elegant solution that, and cost efficient, time saving, so on and so forth? I think there just hasn't been very much in this space as yet. Because it's a hard, hard problem to solve. You know, automation's a little bit scary. I mean, that's the reality of it. And the way you make it less scary is by proving it out by doing the simple things first, like reducing the alert fatigue. You know, that's easy. You know, providing notebooks to people so that they can, you know, click things and, you know, do things in a straightforward way. That's pretty easy. The full automation, that's kind of the North Star. That's what we aspire to do. But, you know, people get there over time. And one of our customers had 700 instances of this particular incident solved for them last week. You imagine how many human beings would have been doing it otherwise? You know? That's just one thing. Right. You know? How many did it take to build a pyramid? How many decades did that take? Right? I think you had an announcement this week. I don't think we've talked about that. No, yes. So we just announced Incident Insights, which is a free product that lets people plug into initially pager duty and, you know, pretty soon Ops Genie service now, et cetera. And what you can do is you give us an API key, read only, and we will suck your pager duty data out. We apply some lightweight ML unsupervised learning. And in a couple of minutes, we categorize all of your incidents so that you can understand which are the ones that happen most often and are getting resolved really quickly. That's click ops, right? Those alarms shouldn't fire. Which are the ones that involve a lot of people? Those are good candidates to build a notebook. Which are the ones that happen again and again and again? You know, those are good candidates for automation. And so, you know, I think one of the challenges people have is that they don't actually know what their teams are doing. And so, this is intended to provide them that visibility. One of our very first customers was doing the beta test for us on it. He used to tell us he had about 100 tickets, incidents a week. You know, he brought this tool in and he had 2100 last week. And it was all, you know, like these false alarms. So while he's giving us a week. You think that was eye opening for him to see that? Sure. Yeah, and while he's, you know, looking at it, you know, he's just like filing Jira's to say, oh, change this threshold, cancel this alarm forever. You know, all of that kind of stuff, you know. Before you get to do the fancy work, you got to clean your room before you get to do anything else, right? Right, dinner before dessert, basically. Hey, thanks for the insights on this. And again, the name of the new product, by the way, is? Incident Insights. Incident Insights. Totally free. Free. Yeah, it takes a couple of minutes to set up, you know. Go to the website, shoreline.io, slash insight. And, you know, you can be up and running in a couple of minutes. Outstanding. Again, the company is Shoreline. This is Honor Ron Gupta. And thank you for being with us. We appreciate it. Appreciate it. Glad to have you here on theCUBE. Back with more from AWS Reinvent 22. You're watching theCUBE. The leader in high tech coverage.