 Full-stack observability is all the rage today. As businesses lean in to digital, customer experience becomes ever more important. Why? Well, it's obvious. Fickle consumers can switch brands in the blink of an eye or the click of a mouse. Technology companies have sprung into action and the observability space is getting pretty crowded in an effort to simplify the process of figuring out the root cause of application performance problems without an army of PhDs and lab coats, also known as endlessly digging through logs, for example. We see decades old software companies that have traditionally done monitoring or log analytics and or application performance management, stepping up their game. These established players, you know, they typically have deep feature sets and sometimes purpose built tools that attack one particular segment of the marketplace. And now they're pivoting through M&A and some organic development, trying to fill gaps in their portfolio. And then you got all these new entrants coming to the market claiming end-to-end visibility across the so-called modern cloud and now edge-native stacks. Meanwhile, cloud players are gaining traction and participating through a combination of native tooling combined with strong ecosystems to address this problem. But you know, recent survey research from ETR confirms our thesis that no one company has at all. Here's the thing. Customers just want to figure out the root causes quickly and efficiently as possible. It's one thing to observe the stack end-to-end, but the question is who is automating the observers? And that's why we're here today. Hello, my name is Dave Vellante and welcome to this special CUBE presentation where we dig into root cause analysis and specifically our one company, Zebrium, is using unsupervised machine learning to detect anomalies in pinpoint root causes and delivering it as an automated service. And in this session, we have two deep dives. First, we're going to dig into this exciting new field of Rcast root cause as a service with two of the founders and technical experts behind Zebrium. And then we bring in two technical experts from Cisco, an early Zebrium customer who ran a POC with Zebrium's service, automating and identifying root cause problems within four very well-established and well-known Cisco product lines including WebEx client and UCS. I was pretty amazed at the results and I think you'll be impressed as well. So thanks for being here, let's get started. With me right now is Larry Lancaster who's a founder and CTO of Zebrium and he's joined by Rod Bag who's the founder and vice president of engineering at the company. Jens, welcome, thanks for coming on. Thanks. To be here. All right, Rod, talk to me. Talk to me about software downtime, what root cause means, all the buzzwords in your domain, MTTR and SLO, what do we need to know? Yeah, I mean, it's like you said, I mean, it's extremely important to our customers and to most businesses out there to drive up time and avoid as much downtime as possible. So when you think about it, all of these businesses, most companies nowadays either their product is software and it's running on the web and that's how you get a point and click or their business depends on it, internal systems to drive their business and to run it. When that is down, that is hugely impacting to them. So if you take a look way back 20, 30 years ago, software was simple, there wasn't much to it. It was pretty monolithic and maybe it took a couple of people to maintain it and keep it running. It wasn't really anything complicated about it. It was a single tenant piece of software. Today's software is so complicated, often running maybe hundreds of services to keep that or to actually implement what that software is doing. So as you point out, enter the sort of observability space and the tools that are now in use to help monitor that software and make sure when something goes wrong, they know about it. But there's kind of an interesting stat around the observability space. So when you look at observability in the context or through the lens of the cost of downtime, it's really interesting. So observability tools are about a $20 billion market. But the cost of downtime, even with that in place is still hundreds of billions of dollars. So you're not taking much of a bite out of what the real problem is. You have to solve root cause and get to that fast. So it's all great to know that something went wrong, but you got to know why. And it's our contention here that really, when you take a look at the observability space, you have metrics, that's a great tool. I mean, there's lots of great tools out there with around metrics monitoring that's going to tell you when something went wrong. It's very rarely it's going to tell you why. Similarly for tracing, it's going to point you to where the issue is. It's going to take you through that stack and probably pinpoint where it's happening or where something is running slow potentially. So that's great. But again, the root cause of why it's happening is going to be buried in log files. And I can expand on that a little bit more, but when you're a software developer and you're writing your software, those log files are a wealth of information. It's just a set of breadcrumbs that are littered with facts about how the software is behaving and why it's doing what it's doing or why it went wrong. And it's that that really gets you to the root cause very fast. And that's our contention, is that these software systems are so complex nowadays and that the root cause is lying in those logs. So how do you get there fast? We would contend that you better automate that or you're just doomed for failure. And that's where we come in, getting to that. Thank you, Rod. You know, it's interesting. You talk about the $20 billion market. There's an analogy with security, right? We spend $80 billion, $100 billion a year on securing our infrastructure. And yet we lose probably a closer to a trillion dollars a year in breaches. And there's a similar analogy here. $20 billion could be 5x in downtime impacts or more. Okay, let's go to Larry. Tell us a little bit more about Zebrium. I'm interested always in asking a founder why you started the company. Rod touched on that a little bit. You guys have invented this concept of R-Cas. What does it mean? What problems does it solve? And how does it solve the problem? Let's get into it. Yeah, hey, thanks, Dave. So I think when you said, who's automating the observer, that's a great way to think about it because what observability really means is it's a property of a system that means you can see into it. You can observe the internal state and that makes it easier to troubleshoot, right? But the problem is if it's too complicated, you just push the bottleneck up to your eyeball. There's only so much a person can filter through manually, right? And I love the way you put that. So that's a great way to think about it is automating the observer. Now, of course it means that, you reduce your MTTR, you meet your service level objectives, all that stuff, you improve customer experience, that's all true. But it's important to step back and realize, like we have cracked a real nut here. People have been trying to figure out how to automate this part of sort of the troubleshooting experience, this human part of finding the root cause indicators for a long time. And until Zebra came along, I would argue no one's really done it right. So, I think it's also important, as we step back, we can probably look forward five to 10 years and say, everyone's gonna look back and say, how did we do all this manually? You're gonna see the sort of last mile of observability and troubleshooting is gonna be automated everywhere. Because otherwise, people are just, they're not gonna be able to scale their business. So, I think one more thing that's important to point out is, I think Zebra, it's one thing to have the technology, but we've learned we need to deliver it right where people are today. You can't just expect people to dive into a new tool. So, we're looking at, if you look at Zebra, you'll put us on your dashboard and we don't care what kind of a dashboard it is. It could be, Datadog, New Relic, Elastic, Dynatrace, Grafana, AppDynamics, ScienceLogic, we don't care. They're all our friends. So, we're more interested in getting to that root cause than trying to fight these incumbents and all that stuff. Yeah, so interesting. Again, another analogy I think about, you talked about automation, we're gonna look back and say, we're never gonna do this again. It's like provisioning LUNs. Nobody provisions LUNs anymore. It's all automated. That's funny. So, Larry, stay with you. Then the skeptic in me says this sounds amazing, but it might be too good to be true. Tell us how it works. Yeah, so it's interesting. So Cisco came along and they were equally skeptical. So what they did was they took a couple of months and they did a very detailed study. And they got together 192 incidents across four product lines where they knew that the root cause was in the logs and they knew what that root cause was because they had had their best engineers work on those cases and take detailed notes of the incidents that had taken place. And so they ran that data through the Zebrium software. And what they found was that in more than 95% of those incidents, Zebrium reflected the correct root cause indicators at the correct time. Like that blew us away. When we saw that kind of evidence, Dave, I have to tell you, everyone was just jumping up and down. It was like the Apollo command center when they finally touched down on the moon kind of thing. So it's really exciting at a point of time to be at the company, like just seeing everything finally being proven out according to this vision. I'm gonna tell you one more story, which is actually one of my favorites because we got a chance to work with Seagate Live Cloud. So they're a hyper-modern SaaS business. They're an S3 competitor. Zoom has their files stored on Live Cloud to let you know who they are. So essentially, what happened was they were in alpha in early access and they had an outage and it was pretty bad. I mean, it went on for longer than a day actually before they were completely restored. And it was, fortunately for them, it was early access. So no one was expecting uptime service level objectives and so on, but they were scared because they realized if something like this happens in production, they're screwed. So what they did was, they saw Zebrium, they did some research, they saw Zebrium, they went in a staging environment, recreated the exact event that they had had. And what they saw was immediately Zebrium pops up a root cause report that tells them exactly the root cause that they took over a day to find. These are the kind of stories that let us know we're on to something transformation. Yeah, that's great. I mean, you guys are jumping up and down. I'm sure we're going to hear from Cisco later. I bet you they were jumping up and down too because they didn't have to do all that heavy lifting anymore. So Rod, Larry's just sort of implying that you're, actually you guys both talked about it, that your tools agnostic. So how does one actually use the service? How do I deploy it? Yeah, so let me step back. So when we talk about logs, right? Like all these breadcrumbs being in logs and everything else. So they are a great wealth of information, but people hate dealing with them. I mean, they hate having to go in and figure out what log to look at. In fact, we had one of our, or we've heard from several of our customers now prior to using Zebrium. But when they're have some issue and they know there's something wrong, something on their dashboard has told them that something's wrong. Maybe a metrics has taken a blip or something's happened that they know there's a problem. We've heard from them that it can take like a number of hours just to get to the right set of logs. Like figuring out over these hundreds of services where the logs are to get to them, maybe searching in a log manager, just to get into the right context even can take hours. So that's obviously the problem we saw, but we don't want them just looking at logs. I mean, we don't want to put them back in the thing they don't like doing because people don't do that, they don't like doing. So we put it up on the dashboard. So if something is going wrong with your metrics and that's the indicator or maybe it's something with tracing that you're sort of digging through now that you know something's wrong, we will be right on that same dashboard. So we're deployed as a SaaS service. You send us your logs, you click on one of our integrations and we integrate with all these tools that Larry's talked about. And when we detect anything that is a root cause report, it will show up on your dashboard in the same timeline as those blips in your metrics. So when you see something going wrong and you know there's an issue, take a look at the portion of your dashboard that is us and we're going to tell you why. So we're going to get you to the why that went wrong. No other work to be, you can also click down and click through to us so that you end up in our portal if you want to do some more digging around if you need to or whatever, maybe to get some context, what have you. But it's rare that you ever need to do that. The answer should be right there on your dashboard. And that's how we expect people to use it. We don't want them digging in logs and going through things. We want it to be right in their workflow. Great, thank you Larry. So Rod, we talked about Cisco, we're going to hear more from them in a moment in Seagate. I would think this is like a perfect solution for a SaaS provider, anybody doing AI ops, do you have some examples of those types of firms leading into this? Yeah, a couple of great, I mean, we've got many of them, but a couple that I'll touch on, we have an actual AI ops company that was looking for sort of some complimentary technology and so on. And so they decided to just put us through our paces by having one of their own SREs sign up for our service in our SaaS environment and send the logs from their system to us and just see how we did. So it turned out we ended up talking back to this SRE like a week after he had installed the product or signed up and started sending us logs. And he was hymning Han saying that he was busy like every SRE is and that he didn't have a chance to really do much with us yet. And we just having this conversation on the phone and he comes to tell us that, yeah, I've been busy because we had this terrible outage like five days ago. We said like, did you actually look on the Zebra and Passport? And he goes, you know what? I didn't even think to do it yet. I've just been so busy and frazzled. So we have an integration with that company. He hadn't put that integration in so it wasn't in his dashboard yet but it was certainly on ours. So he went there and he looks and he looks on the time range of when he had had this incident and right at the very top of the page on our portal was the incident with the root cause and he was flabbergasted. It literally would have saved them hours and hours and hours. They had this issue going on for over 24 hours and we had the answer right there in five minutes. I mean, it was crazy. And we get that kind of story. It's just like the C-gable. If you use us and you have a problem we're going to detect it and you're going to hear from Cisco how successful we are at detecting things. I mean, it'll be there when you have a problem. In SaaS companies, you know, one of our customers is our Chara. They do cost optimizations for cloud properties, you know, for AWS optimization, Google Cloud and so on. But they use our software and they have a lot of interaction obviously with these cloud vendors and the APIs of those cloud vendors. So, you know, in order to figure out your costing at AWS they're using all those APIs. So it turned out we, you know, they had some issue that they, what were their services were breaking and we had that root cause report right on the screen again within five minutes that was pointing to an API problem with Google and they had changed one of their APIs and our chair was not aware of it. So their stuff was breaking because of a change downstream that we had caught. And I'll just tell you one last one because it's somewhat related to one of these cloud vendors of, you know, it was a big cloud vendor who had an outage a couple of months ago. And it's interesting because, you know, a lot of our customers will set up shared Slack channels with us where we're monitoring or seeing their incidents as well as they are. So we get a little slack representation of the incident that we detected for them or the root cause that we detected for them and that's in a shared community channel. So we could see this happening when that AWS outage happened. We could see our customers getting impacted by that AWS outage and the root cause of what was going on there in AWS that was impacting our customers. That was showing up in our incidents. Now we didn't obviously, you know, have the very root cause of what was going on in AWS per se but we were getting to the root cause of why their application, our customer's applications were failing and that was because of the issues going on at AWS. Very interesting. I mean, I think one of your biggest challenges is going to be getting people's attention because these SREs is so busy, their hair's on fire. Right. Yeah. I tell you, if you get their attention, they love it. I mean, this AI ops company, I didn't even tell you the punchline there, but you know, they had this incident that occurred that we found and quite literally the next week they ended up signing up as a paid customer. That's great. And, Larry, I'll give you the last word. I mean, you know, Rod was talking about, you know, API changes and APIs and you know, there's still a lot of scripts out there. You guys, if I understand it correctly, run both as a service in the cloud and you can run on-prem, which is important because there's a lot of sensitive information and logs and people don't want to leave. That's right. Absolutely. But close it out here. Yeah. I mean, you can, that's right. You can run it on-prem just like we run it in our cloud. You can run it in your cloud or on your own infrastructure. No, that's all true. You know, I think the one, I think the one hurdle now that we have left as a company is getting the word out and getting people to believe that this is actually possible and try it for themselves. You don't believe it. Do a POC, try it yourself. And it's, you know, people become so jaded by the lack of, you know, real sort of innovation in the software industry for the last 10 years that it's hard to get people to, but guys, you got to give it a shot. I'm telling you, I'm telling you right now it works. And you'll hear more about that from one of our customers in a minute. All right, guys, thanks so much. Great story. I really appreciate your sharing. Thank you. I appreciate the time. Okay. In a moment, we're going to hear from Cisco, who is the customer in this case example and a company that has, they have quite an impressive suite of observability tooling and they've done a pretty compelling proof of concept with Zebrium using real data on some Cisco products that you've heard of like WebEx. So stay tuned and learn about how you can really take advantage of this new technology called Root Cause as a Service. You're watching theCUBE, the leader in enterprise and emerging tech coverage. Hello. Zebrium Root Cause as a Service helps solve the age old problem of finding the root cause when a software or infrastructure failure occurs. A recent third-party study showed it can identify the best root cause indicators in the logs in over 95% of incidents. And the best part is you see the results directly on your existing monitoring dashboards. Let's see it in action. I've installed an online shopping app on a Kubernetes cluster. It's being monitored by Datadog and logs are being sent to Zebrium. The dashboard shows everything is healthy. I'm now going to simulate a real-life failure by running a chaos experiment that corrupts the network in one of the pods in the cluster. After a moment, the app stops working and the dashboard shows network traffic and CPU have dropped to zero. Soon after, the Zebrium app widget detects something. Quick aside, there were no rules in place to detect this. The NLP summary here provides a nice clue of what happened. Clicking it shows a word cloud with pod network corruption, the name of the experiment I ran right at the top. Let's view the full report. Millions of loglines were generated while the failure occurred but Zebrium picked out just 46 from seven different services. The 46 lines tell the story. The root cause is the chaos runner starting and kicking off a network corruption experiment which caused a Q-length misconfig on ETH zero. Then we see the symptoms. How it impacted the app, a timeout in the order service, a 500 error in the front-end service and a socket exception in the cart service. All this was picked out automatically with no manual training or rules. With Zebrium, you never need to dig through logs again. Getting started with Zebrium in your monitoring tool is easy. Book a demo or start a free trial at Zebrium.com. Okay, we're back with Archie Bossu who is Cisco's resident philosopher. Who also holds a master's in computer science. We're going to have to unpack that a little bit. And Nijati Cherheli, who's technical lead at Cisco. Welcome guys, thanks for coming on theCUBE. Happy to be here. Excellent. All right, let's get into it. We want you to explain how Cisco validated the Zebrium technology and the proof points that you have that actually works as advertised. So first, Archie, first tell us about Cisco TAC. What does Cisco TAC do? So TAC is, otherwise it's an acronym for Technical Assistance Center, is Cisco Support Arm, the support organization. And the risk of sounding like I'm spouting a corporate line, the easiest way to summarize what TAC does is provide world-class support to Cisco customers. What that means is we have about 8,000 engineers worldwide and any of our Cisco customers can either go on our web portal or call us to open a support request. And we get about 2.2 million of these support requests a year. And what these support requests are essentially, the customer will describe something that they need done, some networking goal that they have that they want to accomplish. And then it's TAC's job to make sure that that goal does get accomplished. Now, it could be something like they're having trouble with an existing network solution and it's not working as expected or it could be that they're integrating with a new solution, they're upgrading devices, maybe there was a hardware failure. Anything really to do with networking support and the customer's network goals, if they open up a case requesting for help, then TAC's job is to respond and make sure the customer's questions and requirements are met. About 44% of these support requests are usually trivial and can be solved within a call or within a day. But the rest of TAC cases really involve getting into the network device, looking at logs. It's a very technical role, it's a very technical job. You need to be conversant with network solutions, their designs, protocols, et cetera. Wow, so 56% non-trivial. So I would imagine you spend a lot of time digging through logs, is that true? Can you quantify that? Like every month, how much time do you spend digging through logs and is that a pain point? Yeah, it's interesting you asked that because when we started this on this journey to augment our support engineers workflow with Zebra solution, one of the things that we did was we went out and asked our engineers what their experience was like doing log analysis and the anecdotal evidence was that on average an engineer will spend three out of their eight hours reviewing logs either online or offline. So what that means is either with the customer live on a Webex, they're going to be going over logs, network state information, et cetera, or they're going to do it offline where the customer sends them the logs. It's attached to a service request and they review it and try to figure out what's going on and provide the customer with information. So it's a very large chunk of our day. I said 8,000 plus engineers and so three hours a day that's 24,000 man hours a day spent on log analysis. Now the struggle with logs or analyzing logs is by out of necessity logs are very contrite. They try to pack a lot of information in a very little space and this is for performance reasons, storage reasons, et cetera, but the side effect of that is they're very esoteric. So they're hard to read. If you're not conversant, if you're not the developer who wrote these logs or you aren't doing code deep dives and you're looking at where this logs getting printed and things like that, it may not be immediately obvious or even after a little while, it may not be obvious what that log line means or how it correlates to whatever problem you're troubleshooting. So it requires tenure, it requires, like I was saying before, it requires a lot of knowledge about the protocol, what's expected because when you're doing log analysis, what you're really looking for is a needle in a haystack. You're looking for that one anomalous event, that single thing that tells you this shouldn't have happened and this was a problem, right? Now, doing that kind of anomaly detection requires you to know what is normal. It requires you to know what the baseline is and that requires a very in-depth understanding of, the state changes for that network solution or product. So it requires time, tenure and expertise to do well and it takes a lot of time even when you have that kind of expertise. Wow, so thank you, Artri and Najati. That's about, that's almost two days a week for a technical resource that's not inexpensive. So what was Cisco looking for to sort of help with this and how do you stumble upon Zebra? Yeah, so I mean, we have our internal automation system which has been running more than a decade now and what happens is when a customer attached a log bundle or a diagnostic bundle into the service request, we take that from the SR, we analyze it and we represent some kind of information. It can be alerts or some tables, some graph to the engineer so they can troubleshoot this particular issue. This is an incredible system but it comes with its own challenges around maintenance to keep it up to date and relevant with Cisco's new products or new version of a product, new defects, new issues and all kinds of things. And what I mean with those challenges are, let's say Cisco comes up with a product today. We need to come together with those engineers. We need to figure out how this bundle works, how it's structured out. We need to select individual logs which are relevant and then start modeling these logs and get some values out of those logs using parsers or some reg access to come to a level that we can consume the logs. And then people start writing rules on top of that abstraction. So people can say in this log, I'm seeing this value together with this other value in another log, maybe I'm hitting this particular defect. So that's how it works. And if you look at it, the abstraction, it can fail the next time and the next release when the development or the engineer decides to change that log line which you write that reg access. Or we can come up with a new version which we completely change the services or processes than whatever you have wrote needs to be written for the new service. And we see it a lot with products like, for instance, WebEx where you have a very short release cycle that things can change maybe the next week with a new release. So whatever you are writing, especially for that abstraction and for those rules are maybe not relevant with that new release. With that being said, we have a incredible rule creation process and governance process around it, which starts with maybe a defect. And then it takes it to a level where we have an automation in place. But if you look at it, this really ties to human bandwidth. And our engineers are really busy working on customer facing, working on issues daily. And sometimes creating these rules or these parsers are not their biggest priority. So they can be delayed a bit. So we have this delay between a new issue being identified to a level where we have the automation to detect it next time that some customer faces it. So with all these questions and with all challenges in mind, we start looking into ways of actually how we can automate these automations. So these things that we are doing manually, how we can move it a bit further and automate. We had actually a couple of things in mind that we were looking for. And this being one of them being this has to be product agnostic. Like if Cisco comes up with a product tomorrow, I should be able to take it logs without writing complex rex, parsers, whatever and deploy it into the system so it can embrace our logs and make sense of it. And we wanted this platform to be unsupervised. So none of the engineers need to create rules, label logs, this is bad, this is good or train the system like which requires a lot of computational power. And the other most important thing for us was we wanted this to be not noisy at all because what happens with noises when your level of false positives really high your engineers start ignoring the good things between that noise. So they starts the next time thinking that this thing will not be relevant. So we want something with a lot less noise and ultimately we wanted this new platform or new framework to be easily adaptable to our existing workflows. So this is where we started. We start looking into the, you know the first of all internally, if we can build this thing and also start researching it. And we came up to Zebra, actually Larry one of the co-founders of Zebra we came upon his presentation where he clearly explained why this is different how this works and it immediately clicked in and we said, okay, this is exactly what we were looking for. We dive deeper, we check the blog posts where Zebra guys really explain everything very clearly there, they're really open about it. And most importantly, there is a button in their system. And so what happens usually with AI ML vendors is they have this button where you fill in your details and sales guys call you back and explains the system. Here, they were like, this is our trial system. We believe in the system you can just sign up and try it yourself and that's what we did. We took our, one of our Cisco Live DNA center wireless platforms, we started streaming logs out of it. And then we synthetically, you know introduced errors like we broke things. And then we realized that Zebra was really catching the errors perfectly. And on top of that, it was really quiet unless you are really breaking something. And the other thing we realized was during that first trial is Zebra was actually bringing a lot of contacts on top of the logs during those failures. We worked with a couple of technical leaders and they said, okay, if this failure happens I'm expecting this individual log to be there. And we found out with Zebra, apart from that individual log, there were a lot of other things which gives a bit more context around the root calls which was great. And that's why we wanted to take it to the next level. Yeah. Okay. So, you know, a couple of things to unpack there. I mean, you have the dartboard behind you which is kind of interesting because a lot of times it's like throwing darts at the board to try to figure this stuff out. But to your other point, Cisco actually has some pretty rich tools with app D and doing observability and you've made acquisitions like thousand eyes. And like you said, I'm presuming you got to eat your own dog food or drink your own champagne. And so you've got to be tools agnostic. And when I first heard about Zebra and I was like, wait a minute, really, I was kind of skeptical. I've heard this before. You're telling me all I need is plain text and a timestamp and you got my problem solved. So, and I understand that you guys said, okay, let's run a POC. Let's see if we can cut that from let's say two days a week down to one day a week. In other words, 50%. Let's see if we can automate 50% of the root cause analysis. And so you funded a POC. How did you test it? You put synthetic errors and problems in there but how did you test that it actually works, Najati? Yeah, so we wanted to take it to the next level which is meaning that we wanted to back test is with existing SARS. And we decided, we chose four different products from four different verticals, data center, security, collaboration and enterprise networking. And we find out a SARS where the engineer put some kind of log in the resolution summary. So they closed the case and in the summary of the SR they put, I identified these log lines and they led me to the root cause. And we ingested those log bundles and we tried to see if Zibrium can surface that exact same log line in their analysis. So we initially did it with Autry ourselves and after 50 tests or so we were really happy with the results. I mean, almost most of them we saw the log line that we were looking for but that was not enough. And we brought it of course to our management and they said, okay, let's try this with real users because the log being there is one thing but the engineer reaching to that log is another thing. So we wanted to make sure that when we put it in front of our users, our engineers they can actually come to that log themselves because we know this platform so we can make searches and find whatever we are looking for but we wanted them to do that. So we extended our pilots to some selected engineers as they tested with their own SRs also due to some backtesting for some SRs which are closed in the past or recently and with a sample set of I guess close to 200 SRs we found out like majority of the time almost 95% of the time the engineer could find the log they were looking for in Zebraeum's analysis. Yeah, okay. So you were looking for 50% you got to 95% and my understanding is you actually did it with four pretty well-known Cisco products WebEx client, DNA center, identity services engine, ISC and then UCS unified. So you use actual real data and that was kind of your proof point Atri, so that sounds pretty impressive and have you put this into production now and what have you found? Well, yes, we've launched this with the four products that you mentioned we're providing our TAC engineers with the ability whenever a support bundle for that product gets attached to the support request we are processing it using SENSE and then providing that SENSE analysis to the TAC engineer for their review. So are you seeing the results in production? I mean, are you actually able to reclaim that time that people are spending? I mean, it was literally almost two days a week down to a part of a day is that what you're seeing in production and what are you able to do with that extra time and people getting their weekends back are you putting them on more strategic tasks? How are you handling that? Yeah, so what we're seeing is and I can tell you from my only personal experience using this tool that troubleshooting any one of the cases I don't take more than 15 to 20 minutes to go through the Zebraim report and I know within that time either what the root cause is or I know that Zebraim doesn't have the information that I need to solve this particular case. So we've definitely seen, well, it's been very hard to measure exactly how much time we've saved per engineer, right? What we, again, anecdotally, what we've heard from our users is that out of those three hours that they were spending per day we're definitely able to reclaim at least one of those hours. And even more importantly, what the kind of feedback that we've gotten I think one statement that really summarizes how Zebraim's impacted our workflow was from one of our users and they said, well, until you provide us with this tool log analysis was a very black and white affair but now it's become really colorful. And I mean, if you think about it log analysis is indeed black or white you're looking at it on a terminal screen where the background is black and the text is white or you're looking at it as a text where the background is white and the text is black but what they're really trying to say is there hardly any visual cues that help you navigate these logs which are so esoteric, so dense, et cetera but what Zebraim does is it provides a lot of color and context to the whole process. So now you're able to quickly get to using their word cloud using their interactive histogram using the summaries of every incident you're very quickly able to summarize what might be happening and what you need to look into like what are the important aspects of this particular log bundle that might be relevant to you. So we've definitely seen that a really great use case that kind of encapsulates all of this was very early on in our experiment there was this support request that had been escalated to the business unit or the development team and the TAC engineer had really they had an intuition about what was going wrong because of their experience because of the symptoms that they'd seen they kind of had an idea but they weren't able to convince the development team because they weren't able to find any evidence to back up what they thought was happening. And it was entirely happenstance that I happened to pick up that case and did an analysis using Zebraim and then I sat down with the TAC engineer and we were very quickly within 15 minutes we were able to get down to the exact sequence of events that highlighted what the customer thought was happening evidence of what the customer what the TAC engineer thought was the root cause. And then we were able to share that evidence with our business unit and redirect their resources so that we could change down what the problem was. And that really has been that really shows you how that color and context helps in log analysis. Interesting. We do a fair amount of work in the cube in the RPA space, the robotic process automation and the narrative in the press when our RPA first started taking off was oh, it's machines replacing humans and we're going to lose jobs and what actually happened was people were just eliminating mundane tasks and the employee's actually very happy about it but my question to you was was there ever a reticence amongst your team like oh wow, I'm going to lose my job if the machine's going to replace me or have you found that people were excited about this and what's been the reaction amongst the team? Well, I think every automation and AI project has that immediate gut reaction of you're automating away our jobs and so forth. And initially there's a little bit of reticence but it's like you said once you start using the tool you realize that it's not your job that's getting automated away it's just that your job's becoming a little easier to do and it's faster and more efficient and you're able to get more done in less time that's really what we're trying to accomplish here. At the end of the day, Zivrim will identify these incidents they'll do the correlation, et cetera but if you don't understand what you're reading then that information is useless to you. So you need the human, you need the network expert to actually look at these incidents but what we are able to skin away or get rid of is all of the fat that's involved in our process like without having to download the bundle which when it's many gigabytes in size and now we're working from home with the pandemic and everything you're pulling massive amounts of logs from the corporate network onto your local device that takes time and then opening it up loading it in a text editor that takes time all of these things are we're trying to get rid of and instead we're trying to make it easier and quicker for you to find what you're looking for. So it's like you said, you take away the mundanity you take away the difficulties and the slog but you don't really take away the work the work still needs to be done. Great, guys, thanks so much appreciate you sharing your story it's quite fascinating really thank you for coming on. Thanks for having us. You're very welcome. Okay, in a moment I'll be back to wrap up with some final thoughts this is Dave Vellante and you're watching The Cube. So today we talked about the need not only to gain end to end visibility but why there's a need to automate the identification of root cause problems and doing so with modern technology and machine intelligence can dramatically speed up the process and identify the vast majority of issues right out of the box if you will. And this technology it can work with log bundles in batches or with real time data. As long as there's plain text and a timestamp it seems Zebraeum's technology will get you the outcome of automating root cause analysis with very high degrees of accuracy. Zebraeum is available on-prem or in the cloud now this is important for some companies on-prem because there's really some sensitive data inside logs that for compliance and governance reasons companies have to keep inside their four walls. Now Zebraeum has a free trial of course they'd better right so check it out at Zebraeum.com you can book a live demo and sign up for a free trial. Thanks for watching this special presentation on The Cube the leader in enterprise and emerging tech coverage on Dave Vellante and we'll see you next time.