 Okay, we're back with Autry Basu, who is Cisco's resident philosopher, who also holds a master's in computer science. We're going to have to unpack that a little bit. And the Jati Chair Heli, who's technical lead at Cisco, welcome guys, thanks for coming on theCUBE. Happy to be here. Excellent. All right, let's get into it. We want you to explain how Cisco validated the Zebrium technology and the proof points that you have that it actually works as advertised. So first, Autry, first tell us about Cisco TAC. What does Cisco TAC do? So TAC is, otherwise it's an acronym for Technical Assistance Center, is Cisco Support Arm, the support organization. And the risk of sounding like I'm spouting a corporate line, the easiest way to summarize what TAC does is provide well-class support to Cisco customers. What that means is we have about 8,000 engineers worldwide and any of our Cisco customers can either go on our web portal or call us to open a support request. And we get about 2.2 million of these support requests a year. And what these support requests are essentially the customer will describe something that they need done, some networking goal that they have that they want to accomplish. And then it's TAC's job to make sure that that goal does get accomplished. Now it could be something like they're having trouble with an existing network solution and it's not working as expected, or it could be that they're integrating with a new solution, they're upgrading devices, maybe there was a hardware failure, anything really to do with networking support and the customer's network goals, if they open up a case requesting for help, then TAC's job is to respond and make sure the customer's questions and requirements are met. About 44% of these support requests are usually trivial and can be solved within a call or within a day. But the rest of TAC cases really involve getting into the network device, looking at logs. It's a very technical role, it's a very technical job. You need to be conversant with network solutions, their designs, protocols, et cetera. Wow, so 56% non-trivial. So I would imagine you spend a lot of time digging through logs, is that true? Can you quantify that? Like every month, how much time do you spend digging through logs and is that a pain point? Yeah, it's interesting you asked that because when we started this on this journey to augment our support engineers workflow with Zebra solution, one of the things that we did was we went out and asked our engineers what their experience was like doing log analysis and the anecdotal evidence was that on average an engineer will spend three out of their eight hours reviewing logs either online or offline. So what that means is either with the customer live on a Webex, they're going to be going over logs, network state information, et cetera, or they're gonna do it offline where the customer sends them the logs, it's attached to a service request and they review it and try to figure out what's going on and provide the customer with information. So it's a very large chunk of our day. I said 8,000 plus engineers and so three hours a day that's 24,000 man hours a day spent on log analysis. Now the struggle with logs or analyzing logs is that by out of necessity, logs are very contrite. They try to pack a lot of information in a very little space. And this is for performance reasons, storage reasons, et cetera, but the side effect of that is they're very esoteric. So they're hard to read if you're not conversant, if you're not the developer who wrote these logs or you aren't doing code deep dives and you're looking at where this logs getting printed and things like that, it may not be immediately obvious or even after a little while, it may not be obvious what that log line means or how it correlates to whatever problem you're troubleshooting. So it requires tenure, it requires, like I was saying before, it requires a lot of knowledge about the protocol, what's expected because when you're doing log analysis, what you're really looking for is a needle in a haystack. You're looking for that one anomalous event, that single thing that tells you this shouldn't have happened and this was a problem. Now doing that kind of anomaly detection requires you to know what is normal. It requires you to know what the baseline is and that requires a very in-depth understanding of the state changes for that network solution or product. So it requires time, tenure and expertise to do well and it takes a lot of time even when you have that kind of expertise. Wow, so thank you, Artri. And Najati, that's about, that's almost two days a week for a technical resource that's not inexpensive. So what was Cisco looking for to sort of help with this and how did you stumble upon Zebrium? Yeah, so I mean, we have our internal automation system which has been running more than a decade now. And what happens is when a customer attach a log bundle or a diagnostic bundle into the service request, we take that from the SR, we analyze it and we represent some kind of information. It can be alerts or some tables, some graph to the engineer so they can troubleshoot this particular issue. This is an incredible system but it comes with its own challenges around maintenance to keep it up to date and relevant with Cisco's new products or a new version of a product, new defects, new issues and all kinds of things. And what I mean with those challenges are, let's say Cisco comes up with a product today. We need to come together with those engineers. We need to figure out how this bundle works, how it's structured out. We need to select individual logs which are relevant and then start modeling these logs and get some values out of those logs using parsers or some reg access to come to a level that we can consume the logs. And then people start writing rules on top of that abstraction. So people can say, in this log, I'm seeing this value together with this other value in another log, maybe I'm hitting this particular defect. So that's how it works. And if you look at it, the abstraction, it can fail the next time and the next release when the development or the engineer decides to change that log line, which you write that reg X. Or we can come up with a new version which we completely change the services or processes than whatever you have wrote needs to be written for that new service. And we see it a lot with products like, for instance, Webex where you have a very short release cycle that things can change maybe the next week with a new release. So whatever you are writing, especially for that abstraction and for those rules are maybe not relevant with that new release. With that being said, we have an incredible rule creation process and governance process around it, which starts with maybe a defect. And then it takes it to a level where we have an automation in place. But if you look at it, this really ties to human bandwidth. And our engineers are really busy working on customer facing, working on issues daily. And sometimes creating these rules or these parsers are not their biggest priority. So they can be delayed a bit. So we have this delay between a new issue being identified to a level where we have the automation to detect it next time that some customer faces it. So with all these questions and with all challenges in mind, we start looking into ways of actually how we can automate these automations. So there's things that we are doing manually, how we can move it a bit further and automate. And we had actually a couple of things in mind that we were looking for. And this being one of them being this has to be product agnostic. Like if Cisco comes up with a product tomorrow, I should be able to take it logs without writing complex rex, parsers, whatever and deploy it into the system so it can embrace our logs and make sense of it. And we wanted this platform to be unsupervised. So none of the engineers need to create rules. Label logs, this is bad, this is good. Or train the system like which requires a lot of computational power. And the other most important thing for us was we wanted this to be not noisy at all because what happens with noises when your level of false positives really high, your engineers start ignoring the good things between that noise. So they start the next time thinking that this thing will not be relevant. So we want something with a lot less noise. And ultimately, we wanted this new platform or new framework to be easily adaptable to our existing workflows. So this is where we started. We started looking into the, you know the first of all internally if we can build this thing and also start researching it. And we came up to Zebrium, actually Larry, one of the co-founders of Zebrium we came upon his presentation where he clearly explained why this is different, how this works and it immediately clicked in and we said, okay, this is exactly what we were looking for. We dive deeper, we check the blog posts where Zebrium guys really explain everything very clearly there. They're really open about it. And most importantly, there is a button in their system. So what happens usually with AI ML vendors is they have this button where you fill in your details and sales guys call you back and you know, explains the system. Here, they were like, this is our trial system. We believe in the system you can just sign up and try it yourself. And that's what we did. We took our, one of our Cisco Live DNA Center wireless platforms, we start streaming logs out of it. And then we synthetically, you know introduce errors like we broke things. And then we realized that Zebrium was really catching the errors perfectly. And on top of that, it was really quiet unless you are really breaking something. And the other thing we realized was during that first trial is Zebrium was actually bringing a lot of contacts on top of the logs during those failures. We worked with a couple of technical leaders and they said, okay, if this failure happens, I'm expecting this individual log to be there. And we found out with Zebrium, apart from that individual log, there were a lot of other things which gives a bit more context around the root calls, which was great. And that's why we wanted to take it to the next level. Yeah. Okay. So, you know, a couple of things to unpack there. I mean, you have the dartboard behind you, which is kind of interesting because a lot of times it's like throwing darts at the board to try to figure this stuff out. But to your other point, Cisco actually has some pretty rich tools with APD and doing observability and you've made acquisitions like Thousand Eyes. And like you said, I'm presuming you got to eat your own dog food or drink your own champagne. And so you've got to be tools agnostic. And when I first heard about Zebrium, I was like, wait a minute, really, I was kind of skeptical. I've heard this before. You're telling me all I need is plain text and a timestamp and you got my problem solved. So, and I understand that you guys said, okay, let's run a POC. Let's see if we can cut that from, let's say two days a week down to one day a week, in other words, 50%. Let's see if we can automate 50% of the root cause analysis. And so you funded a POC. How did you test it? You put synthetic errors and problems in there, but how did you test that it actually works, Najati? Yeah. So we wanted to take it to the next level, which is meaning that we wanted to back test is with existing SARs. And we decided, we chose four different products from four different verticals, data center, security, collaboration and enterprise networking. And we find out a SARs where the engineer put some kind of log in the resolution summary. So they closed the case and in the summary of the SAR, they put, I identified these log lines and they led me to the root cause analysis. And we ingested those log bundles and we tried to see if Zibrium can surface that exact same log line in their analysis. So we initially did it with Autry ourselves and after 50 tests or so, we were really happy with the results. I mean, almost most of them, we solved the log line that we were looking for, but that was not enough. And we brought it of course to our management and they said, okay, let's try this with real users because the log being there is one thing, but the engineer reaching to that log is another thing. So we wanted to make sure that when we put it in front of our users, our engineers, they can actually come to that log themselves because we know this platform and so we can make searches and find whatever we are looking for. But we wanted them to do that. So we extended our pilots to some selected engineers and they tested with their own SARs. Also do some back testing for some SARs which are closed in the past or recently. And with a sample set of I guess close to 200 SARs, we found out like majority of the time, almost 95% of the time, the engineer could find the log they were looking for in Zebraeum's analysis. Yeah, okay, so you were looking for 50%, you got to 95%. And my understanding is you actually did it with four pretty well-known Cisco products, WebEx client, DNA center, identity services engine, ISC, and then UCS unified. So you use actual real data and that was kind of your proof point. But, Atri, so that sounds pretty impressive and have you put this into production now and what have you found? Well, yes, we've launched this with the four products that you mentioned. We're providing our attack engineers with the ability whenever a support bundle for that product gets attached to the support request, we are processing it using Sense and then providing that Sense analysis to the attack engineer for their review. So, are you seeing the results in production? I mean, are you actually able to reclaim that time that people are spending? I mean, it was literally almost two days a week down to a part of a day is that what you're seeing in production and what are you able to do with that extra time and people getting their weekends back? Are you putting them on more strategic tasks? How are you handling that? Yeah, so what we're seeing is, and I can tell you from my own personal experience using this tool that troubleshooting any one of the cases I don't take more than 15 to 20 minutes to go through the Zebraim report. And I know within that time, either what the root cause is or I know that Zebraim doesn't have the information that I need to solve this particular case. So, we've definitely seen, well, it's been very hard to measure exactly how much time we've saved per engineer, right? What we, again, anecdotally, what we've heard from our users is that out of those three hours that they were spending per day, we're definitely able to reclaim at least one of those hours. And even more importantly, the kind of feedback that we've gotten in terms of, I think one statement that really summarizes how Zebraim's impacted our workflow was from one of our users and they said, well, until you provide us with this tool, log analysis was a very black and white affair, but now it's become really colorful. And I mean, if you think about it, log analysis is indeed black and white. You're looking at it on a terminal screen where the background is black and the text is white or you're looking at it as a text where the background is white and the text is black. But what they're really trying to say is there are hardly any visual cues that help you navigate these logs, which are so esoteric, so dense, et cetera. But what Zebraim does is it provides a lot of color and context to the whole process. So now you're able to quickly get to, using their word cloud, using their interactive histogram, using the summaries of every incident, you're very quickly able to summarize what might be happening and what you need to look into, like what are the important aspects of this particular log bundle that might be relevant to you. So we've definitely seen that. A really great use case that kind of encapsulates all of this was very early on in our experiment, there was this support request that had been escalated to the business unit or the development team. And the TAC engineer had really, they had an intuition about what was going wrong because of their experience, because of the symptoms that they'd seen. They kind of had an idea, but they weren't able to convince the development team because they weren't able to find any evidence to back up what they thought was happening. And it was entirely happenstance that I happened to pick up that case and did an analysis using Zebraim. And then I sat down with the TAC engineer and we were very quickly, within 15 minutes, we were able to get down to the exact sequence of events that highlighted what the customer thought was happening, evidence of what the, so not the customer, what the TAC engineer thought was the root cause. And then we were able to share that evidence with our business unit and redirect their resources so that we could change down what the problem was. And that really has been, that really shows you how that color and context helps in log analysis. Interesting. We do a fair amount of work at theCUBE in the RPA space, the robotic process automation. And the narrative in the press when our RPA first started taking off was, oh, it's machines replacing humans and we're going to lose jobs. And what actually happened was people were just eliminating mundane tasks and the employee's actually very happy about it. But my question to you was, was there ever a reticence amongst your team? Like, oh wow, I'm going to lose my job if the machine is going to replace me or have you found that people were excited about this and what's been the reaction amongst the team? Well, I think every automation and AI project has that immediate gut reaction of you're automating away our jobs and so forth. And initially there's a little bit of reticence but I mean, it's like you said, once you start using the tool, you realize that it's not your job that's getting automated away. It's just that your job's becoming a little easier to do and it's faster and more efficient and you're able to get more done in less time. That's really what we're trying to accomplish here. At the end of the day, Zevrim will identify these incidents. They'll do the correlation, et cetera. But if you don't understand what you're reading then that information is useless to you. So you need the human, you need the network expert to actually look at these incidents. But what we are able to skin away or get rid of is all of the fat that's involved in our process. Like without having to download the bundle, which when it's many gigabytes in size and now we're working from home with the pandemic and everything, you're pulling massive amounts of logs from the corporate network onto your local device that takes time. And then opening it up, loading it in a text editor, that takes time. All of these things we're trying to get rid of and instead we're trying to make it easier and quicker for you to find what you're looking for. So it's like you said, you take away the mundanity, you take away the difficulties and the slog but you don't really take away the work. The work still needs to be done. Great. Guys, thanks so much. Appreciate you sharing your stories. It's quite fascinating, really. Thank you for coming on. Thanks for having us. You're very welcome. Okay, in a moment, I'll be back to wrap up with some final thoughts. This is Dave Vellante and you're watching theCUBE. So today we talked about the need not only to gain end to end visibility but why there's a need to automate the identification of root cause problems. And doing so with modern technology and machine intelligence can dramatically speed up the process and identify the vast majority of issues right out of the box, if you will. And this technology, it can work with log bundles in batches or with real-time data. As long as there's plain text and a timestamp, it seems Zibrium's technology will get you the outcome of automating root cause analysis with very high degrees of accuracy. Zibrium is available on-prem or in the cloud. Now this is important for some companies on-prem because there's really some sensitive data inside logs for compliance and governance reasons. Companies have to keep inside their four walls. Now Zibrium has a free trial, of course, they better, right? So check it out at Zibrium.com. You can book a live demo and sign up for a free trial. Thanks for watching this special presentation on theCUBE, the leader in enterprise and emerging tech coverage on Dave Vellante and we'll see you next time.