 Hello, am I audible in the back? Yes, excellent. So I think when the parallel sessions are going on, one of the challenges is how to get an audience. I need to come up with a strategy to do that. My name is Sapnam, and I'm gonna talk about anomaly detection. How many of you have listened to what anomaly detection? All right, I think I'm gonna have a good time. So anomaly detection, I don't think I need to do a marketing pitch here. It's a well prominent problem in credit card fraud detection, in IT infrastructure area, and in complex systems such as automobile, chemical industry, you name it. You will encounter this problem. So I'm gonna focus more on IT infrastructure area, and the IT infrastructure area, you can imagine that here is our IT admin, a cute little boy, and here he's looking at the dashboard, and there are hundreds of thousands of streams going in the backdrop, and you need to detect anomalies, or you need to do a health monitoring of it, and if there is any severe issues, bring out to the attention of IT admin. Now, I'm showing a three different streams here, and the objective of showing these three different streams is that the complexity of each data stream is gonna be different. What you're really looking at the data, which is a very fast time scale data, and also each stream may be simple or very complex. It may have a seasonality, it may not have a seasonality, and so on. The key message I wanna say is, if we come up with a heavy hammer of a deep learning, we don't need to apply a deep learning problem for this guy, right? So when we choose a machine learning algorithm, let's choose it appropriately for appropriate data stream. That's the message number one. Now, let me fix the ideas here, the way we are approaching the problem. What we are saying is broadly speaking, there are two kind of anomalies. One is a local anomaly, and the other one is a global anomaly. What do you mean by local anomaly? It's like you're looking at time scale, which is in seconds or minutes, and you need to make a decision on that, that hey, is there anomaly there? So here's a very simple representation of that. You know, I have a CPU on the y-axis, it could be any metric, CPU, memory, response time, or any other metric. I'm looking at the univariate here. When we look at, in kind of a sort of a hawk's eye, you basically, I don't know whether hawk can do that, but at least a human can do it. You know, you assume in your mind that here is a baseline, and with respect to the baseline, there is a significant departure, and that's the reason there is an anomaly there. So if we are talking about a very short time scale, I am, or actually we are naming it as a local anomaly. Now, but when we are looking at hourly scale, or a monthly scale, or a weekly scale, there is a seasonality in the data, then we are calling it as a global anomaly. So that's a typical example in a sort of e-commerce traffic where every Tuesday it goes up high, but in certain Tuesday it doesn't go up, then you raise an alarm or you raise an anomaly there. So what I'm gonna do is I'm gonna show some videos for local anomaly and a global anomaly, and then I will go deeper into it. So what it shows is a local anomaly plot. Imagine that you have a data streaming happening, there is a whole bunch of data coming in, and we are learning for every 100 samples or so, and with respect to this baseline, you kind of trigger the alarm. And then you again move your window in a sliding window manner, and then you again detect that, hey, did the distribution of the data which is coming in, is it significantly different, then you again raise an alarm. And these alarms or anomalies, I am kind of interchangeably using it. Now, but if you notice here, I'm also marking some of them as a blue ones. Now, what do I mean by that? They are something like false positive. In this case, I'm able to say that because I have a ground truth, and that's a challenge in anomaly detection problem that where you set your threshold, should I put it up or way down, and there will be always a trade-off between false positives and a misdetection. Now, the key message here is we are looking at in a shorter time scale, and another thing is, which in every hundred samples or so, we learn the baseline and then we apply it to the next baseline. So the idea here is that, how can we make and learn our models in the real time and keep applying it as the data arrives? So this was a model of a local anomaly, and now I have a model of sort of an example of a global anomaly. So the local anomalies have done its job, but the global anomaly is now figures out that there is a seasonality in the data, and it says that hope, there are these anomalies, they should not be there. So it suppresses that. And this guy here is actually not, it's a significant departure from the global model. So there is actually anomaly there. Now what you see down here is another example that the local detection actually is saying that there should be anomalies, but the global model is actually suppressing it. So these are kind of toy examples to give the ideas. And what both of these examples actually illustrate that we have a combinations of local anomalies and a global anomalies. And we need to have a fusion of all these models to really give a meaningful anomalies in the end. So what's the underlying anomaly detection pipeline here? So the pipeline here is we first do the data pre-processing or any kind of a feature extraction and those kind of things. And the one I was saying that you need to also estimate the complexity of the data stream. We figure out that you pick up the right hammer for the right data stream. After that, we do a local anomaly detection. Once we do a local anomaly, we perform the global anomaly detection to figure out if there is any seasonality in the data, then use that knowledge to further suppress or generate anomalies. And in the end, we do anomaly suppression and fusion. And the key idea there is that if you're getting lots of anomalies, you don't need to send alert for each anomaly. And the example is if you have 100 anomalies in 10 minutes, you don't want to send 100 emails. So you just send a one email for that set of anomalies. So let's go a little deeper. Here, of course, I think since lots of people here are aware of unsupervised machine learning and also about the anomaly detection, I think some of these techniques you guys have used in your applications, like one class SVAM, Conrail Density Estimator and parametric models and so on. What I would get deeper into the pages test, there'll be my next slide here. Over here, I just want to give a message that a typical way people kind of say that, hey, I'm gonna say anomaly something is I first fit a parametric model. It could be log normal, could be Poisson or could be gamma distribution or some distribution you pick up. You validate that and then you say that, hey, is that point too far away from that distribution? You compute the probability and you compute actually a log of that and then you compute with some kind of a scoring mechanism that, hey, this is too far, so I'm gonna declare as anomaly. Or if you don't even go in that far, you can actually come up and say that, hey, I'm gonna do plus minus three sigma that is been there in 1960s. People have been using it, but I think what kind of problem we are looking at is we want to make sure that we do more robust anomaly detection and then also we are able to detect different type of local anomalies. So one specific type, I will go deeper. And that one is, I'm gonna show that type first. That type of anomaly detection is more when the baseline actually gets shifted. So in the time horizon, actually this was for few hours, this baseline got shifted and the key challenge there was that you just want to detect the shift in the baseline. You don't want to keep saying that these all are anomalies. So how do we figure out this change detection? So this change detection problem, we go back and pick up one of the fundamental theory in change detection theory, there is something called Pages Test, Pages Test, and we modified that. So let me first give an underlying concept of Pages Test. It's all about coming up with a statistic which is tracking the change in the process and this signal is actually embedded in the noise. And you kind of hypothesize as a binary hypothesis problem and not only one change point, but it could be whole bunch of change points. And when this log likelihood ratio of this alternative hypothesis to the null hypothesis crosses certain threshold, you kind of raise an alarm. Now, the beauty of this algorithm is that there is no assumption on what PDF you should really pick up, you can pick up anyone. And as long as you are coming up with some kind of a closed form, and even if you are not able to do a closed form, you can actually solve that. So I have solved that in various domains where actually these probabilities were coming from here Markov model, and we were doing detection of some suspicious activities. So the message is that when we applied this kind of change detection techniques, you can declare a detection and then you don't need to say that hey, that everything here is actually anomaly. You can see here the anomaly is only the first few points where you have declared the change and you have detected the change. Now that was about the local anomalies. There were whole bunch of other techniques. I'm not gonna go deeper into it, but feel free to have a dialogue with me after my talk. I'm gonna give a flavor of the global anomaly detection. Think, couldn't find time to make a nice VBD, just hand-drawn it. So here the idea is that you have a sort of a real-time data coming in and you want to detect the periodicity and you want to detect some kind of a seasonality out of it. And in order to do that, of course, you need to down-sample it and that's where the down-sampler is over there and you kind of make different kind of models. You make hourly model, you make daily model, and so on. And the underlying models could be anything. Could be your ARMA, ARIMA, or whatever is favorite caramel filter, the way you want to formulate. And then you store those models and when you want to predict, use those model to predict on them and raise anomalies, either generate anomalies which are using the seasonality in the data or you suppress anomalies. So ultimately, what you get is anomalies, but you don't need to send alert for each one of them. So you kind of do a control over there and give only a population of anomalies to the IT admin. So what I just talked about, I think the key message is, in these IT infrastructure domain, what we have seen is that if you use a combination of techniques which some of them are looking at more at a faster time scale and some of them are looking at more at a slower time scale, and then when you fuse their results, you get a better outcome on that. That's the one message. And the second message is you don't need to raise alerts for each anomaly. So that's more of a domain knowledge. How do you bring the business or a domain knowledge and cook that back into the system? And there also one key input could be that if you are able to get something out of the IT admin that he is doing an action on it, you can further take it and do some kind of a Bayesian formulation of it and you can further tweak your thresholds to give lower false alarms and better detection accuracies. And last but not least, there are a lot of nuances in doing a tweaking of hyperparameters and the thresholding techniques and so on. So there, if you don't have a ground truth data, what our experience is that use multiple techniques and use few of them as a lower bound and few of them as kind of upper bound and then see where your system is behaving. I think last but not least, I'm very, a lot of thanks to my colleagues and my team members here, Nitin, Ashish, Roshni, Chaitanya and they all are here and my mentor Raj. Think open for questions. Hello, I'm here. Hi, thank you for the nice talk. I'm curious, is this in production and what is the impact it had on the ops team? Has the head count reduced, stuff like that? Yeah, so this is a product which we are building and we kind of did the internal testing on in the data set which is get generated on the mainframe and all that. So we did a testing on that and so our results are pretty good on that. We haven't shipped it yet. We just have time for one more question. Hello, yeah. So in problems, I have worked on the single-variate anomaly detection is kind of easy but when you multivariate anomaly detection becomes difficult, the interaction between the variables. So those are more subtle anomalies which we are not able to see with simple statistics. And have you worked on something like this and how do you come up with probability distributions, multivariate probability distributions through which you can calculate log likelihood and say that a particular point is an anomaly? Right, so I think we went in this way first. The reason for that is it's not about, I'll definitely make a comment on the multivariate as well. We took this one first because we need to give interpretable results to the IT analyst. So they wanted to have some deeper insight. When you are saying alarm or when you are saying alert, what does it mean? Now if I can, I mean the team data science team can figure out that we can take the whole bunch of variables and do a PC on it or even do a robust PC or whatever. We can pick up the state of the art to really figure out which features to take and then do a detection on that. But ultimate outcome was that there is something what analysts want is, or the admins want is that something which they can also interpret. And that was sort of a, so you can think of like, this is sort of a minimum viable product requirement. So and feel free to connect with me. I can give you a lot more insights on a multivariate. Thank you, Satnau.