 next session and our final one for today will focus on a specific issue in this regard unsupervised real-time anomaly detection. Here to explain we have two data scientists from Telefonica, Aitor Landete and Pablo Mateos. Welcome Aitor and Pablo, are you there? You are, that's great, how are you? Good with me. How are you? Very well, how are you? Thank you. Okay, so thanks Nicola for introduction. You're welcome. I'm going to start my presentation to share my screen. So, go ahead. Yes, okay, let's do it. Are you seeing my presentation? Yes, Pablo, we can see it. Go ahead, everything's fine. Thank you. Thank you, thank you, Nicola. So, first of all, as is our first time here, so thank you for considering our talk for this event. We have like to present this talk with audience, but due to the current situation, it's a bit, we are going to try to do our best online. At least we hope to explain everything as well as we would do on stage. So, let's do it. So, good evening everyone. Welcome to our talk. First of all, we are going to present ourselves. We are Aitor Landete and Pablo Mateos and we work at Data Scientists as Nicola said. Telefonica, specifically in IoT and Big Data Telefonica Tech. Our role is the developing tailor-made solutions according to business client needs. We are acting as a consultant agency. So, one of the projects where we were involved is one related with real-time anomaly detection. And today we are going to present the topic about real-time anomaly detection and root-cut estimation. To focus on this kind of topic, currently a growing number of business problems are based on real-time metric analysis from credit frauds to predictive maintenance. So, right now we are moving to an era where all sensors, devices are connected through internet. For this reason, it is crucial to refine data analytics in real-time to reach all the sectors with more advanced and cutting-edge technology. For example, smart cities, industry, 4.0, or smart healthcare. So, we are going to see in our presentation. In this talk, we are focused on supervised real-time anomaly detection. We are going to present an supervised real-time anomaly detector based on an LSTN neural network forecaster in addiction to this. We are going to present an automatized way to detect the root-cut of such anomalies in case we will have the RFID post-up from different metrics. For the end, we focus on these anomalies points that occur simultaneously on a subset of all these monetized metrics. In order to estimate these root-cuts, we analyze correlation between the metrics we are going to study that belong to that subset. We are going to do this with different approaches. For example, cosine similarity, correlation of tokenized metric names, et cetera. A sample of these situations in which this approach can fix. For example, new releases of an app whose metrics are monitoring, are changing when you release a new feature, transportation services, et cetera. Altogether, give us a common future for all these anomaly points which made toward the origin of anomaly behavior detected. During this talk, we are going to review the context of anomaly detection. In particular, in real-time, we are going through our solution where we have results we have obtained and what kind of root-cuts we can assign it and the next step to improve our system. Let's start. First of all, we are going to explain in an informative way what is anomaly. In this case, anomaly is a broad sense. It's difficult to define because it's not exactly this term how to define because it's changed from one domain to another. But in general way, we can say that anything that is anomalous is any instance or sample that not follow the collective common pattern. Therefore, it's easy to separate from the rest of the data. Besides all these, we can also differentiate between anomalies and outliers because the former of the old domain considered abnormal included noise. However, anomalies are a special case that has a significant information of interest. This is important for us because we can explain the anomaly with root-cuts analysis. How we deal with this kind of data? Well, there is a concept that is anomaly detection. Basically, anomaly detection is the process of finding patterns in the data that don't complete the expected behavior in the rest of the data. In other words, it's about capturing an isolated anomaly from the rest of the normal data. What kind of applications can fit anomaly detection techniques? What type of applications are they useful? Well, the detection anomaly is what we use in many sectors such as in banking, in credit fraud, in cybersecurity for interaction detection or in healthcare, detecting medical anomalies, for example, such as the heartbeat, the transient heartbeat, as you can see here in the image. Before to deep dive in anomaly detection, we have to differentiate or to explain the main aspect of anomaly detection problems. For anomaly detection problems, there are several aspects that must be taken into account before embarking on the tax. These aspects are definition of normality to know what is normal and what is not. This will depend on the constant of the domain of study, characteristic of the data, depending on its nature. One approximation or another will be used. Type of anomalies, not all anomalies are the same of if then depending on how we define them to have their own features. What result do we obtain by detecting anomaly or rather what we do to define that a sample is abnormal? Not. Finally, when we have detected the anomaly, we need to verify if our system is doing it correctly. For this, we need to have a series of metrics, a series of performance evaluation metrics to evaluate, to test, to assess the behavior of our system. Defining the normality is a complex context because it depends on the business context that is being working on. As I mentioned before, anomaly detection techniques provide patterns to determine those samples that are not normal. Therefore, it's important to know what is normal and what is not. It is necessary to have a balanced trade-off between all the nuances of normal behavior and not to be too generic with incomes and these every nuance of normal data and treat like as anomaly. Let's illustrate this with an example. The main issue with anomaly detection is that it's an ill-defined problem. It's necessary to carefully observe the whole picture of the data to determine with high confidence what is anomaly or not. In this example here on the right, the image on the right, we can see clearly that it would isolate different parts of the signal, the consideration of what is an anomaly change. For example, in the first red square on the left, the big one, the peaks could be anomaly, but if we compare with the rest of the data with the red square in the middle, we can see that this anomaly, the consideration anomaly change. Also, it's a cure with the last one. Continuing with the important aspect of anomaly, we have defined what is the normality. Defining the normality is important. We have to have in consideration to take into account the characteristics of the data. Data is normally described by a series of attributes, like in table data. In anomaly detection, we can treat problems as a univariate approach that with a single attribute or multivariate approach. Furthermore, this attribute can be quantitative or numerical or qualitative or categorical. Type of anomalies. We also have to consider what kind of anomaly. Knowing the type of data that we can face, we must look at a complete picture of this and we go to the types of anomalies. Anomalies behave in the same way of our considered similar. By definition, there are three types of anomalies, point anomaly. When an instance is compared with the rest of the instances, leaflet group is considered point anomaly. We can see clearly here the clusters, the layers of anomalies don't belong to any of the clusters. Collective anomalies is a collection or related anomalies are anomaly compared to the rest of the data. Here we can see the example of electrocardiogram and see how the heart rate stopped for a period collectively. We are facing a failure at an anomaly. The last one, contextual, is a instance in a particular context behaves unnormally. For example, the temperature according to a period of a year, a season of the year, and in winter it's great to find very high temperatures. Ultimately, the goal of anomaly detection is to identify whether the anomalies are in the data. For this, we can use two different versions, two different approaches. Through a score, calculating a metric for a system, for the example, it can be a probability, a distant account, then by defining a threshold, we can see which are considered anomalies or which are not. On the other hand, we have the labels, we have data that has associated label indicating whether they are anomalies or not. We must simply implement algorithms that are capable to discriminate between the labels that are anomalies or those that are not. Continue with this. Depending on whether the data is labeled or not, we must choose among a great variety of machine learning techniques. If we have historical data, we can use supervised approaches. If not, we will need unsupervised methods to discover anomalies pattern. This is the key to real-time data sources like IoT sensors. Among all the great variety of techniques used for anomaly detection, we can find here this diagram. Some of them, which area of the family, for example, the most basic ones, such as extreme value analysis to the most group in one, near and near course, the latter will be the ones we will use in our solution. Finally, the final aspect to consider in anomaly detection is to take an account is the performance evaluation. For unsupervised cases, it's difficult since we don't have any label, but we must do an exploration study of the pattern associated with abnormal samples. For unsupervised cases, we will use no metrics in clarification test, such as confusion matrix that makes sure the rate of positive and negative classified well and badly. Other metrics such as accuracy, specificity, precision or recall. Finally, we can use also rock out cube to test the trade-off between the rate of positive versus the rate of full positive for different thresholds. We have checked all the aspects for anomaly detection, and now we are moving on to real-time case. What is important real-time anomaly detection? Before navigating the world of real-time anomaly detection, let's take a look at the difference between batch and versus real-time processing, because it's important to have this idea. You can see here in the diagram, we have two different versions for processing the batch and the real-time processing version. We use batch when the reaction time is not critical, the time is not a critical factor to explore the anomalies. We can collect a large collection of data and apply any algorithm to detect anomaly in that period. There's no rush to capture anomalies. We can iterate over and over with our algorithm to improve its accuracy. With this methodology, we will obtain fewer positive because we can iterate our algorithm to improve it, but it has also some drop-wad that is less scalable when you have a lot of data and competitionally more expensive. On the other hand, we have the real-time. Here, we don't need to see the entire data stream. The system continues learning. We don't need a manual supervision. The problem is that currently almost all material learning algorithms are designed to be in a batch way. This represents a great challenge when generating new solutions to the real-time case, because the system evolves and the data behavior changes over the time. Users, for example, behave differently, navigating from an online website. This is the purpose of real-time or online material learning algorithm. It must be taken into account that we must build a system capable of detecting new anomalies without having programmed thresholds. They are capable of adapting over time because the system evolves and the data behavior changes over time. Finally, for real-time systems, it is necessary to take into account that one of the obstacles that we must overcome is early detection. We need to anticipate when detecting an event. This process is quite challenging. Right now, it is one of the main zombie blocks in real-time systems. What is important? It is important because the massive increase in streaming time series data is leading to a shift to a real-time anomaly detection. In order to estimate the new normal, imagine right now in the current situation, it could be used in different and multiple sectors, from sales marketing to supply chain manufacturing. Every off-stage of this business requires sufficient information to adapt their strategy and maximize the productivity. Finally, IoT is a growing technology and the projects that are made in the future are increasing. We can see that this is a very conducive field to apply this type of system. Let's move on first to time series data. We see the aspect in a general way, but we are going to focus on time series data. In real-time detection, the time component is essential variables. In this case, the type of data must be treated differently from the samples in the past, for some non-tabular data. A time series data is nothing more than a series of observations taking over a time range, usually with similar intervals. We have to treat this differently because the time series, normally, typically, they have a series of components that are trend, seasonality, or cyclic variation, and also noise and irregular variation. This component, you can see that can be illustrated in a simple way in the following images. Also, in terms of time series, depending on the type of the trend and seasonality they have, they can consider also multiplicative and additive. This is because each component affects the final time series in a multiplicative or additive way. We have to consider also to take into account these considerations. In order to provide a good real-time anomaly detector, we have to consider these five points, the time list, the scale, the rate of change, and consciousness, and the definition of incidence. We have answered our question with each bullet. The time list is referred to how often does a business or company need to know that something is abnormal, its periodicity, scale is how large the data is going to be handled, and rate of change, the data evolves over the time, or otherwise, it is static. There are few variations. Consistency is about explaining an abnormal anomaly and do we have to take into that cloud a lot of metrics, multiple metrics, several metrics to explain the context in which the event happened. And finally, the definition of incidence, if we already knew the anomaly, we have prior information, we are able to define them and later categorize them. So if we move to time and scale, as I said before, when something abnormal occurs, any business needs to know rapidly to act correctly. So currently, most of the company needs real-time methodology, online, or real-time algorithms process the data sequentially. They need this input to analyze to use it to improve the predictor in the next step. They have the advantage that they make them escape better, however, they tend to be more prone to false positives. So we have to consider the time list and the scaling in this case for real-time anomaly detection. The rate of change is referred to that the environment changes when an app or online website, an online business, they launch it, they launch a new feature, the user behavior changes because the patterns of the users also change. So we have to think about the type of business we deal with. For example, in automatic industrial processes, really, we have a very sharp change. The rate of change is important in its effects, which model is going to be chosen for detection on anomalies. Algorithm can be handled depending on the change in the data are needed. In the example, on the right, we see two time series where we see a very stable series over the time, however, in the image below we see on the red arrow how during a specific day there is a sudden change. So we have to be alert to be aware about this type of situation. The consensus means the system takes a lot of multiple metrics, taking into account different metrics to have a holistic version. Only by combining metrics we are able to know what is happening or what not. Analyzing them individually may not clarify their full cost of the problem. Therefore, there are two ways to treat this aspect, univariate and multivariate. It would come, as we have said before, two ways to face this problem. Here we present a third solution that means it's proposed that meets both the hybrid approach. You learn from normal models for each metric, combine anomalies to a single incident. If the metrics are related, it's scalable, may interpretation from group anomalies, combine multiple types of metric behavior, and start methods for discovering the relationship. In the case of a definition incident, it's the same. We have proposed another hybrid approach in which it has a mixture between a supervised method and a supervised method. You see a few labeled examples to improve the detection and supervised method converting into a semi-supervised using unsupervised detection for unknown cases and supervised detection to classify already unknown cases. What is the typical process in learning normal pattern behavior? Anomaly detection helps companies to determine whether something changed in the normal business pattern, as we have said before. The three main steps are we calculate a statistical test, if the sample is explained by the model, the normal behavior is modeled by a disallowed element, and then we indicate if the sample is normal or not. From here, it will continue my workmate, Eiter, explaining some techniques and our model and the result of things, and also the root code analysis. This is your turn, Eiter. Thank you, Pablo. I hope that now everyone is on the same page. Now we are all excited on fast-growing areas where we can obtain many interesting results. Now let's jump to the meat of the talk. Pablo told us an excellent introduction where he was discussing about the importance of real-time anomaly detection and how we are going to detect these anomalies. But now let's think about something. When someone is talking about detecting anomalies by professional deformation, we usually go to clustering algorithms. Now let's recap a bit the different anomaly detector algorithms, and now we are going to go to the meat of the talk. First of all, we can detect anomalies by clustering algorithms, but this is kind of not suitable. Why? Because clustering algorithms are not sensitive to time. They cannot infer from the context of what an anomalous point means. One way out to this problem would be to generate time classes. Sorry, Pablo. Go to the... It would be to generate time classes. These time classes we can give to the different data points, we can give a sense of time. We can generate time class that would be, I don't know, morning, night, weekend, whatever. We are interesting to analyze. And then we can just perform a clustering algorithm where we obtain the anomalies in these different time classes. An example of this would be to use isolation forest, one-class super vector machines, etc. Let's go, Pablo. A more refined way to detect anomalies in time series would be to use auto encoders. Auto encoders are internet to compress the time series metrics into a lower dimensional latin space and recrust them. Examples of that are LSTM-based auto encoders and also variational auto encoders are examples in the literature that would be like multi-channel convolutional neural network encoder and LSTM decoder. But anomalies in this case are spotted in different ways. It could be done by reconstruction error or probability reconstruction or by clustering in the latin space. These are the two different ways where we can spot anomalies with these kind of detectors. Pablo? Okay. And this is the king of the anomaly detectors that are time series forecasts. These are the most widespread methods. And the concept of this type of algorithm is quite simple. It's just to forecast the future and to compare our predictions with real time points that we are receiving. Okay? So this would be our preferred way to spot anomalies in real time. Okay? So examples of these are LSTM group-based forecasters, one-dimensional convolutional neural network forecasters. Also, we can use classic algorithm like Sareem, our whole winters, and reversers of any kind, like, for instance, HG boost of light UVM. Okay? So let's jump. Okay. So how do we spot anomalies with this type of algorithms? Okay. The first naive way that someone can think about is to set a hard threshold. Okay? That it will be an upper and lower admissible value. Okay? And everything that goes above these thresholds will be an anomaly. Or even it would be to, in a more refined way, it would be just to take the mean and standard deviation and everything that goes up and down of these kind of thresholds would be the, would be the anomaly. Okay? A more refined way, we can assume that the point is anomalous if it's a beyond three rolling standard deviations, okay, of the rolling standard deviation. But there is a better, in our opinion, it is a better way that this will be to use confidence intervals. Okay? Where we make predictions, but these predictions have a confidence interval where we are sure about the prediction of the forecasted predictions. So any point that is outside these bounds would be considered as anomalous. Okay? So Pablo. Okay. So our solution. Now let's going to talk about serious things. Okay? So let's go to the next one. Okay. The thing is that we are providing a method to spot anomalies in real time, okay, in an unsupervised way. So if it's real time unsupervised way and it's automatic, the thing is that we have to handle different patterns in time series. Okay? So in a large dataset, it's very probable to find different types of behavior of this time series. So is it possible to fit them all in a unique model? And we will see how we handle this. Okay? So, okay, also the thing is that as data is streaming in our model, the thing is that we have to detrain this and remove the decisionality of this data. Okay? And we cannot do this by hand. Okay? Because it will be nonsense. It will remove all the purpose of this tool. So to do that, we have to consider the seasonal pattern in order to avoid false anomalies. Okay? And knowing the seasonal, let's go back Pablo. And knowing the seasonal pattern is possible to detect the anomalies in that way. And also we can see that we also have the trend, the decisionality, but also we can have combined decisionalities. Like in the right-hand side plot, we can see that we have a decisionality pattern. And on top of that, we have another decisionality pattern. Okay? So the thing is that we have to remove all of this in order to ingest it in the model. So let's go to the next one. Okay. So we have seen that a removing trend decisionality is crucial for time series analysis. Okay? To ensure a correct performance when predicting what is normal and what is abnormal. Okay? So what do we do? We do this process in an automatic way. Okay? First of all, we have to detrain the data. How do we detrain the data? Okay? So we have a, we have our time series metric. Okay? So we remove the central part of this time series metric. And what we do is to interpolate between the first part and the later part. Okay? And then we can spot whether the data has trend or not. Okay? And then doing that, if we, if automatically is detected that there is a trend in that data, we detrain it. Okay? We just remove it. But there is a second and most complicated part that is to detect decisionality and to remove it. Okay? So how do we do that? Okay? To do that, we just, okay? Depending on the ground relative to the data, we select a predefined periodicity band. Okay? And we obtain the ACF plot. Okay? Then what we do is to iterate over all, over a predefined threshold. Okay? Okay? And we find all the spikes that are above a certain threshold. Okay? That is the threshold that is given by the ACF. Okay? Essentially. Okay? So what we do is to measure these spikes, the, the, the difference between these spikes, the number of lacks that are there. So doing that, we are able to spot the, the, the most significant, the most significant periodicity bands. Okay? And not accordingly. Okay? And then remove it. And, but the thing is that we can, we do that and we are kind of sure that this is done perfectly. Okay? And we are pristine. But the thing is that always there is a test, okay, that we have to pass. Okay? Automatically, the, the, we have to pass these three tests. Okay? The augmented DT fuller test, the KPSS test, and the kind of enhanced test for seasonality. Okay? So we do this process automatically and, and the test has to be passed. Okay? Perfect. So now let's go into talk about the anomaly detector model that we are using. Okay? We have talked that, we have said that what we are doing is kind of the trend and this is, and removes seasonality from the data. Okay? And now the data is perfectly prepared to be ingested in the model. And what model do we use? We have seen that there are plethora of models. What do we use? We know, and it's well-known in the, in this sector, that there is no preferred way to spot anomalies to do forecast. There is, there is simply speaking, there is no such a preferred model. Okay? So what we use as a preferred model for us is to use an ensemble model. And this ensemble model is a in the, in the, in the clusters, as you boost and in the GBM on profit. Okay? We have all these models and these all models form or a model. Okay? So let's go to the next one. But you have seen that there are a bunch of models and a bunch of models imply a bunch of computational cost. And this is not good. So the thing is that this ensemble model is the one that is offering the best results as we are going to see. But what happens that there is no freelance here? There is, there is a, it is computationally costly. Okay? And it may be not suitable for all, all cases. Okay? So matching restrictions, batch restrictions, etc. There are many restrictions that can avoid us to use this kind of model. Okay? So what we did is to develop a lighter model and alternative. Okay? That is working kind of okay. Okay? So this model is an LSTM based forecaster, but it's obtaining its confidence interval with a stochastic dropout. Why is stochastic dropout? Would you say, why are you using this? Okay? The thing is that to you to obtain a confidence interval in inference with LSTM forecasters, we have to activate dropout for inference. Okay? So the thing is that if you set a predefined dropout in these layers, the thing is that we notice that what is happening is that you are kind of biasing the width of the threshold or width of the confidence interval, because let's think about this. You set a threshold of 0.2, 0.3, 0.4, I don't care. Okay? And you just compute the confidence interval. The thing is that if you just jump to another threshold that it would be bigger, for instance, you will notice that this confidence interval is larger. So that makes you to be in a kind of a tricky situation is what threshold do I use? I am biasing my anomalies. So we notice that using stochastic dropout, which is to put random dropout rates for the layers and bootstrap or results for a large amount of iterations, we obtain some kind of a stable confidence interval for predictions. So let's go to the next one. Okay. And now you may be thinking, okay, how are you training your models? Okay. So there is a question that is, do we train a model for each metric? This is kind of not scalable. Okay. This is if I have two metrics, three metrics, this could be okay. But if I have a thousand metrics, I cannot do that. It's just impossible to do that. Okay. But the thing is that one would say, okay, but if I am doing this, I have a good performance. It's just my predictions are kind of optimal. But the other way around, the other side is, do I train a single model for all the metrics? This looks better in terms of scalability. Just I have a gigantic model that is making predictions about, I don't know, these thousand metrics, but this is not kind of okay, okay? Because the performance will drop dramatically. So fortunately, there is a midpoint that this is the thing that we are using, which improves scalability while committing with performance. Okay. So we go to the next one. What do we do? What we do is to package metrics that are kind of similar in some way and put it in the same model. Okay. What we use is to use the package TS features. Okay. And this packet was done in R and it has an Python implementation. And the thing is that what we are using, this is something that we are studying because this is something that is common sense, what we did, but we want to refine this even more. Okay. What we saw is that using metrics that are homogeneous in terms of seasonality, improve model accuracy if they are trained together in the same model. Okay. So this is one way in order to improve scalability, okay, to have less models and to also improve accuracy. If we go to the next one. Okay. So now let's go to the results. You will be saying, okay, these guys were telling me some stuff about the time series metrics and anomaly detection, but what do they have? Okay. So let's go to the next one. First of all, before digging into a complex case, okay, let's go into focus on the most simple case that we can handle, that is a univariate case. Okay. We have a single metric and we want to spot the anomalies of these metrics in an unsupervised way. Okay. So in order to do that, we used the data set that is called the Yahoo! Synthetic and Real Time Series data set, okay, that was provided to us by the Yahoo! Web scope program. Okay. And the results that we are going to show you is correspond to the A1 benchmark, okay, data, which consists on a 67 different univariate real production traffic Okay. But what is going on? The thing is that these anomalies are labeled. So we train our model in an unsupervised way. We spot the anomalies and then we are able to compare with reality. Okay. So using a label data set allows us to compare and to see if our model is kind of good. Okay. So these 60 different univariate time series have different seasonality patterns, different anomaly types, different anomaly balances. Okay. So, because they are, for instance, there are metrics that they have like 200 anomalies over 1400 different time stamps, and there are other metrics that they have no anomalies. Okay. What are the combined results? For the ensemble model, we obtain a recall of 0.92. Okay. So we are almost spotting all the labeled anomalies. Okay. The thing is that, and this is a course for anomaly detection in if my forecasting is the amount of false positives that we encode. Okay. So the thing is that in our case, we obtain a precision is dropping, okay, the 0.92 case. It's dropping to 0.82. That is kind of okay. Okay. Because we compare results with other results in the literature that unfortunately there are not many, but we saw that using all the type of algorithms that were obtained like a 0.60 something. Okay. So this is a massive improvement compared to these results. For the lighter case, okay. No, let's go to the back for the lighter case. Okay. The light model that it will be the LSTM based forecaster. Okay. The results drop. Okay. That is a recall of 0.85 and a precision of 0.74. But the thing is that this model is faster and lighter. Okay. So that's the tradeoff that we have to face. Let's go to the next one. Okay. Now we have some remarkable examples. Okay. And let's going to make some, to be kind of proud of these examples. Okay. This is not what is going on in all the metrics, but these are kind of the ones that makes you proud. Okay. So these metrics, okay. That they will be correspond to the test 1.csv. Okay. From this data set. There are only two real anomalies. Okay. And we spot with our model with the LSTM based forecaster. We spot only three anomalies. Okay. So we have a recall of one. Okay. That will be impressive. A precision of 0.66. Okay. That's kind of a pity. And an fcore of 0.8. What is going on? If you see this plot, you see that the two positives are kind of hidden. One, if someone is looking this kind of metric would say, okay, this high spike that is around the 1,200 timestamp, for sure, that's an anomaly. And no, that's not an anomaly and also our model is not detecting that as an anomaly. So that's kind of remarkable. The pity is that we detect this kind of spike that is around the 700 timestamp that is, unfortunately, a false positive. Okay. But for us, it's kind of an impressive result. And next one. Okay. And now let's go into two better results. Okay. And this is the ensemble model. Okay. That this is the flagship of the anomaly detector. Okay. The ensemble model in this case, this case has a 227 label anomalies and we found 249. And the recall is impressive in this case. It's 0.98. So we are almost detecting in an unsupervised way, almost all of the true anomalies that are in this dataset. The precision, as we are seeing, the precision drops because we are obtaining some false positives. Also, we kind of think, we cannot blame Yahoo. Okay. We cannot think that there are some kind of anomalies that are not spotted that resemble like an anomaly. Okay. But we cannot complain about that. So, okay, this is the univariate case. And let's jump to the next one. The next one, this is kind of more complicated. Okay. This is the multivariate case. So now we are using a more complex dataset that consists on 51 different sensors. Okay. And these sensors are called sensor zero, sensor one, etc. And in this case, they are coming from an open dataset that is called the PAMP sensor dataset. Okay. It consists on 51 different sensors that are monitoring the water PAMP on a small area far from a big town. Okay. And this is kind of the thing that we are hoping to apply this kind of model. So there will be an streaming data source and it's controlling some important thing that is the water PAMP from a city and spotting anomalies in real time. In this case, we have different labels, okay, which correspond to system failures and recovery stages of the PAMP, where the PAMP is working abnormally. So our goal in this case will be to predict in a super-spice way the anomalous behavior of this water PAMP and then try to see how it's working. Okay. So this is, we are not going to show you the results for all of the 51 sensors. Okay. So these are the predictions of our model and we see the anomalies and we see the predictions and we see also the real data that is hidden behind. What is going on here? Let's go back, Pablo. Okay. What is going on here? Here, there is kind of a problem. Okay. Here we are spotting more false positives. Okay. We are spotting more false negatives than in the other case that it was kind of amazing to show, but this is, it would be a kind of a real case. We're spotting anomalies that are not anomalies. Okay. Like for instance, there is a dot, it will not be crystal clear for you, but if you have the PDF, that one. That's a point that it's kind of an anomaly, okay, for all of the metrics, but it's supposed that this is not an anomaly. We don't know. Okay. So the thing is that here what we are predicting are zeros and ones. Okay. And the thing is that these zeros and ones are labeled and we are going to see now the results. And now in the root case estimation, what we are going to do is try to infer what is the working stage of these sensors, of this water pump. So let's go jump to the next one. Okay. Here we see that this is a kind of a drop in the metrics, in the evaluation metrics, compared to the other case. Okay. But it's still a kind of reasonable. Okay. So we obtained 89, 0.89 in recall. Okay. So we are almost spotting the 90% of the true positives. Okay. In a dataset that is gigantic compared to the other one, that the other one had only 1400 timestamps. And this one had, I don't remember, but they had a kind of a lot. And we spot 90% of anomalies in an unsupervised way. The precision at this going on, usually it's dropping because we are course with the false positives, but it's kind of reasonable. Okay. It's like a 0.8. And for the lighter version, okay, the listing forecaster with the stochastic dropout, we didn't show the plots, but the results are kind of reasonable as well. Okay. Around 0.7 in precision and recall. And the anomalies that we are spotting, it's in real time in an unsupervised way without prior knowledge. And this is important. It is no prior knowledge of the real anomalies. We think that this is remarkable. So let's go jump to the next one. Okay. So now we have seen about detecting anomalies, okay, in real time. But the thing is that are we interested in detecting real anomalies? Are we more interested maybe or perhaps in detecting, let's say, the cause of the anomaly in order to act? So the question that we have to ask ourselves is may we give an estimation about the origin of the anomalous behavior detected? So let's jump to the next one. Okay. So the importance of spotting anomalies is gigantic. Okay. And we can rely on that. Okay. But the thing is that if we really want to extract the actions from these anomalies, we require to give a description of kind of an origin of this anomaly in order to ensure a fast action response to remedy the anomalous behavior. So examples of these are latency critical task. Okay. That this is, we cannot neglect this like anomalous driving or augmented reality. Okay. With the latency is critical and the experience would be enormously changed if we cannot put remedy to these anomalies. Okay. There are other anomalies that are equally important that are related with welfare and emergencies. Okay. Such as the malfunctioning of some critical elements like electrical or water supplies that example that we were discussing before. Also security. Okay. Estimating the anomalous cause that is hidden in the anomalous behavior in severe security. Like for instance, we are monitoring some server. Okay. And we detect anomalous behavior. Okay. In this anomalous behavior, what is telling us? Is this something that it will be this is important to like payments or transportation are also other cases where we can discuss about this. So next one. So first of all, in order to give a meaning or an explanation of the metrics, we have to understand the metrics. Okay. So typically it's very hard to understand what is going on at a single metric level. Okay. So we have a single metric level. We have an anomaly. Okay. Let's face it. And that's it. But once we are dealing with multivariate time series metrics, we cannot do that. We have to extract meaning of what is going on there. Okay. So it is necessary to isolate which metrics are related. Okay. So otherwise the model could lead to unexplainable situations because metrics are completely unrelated. Okay. So we have to correlate metrics. And this is the crucial word that we have to tell in this part over and over correlations. So let's go to the next one. Okay. So the thing is that now in unsupervised root cause estimation, we have no ground truth about the origin of the anomalies. Okay. We just know that there's an anomaly. And that's it. Okay. So we must rely on correlations about the metrics which are anomalous simultaneously. Okay. So we have some anomalous behavior, but this anomalous behavior has to be done in all the metrics. Okay. More in a set of metrics. Okay. And the thing is that is to be able to correlate these metrics that are anomalous at the same time in order to extract knowledge from that. So how do we do that? So let's go to the next. Okay. We follow three different ways to give an explanation to these metrics. Okay. The first thing will be to a normal base similarity. Okay. We know we have a time series metric. Okay. And this time series metric has some values. Okay. Now we have past or anomaly detector. And now this time series metric can be converted into a vector of zeros that it is not an anomaly or once there is anomaly or zeros or anomaly scores. Okay. Whatever. The thing is that these series, okay, that are now vectorized in this representation can be clustered in order to put together the anomalies that the metrics that are similar in the behavior in terms of anomalies. Okay. This would be the first one, the first way to correlate metrics. The other one, and this kind of, we were thinking a lot about this and this kind of makes sense. Okay. It will not, it will not work always, but it makes sense. Okay. Why? That it will be text description similarity. What is this? Metrics have names. Okay. Maybe we have lazy names like sensor zeros and sensor ones, sensor two. Okay. But maybe we have real names like sales in Germany, revenue in Germany, whatever in Germany. Okay. So now these metrics that maybe are uncorrelated that they are describing different things, they are describing something that comes in the name, something that is common in Germany. Okay. I'm saying Germany, like I can say any other country. So the thing is the following that now we can, we can correlate also metrics in terms of the names. Okay. And find correlations among them. If we go to the next one and the third one, it will be normal behavior similarity. Okay. Normal behavior similarity. What this just saying is to find the metrics that they have the kind of the same shape in the normal behavior. Okay. So we can do, for instance, it will be an easy approximation that it will be a pattern based representation dictionary. Okay. So for instance, you have some pattern in some chunk of your cities and this pattern is changing to another pattern in a different chunk of the cities. We just feel a vector with these representations. Okay. And we just find correlations between these metrics. So let's go to the next one. Okay. And this is for unsupervised. Okay. For super semi-supervised or supervised root cost estimation. And this is, this will be the one that we are going to show you in these slides because for unsupervised, we didn't have data that will fit and will be impressive in the sense that we have data that the names, for instance, are sensor zero zero. Okay. And we're going to have any kind of knowledge of the root cause of this anomalies. Okay. So if I tell you that anomaly that is coming from sensor zero zero and sensor zero one are related, you will say, okay, whatever. Okay. So that's why we are going to follow a supervised way. But the thing is that if you have prior knowledge about prior anomalies, okay, you can just make a classifier. Okay. So you just go to a data point and you just pass it for a, in a classifier, okay, that classify metrics in a different area, whatever. Okay. And they are just giving you this, the exact knowledge about the origin of this course. Okay. And also there is a hybrid approach where we have, we cannot have, and this is this case, we cannot have a hybrid exogenous variables, okay, where we have, okay, like a holidays or whatever, and we just can correlate it. Okay. This kind of anomalous behavior with these exogenous variables. Okay. So if we go to the next one, these are the results, the results for the supervised case. Okay. The unsupervised case, you are free to contact us and we can discuss plenty about it. And we are also hoping to find a good dataset. Okay. In order to to show the real power of this. Okay. So in this case, what we had is the, in the water pump dataset, we have the historical database is gigantic. Okay. So what we did is to chunk it, is to take a chunk and this chunk, we trained a classifier on the anomalous labels. Okay. So these are the legibn features of the classifier. Okay. We jump to the next one. In this concrete dataset, the anomalous behavior that I would be the ones, okay, can be separated in two different anomalous behavior, one that is broken and one that is called recovering. Okay. So by doing that, what we are able to do is to spot for each anomaly that we detect to say immediately if this anomaly corresponds to a broken stage or to a recovering stage. Okay. And these are the future importance of legibn in these two cases. Okay. So what are the results? And this is kind of blurry. The results are the following. For instance, remember about the isolated anomaly that it was seen in the plot. Okay. That isolated anomaly that it was a false positive thanks to the classifier to the post processing classifier. Okay. Because this is an special case where we have a supervised prior knowledge, the thing is that thanks to that classifier, we are able to discriminate false positive and to improve or precision. Okay. So indeed, that anomaly that it was a false positive, once it goes through the classifier, it is a regular point. Okay. So the flag about the anomaly will not be raised. Okay. On the next one. Okay. In this case, we have a bunch of anomalies. Okay. They are the red band that we have in all the sensors. Okay. Corresponds to almost all of them are anomalies. Okay. So thanks to the classifier, these anomalies are coincident in time. Okay. For these sensors and other ones. Thanks to this anomaly, thanks to this classifier, what we are doing is just to plug the data points that are anomalies at coincident times and put it through the classifier and surprise what we are obtaining. What we are obtaining is that we are obtaining the anomaly in real time and thanks to the classifier, we know that this anomaly corresponds to a broken stage. And then the rest of anomalies that we are detecting correspond to a recovering case. So all the true positives that we are obtaining have a precise characterization where the anomaly is correctly defined. Next one. So, okay. Next steps. Next steps. And this is something that we want to focus on. Okay. We have a bunch of missing links that we have to fix in this case, but this is kind of an ambitious plan that we have. We want to build an anomaly detector in real time that it will be used for all kind of situations that are coming from IoT. Okay. And it will be, and this is the thing that we have missed and we have not talked about, that are image real time anomaly detection. Okay. So the thing is that we now are able to detect anomalies in real time. Okay. Because these models can be put on the edge. Okay. For inference, we train this, for instance, the ensemble model, this gigantic model, we just train it on the cloud and put it on the edge and just take real time inference about anomalies. Okay. And this is kind of gigantic. And also we are obtaining a root cause about these anomalies. But the thing is that now we are missing images. Okay. We cannot deal with images. And this is the future steps that we are hoping to follow to use a convolutional neural network to extract features from the video inputs and then pass it through an LSTM based model or a forecaster model or some model or ensemble of model that are sensitive to time. Okay. There is a bunch of work to do, but this is our hope for the future. Okay. So let's jump to the conclusions. Okay. So the conclusions are crucial. Okay. If I have an anomaly in the number of payments that my retail company is receiving, now I can be in trouble if I, if I'm not able to spot it on real time. Okay. So we have developed and this is important an automatic tool that is able to detect anomalies in real time. Okay. That is doing the whole thing in automatic way. The pre-processing, the future selection how to select them, the features that go to the same model to tend inference to, we are able to split the, the, the metrics in different models in order to improve the scalability and to improve also performance. The thing is that in addition to that, our system can also obtain insights into the origin of the anomaly. Okay. Because we are able to correlate the anomalies that are coincident in time. Okay. And we're able to give an explanation. Also, the, the other case that is kind of, I, I did like, okay, that it would be to have a prior knowledge of anomalies. Then all these anomalies will be go through a classifier and then to give the exact meaning of this anomaly. But what are the future steps? The future steps are to look for a way to reduce the number of false positives. Okay. To increase the precision. We are kind of happy or kind of okay with the recall for model, but the precision is kind of over a 10% less than recall. Okay. So our main hope is to improve it in and to increase or, or precision. Okay. And, and by, and to do that, what we have to do is to try to obtain a more precise normal behavior. Also, we have to move to a normally detection system to the computer vision field. Okay. In order to extend this portfolio of use cases of this tool. Okay. On finally, and we are finishing and I promise that maybe we are out of time. I want to thank if you're interested in this topic, you have to go to the AI and analytic services. Okay. In Telefonica. Okay. Because we are doing here this kind of stuff that for us, we think that is pretty cool. Maybe people will think that is not okay, but we are doing this interesting stuff. We are doing things with a computer vision. We are doing things with IOT. We are doing things in kind of all the realms that you can think about. So I want to thank them. And also, we want to thank Alejandro Narniz, who was one of the guys who was helping us at the early stages of this work. And without him, this would be impossible. So thank you all of you. And if you have any questions, they are more than welcome. I thought Pablo, thank you so much. That was very enjoyable, very complete talk. It was very interesting to hear the difference between anomalies and outliers and then this very impressive tool that you've built. But you've been a bit naughty, I have to say. You've overrun by 25 minutes. Sorry for the extension. No, you're just lucky because we are the last one. You're the last one. I wasn't strict with you because we're late in the other room. And we gave you a little bit of extra time. But it was worth it for all of that extra information that we got. So yeah, we have a few questions for you. We'll have to be brief because we are running very behind schedule. Questions from Miguel Angel. Does the ensemble method use voting? Is the lightweight model available outside Telefonica? The ensemble is not using voting. The ensemble is minimizing the RMSE. Okay. And a question from Kenneth here. What is the computational cost of your present model in terms of computer memory requirements and training time? It's difficult to respond to this because we are in the earliest steps of our model. And the scale of our data is difficult to find a proper anomaly detection data set. So in the further steps we are going to do with this tool, we are going to consider the memory and the computational cost for the model because we are running with little data set because of the difficult of finding proper data sets. We are in the earliest stage of the development of the model. Okay. But all we can say is that the lighter version, okay, the LSTM forecast model is around 10 times faster in training. Okay. Thanks to you both. We are now very much running out of time. There are some comments here, some feedback, which is very congratulatory. Very people are very impressed with your model. So hopefully they will get in contact with you via the networking section on the platform and address specific questions to you there. So all that remains for me is to say thank you very much to both of you and hopefully we'll see you soon. Thanks to you. It was a pleasure to talk to you. Thank you. What's up? Thank you.