 It was built on the groups that were already doing research and little proof of contents at Telefoneca using Telefoneca's own data to see if we could improve our operation. And an old company that we bought called Synergic Partners that together build the Luka unit. In the Luka unit we still do what we did in these units as research but also what they did in Synergic as consulting and with a focus on building products and a portfolio of big data units and products in Telefoneca. What we want to show you today is one of those cases, a bit of research that we did on analytics but with an eye on building a product, a use case for the Luka portfolio. So Alberto, if you... Okay, thank you Alvaro. Well as Alvaro was saying this presentation targets one of the things that we are doing in research and production regarding one of the products that we have within Luka data unit. The presentation is going to be divided into four main topics. In the first topic we are going to show you why building explainability together with an analytic solution is important from the Telefoneca perspective. And in the second section we are going to show you the use case itself. We are going to briefly introduce where we are going to develop and deploy this model. It's an analytic solution for anomaly detection within call center received calls. These are the only two parts of the presentation that are going to be business labeled because in the third part, the biggest one, we are going to go in depth from the technical perspective on the solution itself and how it was developed. In the last part, in the where section, we are going to briefly introduce you the product itself from a global perspective, Luka comes both from the solution itself and from the software architecture perspective. So you know not only how our solution is built, but you can also have the picture of where it is deployed within the product itself. Well, why? Responsible AI at Telefoneca. For us, for Telefoneca, it is quite important for a long time to develop software products and specifically AI products that are responsible towards the society. And this is something that could be condensed in the five principles that we have published from Telefoneca. If you see the chart here, you see a survey that was presented from Harvard some months ago where you can have a global view of all the AI principles that have been published by different organizations, both from the academia, from the private sector industry, or public sector. We are here in this small section, it's Telefoneca. Telefoneca presented the principles at the beginning of, at the end of last year, beginning of 2019. And we have also published the principles themselves in the AAA false symposium a few days ago. So you have a scientific publication targeting and explaining in depth what are the principles for Telefoneca, the principles regarding responsible AI. It's going to be published in the final survey when it is published, but for the time being, you can consult it in the archive. You can look at it and look for the title and you can read the paper. Briefly introducing them, because it's important for the following parts of the presentation, the principles could be divided into four groups and the fifth group that simply extrapolates them to any business relationship and work we do with third partners because we have to be compliant with them, but any software product or any business partner that works together with us should be also compliant with them. The first principle is fairness. For us, it's important that algorithms do not threat unfairly groups that belong to sensitive groups that are labeled with sensitive attributes. We do not want to discriminate due to gender, due to religion, race. So it's something that we take into account and we audit algorithms to be able to comply with that. We also, for us, it's also very important that everything related to privacy and security by design regarding AI products, this is something that is not strictly new from Telefonica, because we have a huge legacy of work that we have done in the past regarding security, we have 11 paths, we have many organizations within the company that are experts in that area and many of the principles, well, many of the aspects that are condensed in this principle, targeting software products are also applicable directly to AI products. There are some specificities that are also considered like adversarial attacks and all that stuff that it's a security issue more related to machine learning, artificial intelligence, but in general everything is inherited from the company good practices that were developed before. It's also something that is applicable to human-centric AI because we develop products that are good for the sustainability point of view that contribute to the well-being, to the common good and it's something that it's also should be applicable to artificial intelligence, but the thing that targets and that is most important for this presentation is transparency and explainable AI. What do we want to achieve with that? We imagine a feature that we are trying to build right now in where the machine learning and artificial intelligence models explain the decisions that are reached. We do not want to work with black boxes, we do not want outputs that are not justified and explained in a manner that a customer can understand. Imagine that we were in 2001 in Space Odyssey, well even in the case that the system goes wrong and makes behaves, we want that it could explain the decisions behind it and the reasons behind it. I mean if it's going to try to kill the crew, at least explain why because we can act accordingly and maybe avoid the the catastrophic damage that came later in the movie. Of course Luca Combs is not so delicate and the misbehavior of the algorithms is not so dangerous like in 2001, but even though it's something that is important for us and that we are taking into account from the beginning. Introducing the use case, anomaly detection in a call center. Well Luca Combs is a huge product that targets different audiences and different scenarios, but the one that we are going to show you right now belongs and it's framed within customer communications or also known as call center. We imagine we have a call center with many services, with many lines associated with those services and that call center receives calls from every minute, every second, every hour. If we group those calls into specific time frames we can have the total number of calls that are received in the call center in the service of the call center at a specific time range. That's the information that we are going to use to explain to a final user of the application when the total number of received calls in the call center is anomalous and also why. It's something that belongs, the type of data strictly belongs to time series. At the end of the day it's a time series of on-evalid data where we have the total number of calls received according to different categorical variables that belong to a time series like weekday, the month, the time range itself, those kind of variables. So going to the main part of the presentation, how do we build our anomaly detection model and how do we incorporate explainability within it? Well the first thing to consider is why not to use a simple model? We are talking about anomaly detection using more or less complex machine learning models but the thing is that if we want to detect anomalies and obtain explainability at the same time we can use vanilla models that are considered white boxes that work more or less fine and that could yield results that anyone can interpret. If we use the Gaussian distribution anomaly detection technique as you can see in the image we can consider as normal or non-anomalous all the data points that could be fit within the Gaussian bell and the ones that are outside the tails would be labeled as anomalies. This is a simple way, it's quite widely used and the good part is that it's easy to interpret and it yields naturally visual results that could be interpreted by anyone. The thing is that there are two main cabits or issues that we should consider at this point. It does not work with a lot of types of data distributions. It's a parametric model, it's a parametric technique that assumes that the model itself, well the data distribution itself is Gaussian so it does not yield good results if this is not reached. Another thing is that it doesn't account for more complex pattern relationships. I mean at the end of the day it just classifies anomalous or non-anomalous data points. The anomalous points it's because they are too big compared to the data distribution or too small but what happens to more complex relationships? What happens if we have a multivariate case? I mean of course we can work with multivariate Gaussian distributions but if that's the case then the interpretation would be much more difficult. So the thing that we want to try to accomplish is why not use a model that has the advantages of being non-linear, non-parametric, that can achieve anomaly detection from any data distribution and detect complex patterns while at the same time not sacrificing the good parts of the vanilla model. I mean why not have a model that explains itself, that is visually intuitive. So let's build this. Let's build something like this. We start with a data frame after a lot of processing with the day, the time, the line, the service, the number of calls. Well on the client we have a lot of categorical variables and we have only one continual variable that is the number of calls. On the other ones are just categorical variables. So we want to do two things at the same time. We want to detect anomalies and at the same time we want to obtain explanations that matches or that could be classified in this explainable AI taxonomy that we have at the left of the presentation. I mean if you research the current scientific publications, the last trends within the taxonomies, well the kind of taxonomies that could be applied to our model in order to identify it, it's a model. What do we want to do? We want a model that has local explanations. I mean we do not, the user does not demand, in this case, do not need to know how the whole model behaves. It only needs to know why a specific data point has been labeled as anomalous. I mean if we have 50 calls in a Saturday in August we want to know if that number of calls is anomalous. We do not need to explain how the whole model behaves. So we want local explanations. The model that we build, it's also more specific. We are going to use the information from inside the model itself in order to build explanations. We are not going to consider it as an error call to query the information we want and consider it a black box that we do not want to unfold. We're going to treat it to use all the information available as possible. And the information and the explanations that we obtain should be human friendly, should be explained in a visual way that it's easily comprehended by the user and it should be also counterfactual. I mean in the case of anomaly detection it's widely important to explain not only why a data point has been labeled as anomalous or what features contributed to the classification of a data point into an anomalous category, but also explain what should have happened in order for that data point to have been labeled as not anomalous. I mean it's not important only to say well you have 300 calls on a Saturday in August and that's anomalous. Explain me also what should have happened in order to consider that number normal. For example if you have received 100 calls the number would have been normal. That's the kind of information that we want to include within our development technique. So we are going to use one class SVM model. It's an anomaly detection model that we can in an unsupervised way detect anomalies. So just to really introduce the model we are going to explain in one minute, that's right, what how a supervised SVM model works and one minute for unsupervised SVM. It's quite difficult but we're going to try it. Supervised SVM, a widely known algorithm for binary classification or recursion in which we have two categories and we develop a linear frontier to classify the data points in the classification instance into two categories. It doesn't matter if it's soft margin or hard margin, we have a linear frontier but we can improve the model and use nonlinear decision frontiers if we map the data points to a higher dimensionality where we have a hyperplane that is linear and that could separate those data points that could not be separated in a linear way in their original data space. But the equation that it's going to the equation that is going to yield the decision frontier do not need the information of a higher dimensionality space. It only needs the dot product. So the thing that we could do is define a kernel function related to some landmarks and use that information to calculate the part of the dot product and then infer the decision frontier in the original space. Well, one minute, more or less. So for example we have in the left we have the example of a kernel of a kernel radial basis function that is the one that we use also in Lukakoms and the things that using this kernel trick the computational cost is much lower. What happens for a one-class SVM? We do not have categories, we do not have labels, we only have a time series data and we do not have prior knowledge regarding what data points are anomalous and which ones are not anomalous. So the thing is that we need to, the thing that one-class SVM do is identify anomalous data points without prior knowledge information. So considering a hyperspace with a data cloud and with many data points the intuition behind it is it tries to separate the data points with huge density, the clouds with a lot of data points from the sparse ones and in order to do that it builds a decision frontier that it tries to be as far away as from the original coordinates as possible while at the same time separates as many data points from in two categories as possible. Considering those those things in the trade-off it defines the the decision function that identifies which data points are sparse cloud and are anomalous from the ones that are not anomalous. This method could be applied to any kind of data being categorical, numerical, continuous, as well as long as it's of course like any SVM model as long as it's future-scaled in order to not consider some features more importantly than others. So one thing that we should also think into account and that we have taken into account within the development of our solution is that well we are working with an unsupervised machine learning model but we need to do a fine-tuning of the hyperparameters and do a great search and this is something that is important for a real product because we need to have the the best model as possible. So there are many, well before introducing the the algorithm that we use within our solution let's introduce briefly the two hyperparameters that are important in this case. One class SVM takes into account two hyperparameters, the nu and the rejection rate. They are the introducing I'm going to briefly introduce how rejection rates works from an intuitive way. I mean rejection rate it's a hyperparameter that if it's decreased a lot it will yield a decision frontier that overfeats the non-anomalous data points and will yield too few anomalous points. So it should not be very low but at the same time if we increase it too much the thing is that we are going to have an underfeated model that will have that will be that will not represent the data distribution of the of the non-anomalous points will not be good fitted good enough for for those points and will consider too many data points as anomalous. The first case the case of underfitting the overfitting is represented in image 8 where we have a decision frontier that it's too fitted to the data points and we don't know how we do not have many non anomalous many anomalous points and the underfitting case is the the case where we have a decision frontier that do not represent good good enough the data distribution that it's inside it. We need a trade-off between the two parameters that it's represented in the image B and something quite similar happens with the with increase or decrease of the new parameter. So how does MIA's algorithm work it's the one that we have used because mainly it's efficient it yielded good results for us and at the same time it's quite easy to implement and to comprehend it's something quite important because we after reviewing the research for hyperparameter tuning in anomaly detection for one class VM there are solutions that are too complex to implement in a real product and this one is easy enough to implement it. So there are two things to consider first the inner points the inner points should be as far away from the decision frontier as possible to avoid the situation A to avoid overfitting and at the same time the exterior points should be as close enough as possible to the to the decision frontier in order to not leading the model to the decision to the scenario C because if we only take into account the information of the inner points we will always arrive at scenario C and we are going to try to avoid it. Taking into account those two things into a objective function we try to maximize the distance of the data points and for the inner data points while at the same time minimizing the distance of the of the exterior data points taking those things into account we compute the the value of the objective function and we have like this great search example for different combinations of of beams for the hyperparameters and we choose the one that yielded the the best result the biggest one this will help us have the best hyperparameters for the for our one class VM model and after that we develop the explainability part while at at the same time that we feed them all within the data. Let's imagine a situation with only two variables we have the continuous variable that we were mentioning in our univariate univariate data distribution we have the number of calls the total received number of calls and we have the weekday we have for example zero one depending if it's Monday Tuesday so we have that data that's on the left and the output of the algorithm will be the one on the left the output from the black box we have in orange the data points that are anomalous and in blue the ones that are not anomalous so the thing that we want to obtain is that we want to use and that we approach in our solution is using the the value of the decision frontier for the explanations to themselves if we compute the value of the decision frontier for each combination of categorical variables we can use that value as a pattern to explain why on a Tuesday a data would have been labeled as anomalous because we have that information and also since we have the distance to the decision frontier we can also know in a use case how how how much should have changed the number of received calls in order to not consider a point as anomalous or the other way around but so the pipeline the complete pipeline will be the one that we see in in these images first we have a we have our data points we have to fit them in in the model we well we fit them in our real business function model because it's the only one that works with the mes algorithm but for us the the this current trick is as I said before is good enough and we obtain a decision frontier like the one on the left after the feature scaling and and only processing technique application then what we do what we want to do is obtain as we said before the thresholds the values of the decision frontier that are between the that are that are for that appear for each combination of categorical variables there is some there is a bit that we should take into account here is that any well most of the most of the one class sbm algorithms being developed when in python using sk learn or being developed in scala using leaf sbm yield two results they yield which are which data points are anomalous and which ones are not anomalous and at the same time they yield another information they give the information of the distance to the decision frontier but we do not know if that distance corresponds to the value of the distance for the the specific combination of a categorical variable so we do not we should not use directly that value because it could be false we need to the thing that we do in look accounts is we extrapolate that value taking into account the distance and also taking into account the value of the anomalous anomalous data points themselves so if we if we add for example for the let's let's consider the scenario for for variable one a weekday one we have a non anomalous data data points and we have anomalous data points over over the threshold the thing that we want to do is we want to obtain the green value the value of the decision frontier in the middle the one in red but the one but using the threshold in the green part so the thing that we want to do is obtain the value they use the blue the blue value and add a value that it's not a middle value intermediate value calculated use it the non anomalous data points and the anomalous data points the anomalous data points closer to the decision frontier and closer to themselves and use that value scaled with the decision from the decision the distance yielded by the by the unsupervised model and in that way we have we can estimate the middle value for the decision frontier we do that for every combination of categorical variables and for every for for this continuous variable and well for the for the cases where when we do not have anomalous data points like the scenario two and for the lower thresholds and all and all that stuff the thing that we do is we use an offset derived from a business rule that we validate with our clients then we obtain the thresholds for each combination of categorical variables and since in at this current case we are only working with weekdays as categorical variables and the total received calls as numerical variables we applied the same the same pattern the same thresholds to each one of them so you can see that every six-day period is repeated and for new data points we could either way refit the model with all the data points or use directly these thresholds to classify the new data points as anomalous or non anomalous and also have the information regarding when regarding if the the data point is anomalous or not and what should have happened in order to change that the general pipeline is the one that we're using the image we can need to decide if we approach a non parametric or parametric anomaly detection technique over the data points and in case we use as this presentation the non parametric we do a preprocessing of the data we do the grid search using the means algorithm that we showed before we fit the model we obtain the limits that are applied to each combination of categorical variables we persist them and we apply them to identify anomalies in a defined period and we can either apply it to new data or we can refit the model and apply it to model that that the data that was used in the fitting of the model and for the next part alvar is going to introduce you where this technique is placed within luca comes and we'll give you a brief introduction of the product itself great thank you well as alberto said we could have used a simpler model like the one with the at the Gaussian and normal distribution and we didn't do not also because obviously there is data that doesn't follow that at that normal distribution but also because we wanted the challenge we wanted the challenge to find something that worked across every client on every client's data and also we wanted the challenge to use it on a real in a real product i'm going to show you a brief demo of what the product is and then we'll see a bit of the technical and how it works from the architectural point of view these products i'm talking about is luca comes what we wanted with this is to go back to the to the core of of telephonica which are communications and and use the client's data that they generate when when they work with us analyze it tinker it with with david play with it a bit and then extract insights that we then return to them so that they can use it and improve their their company or their behavior using right now one of the first of the first modules alberto very briefly talked about them this is the the one where we analyze a company internal organization of how they their employees use their communications using the mean average consumption of their organization or their or their departments you can go in and see the the behavior and the consumption of single lines within that within that department and this is all descriptive information it's very simply simple to to do this if you have the the information but we want what we wanted to add obviously you can you can filter here and see and see the the old layers what we wanted to add here apart from the from the descriptive information here you you're going to see that you can filter on the on the types of of traffic national you can see the the the behavior when a company has several sites you can see where are where are they calling and this is all description information that we that we wanted to improve and and to analyze adding what we call advanced analytics you see here the the the algorithm that alberto described this is the anomaly detection one of the funny things and the and the important things here is that we develop the the algorithm thinking on on call centers and on comes on consumer calls but here we're using it later to other use cases we're using it here to analyze the behavior of the of the department within a company it's very easy to to see what alberto talked about in this in this image if you tell the client what you what he should expect for a behavior and the and the consumption to be within the blue line with blue blue band sorry that we saw if it's outside of that behavior then it's it's an anomaly right we don't need here several more advanced analytics algorithms such as clustering the the lines of the of a company based on the the behavior and understanding how they how they behave compared to to the rest of the of the company and in the and in the call center module that was the original idea you see obviously again the descriptive part of the of the module where we analyze what we saw where the main kpi is important kpi for this type of companies which are the number of calls that I kept talking about and also the number of concurrent calls which which are the it's the main kpi that the these companies used to to side the their crew there the people that receive those those calls this is again part of the descriptive parts here you see again the same advanced analytics with the anomaly detection again the point here was to to be able to explain very fast and very easy the complex algorithm that we were using and we and we add and it'll show you now another one which is a the next step of what we wanted to do here we wanted to start with the descriptive part we wanted to add a prediction we was the next step I'm following these in the in the future releases will add a prescription part so we show you what's happening we show you what will happen and we show you what you should do in this in these cases going back to the technical part a bit again how does it work here for a for a architectural point of view luca comes it's a product that was built on the on the cloud from the beginning it was built on amazon web service from the beginning we now move to a multi-cloud support is in issue two and this is here the the the architecture diagram it's a very simple approach from an architectural point of view we use the as a talk at the beginning we use the telephonic as own data that we receive daily and we analyze it here in a very currently very small cluster ephemeral caster on the on the cloud this is here where the algorithm that Alberto described executed itself daily well it does the monthly a monthly process of training and grid search and daily it executes the prediction part of it and then all those results are stored in a in a database cluster and maria dv uh database cluster and so on the on the web page that you that you saw right now we we launched the product a few a few months ago so right now we're we have only a couple of products of clients sorry and this cluster is a ephemeral cluster that secures the the algorithm it's it's very small it's currently just a one single node uh cluster but it has the potential uh to vertically and horizontally scale accordingly so that was it from the for the look-a-coms and the and the algorithm um i'm not sure you have any question we'll have six minutes of questions yeah and i'm not sure we have microphones i think we have time for maybe one one question is that all right and then if you guys want after that perfect i can meet them outside sure i can bring you the microphone okay thank thank you very much for the presentation and i was wondering when you talk about transparency and and explaining what the algorithm does you you explained how you did define the outlier but you didn't explain why the outlier appears so did i get that correct yeah the thing that we want to explain i mean well there are some things that should be considered regarding your questions also uh what kind of user is going to receive in this case it's the final user our explanation for the final user and we want to explain only and only why a data point is considered anomalous regarding the rest of the data points for that client i mean that client have a behavior and a pattern of evolution on their data and we want to explain why that data point is an anomaly in comparison with the rest of the data points we are not talking anything about causality here i mean it's just that component okay so thank you very much we'll be outside for for a half an hour or so uh so you have any more questions we'll be we'll be outside ready to do ones thank you very much thank you very much