 So welcome to the Nokia presentation. My name is Nicolas Dupuis. I come from Belgium. And I will talk about the usage of deep learning in the context of broadband network diagnosis. So I think you know Nokia mainly for this, which was a popular phone in the year 2000 with snakes and so on. Since that, we have made some evolution. So we have actually rebuilt the same phone. We're also active in the smart phones. Also, actually, in what you don't see actually, which is the infrastructure of the network. So we built wireless devices like antenna, hot topic now with the 5G arriving. We also built devices, hardware, software for fixed network, wire line for optical fiber for copper technologies. And we also built devices for your home like these Wi-Fi beacons. Maybe what you don't know about Nokia, it's the following. So actually, Nokia is also very active in the AI machine learning technologies. I'm actually leading, I'm a tech lead for AI and machine learning innovation within Nokia software. And I will mainly focus my presentation today about care analytics. But what is it exactly? So at home, you can have a situation like that. So it's what we call your digital experience. You have plenty of devices. You have a tablet, a smartphone, you have Wi-Fi, you have a PC, you have a TV, all these devices, they get connected to the internet. And sometimes it doesn't work so well. It can be slow, it can be stuck, you can have lax. That's what happens usually at home. Anyway, to supply these devices, a preferred medium remains the copper technology. Also it can be the optical fiber, but copper technology remains also a preferred medium worldwide. And I say it can be slow, it can get stuck, but you can also expect this. It can just not work. And actually I'm working against that. So I would like to avoid with deep learning, machine learning, artificial intelligence, data science, whatever you want. I try to work against these painful aspects in your digital experience life. So we mainly work with big telcos, so the service provider. And they're actually interesting in predicting that you will have such issue. So in advance, they would like to have a kind of predictor that your experience will not be so good. So they can act upfront to prevent that. The second aspect is mainly the aspect I will cover today is the diagnosis. So okay, it's good to predict that something will go wrong, but okay, why, why does it go wrong and what need to be fixed then? And sometimes that's the latest aspect, you have optimization. So sometimes we know that it will go wrong. We can already take preventive action to make that working better. But here I will mainly focus on the diagnostics aspect. So I said copper medium is a preferred medium, still in the world. I will not redo here lecture on copper technologies. That's not the goal, but I need to have, I need to introduce a piece of background anyway. So in your copper twisted pair, you have the signal that travel and these copper twisted pair actually can be modeled by a channel frequency response. So that's a curve. Remember it's a curve that your modem measure and it's actually a picture of the attenuation of the signal through your medium. So if you have a line, the signal travel, it gets attenuated over the frequency and that's the attenuation, the normal expected attenuation for a normal copper pair. Now if you start to have issues, if you have humidity, it can be humid, humidity agent, it's raining, in the street you have interconnection, you can start to have oxide on your connector, that curve get affected. And here it's an example of a curve for an oxide on connector. Now if you have another problem, here it's just another loop that is connected to your loop, so a kind of neighbor that has been connected to your line, that is called a bridge tap for example. You see this curve is also differently affected. You start with peaks, dips and repetition of dips. So that's phenomenon that can affect these channel frequency response and at the end all of that produce reduction of speed. So your connection get slow because of this issue. And in the worst case, imagine you have a lot of oxide on your connector, you can have no more service anymore. So it looks feasible from this curve to try to recognize the problem. It's not exactly the same shape. We can potentially point to the right problem by processing this curve. Anyway, we have not a complete understanding of what need to be tracked in the curve. We know that some parts get affected. We see in the lower frequency it's more attenuated, but there are other pattern in this curve like waves. And we don't know if that pattern matter in recognizing the problem or not. Same for the bridge tap. We know that there are peaks, repetition of peaks, but there are other examples where this one for example, there is only one peak. Is it still the same problem or is it something else? And what about more complicated example like this one where it's not regularly spaced? Is it still the same problem or not? And what about this? A shape which is even more complicated. So it starts to be difficult and we don't have even with people that go in the field and like field technician for years, they don't have a good understanding of this curve. So by watching the curve, it can be difficult anyway to point to the right problem. And it's for that that we would like to apply artificial intelligence. So deep learning, I think you are all aware of what it is, just a quick refresh. So it's popular of course in computer vision or in image recognition. And it's popular why? Because it has the ability to really identify pattern. We have seen that in the previous presentation or in the morning in the kiddo. It has the ability to recognize pattern and also to point at the end to the right category. And the advantage of this big convolutional neural network is that there are two parts, a first part that try to really extract the relevant pattern and we don't know what really need to be extracted. So it will learn really what need to be extracted. And the second part to combine them in order to point to the right category to the right problem. So it's actually a problem which is applied in computer vision, but that can also be applied to our problem. It's similar, it's just that we don't recognize images, but we try to recognize problem out of the curve that we measure. So actually what we are doing, we retrieve that curve from your model. We have designed a deep learning model and the goal of this deep learning model is to point to the right problem. So that if you complain it's slow or I have no more services, in advance we will point to the right problem. So the operator will know that in advance. So there are a lot of techniques, of course for the design of such network, you need to know the layer aspect, you need to know the hyper parameter tuning aspect, there are plenty of data science aspect, what can also be a good optimizer, et cetera, et cetera. But the key really, and that's what I will illustrate today, the key, it's really to have a good domain expertise in order to correctly tune or design or adjust or train this neural network. And that will be a kind of backbone of this presentation. It's how can we really optimize the design of this neural network, taking advantage of a good domain knowledge. So first soft labelling, I have mentioned that here we do, for example, soft labelling. It's different from the unwritten digit recognition problem. And why is it different? In the unwritten digit recognition problem, all digits belong to a class and they all have been assigned to that class, so all zero are zero and there is no ambiguity in that. It's all zero, let's say human, expert, upset, it's all zero, it's all one, no ambiguity. While in our case, you can have a degree of severity for the same problem. So within a class, they are not all equal, so all zero are zero here, all bridged up, this example with this piece of cable. Now anyway, not all the same. And when the curve gets very, very flat, these peaks are very small, it's not so important to repair that because there is not a huge impact. So in the design and in the training process of our deep learning model, we would like to take advantage of this knowledge that we know, this notion of severity. So actually it becomes a kind of classification problem but with a notion of severity per category and we solve that by applying soft labeling. So instead of having a vector that has been one odd encoded, so full of zero and some one, there you can have really a distribution, so a value between zero and one for the category of problem that you have. If it's highly visible, it's close from one and if it's very small, minor, it's more closer from zero. So it's already a way to embed the notion of severity within a classification problem. And to do that, to know, so I explain you this, the deepness of the peak is important. It's a kind of knowledge, so it's domain knowledge that we have embedded here. And if you go further, you see that this later curve without domain knowledge, we don't know the slope. It's really steep. Is it more impacting, less impacting? Only an expert in the domain can tell you and I can tell you that no, that generic slope is not related to the impact and still can be considered as with a lower impact while the slope is quite steep. Expert scalar feature, so there is a kind of paradigm in convolutional neural network or deep learning that the network alone should compute all the pattern by himself. But we also sometimes have some domain knowledge, some knowledge about what can be relevant on the curve. So I tell you the macroscopic slope is not related to the severity and it's only the small peaks. So if in the past we had that knowledge and we had developed some algorithm to track, I don't know, the value of the slope or some characteristic that reflect what we would like to extract, why not adding them to the neural network? So in that case, we arrived to a kind of hybrid neural network where you have a part that is learned automatically by the convolutional steps, convolutional layers, but we have also added to the fully connected all these scalar expert feature. So in that case, you benefit from automatically computed layers plus we have added the knowledge that we had in the fully connected. So you have human feature plus pattern recognized by the machine, fully learned by himself. And the two actually we have seen and we have noticed that it provides a piece of robustness because sometimes if you let the neural network learning alone, it can sometimes learn quantities that are not so relevant and sometimes it's rushing into a category which is not right. And if you would like to a bit avoid that, it's good to have this scalar feature that back up a bit those mistakes. So it will not rush into the wrong category. So that's also an aspect which is important to consider. Don't always trust what has been learned by the neural network himself. You can still complement by some feature that has been known as a meaningful one. The training set, training set is the key. So we all discuss about training set and in the key notes in the morning that was also mentioned that training set is important. Now we do machine learning. So we are teaching a machine something. So you are a bit like in the professor role you are teaching a machine. So you need to actually select the right library and in that library you have to put the right books, let's say, about the right topics. And if you would like to be complete even if you have a massive library, you have to have a kind of library containing many types of books. So not only, let's say, on literature or in mass, but that can be on politics on many topics. And more complete you will be, better it will be for the neural network. So now there is a kind of trade-off between do we do data science or do we do pure machine learning? And there is a difference. In data science you have a data set that has been collected from somewhere. You would like to extract value. Now in machine learning we would like to train a system. And that can be done by training sets that are not always learned or that are not always built from real-world data. For instance, the robots maybe at Boston Dynamics, they have not sent robots in the field by millions. So they have built their own system to start training in the back office on purely software. These algorithms maybe to run or to work. So be imaginative in your machine learning system. Try to build a good training set which is large, which is various and which is unbiased, for example. And to do that be imaginative. I saw yesterday some technique talking about generative approach or data augmentation that are good ways to build very large, various unbiased data set that finally will lead to train the machine with finally good and complete example and finally that will lead to something that work better at the end. Because you can have, I don't know, a 10 million data set from the field but maybe you have 99% of the cases, the same cases. Even if you have 10 million examples, it's not various, it's not covering everything and actually the spectrum of this data set can be finally small. So good also insight regarding the training set. Not regarding the deep learning model and the deep learning pipeline. You are still, let's say in the professor role. So we are still teaching a machine. And actually some intermediate step can help. So learning intermediate step can help in learning other more complex ones. So for instance, I think you know many languages. If me, I would like to learn, I speak French, if I would like to learn, I don't know, German, it can be good to start learning Dutch which is a bit in between and that will be easier after from Dutch to learn German. So having a model that would like directly to learn the final output, usually it will be a big model. It's difficult to control when the model are very big. While if you learn with maybe a smaller model, an intermediate step, you can after that, it's a big, this picture, you learn this red output but these output can become the inputs as scalar feature, for example, of another model. And actually you will end into a deep learning pipeline, train in two steps and you will make advantage of again some expertise in guiding the system. So you can tell actually you have a strategy for the system to learn first that by knowing that that will help in learning this after. So you process step by step, it's like human. It's like learning a language. If you start with a first language, it can help to learn a second one. And that force actually also to focus on the right aspect. If you have big, deep learning model, maybe you will not focus, so it's big, it's difficult to know on what it will put some weight. And maybe this model will be finally too large and at the end having very large model, it's not so good. So it can be better to have two small one, two cascading small one for various reasons but also you can force to insist to learn one quantity that you judge meaningful. So at the end to come back to my story with what I mentioned regarding the training set, the scalar feature, the soft labeling, the cascading deep learning model, that pipeline, we are able out of a curve like that. So if you watch that curve, it's difficult for you and me to infer anything but we can tell that your internet connection, your wire is actually affected by the connection of two other wires. So we have two neighbor or two piece of cable that are connected to your line. And that was correct. We can say there is one long, one short. So there is a short one somewhere and a long one maybe in the street. We can also give the length precisely saying it's a short that's the long of almost 14 meter. It was again correct. The other one is shorter, seven meter. One is terminated by your old phone, for example. It can be also an alarm or whatever. And we can also predict before touching the line. So before the field technician go in the street, we can also predict that if he repair that, if he remove that, you will win back some bandwidth. And that bandwidth in this case, 162 meter. So out of this curve, we can clearly give good insights for the field forces in order to repair your line and restore your services and avoid the screen that I show at the beginning. Term of DevOps, I think it's classical. We use Python, so for the pipeline design and the code surrounding all this model for the pre and post processing. We have developed a lot of in-house Python function that are used either for pre or post processing but also that can be used as callback in Keras, for example. For instance, robust early stopping. You have the early stopping, but sometimes there are pro and cons on that, so we have redeveloped another early stopping method that we judge more robust. It's a Python function that is called as a callback by Keras, for example. And at the end, we sell product, Nokia software product that goes into the ecosystem of the big telecommunication operator. So we need to have a level of quality that is high and we deliver a Python production level code. So it's well-formated, let's say, and it follows some guidelines and rules. At the end, strict rule, let's say, it was kind of a good quality code. Now, regarding the TensorFlow Keras, it's a convolutional neural network model. It uses, as I said, as expert scalar feature. These features are coded directly using TensorFlow or Keras, so tf.something function. It uses the specific callbacks that I mentioned and we store in the Keras or TensorFlow format the model. And we now use also the tf records because it helps to speed up the processing. Now, Apache Spark is also used first for large dataset manipulation. We all know that deep learning model require large amount of data, even if we can be imaginative to generate millions of data. After that, you need to manipulate them. And in our case, it's even long sequences by millions. So to distribute a bit the processing over this data, which are stored in Sparket, we use Apache Spark. But also for the hyperparameter tuning, actually TensorFlow, it's single-treaded, let's see, plus-plus-coded, okay, they are metode with a robot, et cetera, to have a kind of distributed training. What you can do is still have the classical training, but in parallel, you can train another model with other hyperparameter. And in parallel, again, another model with other parameter. We can do that, let's say massively, to have multiple models that get trained in parallel. That parallelization, it's made with Apache Spark in our case. So at the end, it's not really a distributed learning or training, but it's distributing the hyperparameter optimization. So training one model or training 100 model, let's say, will take the same time. If you have, of course, the resources. Also for the execution, when you have your TensorFlow model ready, and you would like, and that's our case, to execute for this prediction network-wide at an operator level. So imagine a big operator here in India if you would like to run the prediction for all the network as a one-shot. It means a lot of data to distribute that processing. We also use Apache Spark in that case. So that model has been trained over huge, more than 20 million sequences. And with the early stopping, it stopped a bit before. So that means that between 10 and 20 million example has been required and we don't see overfitting before. So it's good. And we do also a genetical hyperparameter tuning. It's also homemade. So there is a genetical algorithm to search, so not to explore a raw grid search, but to have a conversion in what we explore. But that has been redesigned also in ours. So basically, that's the solution that we have. So that's the network, your network. We are above that, Nokia software, Cloud-native product. We actually deliver, okay, Cloud-native product, but containerized application within that product ecosystem. So my model here for this use case of diagnosing your broadband connection consists in a container that is finally integrated into a more complete product, which is Cloud-native. It's for all technologies. So in that case, all copper technologies, all manufacturers, so we are not limited to Nokia devices. So party, we can collect the same data on the same analysis. Any gateway, if you have that modem or this modem, it works. And it's collecting the data via standardized protocol. Now we deliver actionable insights. So it means that at the end, it's good to have the problem, but it's good to have what need to be fixed. So at the end, there is a kind of interpretation of if you have that, you need to check the connection in the street, for example. We do that network-wide, so for entire network proactively. So before you call to complain, we have all the operator, they already have the result of this analysis. So if you complain to say, I have no more services, they already know that it's because of that problem and then they can act very fastly. It's non-invasive, there is no test that is launched. So it's just retrieving the data measure normally by your modem. And it's of course automated, but it can be done on demand. So when the field forces are on site, they start acting on your line. They can ask for a new test and then we run again the algorithm and they get the new view with the new answer. It also helps to have business decisions. So we know, I don't know, your network is affected at 20% by this type of problem. It can be good to train people in solving this problem or to remove massively that kind of systematic problem. So an overview, because we have made that journey over time, kind of 10 years journey, let's say. But before, so 10 years ago, nobody talked about AI, few talked about machine learning. But we were already doing signal processing, data processing, data analysis, regression, clustering, all that, 10 years ago that was already there. But more driven by human expertise in the sense that it was more rule based on threshold or simple rule and then pointing to an insight. That was okay at that time, but I consider the capability and the field performance as one star, let's say, level one. With the same, let's say, feature manually extracted, but plus an additional layer of machine learning to just combine that via a learning process. So having more optimal combination to point to the right problem. It improves, it helps, but still not perfect, let's say. And then with deep learning by the auto computation of the quantities, so the pattern on the curve that are really relevant and really the difference between that quantity or the one for another problem in order to point optimally to the right problem, it also improves, so we arrive to high performance. Anyway, if you don't do all the tricks that I mentioned, having a good training set, having the good scalar feature, having the soft labeling, there are other aspects regarding the validation, if you don't do that, still, if there is an improvement, you don't benefit from the max capability of your deep learning system. So really, if you add at each step of the design, training, validation process, a piece of expertise, there you can really leverage and take the full potential of your deep learning system or artificial intelligence. Now what does it mean in practice? Because we have usually confusion, matrix, accuracy, loss, et cetera, now this system, it goes in the field and for the field, that's our real picture that has been taken while the workers or the field forces were using our system, so there are problems that has been discovered and advised that has been mentioned by our system to be taken. But what does it mean? It means that for this situation, most of the issues that you can encounter has been correctly detected and recognized, and according to the field for themselves, 95% of all that has been mentioned was correct or helpful, so that's very high and it's important to have that feedback from the field. And we had also a very low false positive rate, so when there is no problem, we clearly state and say, no, there is nothing, and we don't say go there if there is nothing. So the false positive, which is important because going and do some work in the street, it costs, driving there, doing the job, let's say costs. So having a very low false positive rate was really expected, and again, with all these sort of labeling, et cetera, aspects that help to reach also this target in terms of a false positive rate. And also they know, they have a notion of the severity and the impact, and actually that help to prioritize. They know there there is a high impact, it's really painful for the user. There, I will start my day maybe by that guy. And the lower impact can maybe wait, so that help to also prioritize the intervention. So actually all these aspects of deep learning system plus having all the trick regarding the soft labeling, regarding the training, et cetera, has a direct impact on the daily job of all of these people. And the impact is very positive. No, for you and me, what does it mean? Because that was mainly for the operator. For you and me, it means that when you have an issue, if you have a system like this one, which is helping your operator to troubleshoot your connection, the time to resolution, it's much shorter. So before, that may be a technician visiting you, he tried that or he goes there in the street, then back home trying to change something, it take time, and maybe he has not fixed everything. He will have to come back the week after, two weeks after, and then you are in this painful process. While here, with the indication and also the impact, he knows that he has the ability to know that when he has left your home, there is no more problem, for example. And he's also very well guided through his process. You know what needs to be fixed precisely. So that's really a benefit for the big telcos, but also finally for us. And at the end, it's the whole efficiency of the troubleshooting process, which is also improved. Because when you phone to say, I have no more services, they already know what is there. So they sent the right team from the right department with the right tool at the right place. So all the chain can get improved. And for you and them, it's its benefit. And actually it has been highly adapted by the staff. So the telco staff, the field forces, their manager up to the director, they all know our big supporter of this system. I will not read the sentence, but this is senior project manager at an European operator. They have high target in terms of bandwidth. So they need to deliver ultra high bandwidth with their subscriber. And also high target in terms of time to resolution. They cannot leave if you have no more service or if it's slow or whatever. They cannot leave you alone for some days. That's unacceptable. So they try our system and it worked very well for them. They were very happy. So we are finally in a data science conference. So last advice is for you regarding AI. So deep learning is not only applicable to image and sound. So I have not processed any image. I have not processed sound. I have not processed text. It's a new problem. It's an electromagnetic signal. There was no pre-designed model for them. So I could not have done transfer learning. But with a piece of knowledge and, of course, deep learning skills, we can rebuild a system that also applies to this problem, which is out of the well-known problem. So don't be shy, let's say, to also do deep learning for areas that has not yet been covered by the existing state of the art. Machine learning, deep learning model that perform well in notebooks might not give the expected result in the field. So I set for the same, more or less, accuracy here. You don't have the same field performance. So these people, for the same accuracy in your notebook, for the weak problem, for example, if we say it's extremely high computer, you need to act there. And actually, it was very weak. And you invest a lot of money. You repair something. And finally, that doesn't change so much. It's a waste of time at the end. Even if we have well-recognized the problem, let's say from an accuracy point of view. So all of that makes you sensitive at the end to the business. So stay connected to the business problem. Understand well what problem you need to solve and remain connected to this business problem. And don't be, let's say, hypnotized by loss or accuracy. Increasing your domain knowledge will save you a lot of time. So it's always good to know what is behind the data. You receive a data set. You don't understand anything. You will finally find less good outcomes than if you know what this data means. Here it was electromagnetical signals. If you have a kind of communication engineering background and you know what it means, you will finally do better data science than if you knew nothing about communication engineering. You receive the same data set. You start processing them. The outcome will never be the same. So please, it's a good advice, but try to improve on your domain knowledge that will at the end make you a better data scientist and you will save a lot of time. Machine learning is not data science. So it's what I said. You can explore and find nice outcomes out of the data. But to train a machine, be imaginative. I've seen other talk about data augmentation, GAN, et cetera. Having maybe synthetic data can also help in building very massive values and unbiased data set. And that is very helpful for your deep learning model. And then going from deep learning model to deep learning pipeline, so having cascading model, so the output of a model become input of a second model, that also helps. That can also be a good insight. So that's it from my side. If you have any question, I'm open. And also after. Thank you. My name is Ranga. I have a background in networking. So definitely could relate to many of your stuff. The interesting thing is, see, when we do data center networking or campus networking, we think of networking data as structured. It has packet headers. There's information. And electromagnetic signals are structured. And we think of deep learning as being used for unstructured, like you said, for sound, for images, for text. So we've used basically traditional machine learning models and not gone to deep learning for these kind of use cases. Again, what you said actionable insights, like recommendations that you give for certain use cases. So I just wanted to know your guidance on, like you said, in the last slide, use deep learning. Why would you do that? And what has been your, I mean, I saw that you improved your accuracy. Were there other factors that you would sort of recommend why you could go to deep learning for these kind of use cases? So the motivation here was not to improve actually the accuracy. Finally, a good aspect at the end. The first aspect, it's actually a signal processing problem. We have that curve, like an image. Here it's only 1D, so it's a kind of long sequence. And on signal processing filtering, it's usually filter that run over your signal. Filter that run over means a convolution. Having multiple filter, you start to have a network of filter that do convolution. At the end, it's actually convolutional neural networks. So the motivation was, I would like to extract pattern on top of a signal. And to do that, and for the capability to do that, we have privileged convolutional neural network. And it happened that at the end, it works better. So the motivation was use deep learning, use convolutional neural network for its ability to process the signal. An extract pattern out of this signal. Yes. First, we're going to classify what I believe that process of ignition can make a difference. I'm just wondering, and normally, detection is important for the instant-catchnesses? Of course, because they're within, they get a box of dilation, I think. Yes, so I will be simple on that. Anomaly detection is basically, it's when you don't have a clear labeling of the problem. You don't know what happened, but it's out of the normal situation. It's an anomaly. While here, we have the knowledge of, it's out of the normal, but we know that it was that problem, or this problem, or that. So we are actually more precise than only anomaly detection. Say an anomaly, it can be very complicated, but it gives an extra insight. Here, we don't only say there is something wrong, but we say it's wrong, there is that problem, but that's the name of that problem. We can recognize even the name of the problem, the type of the problem. So, and that requires a good labeling of the data if you would like to do that for training. Maintenance, proactive maintenance is the application of that. So we are running short of time, actually. Yeah, you guys can continue outside. But I would be happy to continue the discussion. Thank you.