 To go Well, hello everybody before we start I would like to ask you a question Did you knew that only 18% of the failure are age-related? So if we think about a simple example, we take our car every period of time For a schedule maintenance and we think that with that we are good to go and we're preventing it to have a failure Somehow well, we're only considering 80% of the failures that could happen What happened with this 72% that is left and and now I just mentioned a really simple example But what happens if we're talking about a production line where a machine breaks? And then the impact it has and the causes has it's not only that machine that is not working But the entire production line that gets affected Well today I would like to with my colleague I would like to talk to you a bit more about that and how can we Prevent the 72% of the failure patterns rest that are left So as they all mentioned, I'm Daniela Salis and I'm really happy to be here to talk to you and okay My name is Rodrigo Cabello and it's a great pleasure to be here for me So today and I am the artificial intelligence team leader playing concepts and also and Microsoft and MVP in the artificial intelligence area So let's start to talk about predictive maintenance So How did it all start it? well Back in the days where we didn't have any machinery We would just build something really easy and if it didn't work we would just through through it away and Build something more what happened as we evolved as humans we all it also Evolved with us the machines we used and then it didn't make sense to throw away the machines when they break to build a new one So then we started repairing stuff But we would just repair it when in the moment that the machine was broken So it was not convenient because normally it happens that when you need the most is when it breaks so then we started to prevent that by Man bad by scheduling a preventive maintenance every certain period of time We would look at our machines and we would try to check that everything is okay But what's the problem with this? Well, first of all, I just mentioned that only 18% of the failure patterns are age-related second of all, we are Investing a lot of money and time of technicians that are looking at machines that are not broken and as I mentioned before Like really crucial failures can happen in between this time interval. So it's not it's not fixing our problem So this is where predictive maintenance came into comes into play So the main goal of predictive maintenance is to avoid this unplanned reactive maintenance last minute and we also want to eliminate this cost of having technicians fixing machines that are not broken and How are you gonna do that? Well? Thankfully now a days we have IoT devices and we're able to have sensors from all the different parts of machines and we also have Predictive models so when we meet when we mix both we're able to achieve this Okay, thank you. I'm Yara. So Today we have an example about How to show you an end-to-end workflow and different architect that we have prepared today So first first of all is try to collect the different historical data that we have In order to try to to get information about the two ring sensor that we're collecting And also we are building a machine as your machine learning pipeline in order to go deeper in the different Faces that we have developed to read the data prepossessing train and testing and test the quality of my model and also Regist a new version of my model when my data are changing So we have Developed in all the solution using data bricks in the in this case Once we have finished this process. We can export our model. This in these cases a tensor for light on model I'm also we can build an automatic process using continuous integration and continuous deployment To deploy this model into the different IoT device that we have connected for example to the different turbines We have developed two models in this case a predictive model that is making inference all the time I'm also a sensor model. We have to simulate the sensor. We are sorry when we don't have a Realturing here. So we have to simulate all the all the data in order to send to the predictive model Once all the Information we are sending to the cloud in this house We are using IoT have to call the call the information all the telemetry are they're sending different device And also using a string on a lead is to preprocessing add new business logic and also try to build different Reboot in this case to show the different graph and when is necessary to schedule maintenance So we are going to go deeper in these Different phases that we need to to build for example in this case. We are building a pretty maintenance project, so my college Daniel is going to to go deeper in the next section about data gathering So the first step is gathering the data So we have all these sensors in in each part of our of our machinery and we're able to Measure temperature humidity pressure speed and a lot of things so of course depending on a problem We can obtain data from those sensors and use it and how are we going to obtain it? Well, I already mentioned we have IoT devices and IoT devices come in different models different shapes different sizes But basically what all of them have is just it's a circuit board With us with sensors and with Wi-Fi chip that can allow those to send all that data to the cloud So once we have the data we need to preprocess it So we obtain different data from different parts of the machinery and they come in all sort of formats And of course because we're obtaining sequences data. Well, sometimes the data is noisy Sometimes we're missing values. Sometimes we have Very different range of values from the different sensors So we have to preprocess it in order to gain some insights from it And some of the tasks that we need to do while we're doing it is cleaning it Fixing all the missing values Transforming the data and making combinations that will allow our model to gain more insight from that data So we analyze it and we think oh if we combine this and that it gets More information that the model can use for it And at the end we have to select the features that we're going to use Because especially in cases like this where we're dealing with large amounts of data We have tons and tons of variables and we cannot use them all So we have to really think about our problem and what features are the best ones to fit our model So I was telling you that the main goal of it is to predict when our model is going to fail But that is not it Because what happens if they tell me your model is gonna fail in one hour? I'm still in the same place where I was before So I really need to have some time in order to change the outcome and to fix it before it happens and For it for it. I need to have the model Take a look at at this process and be able to tell me well soon It will it will fail and how much time do I need to be able to change this outcome? Well, again, it depends of the on the problem will deal with in this case We're dealing with airplanes turbine turbines. So Kyle M and Air France Mentioned that for them 30 cycles 30 flights is the time when they can actually change the outcome when they can actually Schedule upper maintenance and have everything working and it won't affect them So for us in this project with we decided to use this 30 cycles So we obtain this data and we have to split it in this defined cycles that they use and another thing that we have to bear in mind is that We're dealing with sequence of data and if I tell you look at the sensors today Will the machine fail? Yes or no, you're not able to answer the question you have to look backwards and see how the the patterns of the sensors are working and then you're able to To obtain a conclusion So the models are exactly the same in this case. We're gonna focus on a time window of 50 steps back So every 50 steps back. I'm able to predict what's gonna happen next Now Rodrigo is gonna show you how we did all the data Pre-processing in Databricks. Okay, so let me show you here. We have a notebook we are using Microsoft Azure in this case and data bricks In order to do the break to maintain us with the different data that we have So first of all, we try to collect all the data that we have in order to read and Blow to an storage in order to read and prepresin we're using Databricks in this case so You take a look at the data There are for example in this in this case of the there are 21 sensors with the different information About the different sense of the turbine even the cycle of the of the turbine that of the different flies I'm also the idea that it is going to identify the engine that is supporting the the turbine So once we have reading all the all the data we have to preprocessing So the first of all is try to level our data. It's a binary classification in this case So we are going we need to know this is going to fail in the third Cycle as Daniela mentioned before so maybe we need to Tuck or the order data in order to try to preprocessing and train and learn about your data in order to predict where it's going to be and It's going to fail or not in the third cycle so and after that we have to Normalize our data we have different sensors with different range of values We need to normalize all the columns. Okay in order to to normalize this kind of data Also, we can Visualize the information that we have with the different sensor maybe in order to know if is and With the range of values we have that we are we if you can see here We have different sensors with different range of values for example and sensor 2 is very close very similar to sensor 3 For example, we have another kind of sensors that have a constant value on 0 or 1 We are going to and we are not going to consider for example in this case to To our model from the prediction Also as Daniela I'm telling Before we need to a generating sequences Sequence in order to look back with the time window of 15 Different flies that we have in order to learn about the past in order to predict the future. Okay, so maybe we are preparing the different Processes in order to to train our mobile Okay, the next step is machine learning So a normal machine learning workflow would be okay. We already pre-processed the data Then we need to split it into validation train and test it Tested the the model won't see until they finish and validation and train set allows us to train the model and see how it's going So we choose our model we evaluate the model and once the model is working as as we wanted then the model is Ready to be put into production where it will obtain new data and this data will be will give us a prediction So how do we know when the model is working good to go? It's good to go Well, we need to choose very carefully the metrics. We're gonna use Because depending on the problem we are dealing with We have to adjust what we're what we wish to optimize So in this case, of course, we wanted our model to be accurate But we wanted a model to be able to really truly predict when the failure was what was happening in a turbine But we also as I mentioned want to avoid the cases where we think that something is gonna fail and it's not failing So then we also have to take that into account So the metric of precision and recall helps us with that and then what we do is we mix them in F1 score Which is the harmonic mean of it So then we're able to optimize all that and be sure that our model is working the way we wanted to work So another very important thing when we're We're deciding to do a machine learning Solution is to choose a model that really adjusts to our problem or to our data As I was mentioning before if I show you the descensor data of today, you're not able to know if that machine is gonna fail or not So We really need to find a model that is able to fit sequential data and also as I was also mentioning before We want to know with enough time in order to change the outcome So we need to be able to look back for a long time and be able to provide early warnings So our model needs to be able to identify long-range dependencies So some of the models that adjust to this Characteristic I just said is time series and deep learning models such as LSTM network neural networks In our case we chose deep neural networks because they're perfect for IOT IOT brings a lot of data and deep neural networks needs those data to understand what's going on. So they're a perfect match So What is a long-short term memory? Well, it is the type of neural network that is capable of modeling sequences Because it has sort of like a memory mechanism where it can cause Preview steps and has a context that allows it to predict something So We need this context also we as human needs context, but not all the information is as valuable To do something some information is useful as more useful than the other. So how can we? Help our network to gain the and extract the information. It really needs and And to leave the one that it doesn't Well, they have this make three mechanisms inside it It has an input gate where new information is able to to go in and then it also has a forget gate Where information that is not relevant. It's able to remove it and then the out there then you have the Sorry the forget gate and then you have the output gate, which is when you actually use your memory in order to predict something So when we approach a problem using machine learning, we have supervised machine learning We have two approaches we can use a classification problem. Is this happening? Yes or not or When is it going to happen? Well, it's going to happen in cycle three in this case we decided to approach it as a classification problem, but because We want to know exactly if it's going to fail in the 30th cycle or not Then we tacked the data as Rodrigo showed you Answering the question will this machine? Failed in the 30th cycle. Yes or no Okay, so let me show you the part of the Databricks training Okay, so Once we know that we are going to use an LSTM the problem that we have here is that we need to export our Model to an IOT device. Okay, so in this case We have to use a TensorFlow lighting in order to run faster or model making inference faster in the different IOT device that we have And if you have developed using TensorFlow framework in the case of deep learning You can use the Keras API in order to to build your deep learning solution And also you can obtain a model What happened with this model that the model is not optimized it to run in different device so in this case we have built and and LSTM from the scratch because and if we are using there some flow We cannot Support all the multiple operators that we have in the graph So maybe we can to export or graph and optimize and remove some operators in order to export to a different IOT device So we have we have we are using the experimental API of TensorFlow the compact experimental API in order to build To use the LSTM cell in order to Stack the different Recurrent neural network cells and then to compact in order to put into together we are using two layers with an 100 number of units per each layer in order to solve this problem Also, we are using Because it's a binary classifier where you see in binary core of centropy and also add an optimizer in order to to train our models Once we have finished this part Okay, we are saving them the Keras model and also we are going to after we are going to finish the the training And I'm going to sport the TensorFlow little graph the graph session Okay, so it's very important to take into account in order to Export a model in a format that this line going to run faster The the size of this model is only a 100 kilobyte in this case and we are making inference each inference Taking 700 milliseconds of years perfect for example in this case for a real tiny scenario in order to make the different inference that we have And also we are tracking all the metrics that we are doing in training and the test Faces with Azure machine learning so we can plot the different graphs and order them In the accuracy and order them the loss function and also We can test even or model or upload a new version of our model to Azure machine learning For example in this case we are running different experiments We can track the the duration of our experiment the accuracy on training the accuracy on testing And also if we go for example to a run here We can see different values for the Accuracy a validation accuracy in training and also all the metrics that we have and this is the more in the most Important part we can register a new version of our mobile We can put the different tags with the different metrics that we have in order to track the different versioning of our model in order to for example See if it has a good performance or not And we have to look back in this case in order to roll back sorry in this case I put another model into production It is loose performance for example So in this part we can see The different versioning that we have of our model in order and the different metrics that we have used it in order to To check they are going to put into the ploy and to all the ploy in the different device or not depending the different metrics Okay, so now our team Well, we have these really nice devices allowed to to gathered information and we're able to Prevent fate machine failure, but we want to be able to make decisions in real time Unfortunately, there's still a lot of challenges when it comes to IOT The time it takes from the IOT device to measure and to send to the cloud and then to Predict something and and do something about it. It's a problem. It's not real-time decision. Also We are using these IOT devices in all sort of Places and sometimes these places lose connection So what happens if we have to do something and we have to be continuously connecting by Wi-Fi? It's not possible so Well, how can we solve this? Well? Thankfully, there's a solution now and we can use IOT edge What does IOT edge really mean? Well? We are able to make the decision in the point where the device is connected to the network the edge where the actual device is measuring all this stuff so then we're able to Extend all these complex processing and all the predictions of a model and instead of having to do it all the way in the cloud We're able to do it in the precise moment So then if we lose connectivity, it doesn't matter the models keeps on working it keeps on predicting it keeps on doing everything by itself So as I was mentioning in the beginning, we have all sort of IOT devices But we chose to use Raspberry Pi and Corel Dev because they have an REM architecture So they're super flexible So you can connect them with all sort of sensors and they really can adapt to Whatever the that's the problem you're trying to solve okay, so With Simon I have mentioned before we have different models that we have developed it in order to the model that is going to make him the different inferences during all the time and also send in telemetry with the To the cloud sorry in order to send all that we have collected all the data Making the prediction is going to fail or not to IOT have IOT have is going to call it all the information from all the different device that we have and then we are going to preprocess analyze the information After after this part. So The first model that have developed is to simulate the different sensor data So we are sending all the information to the another model that is making the inference And then we are collecting the information and sending to the cloud With IOT edge and we can build our own docker image the different models using our favorite programming language so we can do in Python C sharp Node so wherever you want in this part. I'm there we can build a docker image upload the docker image and then we can track the different metrics and the different Version of the docker image in order to try it that then this part IOT had is going to update this is a new version of the docker image. It is another version of the my model or my model in order to pull the docker image and and Update all the information without loose connectivity with a loose performance in in this case and on sending telemetry all the time so and the final part is the about them The model deployment and as I said before we have to use the sensor flow Lighter in the in this path. Let me show you In Databricks that we are trying to test in It's very important only to test the part of the model inference that we are doing with tensor flow lighter because we are We are going to lose accuracy in this case where we are we are trying to optimize our model model is not the same It we are using the tensor flow framework, but we are using tensor flow lighter in order to Make inference faster. So it's very important to take into account that we have to read or data and test our data with the final model we are going to deploy so In this case we have even a generator confusion metrics Sending if it is going to fail or not with with the different information mismatch in the classification part so The most relevant here is the black diagonal that we are That is the different mismatch it in the classification So it's very important to know for example, it's not the same is my model is saying that is Not fail Sorry, the real data is saying that is not failing for example, and my model is saying that it's failing We are going to schedule a maintenance and send a person and say, okay, it's working Okay, but the most critical part here is what what happened is my model said is the real data They say for example that is not failing I'm sorry, and it's failing my model predict the other part is the more critical to know for example My model is saying okay It's not failing by in really and the real life is failing the model or is failing them the turbine sensor with the different Information that we have is the most critical part here to consider that do we have to? improve the accuracy in the in this part of the classification So once we have support to model we can see the different and I don't know if you Know netron with netron you can visualize the deep learning model with the different formats So in this case, we are visualizing the inside Part of my tensor flow little mobile So we have an input an input an input that they have With the different sequences of time window that we are sending to the model and also Here we have the the two liars that I mentioned before of LSTM with the 100 units per layer and also we have to Have them a logistic regression or this a binary classification in order to try this and It's failing or not Okay, so this is a good part in order to try to visualize a deep learning model and with the different liars that we have so when we have Deployed or model we are going to to save my model we have compressed in my model to run the different device and We have a new version. Okay. We have registered. We have a model repository in as a machine learning We have a tensor flow little model. So Can I automatize all these kind of steps in order to know to date all the time the different device with new data The different device with I I have improved my quality of my model in this case. Yes We can in this part. We have using a continuous integration and continuous deployment in order to try to for example build a different Docker image. It's time that the changing my data. For example, it's time that is triggering with a new version of my model And also it kind of blow my model for example to an Azure container registry with the version of the image docker And then we can deploy it using continuous deployment automatically to all the different device Then they have connected to the cloud So maybe once and they have a new version of my model We can automatize and deploy and throw some of the bites only in a couple of seconds So maybe it's taking to into account that is very important to have This kind of automation in order to we are increasing the different version. We are improving our model We need to date all the time in the different device and Our final piece in the puzzle. We need to be able to have the analytics and the business intelligence so We build this model, but this model is only valuable if we're able to act in critical situations so we need to provide an User interface where we can see all the real-time Processing of the model the predictions all the sensors and everything that is happening So once we have the device the device is connected to the cloud in this case our t-hub And we have all the analytics and what we did is we created a job that is continuously Connecting all the streaming analytics data into a power BI So then we're able to see all the sensors and all the information in a very Nice way and organized way that we that provides us with the insight that we need in order to act on critical situations and With this we sort of explain all the architecture and just to wrap it up We have historical data from an airplane turbine. We're able to Pre-process the data train a model with it and then Deploy that model. We want to deploy that model in the IoT device So for it we need to use tensor for light and Once we have the model working the model is predicting But at the same time is sending all the information to the cloud where we can actually see what is going on and we can Act on it and now we would like to show you the device working real-time So in this part we are showing different information that we have we have them the different element of different sensors In this case only free sensor and the line and the yellow line is making the different prediction But there is no yellow line is that not doing the different prediction that we have and With you have a value of free is making a programming. It's necessary to schedule a maintenance We have the value of two is not necessary to schedule a maintenance We are grouping all the information in order to the different sequences for this is faster in in this case so maybe we have to Take into account that this in the real case that we are you using we need to collect all the information with the different flies We are here. We are grouping by the different Cycles that we have in order to show you what is she seeing a person that it needs to schedule a Maintenance in order to try to do this is all always working All is working is not necessary to predict our maintenance or not in this part So And it's all that they are not to to say it about the different Different part we have the end-to-end cycle and we have to going to show you the different scenarios that we have Developed so I don't know if you have any questions Hello. No, thank you. First of all, it was really interesting I'd like to ask have you tried this approach for solving regression tasks like Which day turbine will fail? Yeah, yeah, of course the is the Daniela mentioned before we can solve this problem using for example in this case only take a look on the third on the third Cycle and only because we have information about this the site in the number of cycle that is going to fail We can convert this problem to a regression problem in order to to predict the Exactly the cycle that is going to to fail or turbine, but this isn't very interesting to take a look of the other and airplanes companies in order for example them and Daniela mentioned before KLM in order to this Taking a look only to the third and sequel in order to solve the problem But maybe we can convert it and perfectly this part on to our regression problem in order to know the number of cycles All that we need to predict Yeah, yeah, it's Really difficult tasks looks like a regression problem. It's not amazing It's more about data in this case you have the exact number So then you're able to teach the model to predict that number Of course if we only have at that it says it failed or it don't it doesn't fail Then in this situation we wouldn't be able to do it So we sort of transform the data from normal data Would be a regression problem into a classification So basically it all depends on the data if you have the data you're able to do a regression Of course also depends on what you want to optimize, but in this case if you have the data you can do it Okay, thank you