 Welcome to our talk. I'm Daniel. This is Felix. We are Workmates from Deutsche Bahn. This is Germany's largest railway operator and in particular we're working at DBSYSTEL and this is the bunch of computer scientists software engineers and data scientists at Deutsche Bahn and Yeah, we are working to improve the quality of Deutsche Bahn services So our customers are for example Those that run these fancy high speed trains but also station operators which Operate like escalators. We also have energy generating facilities and all these All this equipment tends to fail at some point. That's clear Also at some point it gets repaired. That's also clear, but we try to Improve the MTTR. So the meantime to repair by providing software assistance to Service personnel So for Like two years ago, we are working now on a universal platform in order to provide condition main condition monitoring as a prerequisite for Predictive maintenance and we want to build it in a universal way and therefore we are listening into machinery and We stick sensors like these microphones into machines in order to detect their state and Yeah alarm personal in case something seems to break The challenges here Probably most of you know them as it is often often the case in and data science is that we want to have generalizing models Also, we are not in a research facility, but in industry therefore our customers expect Yeah improvements pretty early on Even in situations with little data and maybe poorly annotated data We've been choosing a machine learning approach actually to address that and Yeah, we tackle these challenges using Transfer learning approach So in a nutshell again condition monitoring goals, we want to decrease the maintenance costs Optimize the personal placements and service personnel cannot be everywhere all the time and we want to increase the availability since some of our customers Have to provide such an availability of machinery in order to allow Yeah operations Transfer learning goals in our sense Are to increase the prediction accuracy and also to go To allow a quick start with the customer Let's have a look at the system architecture that we have established over time Here for the service delivery. So as I pointed out we put plenty of these microphones into the field and so we have sensor boxes developed using Raspberry Pi's and let's say robust robust variant using embedded micro technology Integrated boards. We are using the eclipse foundation software stack, which is also used in IOT scenarios Kura for example, and we use these to sample actually the audio emissions from the machinery and Compress it transfer it back to the cloud back end where we are using plenty of Linux using Amazon web services in that case and Yeah, we are managing our fleet using command-built and chef and Yeah, look into the data using Crafana the tic stack and also the ELK stack and we make extensive use of the Python language Java Ruby and also MQTT messaging on The data analysis pipeline part actually we do predictions In the field also In in the Kura framework We train our models actually in the back end on the cloud side using TensorFlow and Keras and Yeah, we also use a couple of more technologies, which might be pretty familiar to you Let's go into particular matters now so for two years now we are recording data from escalators and Deutsche Bahn operates about a thousand of them in Germany and Yeah Failures in Such escalators tend to become very costly pretty soon if you do not stop the machinery in time since Yeah, they they were off very quickly and Also, it is important for maintenance personnel to schedule maintenance on these escalators in a night shift in order not to conflict with the accessibility contractual liabilities here and Yeah, I Can show you a couple of issues with these escalators. So mostly it is about foreign bodies which get somehow into Machines like coins glass crush travel and stuff like that screws that fall off luggages and Yeah, then they somehow wear off the machine and We want to detect that early on Here I brought some sound examples from an escalator in Hamburg The top Plot is the waveform and we transform actually the waveform into spectrograms So we consider also the the frequency and the time space in order to detect The machines health states so to say so this is a good case here Maybe you focus on the lower part on the spectrum now. I'm switching to a squeaky To a squeaky recording so here the steps are not properly adjusted and Yeah, they they make a very intensive noise and Also the the ball bearings are actually suffering and tend to fail after some time if no Maintenance personal is about to fix that so you can see here that there's plenty of power and plenty of power in these Recordings in on different frequencies. You can see that it has a certain periodicity since the machine is running at constant speed and the the steps are passing by the microphone at the constant rate and Yeah, basically as we've seen in the previous talk We have seen that convolutional neural networks are very good to detect Patterns in pictures and we are using such spectrums actually as as input data to our detection schemes and Felix is about to present these detection schemes to you Yeah, thanks Daniel and hi also from my side So I want to give you an overview of what we've been doing in the past so as he already mentioned we use convolutional neural networks and use those spectrocrams as input images and We then find out what kind of state the machine is in at any given moment in time and We have of course the problem that If you know what the sound is right now, it might not be that it's the sound the machine is doing all the time So after this we need to do some kind of post processing to get out some kind of oscillation in our predictions So as Daniel mentioned if we have a new use case, we might have really little data so how could we use transfer learning to do something about that and reduce the time and cost for data labeling and you know acquisition of new audio and This approach we've been using before So we had this escalator sound data set where we could Go to an escalator and basically put it in any state we want So we threw in gravel or we hit every part in the machine with hammers and stuff so we have this data set then we train our models and we get a condition monitoring classifier and A problem we have there's also let's say we find out okay There's a new failure state and we don't have to data for it yet Maybe we only have a few samples and that's also a use case. We want to use transfer learning for and Here I give you an overview of how we are using transfer learning so actually if you look in the scenario of Transfer learning in vision. It's quite often that they are huge pre-trained networks available Which have been trained on many millions of images. It's not the same case in audio There's one huge data set. It's called audio set. I think it has 22 million snippets from YouTube but the research is just Getting started there basically there are a few good Results already, but we decided to transfer knowledge from the computer vision models and use that Because if you know in this early layers the models tend to learn let's say edges borders and if we cut off our model as is kind of depicted here and Seed activations we use what has been already done in the vision department And what we decided is to not refit some new neural networks layers But instead use traditional more traditional approaches like support vector machines random forest classifiers To fit new models on those activations Back to the escalator case So how does our experiment look like? We took a huge model pre-trained on image net and Applied basically the same approach. I showed you right now We wanted to compare different architectures So we took inception v3 model VGG 16 a couple of others and then we fed our data set in and remember the activations to train some random forests and To make a comparison we also took our approach we had before to train the neural networks from scratch and Let's see how that went So in this case we can see that On the x-axis you have the available audio training data in minutes So we always increase the training data then fit new models from the different approaches and checked what kind of accuracy on our tests that we could achieve and Of course 18 minutes of audio is like really little and we wouldn't probably give the Recommendation let's say to our product owner to deploy such a model But we can definitely see that there's a certain range in which These in blue and orange and the transfer learning approaches can reach a higher accuracy Then would be possible when training a network from scratch from scratch About this slide we need to see it not so much Quantitatively maybe a little bit more Qualitative because to get these insights we train the huge number of models and it took a really long time So we couldn't do hyper parameter tuning for all the models so if in doubt it could always be the case that maybe Optimized model let's say from the training from scratch approach could be getting a bigger accuracy But what we say is if we are in this box with the dotted lines We can definitely say at least looking into the transfer learning approach is probably a good idea And if we have less audio then let's say to the left side of the red line It would be advisable to consider deploying this approach So what do we conclude from that? From a customer perspective if we have little data Maybe only half an hour it could already be enough to at least give a reasonably good model to the customer and satisfy it is needs to get a result really quickly and From a business perspective, we don't need expert time so much to get up Give us the labeled data. We can already with a little bit of help from the service technicians get quite ok models and Technically definitely in small data sets. We can get improved accuracy as compared to before and What's also good about this approach if we only take the activations and train a different model We can choose a model which doesn't tend to overfit that much and Limitations this is like one use case and it worked really well if we would have a different use case and maybe different type of audio noises from the machines and It might be not so suitable there, but it's definitely a good thing to have in our toolbox and the next time We have a new use case. We can think about deploying such a model so where does that leave us next steps we would like to deploy this and See how it compares to the models we have so far and get more insights how this approach can help us and Maybe improve on it Another thing that would be really nice as I mentioned we use models which were trained on image data and We would like to use utilize a huge data set that's built totally on machine sounds to do transfer learning in the future and That's a topic. We want to research and we're working on building up this data But at the moment unfortunately, we're not there yet, but it's definitely on our agenda for 2019 and beyond and We would also like to see how suitable is this approach for deployment on our edge devices and That's something we're I would say maybe close to testing out and Exactly so That's it from our side Thanks for the attention and yeah, feel free to ask questions So to repeat the question, why did we use convolutional neural networks? He mentioned audio data is sequential. Why not use recurrent neural networks? so there has been work on that in acoustic scene classification and audio event detection and What they found out is that? It's you need less data usually to get quite good results if you use CNNs on time frequency Representation such as the spectrograms we use They have been end-to-end approaches, but I think you need more data and It's like a common approach to choose in that setting Yeah, to Repeat the question if we have tried something with Kalman filter and Kalman filters if we have applied general guns to Yeah, augment our data maybe So we are talking about guns and it's something we would like to try out and in audio classification It's something that's been gaining popularity and we haven't done it yet, but Kalman filters I haven't thought about in the context yet We are looking for internships in Frankfurt. So if you're interested in guns just drop us a note To repeat the question is asking if we are thinking about open sourcing our data set Yes, okay So we are making extensive use of open source software as you have seen in the talk Currently our organization is paving the way actually to contribute contribute back because there are many hurdles organizational and jurisdictional and We are looking forward to that maybe next year. Yeah Okay to repeat the question how much time does it take to predict our model So we have deployed this kind of model on raspberry pies and it can run in near real-time I would say okay Yeah, so there's like a short-term component. So we have small recordings for of several seconds, let's say 10 seconds and But then we consider it a hyper parameter also how to cut the spectrograms and cut them into patches and then you can average over those and Depending on what kind of scenario you have then you will have more or less noisy predictions and Then you can aggregate over that and apply some kind of statistics maybe windowing what's coming in and Yeah, so of course it depends on what kind of machine you try to predict the condition For some it might be important to act really quickly for others You might have something that's More long time where you can say okay. Let's see the average predictions in the last couple of hours or something. I Hope that answers your question So the question was whether we Fine-tune those models when doing transfer learning or whether we just take them as is and just see what the activations are and in the experiment we showed in this talk It was just taking activations as is we have tried also Some experiments where we do fine-tuning But especially with that little data. We just didn't have the same results and Especially the kind of simplicity of doing it this way and getting a good result Was something why we wanted to pass you that further, but maybe that's something for the future to take up again Yes, so we are aiming at a Yes, okay, so why the question was why we are using actually acoustic data And if it can be put on rolling stock, right was it right there? Okay, so we are aiming at a universal platform which can be fit into various Machineries and also there's plenty of regulations in terms of modifying machinery So this is non-intrusive. We just pick up the acoustic emissions. We do not have to change machinery itself, okay? So that this is the one part. We are also looking into rolling stock next week We're installing in trains and I think in codbus and try to understand When steps which come out automatic automatically from the train when these tend to fail because this is a big pain for for this Train operator actually so rolling stock. Yes acoustic emissions in terms of minimum or non intrusiveness Also, we haven't mentioned this today. We are looking also in Acceleration data and so this is minimal intrusive since we can just Clue the sensors on top of the machinery and this is a borderline in that sense. Yeah, okay So to repeat the question if I understood correctly Is it how often do we make a retraining on the models when basically in production? so It depends if you have like a Date data flow coming back in a use case Let's say experts that can provide more labeled data Then there will be some retraining probably if it helps improve the model We are currently working on some kind of Continuous learning platform. We haven't Fully implemented anything yet, but it's definitely something we want to take up in the future Also in that case we had the possibility to actually inject hardware faults into the escalator So we are training a classification scheme on the other Hand like for the power generation utilities We won't be able to actually inject faults. Therefore. We are going to trend detection scheme We are currently working on auto encoders and spectral analysis to do that and here we have to train or way more often to Find out these trend developments. Yeah, so it actually depends if you are going supervised or unsupervised