 Thanks for coming to our talk today. We're going to talk to you today about how we used machine learning to help a customer explore the value of predictive maintenance. So this is a packaging machine. It's full of packaging motors. If any one of these packaging motors breaks, the machine stops. And that's expensive. In addition to failing because of age or hard use, these machines, the motors, can exhibit temporarily anomalous behavior. This could be caused by a failure of the material that's flowing through the machine. The way a wrinkly piece of paper could cause a paper jam in a printer or by a problem with one of the motors themselves, such as a loose electrical connection. A customer was using machines like this. And they were interested in performing more cost-effective predictive maintenance and reducing the amount of time that they spent scanning the line for anomalous behaviors. So we ran a pilot project to see if we could take data streaming off these motors on the machines and predict the remaining useful lifetime of each motor and detect anomalies as they occurred. Before I sort of dive into the solution, I'd like to spend a little bit of time talking to you about how we thought about the problem. So I'm about to tell you a story. Now, I hope that I'm going to tell it compellingly enough that you'll just remember everything. But what if I don't? Well, I'd like you to at least remember these three main points. The first is that online learning algorithms can remain accurate even when they're processing data with distributions that change over time. The second is that having a streaming framework in place allows your data scientists to focus on their algorithm development rather than the details of cloud-based deployment. And the third, if you integrate your analytics with open tool chains, then it provides your sort of prototype development with a plausible path to production deployment. So every story has its own characters. This project required people with three different skill sets. First, we had the dashboard builder. And their job was to produce visualizations that the plant operator was going to use in their day-to-day job. And then we had the data scientist. And the data scientist was responsible for turning the raw data into the insights and knowledge that the dashboard builder wanted to display. And the third person was the system architect. And their job was to collect all that raw data, stream beginning off the motors, and deliver it to the data scientist in a timely manner. For this project, Lucio was the data scientist. And I played the part of the dashboard builder and the system architect. Oh, and here on the right, you can see sort of a visual of what we had in mind at the beginning of the project. That graph is the remaining useful lifetime of the motor we're looking at. And those red dots are the anomalies that we've detected. So how did we get started? First, we gathered requirements from the dashboard builder. As the data consumer, they were the one who set some of the ground level requirements for this project. So first, sort of the obvious one, the functional requirements. Actually, detect the anomalous behavior and compute the remaining useful life of each motor. Because this was a pilot project, we couldn't really come to them and say, oh, well, you have to spend a lot of money on IT hardware before you can do anything. So we needed to minimize the burden on the IT people. And the corporate standards for cloud-based tools were changing, or could at least, during the project. So we had to be able to track those relatively effectively as well. And finally, the system was planned to scale out. So whatever we did had to be built in a way that we could eventually scale it out. So now let's take a look at turning the raw data into those alerts and estimates. The data scientist is planning to use machine learning. But there's a few problems. First, the motor doesn't actually generate failure data. So we don't have access to a training set for supervised machine learning. Second, the motors themselves are actually, their characteristics are going to change as they age. So the online models have to adjust to those changes in near real time. Anomaly detection in particular can't wait for overnight batch model updates. And finally, as I said, we're going to have to scale out. But the data scientist doesn't really want to be bothered by the details of how this works. So the solution was to use online learning within a streaming framework. Now online learning or incremental learning addresses the first two of these problems by learning incrementally and adjusting to the data's changing distribution as it changes. Having the stream processing framework in place has provided tools and interfaces that ease the transition from desktop development to production deployment. So now that we've seen what's going to happen to the data, how are we going to deliver it? That question leads the data scientist, or the system architect, to ask, how can I capture streaming data in motion into tables of data at rest? Well, one solution is to use a messaging system that captures time-consistent windows and transforms those into MATLAB timetables. The data scientist also needs models that persist between windows of computation. So the system architect has opted to use a network store for this sort of model persistence. The network data store sacrifices permanent storage, which we don't need for high performance, which we do. So the system architect is working to set up the cloud-based tools, and the data scientist starts to do work on their machine learning algorithms. But in order to do that, they need some test data. In order for the test results to have any credibility, the data needs to be physically accurate to reflect reality. So we used Simscape, which is a physical modeling tool, to combine a customer-provided CAD model of that robot arm with some of the operational characteristics of the motors, such as the amount of current that it draws at a particular motor speed. And then we ran the models to produce 48 hours' worth of data for each of 20 separate motors. And each of these motors had different physical characteristics. Some of them were generating anomalies, some of them were slowly degrading over time, and some of them were running along just fine. And our goal was to produce a visualization like this that would allow the plant operator to distinguish healthy motors from unhealthy ones. So now Lucio is working on a couple of machine learning algorithms, building them on the desktop. How are we going to get those machine learning algorithms up into the cloud? Well, Lucio's machine learning algorithms basically have three requirements. One is that they need to process time series data. So we need the data to be time series. The second one is that they need the model to be continually updated, so we can't lose track of it. And third, we need to be able to send results to the dashboard somehow. So we constructed a streaming architecture that captures this continuous data into time series windows and automatically manages the persistence of the model from one window of data to the next. And finally, publishes those results back out to the messaging service as a stream of its own data. In order to use this framework, Lucio is going to have to wrap his algorithms up as streaming functions, and the results that those produce will eventually be consumed by the dashboard and displayed to the plant operator. So to realize this architecture in the cloud, we chose Apache Kafka as the messaging service, MATLAB production server as the analytics engine, and Kibana as our data visualization tool. Why? Well, Apache Kafka is ubiquitous and certainly has the power to handle the load that we were going to throw at it. MATLAB production server allows you to take MATLAB functions and publish them as web services, and Kibana is just good at visualizing time series data. So let's take a look at the internal structure of a streaming function. A stateful streaming function in MATLAB gets called by the messaging service whenever the data arrives on the appropriate stream. Think of it as running in kind of an infinite loop, processing variable-sized windows of data on each iteration. This is the interface of a streaming function. All stateful streaming functions have four, well, two inputs and two outputs, so four arguments. The first one's the data table. That's the actual features streaming in off the machines. The second is a reference to the previous state of the model that was computed at the previous step. So those are the inputs. The two outputs, this is the new state that we want to pass to the next iteration, and then of course the results, which in this case are the number of seconds of remaining useful life for this particular motor and the time stamp at which we perform the computation. So that was the interface. Now we can take a look at the body of the function, and I've condensed it a little bit to show you the three-part structure that's common to all these streaming functions. First you can see us here retrieving the state from the previous iteration of the previous data window. And now once we retrieve the model, we update it and use the updated model and the input data to calculate the remaining useful life of this motor. And then we take those calculations and the time stamp, put them into the result structure and save the updated state of the model and return all that. And now Lucho is gonna talk to you about how we develop these analytics. In particular, he's gonna tell you how our online learning adapts to statistically drifting data. Lucho? Thank you very much, Lucho. Okay, thank you very much, Peter. I think I'm gonna have to switch to English so my system architect can understand me. So I'm going to play the role of the data scientist and I'm going to talk about two models. So the data scientist, the first thing that I asked Peter is, okay, let me show the data. So these are the data that Peter gave me. We have 27 traces on each motor. We measure the positional error, the electrical current and the motor speed. We are going to use an exponential degradation model to predict the remaining useful life for short RUL. I'm going to use an exponential degradation model because I can see that or probably you can see that there are some small downwards and upwards trends in these plots. That is what we're going to use. I also need to detect anomalies. So there are so many small peaks in these plots. I'm going to suppose that these are the anomalies but if you zoom in into these plots, actually you can see these peaks but you will see that we cannot do just trace holding to detect these peaks. So the distribution of these models are changing over the time. We're going to use a non-linear one-class SVM for this. So I'm going to explain these two models and how we adapt them to an streaming framework. Let me start with the SVM. Okay, this is what I understand of the problem. So we have a distribution for easiness. We're going to use a two-dimensional example. We see 27 variables. So we can plot them in this two-dimensional space. So we see a distribution. What is more difficult is that this distribution is moving over the time. And from time to time, we see some small red dots right here. Those are the anomalies. Unfortunately, they are few. We cannot create a distribution for them. We cannot learn a distribution for them. So there are three challenges here. First, the data is not stationary. The distribution is moving. Also, at any time, we don't have access to the whole dataset. We might see only small batches of data as small as one sample or one feature vector. And the problem is not linear. That means that we cannot just put a line or a threshold in this distribution and can identify reliably the anomalies. Okay, let's go back in time a little bit. So let's focus on the last problem, the non-linearity. So let's assume that I receive the complete dataset, right? So I have the complete dataset right here and one anomaly. So this problem was already solved 20 years ago by Chavkoff and some other notable people. If you know SVM, you can know that there might be a transformation. Let's assume that there is a transformation that is going to map this dataset into a high-dimension dataset. Actually, the most common case is an infinite dimensional space, such that we can divide or discriminate all the normal observations from the anomalies. Actually, in this case, it's a one-class SVM learner. The anomalies are going to be or we are going to assume that they are at the origin. This hyperplane, we have an equation for this hyperplane and after we learn this hyperplane, it's very easy that hyperplane has a counterpart in the future space, which is the edges of this ring on the left plot. And if we are able to learn this hyperplane, we can just score anomalies just by substituting any new feature that we observe. We substitute it into the equation of the hyperplane and that is going to give us the anomaly score. It's going to be close to one. It is a normal observation. It's going to be a low value or a negative. It is an anomaly. Of course, as I already said, this is kind of impossible because Z is in the infinite dimensional space. So there is no such transformation. Fortunately, there is a so-called kernel trick. We just set up a kernel and we can compute also the score with the kernel trick. So this is just SVM as well, you know. There is a big problem with this. If you want to learn this model, you will need to learn N alpha coefficients. So this is very difficult. N is the size of the data set and by the way, we don't observe the whole data set. We only observe some samples at the same time. So that is a problem. So how do we combine the richness of the nonlinear SVM and still solve this problem for online streaming? So now we are going to use another result, 2007. Rahimi and Reg said that, well, let's find a transformation for the input data such that the dot product approximates the kernel. The most common kernel is the Gaussian kernel. So that's not difficult. If that is the case, instead of using the kernel trick, we can use actually explicit transformation and then we just solve a linear problem with regularization. So that's great. So that is what we're doing. There are many solvers for linear classification. We are using stochastic gradient descent. So, and we have a bonus because we are using stochastic gradient descent. The way it works is stochastic gradient descent, we can actually use as small batches as a one observation and we can let the learning rate of stochastic gradient descent to be constant and we are going to be continuously learning. So how is, we apply this to our problem. So we use stochastic gradient descent. As I already said, with constant learning rate, our kernel approximation, we are using the Fourier components, random Fourier components, 1,000 in this example. And we are just setting the fraction of that layer to 1%. The fraction of the layer what it's going to do is just going to put is the location of where the hyperplane is going to be the bias term if you want to say that. So these are the results for the dataset that we have. Actually, I run it with two different learning rates. What we see is the score of the SVM and the crosses at the bottom are the real anomalies. We have some, Peter was nice enough to give me some similar data and we know where we injected the anomalies to the motor. So we are able to detect most of the anomalies. We are able to detect all the anomalies except at the very beginning where the model is still learning. This is what I'm going to hand up to Peter at the end of the day. I have an streaming function which ingest observations and the output is an SVM score. And we need to have an state. The state in this case is going to be our model which is just represented by a hundred coefficients of the hyperplane coefficients for the, sorry, a thousand hyperplane coefficients for my Fourier transformation. Okay, now let's talk about the, that was SVM. Let's talk about the other modeling technique. Okay, let's explain a little bit what remaining useful life is, okay? So there is going to be a health indicator variable that we are going to assume that it's going to have to be exponentially behaved. That is the curve for the exponential curve right here and remaining useful life is defined as the, at any given point, the distance from what we are observing until the point we know that the motor is going to fail. We need to estimate this value. So the way we are going to do it is that we are going to assume that the coefficients of this curve are random variables. If they are random variables, we can have prior estimates for these random variables. In this case, our random variables are going to be theta and beta in this equation. So we can have a prior distribution for these coefficients. So this is going to be now an estimation problem. And we solve it with bias and estimation. So on the bottom, we see the likelihood of the remaining useful life. As you can see, as the time progresses, the distribution moves to the left. The most important thing about this approach is that as time progresses, the confidence estimates are narrowing to the true value of the remaining useful life, which is important. This is what I'm going to hand over to Peter. I, again, I have an streaming function. In this case, I take the observations, but also the time stamp because I need to for the curve regression. And I spit out the remaining useful life and my confidence estimates. In this case, I only need to keep four parameters in my state, which are the two parameters of the two prior distributions for the parameters of the curve. So let me hand over to Peter, who's going to show us how he puts everything into the production environment. Thank you, Peter. So now we can start thinking about deployment. How do we integrate our analytics into an open tool chain to visualize our results? So where are we? Lucho has built a desktop application that can't scale. I have provisioned a scalable production architecture, but I don't have any analytics. And I have some visual designs for a dashboard, but no data. How are we going to put these pieces together? The data scientist needs to communicate with the dashboard builder so that they can agree on the format and the structure of the output variables. Now this kind of communication could happen very easily in MATLAB because MATLAB could graph the results and you could take a look at the code. But we can't count on the dashboard builder to be a MATLAB user. Now, obviously that's not true in my case, but in general these roles are played by different people. So the data scientist loads their code and data up into live editor and that can produce results in more common formats like PDF and HTML, which can be easily shared with the dashboard builder. And then the data scientist and the system architect need to agree on certain details of the system's configuration, like the names of the input and output streams, for example. These configuration file changes won't result in any changes to the application's code. And finally, the system architect and the dashboard builder need to agree on how the data's gonna be persistently stored. This mostly has to, in this case, mostly had to do with the organization of the Elastos search indices. Now, of these three, I think this was probably the easiest conversation because all I had to do was agree with myself. So now Lucho has created his streaming functions. How's he gonna test them? Well, he could deploy them to the cloud, of course, but that takes time and it costs actual money for those CPU cycles. So instead, he turns to the integrated desktop server that ships with MATLAB production server. So this client status display shows that it's accepting connections on localhost 9910 and routing messages to the anomaly detection functions. You can see, I hope, the words anomaly detection up there at the end of that URL. We can also see that the function has been called on four windows of data. Three of them have been processed successfully. You can see the little green check marks next to the word completed and one of them's still pending. And down here, Lucho can take a look at the system logs in case anything goes wrong. Now let's take a look at what's going on with that pending request. Oh, it looks like Lucho is debugging his anomaly detection code. You can see a breakpoint has been set on the isanomaly function. And he can take a look at the values of the local variables showing up over here. So this is MATLAB source level debugger and Lucho is able, this is a tool that Lucho is very familiar with. So he didn't have to learn any extra tool sets in order to interact with his streaming functions. And this led him focus on working on his analytics. He's also, with the help of the system architect, connected his streaming functions to live data. So he can explore the way that these functions will behave in production before actually deploying them to production. But that's his next step. So now our data scientist uses MATLAB compiler SDK to create a project file for deploying his application to the cloud. He chooses his entry point functions and the tool automatically detects the other functions that are, that those input functions call or the entry points call and some of the data files that the functions need when they run. And the result is going to be a single file that Lucho can hand off to the data scientist. So one of the precepts of good architectural design is separation of concerns. And that typically means focus on your interfaces and don't clutter them up with the details of your implementation. So that's what we've tried to do here. This is the production system which is responsible for state management and message routing. And it interacts with the data that's streaming in off the motors through a single idea, these named topics in Kafka streams. The second interaction with the analytics occurs via that file that the data scientist has just handed to the system architect. And finally, the visualization tool just needs to interact with the ElastiSearch indices. So we've contained the complexity of the system. Each of the content owners can change the interiors of those boxes relatively freely. If the dashboard builder, for example, wanted to use another visualization tool, as long as that tool supported ElastiSearch indices, no problem. And speaking of the dashboard, here it is at last populated with real data. You can see here the anomaly count for each motor, the amount of time that the entire fleet has spent in declining health, and there's a table of the motors who are nearing the end of their remaining useful life. To focus in on an individual motor, we have a second dashboard. Here you can see the remaining useful life, that's the graph, and the number and time of any anomalies that we detected. So that's it. Our first version of the application is complete. So what did this project teach us? Well, it really helped that we didn't have to collect training-set data. It's not clear that we could have, so online learning was critical to our success. Being able to move from the desktop to deployment without having to learn any additional tools really accelerated the development of our analytics. And finally, this idea of configuration files for integration with the open tool chains. This was a really powerful idea. By externalizing our dependencies, we could move the application from one cloud stack to another without actually having to change any of the applications code. So now, I guess we have a few minutes for a couple of questions, and Lucho and I will be at the MathRex booth today and tomorrow, if any of you would like to see any of this in any more detail. So thank you very much, and are there any questions?