 OK, let me make sure my laser pointer is working. So I'm going to talk about time series analysis, and I'm going to talk about time series classification. The application is to IT operations analytics, which is the analysis of IT operations data, monitoring data, like your CPU usage, your network usage, how much memory is being used. My name is Rohit, Rohit Chatterjee. I went from computer science to math. And then I liked math more than math liked me. So I got out of math and moved over to finance. Then I got tired of stealing people's pension funds. And so I came back to software. And now I'm a data scientist at Microland. Microland does IT operations management. And I'm building predictive systems to forecast problems and classify them and suggest auto remediation. IT operations, IT systems are monitored continuously. Continuously doesn't really mean continuously. It's like every five minutes, every 15 minutes, depending on the performance indicator that you're looking at, depending on the importance that indicator has. One also wants to look at log data. If any of you attended the Sumo logic talks yesterday, then you're somewhat familiar with what's done in that area. And these are what the time series look like. As you can see, they don't look like stock data. They're not very noisy. It's not a Brownian motion around a mean, regressive, blah, blah, blah. This one on the top left, whoa, this jumped. Beg your pardon, is the most complicated out of these four. And these other three are really quite representative. They're simple graphs. You can describe them using words very easily. I'm not trying to discover some highly complex behavior over here. What I do want to do is I want to automatically classify them so that I can take action. So the challenge is to classify things so they're not terribly complicated. In order to classify, I need features. So this entire talk is going to be about extracting features from these types of graphs. Once one has features, one can plug them into your various statistical methods, your machine learning methods. I'm not going to talk about any of that. Once you have built models, you want to push them into your real-time systems, your lambda architectures. I'm not going to talk about any of that. I'm only going to talk about how to pull features out of these time series, which are useful for the problem of pattern recognition. So you want patterns. You start off looking at some obvious ones. OK, take a mean variance, peak to peak. Turns out these are not terribly useful. And the reason, I'll just give you a simple example of why not, if you look at the means and variances of these two graphs and then you look at their ratio, that coefficient of variation or whatever, they're the same. But the graphs are different. So you're not going to get very far. We're just looking at the mean and the variance. The problem is that the distribution by itself is just not really aware of time. So if you want to approach this from that point of view, then you need to do something like, what is the value at a time t conditioned on the last 10 observations you saw? And you build big conditional distributions. But I didn't want to do that, because it sounds complicated. So I looked at the mode. The mode is a very neglected statistic. It never comes up anywhere, probably because you can't pull it out of distributions easily. So let me remind you the mode is the most frequently occurring value in a sample. If you look at the graph on the left, you see that the mode is around 0.08. And for continuous data, it's not a single value. It's a small interval. So my mode is actually between here and here. And you can see that most of the values of this graph are in this little interval. And so this particular graph has a single mode. The graph on the right, on the other hand, has two modes. There's one up here at 1.6 billion. And at 800 million, there's a second one. So I'm now going to take the modes. And I'm going to use these two graphs for a couple slides to explain, to illustrate a few of the features I'm going to pull out. So here's my first feature. I call it the strength. So the strength of the mode is the proportion of the occurrences of that mode in your window. So the first, the only mode on the left-hand graph occurs basically around 100% of the time. So this is a very strong mode. It's strength is 1. Whereas on the second graph, each mode has a strength of 0.5. So the strength is my first statistic that I'm interested in, which I'm going to use to build up my signature. The strength is my first feature. My second feature is what I call the run ratio. I apologize, I'm terrible at naming things. The run ratio looks at the longest unbroken string of values that lie in that mode, the longest consecutive run. So this one is around 80%. You see there are two runs. This one is smaller. So I take the bigger one. And so for this particular graph, the run ratio of this mode is 0.8. On the right-hand graph, the run ratio of this mode is 100% because of all occurrences of this mode, they all appear in a consecutive run. So 100% of occurrences of this upper mode are in a consecutive run, and the same for the lower mode. And if you find that unsatisfying because you can see that this is half of the graph, well, OK, so I have another statistic which precisely looks at the proportion of the longest consecutive road with respect to the entire window. So let's look at what this does for us. Here are two graphs with a mode at around 0. And these are strong modes, 98%. But the run ratio is able to distinguish between the two. A lot of the times your graphs are not going to have a mode, and then what do you do? This is not a terribly difficult problem because you can apply a difference operator, which is you take adjacent values and you subtract the previous one from the current one, and you form a different series. And so if you apply the difference operator to this slope over here, you get this graph over here. And as you can see, this different series is now very amenable to this modal analysis I just described. There's a mode at around 30,000. And in fact, you can see some periodicity in the non-modal parts. I'm not going to talk about periodicity today. But you will see structure when you form a differencing operation. Here's a step function. Also works well with the differencing operator. The mode is now at around 0. You've got runs. You can describe it using the run ratio and the run strength and the modal strength. And I want to point out that the differencing operator is useful because you can recover the original series from it. If you have a starting value, then you take cumulative sums of the difference series. You recover the original series. So you're not losing information. So it's not like you're throwing away a lot of stuff by stepping from one to the other. So you do all this. And then now you can come up with a signature. And the signature is my goal. I wanted a way to represent my graph in a way that can be plugged into standard statistical and machine learning methods. So I've talked about, I should have ordered this differently. Let me start here in the middle. This is for this particular signature. This is the mode, the run ratio, run ratio and the parent, the strength. I also look at the non-modes because sometimes you get information from the non-modes. The non-mode is the part of the series which is not contained in any of the modes. So I look at the run ratios over there. This GT ratio is useful. The GT stands for greater than. So if you have a mode, then you want to know whether your non-modal values lie above or below. You count which proportion of them lie above. And that is what I call the GT ratio. The GT ratio of 1 indicates that all of them lie above. So for instance, if your mode is 0, then all of your non-modal values lie above your mode. Now you have a big signature. Let me recap, remind you where the signature came from. We had a performance counter and we had a window. So a single window gives you a signature. Now you want to move your window for the same performance counter. And some of these values obviously will change. But you expect that they're going to move within a little range. And so for example, your demo GT ratio, might be between 0.75 and 0.83, I don't know. But now, since you've moved the window and you've made signatures, you have, for this particular performance counter, you have a little part of a vector space which describes this performance counter. And so if you plug this into a real-time system and you're observing your performance counter, you can calculate the whole signature, but that's kind of slow. You can calculate components at a time and see if any of them lie outside your expected range. And then you can trigger an anomaly quickly over there. You can then calculate more values and you can see if you've seen this before. If you've seen it before, could you connect it to an incident that occurred? An incident for non-ITIL people is a problem that happened in the past. And then can you connect it to a resolution that maybe happened so that you can either suggest a resolution to your engineers quickly or even better if you can trigger an automated remediation script to fix it? I left the slide blank because I'm just going to talk now. Plugging this into a real-time system involves, basically, you've got all these features for your signature. And you have to figure out for each performance counter which features are going to be your decision makers. And so for each performance counter, you need to know in which order to compute these features so that you don't waste time computing features which are not going to give you information. So this part is the work in progress. Once this is done, then I can plug this into a real-time system. And then I'm going to see how well it works. Right now, I've thrown this algorithm at a bunch of historical data. And it distinguishes between different graphs very nicely. It says this is a flat graph and then 10 of them are flat graphs. And it puts all the sloping graphs together. It puts all the sporadically spiking graphs together. So I believe that the method works well for pattern recognition. Next year, hopefully, I'll come back and I'll be able to give you results on how well this works in production. Thank you. Ready for some questions? Go ahead. All right. Couple of them, actually. So the first one, how many? So you mentioned about if you do not see the pattern in the vanilla time series, you would do the difference, right? So do you have anything about how many times you might need to difference before you keep doing difference till you see a pattern? So absolutely. That's a great question. The question is, how many times do you need to apply the difference operator before you can then look at the other features of the series? So I expect it to have to apply it more than once. But so far, I have not needed to apply it more than once. So the difference operator, if you come from some kind of ARIMA background or whatever. That's exactly what I was leading to, that if it can be tied to some of the ARIMA sort of. So I haven't needed to difference more than once yet. So that's the first one. Second one is that some of these ideas be used for shorter, I mean, non-high-frequency data, like some of the business time series data that we see. This is sales for per quarter or per day or those kind of stuff. They're not high-frequency, so. Right, so my approach to data which is not appearing at regular time intervals is that what you should do is you should look at the series of the differences between the inter-event spacing. So when events happen, you count that. And you count, OK, it was one day, five days, or whatever. And you make your time series of those differences, one, five, seven, et cetera. And then you try and apply methods to that. And I'm assuming that you are not really looking at predictions, right? Like the time series forecasting side of things. I will not be. The forecasting side of things. Predicting. So I have different methods for predictions. I right now do not have a plan to. So what one could do is you could say, OK, given this vector, which vector does it predict? And then you can look forward that way. But I will have to make it computationally feasible. One question here. Where do you stand up with your hand? It's here. Hello. Hi. Let's say you have a time series signal where you have an upward trend. Then the model distribution that you're talking about, the model go on changing as you go on upward trend on a lower trend. Then in that cases, how does it work? I mean, I believe the model change if it's an upward trend in the signal. So right now I'm only dealing with trend lines by applying that differencing operator. So if it's a straight trend, then the single order difference will work. If you have some kind of convexity or something, you might have to do it more. But once so far, I have not needed to. And one follow-up question. I mean, feature extraction, normally in time series, people use Fourier transforms or wavelet transforms to get the features. And then plug those features. And have you tried anything like that? I have not tried anything like that. I had the idea, then I thought that this was easier. Thank you. Go ahead, there's that microphone is coming right up. Jesse, just tell me when I should stop. OK, good. Quick ones. When you're talking about the time differences and taking the modes and statistical features that you derived, is that in the specific time period that you're looking at, are you flexing it? The window as you go, that's the first one. Second, is your end objective doing prediction? And the third one really is that a lot of the times you would have a signature of any sort are actually related, meaning if something happened in time t0, I would expect in general something to happen in time tn. That's right. So how are you handling that? Because in my mind, if you capture both of those, it is basically deterministic and it will lead to multicollinearity in the old statistical learning methodology. So the second and third questions are kind of related. Just final one is that, what are you trying to predict? Are you trying to predict that if there are five systems interconnected and something happened in 1 and 2, it will have an effect on system 3? Or is it that just like in one environment, if something x, y, z happened, then what would it lead to? So I'll answer your questions backwards. The answer to your final question is both, because if I just have a series of time series, then I don't necessarily need to segregate them by where they came from. Of course, in practice, I might want to. But generically, I'm happy to find out relationships across systems as well as within a system, maybe application to database, maybe how that affect on the disk. Your second and third questions had to do with prediction. And so if I can figure out how to make this fast, because as you can see, it's probably not so quick to compute the signature, then what I would like to say is I would like to say, on the last two hours, I've seen pattern A100. And then I have an 85% chance of seeing pattern B231 over the next two hours. And B231 might be a good pattern, in which case I'll say, fine, I'm happy. Or it might be a pattern which is going to give us an advanced notification of a problem, in which case I would need to do something. For your first question, the length of your window, that depends on the performance counter itself. Certain counters, like whatever, CPU usage, you want to pay attention to faster. Certain counters you might want to look at longer periods. In fact, I shouldn't even say that CPU is necessarily a short-term window, because you have long-term patterns on CPU as well. So if you do some kind of smoothing, you might even want to look at a 10 or 15 hour window. I think I'm done for time. We can talk later if anyone else has any more. Thank you.