 We are changing places. I will introduce the next speaker which is Boris Kuller. Boris is an assistant professor at CSAI. He has a PhD in computer science from the University of Antwerp. He was also a postdoc there for a while. He is a rather popular lecturer in our master data science and society. In particular the course on big data analytics is one that is highly appreciated by students. His research focuses on sequential pattern mining, particular where he introduces novel measures to evaluate patterns. He also worked on broadly relevant topics such as anomaly detection in time series. Discovering anomalous patterns in complex heterogeneous time series data is interesting as a high range of applications but this is not an easy task. To give an example of the difficulty an outlier in such a data set can represent an important source of information but can also be rather spurious or irrelevant for the task at hand. Today Boris will talk about pattern-based time series classification. Boris, for us yours. So good afternoon everyone, welcome to my talk. As Peter said I will be talking about a pattern-based time series classification that's quite a long title full of terminology that I will now try to unpack. So I will start off with a with a general introduction about time series data, what classification is and a little bit about pattern mining before describing how it all comes together. This presentation is largely based on a recent publication of mine together with a couple of colleagues from the University of Antwerp. So let's get going with some basics. First of all what is time series data? Time series data is defined in a fairly simple way as an ordered sequence of values, typically real numbers which come at regular temporal intervals. So that's fairly broad definition of what a time series is. As Peter said time series are encountered in a variety of settings all over the with a variety of applications in the real world. So for example sensor readings, whether that be weather data, temperature, pressure, whatever sensors could be measuring GPS data, trajectories, motion sensors, but also other types of data such as medical data, stock exchange data, etc. So I will be talking today mostly in fairly generic terms describing methodology and algorithms rather than talking about a particular application, but what I will be presenting has wide variety of application and possibilities to be to be applied in a variety of settings. There are different types of time series. We call them the three different use cases when we talk about time series. The first is a simple case, the univariate time series where your data consists only of a single sequence of values. So for example a single stock price where your data set could then contain multiple stock prices which are each considered a univariate time series. A more complicated problem setting is where your data consists of multivariate time series where you have several time series linked together and forming some sort of entity. So for example if you have a machine which is connected to several sensors and those sensors measure all kinds of information about that machine, like I said it could be the air temperature, it could be the temperature of the machine itself, it could be some kind of throughput or energy generation or energy consumption, so all kinds of things that are measured about the same machine at the same time at the same regular intervals that way we obtain a multivariate time series. And finally the most complex type of data that we could have is the so-called mixed type time series where in fact you don't only have a time series as I've defined it on the previous slide. So you have uni or multivariate time series what you are coupling them to some event log data and again event log data is something very generically defined as a sequence of events and this sequence of events doesn't necessarily have to contain let's say real valued measurements, doesn't even have to contain numerical data and doesn't even have to come at regular time intervals so that adds to the complexity of the problem. An example of this is again you have a machine, you have some sensor data being collected about that machine but you also have an event log, a log of for example what a machine operator does with that machine which can happen at all kinds of irregular time intervals. For example why is this relevant? Well this information of course can inform us about the time series data and inform us about how normal or abnormal the time series data is and can add to the patterns that we can find in the time series. To give a trivial example if you are measuring the output of a particular machine and all of a sudden the output drops to zero you might think there's something wrong but if you couple this with an event log data which tells you that the machine operator at that moment switched off the machine then of course this is perfectly normal behavior and by combining these two sources of information we are perhaps able to gain insights that we otherwise wouldn't be able to do. The next bit of terminology that I have in the title of the talk is classification. This is again a fairly simple concept that is perhaps known to most of the audience so I will just very very quickly sum up what we mean by classification. Essentially classification is the process of learning from a training set and applying this knowledge what you've learned on the training set to classifying new instances of data. I give here a nice example let's say not directly a time series example an example of image classification where we have images of four different classes of animals in this case. We have beetles, we have flies, we have birds and we have chickens. So from the small training set that is there we could try to learn how to classify new instances and when a new instance comes in we want to assign a class label to that new instance based on what we have learned from the training set. Now of course how does this tie in with the time series? Well we can in the same way classify time series. Of course the problem becomes a little bit more complex because time series come in all sorts of shapes and forms and even time series of the same class as we can see on this example can be quite varied in comparison to each other. So what is the goal of time series classification? What is the goal of learning? What do we want to learn from a training set in time series classification? Well we want to learn two things what makes time series of a particular class similar to each other that will allow us to assign a particular class label to another new time series that is potentially similar to known time series of a particular class but also what makes time series of a particular class different from those of other classes. So again to give a simple example if you find a particular feature that is present in all time series of a particular class that feature might be useful for classification but it might not be. If that feature is also present in time series of other classes then that feature is not very useful for classification. So we try to find frequent patterns as we will see in different classes of time series. We try to see if patterns are good for discriminating between classes and which patterns are not and those ones we don't use for the classification test. So how do we do time series classification? Well there are two approaches and two types of algorithm in general. You can compare time series directly to each other so you can feed the time series into an algorithm and the algorithm then produces a classifier or we can try to transform the time series into feature vectors that somehow describe the time series and that way we can then compare those feature vectors to each other and in that way learn a classifier. Here is a small example again where we see I think two time series from eight different classes each one each class of a different color. They are sequentially ordered so the first two are from the same class the third and fourth are from the second class etc. We can see the original time series on the left and we can also see what happens if we simply apply the nearest neighbor classifier on the raw time series data and we can see that it goes wrong even though visually these time series are quite clearly similar to each other the ones that are in the same class. A nearest neighbor classifier would struggle with this if you just use Euclidean distance point by point for example so this is in itself not a good idea so what we do that's the middle column there and on the right column you can see what another method does which does a similar thing to what we do so what we do is we transform the time series into feature vectors and then we compare the feature vectors to each other and we can you can see here on these plots that the feature vectors actually of the two time series that belong to the same class are always quite similar to each other and are in fact in this very very trivial example always the first nearest neighbor of each other. So how do we do this well we do this by mining patterns in time series and what are patterns well we look for reoccurring sequential patterns in time series data so we take the time series data we find the reoccurring patterns in them and then we want to use those patterns to answer those two questions that I've asked a few slides ago so the intuition behind our method is that if a new instance a new time series contains the same patterns and as known time series of particular class then the new instance has high probability to belong to that class if the instance doesn't contain the same patterns and as other known time series of particular class then it's less likely to belong to that class. So we aim to discover discriminative patterns for each class and then we classify new instances by essentially checking whether the the new time series has known patterns of all the classes and whether it lacks the known patterns of all the classes and by doing that we can compute to which class the new instance most likely belongs to. How do we build the feature vectors well essentially each pattern that we mine that is discriminative that we decide to keep we use it as a feature so the embedding of a time series of time series data into feature space essentially produces a matrix in this matrix each time series is a row and each pattern is a column and then we try to see which patterns are found in which time series we assign if we if we look for exact pattern matching we assign one if the pattern is present in time series and the zero otherwise we also experimented a little bit with an alternative method which used approximate pattern matching where we define the similarity between each pattern and its nearest occurrence in the time series in all these examples and during the talk I'm talking about I'm giving examples of univariate time series but everything that I've theoretically mentioned can also be used for multivariate and mixed type time series. A word about interpretability of our method which is also important in classification so that users of algorithm also get some feedback on why a particular classification took place I'm giving here back the example of beetles and flies an interesting example because it also shows that what is at first glance not a time series can be turned into a time series and then of course you can utilize time series classification methods that you would otherwise might not be able to do so what we do here is we take the images we take their outlines and we convert them into time series by essentially measuring the distance from each point on the outline to the center point of the image and that way we obtain time series by essentially rotating along the image outline. An interesting note you can also see here that when we talk about a time series the the actual x-axis doesn't have to represent time so we did this on this example data set we classified the time series and we can convert them back into images doing the same way and what we can do when I say producing feedback we can see here on the visualization we have two classes of objects beetles and flies in this case and we have actually visualized patterns that we have associated with beetles in blue and patterns that we have associated with flies in red and we can see that in this small data set of 20 animals 20 insects we actually got three of them wrong and we can see exactly why we got them wrong so in the first row the second image we can see that we classified it as a fly when it's in fact a beetle and highlighted in red on that image are the patterns that we found in that image that are normally associated with flies so by analyzing the results of the classification in this way you can get insights that you otherwise might not be able to obtain very quick word on the evaluation because I'm running out of time so we have compared our classifier with tens of other classifiers on hundreds of data sets and of course all the evaluation that we can do with we can talk about average results what was for us important was the interpretability that was our main motivator for this work we have a classifier that's fully explainable we also have a classifier that's applicable to all three use cases that I mentioned so univariate multivariate and mixed type data which for which there are not a lot of other classifiers available we have a relatively fast method in fact faster than most existing algorithms because it's relatively simple and based on pattern mining and finally we have a classifier that produces very good accuracy results so like I said we're talking about averages here to our surprise our method didn't do much worse than the best deep learning methods it did do slightly worse I say narrowly outperformed but we did test on hundreds of data set and in fact in about a third of them our method outperformed the deep learning methods which which was quite we were quite positively surprised by that and our method produced higher accuracy than any other pattern based method so to summarize I give a very small summary of how the method works on univariate multivariate and mixed type time series and I think my time is up with that so thanks for your attention I'm happy to answer any questions thank you there is one question in the chat actually two already so that's that's big one so can you elaborate a little on the feature factors factors how is a certain pattern chosen to be put in the matrix okay so first of all we have what I said on this slide we are looking for discriminative patterns for each class so we mine all frequent patterns and then we see which ones are discriminative like I said at the previous example patterns that are present in all classes are not particularly useful for classification but patterns that are present in a single class and not present in other classes are so that's what we mean by discriminative patterns that's how patterns are selected and then we have a essentially an embedding then we have we know which patterns represent the columns in that matrix and then we look at a row by row basis so we are looking at the time series that we have and we check whether those patterns are present in a time series or not I don't know if it was the scope of the question how do I fill in the matrix in other words where where the values come from so like I said we have we have tested actually a number of variants of our algorithm but I didn't have time to go into that one variant worked with exact pattern matching so purely binary the value would be one if the pattern is present in the time series the value would be zero if the press if the pattern is not present we've also experimented with with approximate pattern matching where we said for each pattern we try to find the best match in the time series we then measure the similarity between the pattern and the best match and then we assign that value so then the the vectors are no longer binary but the vectors actually contain the values between zero and one so real values between zero and one the higher the value the better the match