 Our second speaker for today, Stefan Blumhoge. Stefan is a PhD candidate at the Euronymous Academy of Data Science in Telfenbosch, where he started working after obtaining his master's degree from the World University in the domain of Data Science. His master's thesis, in fact, has been making, since his master's thesis, in fact, he has been making use of complex networks to model multidimensional data, such as relational or temporal data collected from sensor networks. These models can be useful, for example, when monitoring strain and migration of a bridge, but also when participants of a large event are being tracked to detect when and where information helps are being formed and how information is exchanged. This can also be an informal event, as Stefan put it best to me, because it's his very first project in this area of concern, the student part. Stefan, we're very much looking forward to your talk today. Okay, thank you very much. And also, thank you in need for organizing this time, so I can tell you're very interested to share my work with you all. So, there we go. So indeed, my name is Stefan Blumhoge, PhD candidate at the Euronymous Academy of Data Science in Temples, and we're going to talk today about time series analysis with help of broad known networks. So, a small overview of the talk. So, first I'll explain what Euclidean data and non-Euclidean data mean, and also the different introduction about data searching for the problem. Then I'll give you a small introduction to workpakes and seismic data, because that is the domain that I'm working on. And then tell you about the latest work that forms, which is called time series analysis with broad known networks. So, as you're all familiar with deep learning and all the efforts in deep learning have shown great success in mining patterns and several data data tasks. And most of these successes are visible in data. It contains a sort of grid like structure, perfect grid structure, and also a system of coordinates to work on. But what does that mean? Actually, as we're all familiar with sound or images, this follows a sort of structure. So, on the left, you see a time series, which is, for example, some sound waves. And this is ordered in a chronological manner, and every moment and time is one step ahead and one step behind the old one. And if you look at an image, an image is sort of to be created for all the pixels are nicely organized. So, this is the notion of like euclideanness. But what if we have data that does not really follow this pattern? Well, no, you can be, for example, a road network or a multitude of data. So, a road network consists of all these specific roles that are connected to each other. And you can imagine that the roads in Amsterdam, if you're in Tilburg, are much more densely connected than roads in other parts of the Netherlands. And if you would look at a molecule, which is a figure on the right, these are connected by all these different atoms, they can have different connections between them. And that completely changes the molecule. So, this, of course, doesn't really follow some sort of structure. So, how do you learn on this kind of data? So, what research has done is represent this kind of deformation with a graph. And this is a sort of data structure. And it's also visualized here in this figure. And it consists of nodes, which are the dots that you see, and also the edges, and that are the great lines between the dots. So, on the left, you see that the nodes are numbered. So, those five, those six are connected. And this, again, this can be a road network, which I've shown, but it could also be a specific type of molecule. But there's this issue, because we're going to talk about learning on this kind of data. The issue is that this data is so different from these images of sound data, that we need some different techniques to learn on it. And here you see a small example of how a typical convolutional network does you to toss one image. I'll explain the briefcase for those of you that are not familiar. You see a toy robot image on the left in a small, great box. And what the images try to explain is that it will slide across this image and generate features from it to eventually funnel the data into an other representation so we can learn from it. And the output, for example, will be maybe this is a toy or this is a robot. But there are some challenges in doing this technique on a graph, because, yeah, if we look at these molecules again, they all have different kind of elements, and they have difficult connections. And also, these connections can have different strengths. So there's this lack of consistency in the data. And if you would look at a sense of images, you would find that most of them are all the same dimensionality in the sense of that all the pictures have a 500 by 500 pixel bit. So here this would not hold. And another issue is that these data sets of graphs often have no edge level features or kind of information that you might want to include as well in your analysis. So imagine that this is again this figure, but now it's a network of cities. City one could be then boss and city two could be still work. If you want to calculate the estimated time of a rival to drive here, you would know the distance between the cities. It would also be interesting to know what that there was a trap and it's a five kilometers of that you want to incorporate that information. But if this was a social network, you can see it on the right side of the image. If you want to learn from the social network that you said, you want to know how many friends he has, how long he's on this network. So for example, his account age is eight years old. And his age is 22. So how do you include this information as well? So this general question that has been stuck with researchers for the last couple of years is how can we translate this contribution procedure that's this sliding a window over this image, for example, over two graphs, because that does not work. Well, it's actually pretty intuitive and elegant. And it helps thinking of the graph in a different manner. And we therefore represent the graph in a JC matrix, which represents the nodes in the network. And here we see the JC matrix that is also visible of this figure on the right. So it's consists of four nodes. And how you read this, these matrices is there's a one that there's a connection between the two nodes and a zero if not. So if I highlight the four nodes, which is the last row, and there you can see there's no connection with them one, okay, that's clear. There are two, two ones after each other, these represent the connections with that two or three. And there's no connection itself. So there's a zero as well. Okay, so now we have this representation of the graph. What can we do then? Well, it turns out that if you multiply that a JC matrix with a feature matrix, and these features are just pieces of information by note. So that could be the number of friends you have a social network, or your account page. If you do a multiplication with the JC matrix, speech matrix, you actually get a new representation that is just a sum of the neighboring features. So okay, what can we do with that information? Well, first explain how this again works. So I've added no four. And it is connected with number two and three. If you have these features one, one, two, and these got some together. So it's new representation of this note will be one is one is two, and two is two is four. And again, if you do this for the node two in the graph, there's a bit more connections, there's a connection with all other nodes in the graph. Now there's some connected to 26. So does convolutional graph can be thought of as a sort of message possible, you give each other information about the cell, and then you can learn from that. So then okay, this is like the building block of this model. Then there's a really interesting paper of Thomas Keith on our training that made this all a lot better with their graph convolutional networks. And it's a really popular paper this field. And they've included self loops into the JC matrix, because it's also important to include your own features. And also some normalization tricks are applied. Because in the node, there's a lot of features, there's some will be really high force, you don't want that, at least do some instabilities in training. This is not the only model that is apparent, there are many others. So diffusion conclusion, edge conclusion, gravitation networks, you can name it. They're all included in the spectral package, all based on the data of the tensor code, if you're interested. So what kind of task can you do with the graph neural network models? You can do no classification. So let's see that we have this network of movie enthusiasts, the two just people. And you know that the green ones, they like Trader Movies and the orange colored nodes, they like room cons. Given that two new people join this network and start making friends, who we may predict will be their favorite job. Or you can do something with the entire graph itself. You could classify the entire label of the graph, given again, this molecule example, maybe we know from our training data that some of them are toxic and some of them are not, could we maybe predict from the structure in this network and the atoms that are in there, it is also toxic molecule. And lastly, another type of analysis that can be performed is that you can predict a link in the network, different attributes of other nodes in the graph, and kind of example, example connections, there are all they are. So perhaps these two people, the four and three, should be brought together. And this is exactly what Twitter or LinkedIn do to issue recommendations for friends. So this was small introduction into graph magnetic research. And I'll talk a bit more about what kind of research I'm doing with these techniques. And this is applied on Earthquake. And earthquakes are actually a really great candidate for doing a graph on that research, because it is an enormous amount of data already gathered at the seismological stations for over like 50 or 60 years. And the sensors are geographically grounded. So we know the latitude and longitude information. The data is reported by sensors and it's really crucial for seismologists. So they use this data all the time to say something about earthquakes that occur. And typical use cases are determining the epicenter of earthquake and also estimating magnets and so on. And it can be used for early warning. And that is exactly what I did. So to give him perfect early warning, you have to look at his earthquake data. And earthquakes actually give a lot of hints about themselves. So they send out different ways when it compares. So in this picture on the left, you see that a fault is occurring. There's an earthquake happening, and it shoots these waves into the ground. There are some lighter P waves, but also more slower later on as race and surface waves, which are way more damaged. So if we can take hints from this P wave, we could perhaps predict how intense these F wave S waves will get. So that's exactly what they do. But placing all these sensors in areas where there's a high chance of a perfect curve. Then we can notify people. So how does this look like in a more schematic view? So on the left, we see red star. And this is this is very quick epicenter, and it starts to show us waves through the nearby stations. These waves are visualized in the middle. So if you see some of them, she is very soon, but most of them see the initial runway way later. And the task of early warning is that you look at the initial seconds of this data. So we have 25 seconds of data, but we only look at the first 10 seconds to say something about what happens in the later one periods of earthquake. And what we're trying to do is do a regression task, because we want to say something about how intense this earthquake has measured all the station in our network. So we perform a regression task to predict five numbers which characterize the earthquake. And why is this important? Well, a few things. So there's of course, a perfect warning that we can give to people to find shelter, but also first auto mobilization. So we can already open fire station doors for rapid deployment of fire trucks. We can notify hospitals so in the healthcare we can stop medical procedures for people that are in the surgery. And in best transit systems, we can prevent change from colliding or derailing by the other shaking of the earthquake and also in clear bridges of vehicles. So that brings us to my paper, which obviously is submitted. Could we make use of this spatial information of the sensors to perform a regression analysis and good predictions? This is exactly what we did. So we made a new paper called neural networks for multivariate times regression with application to site data. And what is really cool is that I worked together with two actual psychologists from Italy. So it's a really close cross domain collaboration also international collaboration, which always find really interesting. So by modeling the stations in the site network as nodes and broad, we can apply program networks on the signal data. And you can see them right here. We use two data sets to that are completely different from each other. So on the left side is the network one where it's a really densely connected region and also the curve page shown as orange dots are really density connected. On the right you see another network also in Italy, which is the cause of a larger land area and also the earthquakes are more scattered around. So it's a bit of a tougher use case. We made a model, which is really unique actually that first uses standard convolutional neural networks to generate features from this earthquake data. And then the main purpose of our paper is that we show that you can actually use the output of the CNN layer as input in a broad neural network layer by reshaping the size of this convolutional network. And eventually you can use it for certain tasks. We did a regression task, but you can also apply it for classification for example. So concerning the results, we compared our model against the CNN baseline that was created by the seismologists for experts from Italy and also other traditional machine learning algorithms. So Gainers, Nader, HGBoost, Rending Forest. And we saw great results. So we had an average mean-sweat error reduction of 17.9% on both data sets compared to the best performing baseline for the CNN. And this is actually really remarkable because keep in mind that the only information that we added was the notion of this graph. So this is only one times calculated before you do the analysis at all. So with a little bit of effort to get a really great reduction in error. And even more interesting, of course, we're trying to get early warnings, the earlier warning better. And we tried reducing this input window length to see how far we can go. And it turned out that we can hold the input window length of our golf magnetic model while still achieving similar performance in the baseline CNN. And I hope you can see it on the figure on the right. You can see that we have such a head start that even with five seconds of data, 50% of the data, we can similarly score compared to the CNN. So this is really crucial since we're trying to give off a reward. So in conclusion, we show that the problem networks excel at learning from features that have a spatial grounding. And from our paper, we have seen that problem that is learned from these sequential time series features generated by CNN. So that is really crucial. For future work, we want to investigate the scalability of our model to sensor networks that are of even larger size. Now we investigated 29 stations in both these efforts. We are interested to see what happens when we increase this number to 200 or even 500 nodes. And also we want to try transferring techniques. So perhaps we can train on one network and then transfer some of this knowledge on another network so we don't have to train all the time. There will be a really good value for the social and political community that told us. So thank you for watching and it's our impressions every now. Thank you, Stefan, for a very interesting talk and for a nice introduction into the domain that we appreciate it. I have a question about the potential implementation of your research and the results. So we know that these techniques can be quite passionately in the state of training and that they can be quite demanding also when they're implemented. Now this concerns an early warning system. How does that work in practice? Well, we mentioned what our work in practice is that you continuously monitor the data that's generated by the central networks. And you could supply these data in batches to the network to make a prediction. And luckily we also saw that the model size of the government network is also smaller than the original convolutional network. So you can actually run it on very simple computers to make these predictions. So that's how normally this works. You give it in batches. And the second question that I would like to ask is about transfer learning. So can you imagine that the result that you obtain for this particular data set could be actually used, the trained model could be used also in other domains that might have similar properties? Yes, I think so as well. But you could for example do is this is on weather data. So weather data also has kind of different characteristics as some are shorter and better. So in a day, the temperature changes really a lot. But the portion of the season that is is longer, better. So I hope that the government network can actually cross that as well. But all the kind of domains that actually you are using these techniques is predicting traffic networks predicting the estimated time of arrival. So I know that Google implements this in Google Maps. So given the speed data of all these people that are driving around on roads, would you make predictions better if you could also use techniques here? So a lot of companies and also research on finding out new domains where these techniques can be used. So let's hope we see some other creative uses as well.