 Thank you very much. So I'm Jandreja from Aachen, Germany. So I'll be speaking about motive counting and preferential attachment graphs. If you don't know what motive counting is, our preferential attachment graphs, I'll just go into that shortly. So this is joint work with Peter Rosmini to my PhD supervisor. And yeah, let's first talk about what subgraph counting or motive counting is. So the task is, you're giving a small pattern graph, we call it H, and you're having a big graph called G, also called the target graph. And you want to count how often H occurs as a subgraph of G. So this example, we assume H has got K edges, and G is of size n. K is usually very small, n is usually very big. And you want to embed the graph. So this is an important problem in network analysis or biology. So the setting in biology might be you have a network of proteins, and there's an edge between the proteins if they interact somehow. And then you might be looking for a small circle. And if you see that the small circle occurs very often, then you can deduce something about the interaction of the proteins. You might know that those proteins have like a circular dependency of how they interact with each other or in network analysis. If you have a social network and certain patterns occur very often, then you know something about the network. Network scientists, they use this as a tool. There's many tools to do this. On Wikipedia, I could find a list of eight, nine, ten tools. And they work very well for small k. So some tools claim that they work for k or graphs with up to eight nodes. Some say they work for up to 12 nodes. So for moderately small vertices, this is a solved problem, but as soon as the graph gets bigger, the problem gets more and more challenging. So from a theoretical side, we know that we cannot do it in f of k times the little o of k. So the trivial algorithm runs an n to the k, but we cannot do better than the trivial algorithm theoretically. So the best current algorithm that we know, the exponent has been shaved down to n to the 0.74 times k. Yeah, so this is what's best right now, yes? Isn't it just the number of edges choose k? Like this is some special structure for the sub graph. Yeah, we assume h is any sub graph. We don't know anything about h. We just know that it's got k edges. Yes, so if you give any fixed graph, it wouldn't just be the number of edges of the graph choose k. Yeah, the trivial algorithm would run an n to the k or n to the number of vertices, but we want to do better than that, okay? So you want to compute the number of sub graph? Yeah, I want to count how often h can be embedded into the graph. What I could do is I could enumerate all possible embeddings, check if it's actually an embedding or not, and then enumerate it. Can I just tell you that h is a fixed graph? H is a fixed graph, yes. It's not an understatement. If you're into parameterized complexity, then you can consider the size of h to the parameter, but it makes things simpler if you just assume h to be fixed. Yes? Subgraph, do you mean that you use subgraph or you just mean... Oh, we're talking about non-induced subgraphs, but the same thing also goes for induced subgraphs. There isn't much of a difference, but for the sake of this talk, we talk about normal subgraphs, all right? So the problem is very hard, but we want to solve it nevertheless, and we want to solve it on real networks. So us theoretical computer scientists, we're good about efficient algorithms, and network scientists, they come up with different models for which model the real world, and what if we can combine those two things? We might end up with efficient algorithms for the real world. That's our long-term goal. So the next thing I would present to you is what network scientists are doing about how they're analyzing real networks. So there's some key properties which they discovered. The first thing would be the degree distribution. So there's a lot of vertices which have a very small degree, but there's a few vertices which have a very high degree. And this, you know it everywhere. So there's a lot of people who have very little money, but a few millionaires, billionaires, they own as much as a bottom half. So this is present everywhere. Another thing would be the small world property that everybody's close to everybody. So you might know about the six hops of separation that you can reach from every person to another person within six hops of friendship. A third property would be the clustering property. If you and I, we've got a common friend, then it's more likely that we're also friend because this common connection might introduce us, and so this creates clustering in the network. Preflation attachment model might be the model that has spawned the whole network science and it satisfies the degree distribution, the small world property, but it does not satisfy the clustering behavior even though there's extensions of the model who have clustering. So let me explain to you this preferential attachment model. Not only does it have the properties of the real world model, but it also explains somehow how such a model could evolve. So the idea is you iteratively add vertices. Every vertex introduces M new edges. You can also assume M to be a constant and the edge, a new edge goes from the new vertices to some vertex which is already in the network and the probability of attaching is proportional to the degree the vertex already has. So vertex which has a high degree also attracts many more neighbors in the future. And we write G and M as the graph with N vertices and M edges per vertex. So G12 would look like this. So it's got two self loops. It's a multi-graph that's allowed. And then we might add one vertex, we draw a stop and then we count the degrees. So this has degree four, this is degree one. So the probability that this new edge is attached to this vertex is four divided by five and the probability of attaching here is one divided by five. Then we do an experiment of chance. We wrote a dice, flip a coin and okay, this case it attaches to itself. And we add the second edge. Now the probability of self attachment is here is degrees three, here's degree four. So three divided by seven, four divided by seven. Okay, and in this case it might attach to this graph. So this yields as an example for G22. Yeah, we can continue this. And one thing you might notice that the vertices which have been introduced very early, they have a high degree and the vertices which are introduced very late on, they have a very low degree. So the last vertices have a constant degree while the first vertex can have a degree of size of roughly a square root of n in this model. Okay, so this is a model that we're working with and let's come straight to our result. So we show that we can compute how often h, a small pattern graph occurs as a sub graph in G and m, a graph with n vertices and m edges per vertex in FPT time. And the running time specifically is f of h, some function of h. So if h is a constant, then this function also evaluates to constant. So in own notation it will be dropped. Then m to O of h to the six times log n to the power of O of h to the 12. So again, if h and m are both constants, we get a running time of a polynomial in log n times n. So it's quasi linear running time model. It's still much smaller than n squared or something like that. While the trivial algorithm would run in n to the size of h. So it's an improvement in running time. And from now on we mostly assume that h and m is fixed because it makes our analysis easier. And then we show that the running time is more or less log n times n. While we prove log n, poly log n times n squared, but with a more careful analysis you can get also this running time. Okay, and how do we do it? We present fairly straightforward simple algorithm, just enumeration and a little bit of Coussel theorem. And then the remaining part is to prove that this simple algorithm actually works well. For that, we, yes. We don't know the order of the vertices in which they were introduced into the graph. But yeah, this can be quite a difficult problem to deduce this ordering. So in some other paper we had quite a problem which of course we didn't know the order. Okay, so after we present simple algorithm we have to analyze it and we do so by bounding the number of certain sub-graphs, the expected number of them. And in order to do that, we bound the degree of individual vertices, how the degree of, for example, the first vertex evolves over time. Yeah, so in the paper, this is the hardest part. We won't go into detail on this part. We will mostly focus on the algorithm and bounding the number of sub-graphs. Yes? So the algorithm is deterministic, we expect it is over the generation. Yes, the algorithm, it's a deterministic algorithm. But the running time is a random variable depending what graph you get as an input. The running time might differ, but we simply take the expected value. So we multiply the running time with the probability on a certain graph with the probability that the graph occurs. And this gives us the expected running time. The little m was the parameter for the m. Yes, m was the number of edges that each new vertex shoots out to the other vertices. All right. So the first discovery that we made is that we can rely on counting connected sub-graphs only. So let's, here's a small example where we can illustrate it. So let's assume we count the number of sub-graphs is more specific to a square and a triangle. So here we're counting a disconnected sub-graph and we want to reduce it to counting connected sub-graphs only. So what we can do is we can simply count the number of squares and the number of triangles and multiply them. So this should give us a fairly good approximation of the actual number, but we've counted a little bit too much. So what if the square and the triangle, they overlap a little bit? So this is something that we have to subtract afterwards. And this is what we do here. I actually forgot a case here, I think. But anyways, yeah, so we count all the ways in which the graphs can overlap, but because they overlap it's again a connected graph. So this, we assume that we can count these and yeah, we count them, subtract them and this gives us the number of this pattern. So we've made the problem a little bit simpler. We're talking about connected graphs and we're now on only. And yeah, so let's consider this graph as an example and we want to count the number of squares in this graph. So here would be two squares. This is an induced square, this is non-induced square and we want to find an algorithm which returns the number of squares. The algorithm, you can simply generalize it to any other pattern graph. And what we observe is that every square, it's got a radius which is fairly small. So what we do is we can take for example, this vertex and pick every vertex which is distance at most two from this vertex and induce the graph on it. So we compute like the two neighborhood of every vertex and we write them all down. So that will look like this. So this gives us n graphs. Always the center of the graph is marked red here and then all the vertices in the graph which is distance at most two to this red marked center. So here's two example, two more and the rest is submitted for now. So this fairly easy so far and then we observe now every graph, we only count those graphs which contain those squares which contain the red dot. But now we count four times as many squares as we usually actually want to do because every square is counted four times. One times with this vertex as a center, one times with this one as a center and this one as this one. But if we count them all and sum them up and then we divide by four, then we get actually the answer that we're looking for. So from now on we can assume that we have one vertex and a small ball around the vertices and we simply want to count the number of squares in this small radius subgraph. This is what we're working on from now on. And the way we do it, we create a spanning tree of the graph. So we start at the vertex and then we might create a breadth first spanning tree preferably. So the thick lines are in the spanning tree and the dotted lines, they're not in the spanning tree. And now the key observation is every square can be obtained by adding at most four extra edges into the graph. And if we only add a small number of edges to the graph, to the spanning tree, the graph still reminds fairly simple. So a tree with four extra edges has tree width, I think five. So by Cossel theorem, graph with three widths five, we can solve every problem expressible in MSO, especially counting squares. So yeah, if a graph is more or less tree-like, you can solve a whole bunch of problems including counting the number of squares. But we don't know which of those four edges we need to add, so we can try out all possibilities. So the first thing we might add, this one we mark it as red and we use linear time subroutine to count how many squares in this graph. Of course, there's still no squares in the graph. To avoid counting duplicates, we can only solve squares which contain exactly the red marked edges. Still no square, no square, no square. But now we find a square and we write down one, for example. And we continue in the other graph, for example, no square. And here we find a square, very nice. We continue, no square, no square. Okay, so what we need to do, we also have to try all subsets of two edges. Okay, and now we finally find this square as well. And yeah, we continue for all subsets of up to four edges and this way we find all the squares in the graph. But this, of course, takes a lot of time and it might take a lot of time in a general graph and the remaining part of this talk is to argue that in this special graph model of preferential attachment graphs, the running time is actually small. So we need to, for a small neighborhood, we need to bound the number of edges over the number of subsets of size, the number of subsets of size four of those dotted edges. So that's what we need to do. Then we also bounded the running time. We will show that actually there's only a logarithmic number of subsets for each wall with radius two. Okay, and how do we do that? There's the key observation. What if we take a certain subset, for example, those two red edges and we trace both endpoints, we walk back to the center and this gives us a small graph. We call it a leaf-free graph because in this graph, every vertex has degree at the most, at least two. There's no degree one vertices and it's fairly small because every path to the center contains at most two edges and yeah, so we have a small leaf-free subgraph and there's isomorphism between the subgraphs and the subroutine calls so it suffices to bound the number of small leaf-free subgraphs. So that's a nice observation because now we've got a property of the graph that we can bound and yeah, so how do we do, how do we bound those subgraphs? For the sake of simplicity, let's assume that the leaf-free subgraph that we're bounding now is simply a square. You can generalize it to other graphs but it makes it simpler if we just assume that for now. So let's bounce expected number of squares in G and M. Because of linearity of expectation, this gives also bounds expected running time later on. So let's consider a fixed embedding. So we've got a timeline here. This is the first vertex. This is the last vertex of the random process and we can assume that the first vertex A, B, C and D span the square. So this is one case. Of course, there's many other cases, roughly end to the four many ways in which the square can be embedded and with some overall possible embeddings and then we can't compute the probability that this specific embedding actually computes a square. Okay, so let's consider this embedding. One thing you can do is you want, what you could do is you compute the probability that there's an edge between the eighth vertex and the Bth vertex of the random process and you can show that this probability is one divided by square root of A, B and you can do it for all the other edges as well. So this edge also has a probability of one divided by square root of BC and so forth and so forth. And then you may assume that you simply multiply those probabilities. So if you multiply them all, each factor occurs under a square root but it also occurs twice. So what you get, the product of all this would be one divided by A, B, C, D. But of course, or maybe not of course, but this is actually not what you could do. We made the fallacy of assuming that all the edges occur independently and that we can simply multiply the probabilities to get the joint event but in the preferential attachment model, the presence of one edge might influence the presence of another edge because it increases the degree of the first vertex. So there's very subtle dependencies here and actually the hardest part was bounding for us, bounding the probability that multiple edges between fixed vertices occur in the graph. I won't go into detail how we did that but we managed to prove that we need to add a logarithmic factor in front of it and then we get an upper bound on the probability that those edges occur jointly in the graph. So now we've got a nice bound, if we assume the vertices to be fixed and what we do next is we sum over all possible embeddings. So we sum over all positions where the first vertex can be, all positions where the second vertex can be, or the third, fourth vertex and then we plug in the upper bound on the probability which would be log n or polylog n divided by A, B, C, D. Now a little bit of 5th gruel arithmetic. This is an upper bound right now. But it's fairly tight. If you drop the log n, then it's also lower bound, I assume. Yeah, so there's an equality, but yeah, it's an upper bound, sorry. So what we do is we pull everything out of the sum as far as possible so the log n can be pulled to the very left, the A can be pulled here and so forth. So now it looks like this and this term here, for example, it's a sum from D equal one to n, one divided by D and this is a harmonic number or log n more or less. So every summand is more or less log n and we can plug it all into the hour of one here and then we get a nice approximation on the number of squares in the graph and a similar approximation you get for any other leaf-free subgraph which has a constant number of vertices or constant number of edges actually. Okay, so let's pull it all together and analyze it. So we have n vertices and for every vertex we enumerate over all subsets and this gives us log n to the hour of one many subroutine calls and each subroutine call takes linear time and if we plug it all together we get a running time of polylog n times n squared and if you do a little bit more careful analysis you can get actually a running time of polylog n times n. So almost linear running time. All right, yeah, that's what we do. With a similar technique you can get also slightly stronger results which we're not part of the paper but since they're related I can mention them as well. You can do, you can decide every property to find expressible in first order logic. So in first order logic you can quantify over vertices by existential and universal quantifiers. This problem is also not in FPT so you cannot be solving an n to the length of the formula and what we managed to prove that every property expressible in first order logic so this would be deciding whether a graph has a constant size dominating set, independent set, you name it, many things can be done in almost linear time. So for fixed m, fixed epsilon at fixed property on a random graph you can decide the property in O to the n one plus epsilon. Okay, so that would be all from me. Thank you very much. I wonder how much your result depends on the fact that these graphs are very sparse. So I assume that in general that's as sparse perhaps there's calculate subgraphs easier. Yes, so we very heavily rely on the fact that this number of leaf free subgraphs is logarithmic which is not true in general graphs. So usually the number of triangles for example, it could be linear in the size of the graph. So this is a drawback of the preferential attachment model that it does not model those clustering behavior of real graphs. But we're hoping that we will generalize this to also clustered graphs soon. We consider this as a first step towards that. And this remark is that we have many vertices that are empty that are leaves. So you could remove those and then remove again, et cetera. And that's a small graph, but I assume you will. Are you assuming an iterative approach where you reduce leaves and leaves and leaves and leaves? ATS doesn't have leaves then that should be done. It's kind of hard to say something about the global property of the graph because if you increase the radius of the graph, then you've got very large minors. So actually you get a minor of size log n, a one sub divided minor of size log n. Yeah, maybe you could try to kernelize it until you get a minor of size log n then run a brute force algorithm. This might work. But the graph is really hard to analyze if you don't restrict the radius and as soon as you restrict the radius, then you've got those nice properties. Yes? So the graph here and the process you define, you also get self loops, right? Yes, that's something I didn't mention so far. So in the original definition of the graph model, you get self loops, but we assume for our sub graph counting that we omit self loops and replace multi edges by a single edge. But that's also mostly for the sake of simplicity. You could generalize it to multi graphs as well. Yeah. Then the expectation would be over this new distribution would kind of change, right? Then you can get the same graph from different multi graphs. After you do this pruning where you remove self loops and then the resulting graph would be obtained from a lot of different. So are you taking this induced new distribution or the original one? So yes, the same different preferential attachment graphs after removing multi edges can lead to the same graph. So what we do is we sample a preferential attachment graph with multi edges or self loops, we remove self loops and multi edges and then count the number of sub graphs. And so the expected value is over the multi graphs, not over the flattened graphs. So that is the important thing, what do you do with it? It's also a lot more than what you normally do. Would the same thing work for Adder-Schrinne graphs? So graphs where every edge occurs independently. Yeah, for Adder-Schrinne graphs, this has been well analyzed so far. There's a threshold phenomenon for, if the number of edges is sub polynomial, then you can do a lot of things. But as soon as the number of edges grows to some fixed polynomial, then it becomes hard, these kind of problems. So for GN over N, for D constant, yeah, this would work very well, yes. But even for log N over N, this would still work. Yeah, but for Dancer-Graph it would not. Thank you.