 I'm happy you guys picked the best one. So basically, our talk is on graph neural networks today, which is basically neural networks on graph structure data, machine learning on graph structure data. But we go more into it. We go into filtering data. Like, we try to teach you the entire workflow. So how to handle the tensors with PyTorch, and then how you handle your graphs with the graph that we have on X. And then we basically connect all of them together to get to DGL, which is your deep graph library for graph learning. And then at the end, we have some graph signal processing. So effectively, the workshop will, since it's quite late, it's 6.30. Effectively, everything you need to know about implementing GNNs will be done in like a first 1 and 1 to 2 hours. Then after that, we go into some graph spectral, graph signal processing, which is spectral filtering on graphs. So like band mass filtering, low mass filtering, deep diffusion. And then spectral GNNs. So spectral idea of convolution, spectral convolution applied to GNNs. But the math is more advanced for that. So the math for this workshop will be, I guess, slightly unwieldy to some degree, because it's a bit abstract. But it gets more abstract as you move further on. So if you just capture the basic idea of it and put it up really, you don't have to fully understand it. Don't understand it, feel it. Yeah, that's a quote from my professor. So anyway, who are we, first of all? So basically, I'm Zayan. I'm Zayan. This is Jed, let's say hi. This is Mahir. And this is Kabir. And we're all from NUS High School, which is, yeah, it's a JC in Singapore. And me and Jed are both 18, JC2. And Mahir and Kabir are both 17, JC1. So Mahir and Kabir are like our personal data for today. They'll help you guys, community, and mental health. Me and Jed are the ones who are doing all the presenting. And if you have any questions, you can ask us. So let's get into it. So first of all, what is a graph? So a graph is basically a data structure with vertices and edges. So vertices are basically the nodes of the graph, right? Or these like these. Jed, what's your pen? So vertices are basically these things, right? They're called the, they're called nodes, right? And basically they're, and then the things that connect between them are the edges, or the links between the graphs, right? So essentially graphs are made of vertices and edges. And they meant to represent pairwise relationships in that. So for example, how do you use graphs to represent data, right? So for example, states in a puzzle, if you guys take chess, right, links in a chessboard, right, and we view the nodes as edges in a graph. And the states of the chessboard that are connected by the nodes can be viewed as like, for example, the vertices in the graph. So you see in this case here, this is a game called a sliding puzzle, right? It used to be packaged on Windows. But effectively, states in the puzzle are the vertices. And then the connections or the moves between those puzzle states are your edges, right? Another way is chemistry. Your bonds in your chemistry are your edges. And your atoms are your nodes or vertices, yeah. And then you also put your own diagrams, because me and Jed are both doing a database model now. But essentially like your entities and your relations are your nodes. And the connections between them are your edges. So basically, there's a lot of data that can be represented via graphs, right? That's why we need to have some way to do learning on this kind of graph structure data, right? And that's why graph neural networks come in, right? So some example task for graph neural networks is neural combinatorial optimization. So I recently did a research project where I used graph neural networks to solve the longest simple path, which is a NP-hard problem on fixed vertex size. So essentially, you can use your graph neural networks to get a heuristic. And then you can keep that heuristic into some sort of expert search algorithm to do neural combinatorial optimization. In fact, I think there are a few cutting edge Rubik's Cube solving algorithms right now, not as good as the A-star one, but that use graph neural networks. So we can use it for neural combinatorial optimization. It's been widely used in traveling tables on problems. There's a lot of research on that if you guys want to check that out. Then chemistry, classifying molecules, toxic, non-toxic, looking at the microscopic molecule, the molecular structure, guessing its properties. And that can also be a graph learning task. Then trafficking in networking problems. I guess this is kind of like neural combinatorial optimization, but essentially the shortest path from one place to the next place. How can I get there in the fastest time? Trafficking in networking. So the stock of these structures in a few ways, because we didn't know really the average level of the person who's here to talk. So we're going to do a short recap on first artificial neural networks, deep neural networks, and a bit of linear algebra, like matrices, vectors, right? Because you'll need a bit later on. Then graph theory and basically the implementation of this graph theory with network X. And then we're going to be drawing spatial GNNs, which are GNNs that use spatial convolution operators, which we'll explain to you later. And then we'll talk about the different existing architectures for GNNs, the best ones that people use a lot. And then we're going to do the implementation in DGL of the graph neural networks. So it's basically theory code, theory code, theory code. And then basically at this point, if you guys want to leave, you can understand, because it's quite late. So that's how we structure it. But then after that, yeah? This is the situation here, where we're going to be writing after this. Oh, yeah, you're recording it. So because it's quite late, we're going to put the really unwieldy content, which is your graph signal processing and your spectral GNNs at the end. Yeah, I think I mentioned this just now earlier. But yeah, so you guys want to stick around for that. It's quite interesting. In fact, it's my favorite part of the talk. But we understand that you came here for GNNs. So if you just want to know how to implement GNNs from start to finish, then you only need to stay up till here. So after that, I'll start with my intro if you drink. Let's go into the front end. So first, we're going to recap on artificial neural networks. So as you guys know, neural networks are vector-to-vector functions. So for example, this is an R4 to R1 function. So a four-value vector to a one-value vector function. You feed in samples of data. And you feed in output data. And then you know that what has trainable parameters to feed. So these trainable parameters are the weight and biases of a neural network. So if you look at a singular neuron of a neural network, you guys should probably already know this. Essentially, you have the trainable parameters, the weights, and the biases. So the weights and the biases here. And those are trainable to feed the data. And your x is your input data. So if you have a neuron, you have inputs and outputs. And then you activate it with a certain activation function. But I'm not in the activation function to introduce neural linearity in your model. So to represent neural networks in a useful way, we use linear algebra. So I'll just do a quick primer of it on linear algebra. So linear algebra is made up of three things, scalars, vectors, and matrices. So vectors are basically, so this is a column vector with the vectors in a column. Essentially, it's a way of storing numbers. So for example, this is an R4 vector because it's a four line vector now. And then the matrices is N by M. So this is an R4 by three matrix. So matrices are actually linear algebra so they can represent linear equations, like linear relations. So for example, this equation right here, this simultaneous equation right here, can be represented in linear algebra terms. So first, how do these things come in, right? How do they relate to each other? How do you map them together, right? So the multiply matrix by a vector, right? You take the first row of the matrix and the first and the vector and you basically element wise multiply and sum. So if I want to get the first value of the vector, of the vector that comes from the matrix and the vector, I take W11 times L11, W12 times plus W12 times L21, and that becomes my first value. So you're multiplying every row of the first matrix by the column of the vector basically. And then you can multiply, this is an example of multiplying a row vector by a column vector. So it's the same thing, you're multiplying every column by every row. So then you can add this finally to matrix matrix multiplication where you start off by multiplying W11 times X11 plus W12 times X21. And then that's how you get the first value. So it's like this, this will give us this. Then this, this will give us this. And basically, that's commutation, that's multiplication in matrices. So there's some rules about matrix multiplication that you should know that will be relevant to the talk later on and now. So basically addition and subtraction are all element wise. So basically if I have a vector here like AB and I add a vector CD to it, the net vector will be A plus C and B plus D. So addition and subtraction are all element wise, yeah. How can I raise the water? You hold it down, you just press. You do. OK, anyway, I can do you. And then multiplying a scalar by a vector is also element wise. So if I have a vector AB and I multiply it by some scalar 6, then the resultant vector will be 6A6B. So there's a few matrices that you should know about. One is the identity matrix. This is like saying 1 in higher dimensions. Essentially the matrix is, the entire diagonal of the matrix is all just 1's. Then you have a matrix inversion. So essentially for certain kinds of matrices, there can be a matrix that exists such that when you multiply it by that original matrix, it returns to the identity matrix for that size. So that's called matrix inversion. And it can only be done per square matrices with the same number of rows and columns. So you see here, 3 times 4 plus 11 times 1 is 1. So yeah, that's how that's matrix inversion. So you'll need that a lot later on. You don't even know it now. So the final thing you have to know is transposition, which is basically you flip the values of the matrix across the top diagonal. So I basically flip the values across this diagonal. So 5 and 0 swap, 3 and 2 swap, 2 and 0 swap. Yeah, so that's matrix transposition. And you can do it for higher, irregular matrices, meaning non-squared and irregular matrices. You just always look at the top left value of the matrix. You see the first value of the matrix here. And then you basically draw a diagonal line from there and move to the top of the matrix. So like for example, if I have just a simple vector A, B, I look at the top value and then I see the diagonal associated with it, which is like this. And then I, therefore the transpose is just A. Therefore, the transpose is just A, B. Yeah. OK, sorry. So basically, transpose on the top diagonal, essentially. So the original variation behind vectors and matrices was that vectors are basically n-dimensional arrows from origin. And then matrices are transforms which scale and rotate the vector. So like for example, we have a vector here of 1, 2. And basically, when we multiply by this matrix, it transforms the vector in a certain way by rotating the vector. So it rotates the vector. And then it also scales that vector. So yeah, that's the idea of the original variation. So as can this be applied later that works finally, right? Sorry for the mathematical gradient. But essentially, so essentially, we can represent a 4 by 2 layer like this with a 4 by 2 weight matrix, right? Because there are 4 inputs and 2 outputs that I get weights. So you can represent that with a matrix. And then you can represent your original work layer by your matrix vector application. So this is your vector of inputs, your x1, x2, x3, x4. I'm just finding on the screen. OK, yeah. So like this is your vector, your input vector. Then you can multiply by this weight matrix. And then you get your output vector. So yeah, that's the benefit of it. So then the activation function is just some nonlinear function. In this case, it's sigmoid. But you can use any nonlinear function. Yeah. So all internet of GNNs is that it's a vector to vector function. You take a vector, you get a vector. And your general parameters are your weight matrix and your bias matrix. So your weight matrix is used to basically transform from one end to the next. Then you add some bias term and then you activate it. So that's just how GNNs are generated. So I know this may seem useless for now, but we use linear algebra for the GNN part as well, both for computation and the motivation behind how spatial GNNs work. They both rely on linear algebra. So this is just a primer. So actually, the reason why GNNs work is because they have a large amount of GNN parameters, which allows it to de-texable and fit a lot of data. And essentially, repeated linearity can mimic nonlinearity. And activation acts like basically more mean nonlinear characters than model. So what do I mean by this? So I was going to give you a high level explanation of universal approximation theorem. But instead, I'll just show you this cool graph. So here's an example of data. Assuming I want to fit the white circle in between, the ideal fit would be this circle nonlinear function, the circular nonlinear function. But what I know that what it does is it's essentially adds multiple line segments or multiple linear sections that mimic this nonlinearity. So by adding many lines together, we can get this nonlinear circle. So even though this is not a circle, by having multiple lines, maybe it's like the properties of the circle. Yeah. So to measure the error in a function, we use a loss function. Yeah, you don't know much about that. Commonly, this is L2 norm and L1 norm. And then to calculate the values of the actual weights, we use back propagation. But back propagation is usually done all by library. So you don't have to worry about that. I'll show you in a second how we do it. But yeah, you don't have to worry about how we use back propagation. So now you guys can try out some notebook work if you have your laptop here. Essentially, it's just showing you how you can use PyTorch for solving a simple linear system. So let me open up your jet, opening the document now. So it should be OK. Do you want to do that too, or do you want to do it too? Yeah, sure. OK, so this is the A&N notebook. So essentially, we're going to use PyTorch to solve the simple linear system. So this is a linear equation. As I mentioned earlier, you can use your linear equation. You can use linear algebra to represent your linear equation. So in this case, your input vector would be wx to yz, ordered as a column vector. And then your coefficients would be matrix. So I just encoded it like this. And essentially, your y is your ground truth, and your x is basically what you're trying to find. So we're trying to solve for wx, y, and z in the linear system. So we can write it as an equation like this. The double line squared to, let's call it the L2 norm. And it's basically our error function. So what we're doing is we're checking for our matrix multiplied by our serial values of wx, y, and z minus our actual two values of what ax should be. Basically, we're looking at the error between those two terms. And then we're using our inverse of the solve. So you see, for now, I just said wx, y, and z to random values with torsion random. And then I set my y tensor to just 9, 4, 24, negative 12. So this is your ground truth data. Then we can feed it into our model very simply, like this. So the thing is, for your back propagation wall, you need to run for a certain amount of iterations or reparks. Basically, when are you going to make that step? So essentially, for back propagation, you can't get the ideal value for the way immediately. You have to kind of converge towards it. So because of that, you have to loop over it a certain number of times. So this gives a loop over 1,000 times. And essentially, at every step, I calculate the ideal change of the system. So it's a very simple system, torsion map mode a comma x. And then our loss is just the comparison function between what our guess is and our true value is. And then we feed it in our optimizer dot step. So the optimizer I use is stochastic gradient descent optimizer. And then I only feed the variables that I want to optimize there. Now, I could feed in all the values here. But essentially, the optimizer will only optimize values with the required grand equal true parameter. So essentially, I run my model through that way. And then I can get the ideal values for x. So after 1,000 iterations, I got the values for wx defined in 1, 2, 3, 4. I designed this equation myself. So defining neural networks is kind of the same way. Essentially, for defining neural networks in Torch, you need to have an initialization and a forward method. So your forward method is your forward pass. Essentially, you see, getting from your original input vector to your output vector, the layers you feed it through. And your init is your initialization step for your weights and what not. So for example, here's our final model that we built. Right? So essentially, we define our layers like this. So the sequential step is effectively a way that we can group layers together. And then it's just telling us what we're doing. So what we're doing is we have a 784 input. Then we are moving it to 5-centimeter nodes. Then we're adding real activation function. Then we're taking the 5-centimeter nodes. And then we're adding, we're reducing it to 256 nodes. Then we're adding ReLU again. Then we're doing one final linear layer to 10. And then we use a final activation function for softmax to get our final values. Right? And we can define that with nn.sequential. But you can also reapply that thing over and over again. Yeah, the nn.linear is just your wx plus b weight times x plus b. So yeah, that's effectively Torch. So that must have made a lot of very boring to you guys because you guys probably already know about the nn's. But essentially, we're going to go through the full package. So now we're going to go through more new content now. So Jen, graph theory. Hello. So now I'm going to talk about graph theory. So graph theory. So what's a graph? So for you to see, I mentioned that this is a graph that has nodes. And then you have edges between the nodes. So yeah, I can view the nodes as elements. And the edges are like the connections between them. And then two types of graphs. You can have undirected graphs where you can do something like that. You can just traverse the edges in that manner. We can have directed graphs where basically the edges only point in one direction. So you can only traverse in one direction. So two types of graphs. Now we have the idea of the degree of the node. So the degree of a node in the graph is basically the number of connections to that node. So what come up? This is theory. And then it's like two connections. And then I put some more words that also has two. But four has only one connection. So we can represent this in a degree matrix, which basically says like on the diagonal, you put all the degrees of every node. So like 1, 1, that is two. That's the degree of the one node. And then you just put it up along the diagonal. So the next part is called the adjacency matrix. And it represents the graph by connections between the nodes. So for example, you say there's an edge between 1 and 2, but maybe there's no edge between 1 and 4. So basically like you go to the matrix and for example, 1, 1, there's a connection. Or like 1, 1, there's no connection. And 1, 2, there's a connection and so on. So that fills up the entire matrix. So now I'll go through the notebook. So basically, this is library. It's called NetworkX. It allows us to create a graph during this whole again. So the library is called NetworkX. And we can basically use this to build graphs. So first, you can import a library like this. Import NetworkX as an x. And do g.x of graph. So this creates an empty graph. OK. So while we're looking for this to run, I'll go through the next part. So basically, to add stuff to this graph, you can add nodes and stuff. So first, you can add a node like this, like g.addnode1. And you can also add nodes from any interval container. So this refers to things like lists or sets or tuples. Anything you can look for. And you can also add nodes along with attributes for those nodes. And so you can do it in this manner. And if, for example, you already have a graph and you just want to add a node from that graph to this graph, then you can just do g.addnode from page. Or you can even add graphs as a node within another graph. Because basically for this library, what you can do is that your nodes may not just be integers or some simple data type. You can add anything that is hashable as a node to the graph. So if you think of like Python dictionaries, you can also use anything that is hashable as the key. So it's the same thing for graphs. You can use anything that is hashable as a node. So this is quite convenient, actually, because for example, if you want to sort a graph of documents, maybe the documents are citing each other, you can very easily do that by adding the documents directly. And you can also grow the graphs for adding edges. So you can add edges like this one at a time. Or you can add a list of edges, like you can put edges in an interval container, like a list of set and just add them. And you can also add edges from another graph. So instead of doing add edges from h, just do h.edges and just return that edges from that graph, you can add them to another graph. And you can also clear the graph so that it comes empty again by using .clear. And when you add edges or nodes to a graph, what happens is that even if they already exist, networkX will just ignore the fact that it already exists. And it won't be ready to enter. So as you can see here, now we have eight nodes and three edges. We can also do something like this. Basically, you can have, if there are graph constructors, at least you don't always start from an empty graph, you can start with some nodes inside of you. And then you can use that to build up your graph. So next, what we do is that we can analyze the elements of a graph. And the four basic graph of these which allow you to do this is g.nodes, g.edges, g.adjacency, and g.degree. So what these do is that basically nodes and edges just return the edges, nodes and edges, and g.adjacency basically returns the adjacency list of all the nodes. So for example, it will return the list of nodes that are connected to, for example, node 0 and then node 1 and node 2 and so on. And finally, degree returns the number of edges that are connected to a certain node, which is basically just the degree of that node. And you also report the edges and degree from a subset of all nodes using this function. Next, we can remove elements from a graph. So other than adding it, also remove. So for example, you can remove them by their value basically. So if you add a node 2 inside, then you can just remove 2 in the same way that you could. You can just access elements inside a dictionary by just giving the key. So if you remove an element and it results in an offset, it will result in a node that is not connected to a graph. That's called a disconnected graph. So motherboard connected. It's like a disconnected graph. Yeah, it won't handle it. So it will just treat it as like... So you need to be careful when you add a graph. Yeah, yeah. You also got a graph object incrementally. So for example here, you can see that you're adding edges or you create a dial graph using the connections from g. And basically, you can list the edges. And you can also create it directly from edge list. Or you can create it directly from an adjacency list as well. And for example, here, 0 will be connected to 1 and 2. 1 will be connected to 0 and 2. And 2 is connected to 0 and 1. So I previously mentioned this, which is that nodes and edges can be any hashable object. Not necessarily just integers or strings. Or any simple data type. Anything that you can use as a key to a dictionary. You can also use as a node or an edge of a graph. And actually very conveniently, you can also use subscript notation to access the elements of a graph. So for example, you can see here, this is... I've created a graph here. And you can basically access the nodes and edges properties. So for example here, this will return the nodes which are adjacent to the first node. The node will value 1. And for example, this will return the attributes of the edges that connects nodes 1 and 2. So as you can see here, it returns color yellow, which is what we put in the start. And you can also use this to set attributes as well. So for example, here we have obtained the attribute and then we can set the color to a different color like red. And then when you obtain it later, it becomes red now. So other than this, we can also very quickly examine pairs or... We can very quickly examine node adjacency pairs using this function g.adjacency and g.adjacency.index. So you can do this because g.adjacency is basically just a dictionary and the node values are the keys to that dictionary. And then they will store a list that points to the values of the other nodes. So you can look over it like you look over at the node dictionary. So you can see here, we're basically looking over all these items and we're changing the attributes. And you can also very conveniently access all the edges using the graph.edges property. So now how about adding attributes? So other than adding attributes directly to the nodes and edges, you can also add them to graphs. So here for example, I added an attribute, this day it was Friday to this graph. And you can also modify them later. So for example here, you can modify it like you modify the dictionary. You just do g.graph and the attribute name equals its new value. Step 4, do you add the attribute to the graph? The graph will change anyway? The attribute is just something that's assigned to the graph but it doesn't change the connection. Here you can also see we're modifying the attributes of the nodes. And here as well, this basically modifies the attributes of the edges. So the reason why we might need attribute assessment, for example a node or a certain edge might have a certain kind of property that we care about, like maybe we want to care about the weight of the edge between two nodes. And maybe like for example, we want to find the shortest part between two different nodes. And so each edge you can store like its weight and then later you can use that in order to do your calculations. So yeah, it will be the same thing for nodes as well as the entire graph in general. So next week there's also this function called diagram that I mentioned earlier. What this does is that it provides methods and properties that are specific to directed graphs. So this means that because previously this graph class, this graph class here, it works with undirected graphs and directed graphs and undirected graphs are kind of different because for example for a undirected graph the definition of a degree might be different from that of a directed graph where you have an in-degree and an out-degree which represents the inward going and out-of-going actions. And some algorithms also only work for directed graphs while others are not more defined for them. So you should try not to lump directed and undirected graphs together. Then if you want to convert that you can use graph.to undirected. So next, NetworkX also provides a class for graphs that will allow you to have multiple edges between any pair of nodes. So for example, this will allow you to have more than one edge between two nodes. For example, this might be relevant if you are looking at chemical bonds where maybe you want to have two edges to represent a double bond product. So you can see here that we are basically using a multi-graph and we have added multiple edges for the branches between two nodes. And here you can see that we are using a function provided by NetworkX for shortest path to find the shortest path between two nodes. So next, it will also make your life a lot easier if you don't have to construct all the graphs from scratch all the time. So NetworkX actually provides some graph generator functions. So for example, you can apply classic graph operations or use a call to one of the classic small graphs or use a constructive generator for classic graph. So you can see here that we have all these different generators. So for example, generate is a complete graph. This will generate a complete bipartite graph where basically you have the nodes on the two sides and you are connecting the nodes to each other like all the nodes on this side are connected and all the nodes on the other side. And we have also some other constructors. And these constructors also don't aren't always deterministic. Some graph constructors can also generate random graphs. So we call these stochastic graph generators. And these are some examples. In addition, NetworkX actually supports reading and writing graphs to many popular formats. For example, it supports reading and writing to edge list, adjacency list, GML and all these other different types of file formats which you might want to store a graph in. So you can see here that for example, we've used the right GML function and it unlocks this graph to a file path and then you can read it later. And then the graph will be re-engined last. And finally, you can... And next, you can also have use various graph theoretic functions to analyze the graphs. So what this means is that for example, you can do things like find the clusters inside the graph or you can look at the connected components. Like we see, if you have a graph like you might want to find the multiple disk there are also these connected graphs within that one graph object. So maybe you want to find those, find the number of them. So can you read that? It's like you can have multiple disjoint graphs in like one graph object. So like for example, your 012 nodes could be... It's like what you were talking about often earlier. Like there could be multiple like graph objects connected in one graph object. So like for example, node 012 could be completely connected and then never connected to node 345 which are also connected but never connected to 012. So essentially it's like two graphs existing in one graph object. So we can use narrow x to come back. In some functions of a large output we also iterate. We can basically do this to iterate over them and store it in a dictionary to make a life block to 0. So for example here we have run a small column and you can store these in a dictionary for easy access later. So finally we can also draw graphs using map.lib. Although narrow x doesn't provide these functions natively you can use map.lib to draw them. So for example here we have initialized the graph basically drawing it with labels and this format group is for the labels of the nodes. You can see here that there's also different ways of representing it. So for example here you can draw a line of nodes in the shell in this manner so that they don't overlap. So the edges don't overlap whereas over here we've just drawn it randomly. So the nodes just position whatever you want and then the edges just draw even if they cross each other. So you can see here there are more nodes that position the nodes. So for example here you can just draw it randomly but also position the nodes in a circle so that you can see this is some of the circle and you're drawing the edges between the nodes. These are also some other ways that you can draw graphs. And finally you can also save the graphs to an image for future use. So here we have saved this graph to this image file which you can open. And you can also write the graphs to a dot format although I didn't find it in this case. Sorry. So now I will touch on it. Let's see I remember this screen. I want to show you this slide. You don't have to go in there. I want to show you this slide. I want to show you this slide. And then also share it with you. I think it should be useful. Yeah it's there already. Okay so now I will talk more about graph neural networks. So basically we can split tasks broadly into two types. There's inducted tasks with a lot of small graphs for example if I want to do things like classify what type of graph this is. And the way that you know in machine learning you can split data into a train, test and validation. So the way that you perform these splits is simply by like you put the graph like one more graph is within one split and one more graph is in another split. So like even if you are treating them a single graph as like one whole unit. So an example of this might be molecular classification. Maybe I want to classify whether a certain molecule is pungent or poisonous and you can use another type of task called a transductive task where we just have one very very big network. For example this would be like friendship networks or citation networks where the network is just very very big and it's just one huge graph and what you want to do is that given that you can see a certain portion of the network you want to predict the properties of another part of the network. So during training what you'll do is you can ask parts of this graph and also predictions end up in those places. So for example maybe I will mask some of these areas and then I can ask whether you want to predict, ask the network to predict whether there is an edge there. And an example is friendship networks. But in addition, before we can just write a network on this graph we also have to, because we know that neural networks only work on vectors. So what we want to do is that we want to sort of turn the graph into a vector. So what we do is that we provide embeddings for nodes and edges. What this means is that each node and edge will be assigned a vector which we call an embedding. So you know an embedding or a feature like it can be anything. So for example if the edges are like single bonds or double bonds in the chemical one part encoding to provide a categorical feature although it doesn't have to be one part encoding you can use anything to you can basically add any vector there. Okay so basically now we're going to go into the first type of graph neural network which is spatial GNN. So at first we're not about why we can't use existing neural network architecture with our graph data. So the first thing is why can't we use standard multi-layer program for model or deep neural network, artificial neural network like what we went for at the start of the presentation. Why can't we use that for our graph data. And it's because MLP's failed to account for the structural information of the graph. So as Jeff was mentioning earlier we can have these graphs with all these features associated with each of them. But then if we just feed this feature into like a neural network it's not going to work because it fails to account for the structural information of the graph. And there was a storm in this city over here. The likelihood of a storm in this city here and this city here is higher than let's say this city over there. So because of that we need to account for the connections in the graph. But MLP which only looks at this vector and not at the vectors of the neighbors of the node will not be able to account for that information which is why we can't use standard MLPs. So the next thing that people are like oh we should use convolution. So convolution is basically you're looking at regions of data, spatial coherence. So instead of looking at just one individual pixel at a time you're looking at clusters of pixels in every image. However the reason we can't apply convolution at least of CNN in a direct sense is because our graph data is non-eclidian. It is not ordered. Even though this looks like an ordered graph each of the nodes in the graph have different amount of neighbors. This corner node has two neighbors. This middle node has four neighbors and so on and so forth. So because our graph has non-eclidian properties we can't have a standardized corner that can fit the entire graph. So then we need a way to apply this idea of convolution to graphs. So our basic structure for every CNN is kind of like convolution. Then we need a way to convolve the node features and then we need a way to convolve the node features and then if we want we can have stick connections for fast to prevent like vanishing gradient problem or not. But essentially this is how we're going to build our GNN. The idea is that for spatial GNNs is that we want to pass those signals that signal data around the graph. So we want to share information between neighbors of graphs and of course we want the idea of spatial coherence to be maintained. So we look at another area of math where signal diffusion is happening or passing signals around. So we're going to look at time convolution. So essentially this is an example of data spread across time. So x0, x1, x2, x3 spread across time. And essentially we're going to find a shift operator and the shift operator basically shifts the signals down. So it moves the signals through time or diffuses the signals through time. So you can have multiple shifts and then it becomes a shift squared and you have two shifts. And this is how we do signal diffusion in time in signal processing. This is how you do time shifting. And then what we do is we take all these time shifted inputs and then that's our time shifted convolution operator. And then we multiply each of these vectors of signals by some sort of coefficient. And then we sum them up and that's our time convolution operator. So this is an nth order localized time signal. So it accounts for signals across n time steps. So essentially the time convolution operator can look at signal diffusion from the graph and then you can use it to calculate metrics of the signal diffusion from the graph. So essentially how can we move this idea of time convolution signal diffusion into the graph domain. So what we do is we replace instead of time we look at we replace it with a line graph. So essentially a graph pointing instead of points in time we look at we assign each point of time a node and then every shift operation we just move the node features down. X1, X2, X3. So X0 is our node 0. Then we move it down to node 1. So that's our shift operation. So that's how we do it. But what is the shift operation in this case? For this line graph. So actually the key inside to this is the shift operation is actually the adjacency matrix of the graph itself. So I guess this is a little delay but take some time to understand it. So the connectivity between this node and the next node is 1. So that means we're shifting that signal to that node. So this is how you can use your linear algebra. So we take our adjacency matrix and we multiply by our signal and then our signal moves down. So that's the idea of how we can move our time convolution idea to the graph convolution idea. We realize that our shift operation is just adjacency matrix. So now we have a line graph and I just adapt it to all the graphs. So we can adapt it to any kind of graph where the shift operation is just adjacency matrix. And then we have our time general graph convolution filter which is used in graph signal processing. And basically this is an end-order graph signal diffusion convolution. So essentially what each shift operation is doing is diffusing the signals throughout the graph. So we're passing information around that and that's the idea of message passing. We're passing information between nodes and by sharing that information each node contains context of these neighbors. And then if we feed that data through a neural network or some sort of energy of any kind, then we can get a better output or better understanding of what is going to happen. Now we know that if there's a storm in a city in a neighborhood city, that information will be passed on to us. But then if I'm a neighbor really far away, it's going to take a lot of shifts to get that information to me. So because it takes a lot of shifts that's how we can have a localized spatially coherent message passing operation for graphs. So at this point it looks really unambiguously, but I'm going to show you the way of writing information. It's very simple. That's very simple. But we kind of adapt this general convolution operator for GNNs. So the thing is, neural networks have multiple layers. So we drop all the layers and just take a single layer. So just take the signals and another time where I just symmetric to get the signals back. And then we're going to adapt it to multiple node features. As I said the node feature can be the entire vector, not just a single value. So we just keep the node features of each vector, we organize it as a matrix and then we can get them up to the back as a multi-valid neural medium. Now we need to transform these vectors or add some trainable parameter to them. So we can again just use the algebra and just add the matrix at the back. Add a weight matrix at the back to transform these signals. So essentially we're taking these signals, refusing them or taking the signals, transforming them and then refusing them throughout the graph. And then that's our new layer. So this looks very big complex matrix question, but this is how it's doing. Essentially it's taking, if you're looking at this graph, this node here, taking information from this graph, this node and this node, and then summing all of them up multiplied by some with shared weight matrix. So essentially each node has its own weight matrix associated with it. Now you guys can clearly see your problem here that the original signal of the graph is lost or at the graph, the original signal of the node is lost over here. So what we can do is we can add an additional self-loop to the graph basically adding, replacing our adjacent matrix plus i. So essentially now we're not just looking at all our neighbor's features, we're also looking at our own feature and we'll multiply by some weight matrix to transform it to a different node vector size and that's basically our layer set. Now we want to count for example the weight of the edges or we can just add the edge weight data here. So just scale how much ever the transform feature is by the edge weight and then we just add the edge weight data there. So now this is a very high level understanding from the map where essentially we're just summing and passing things. So Jen will explain it in a better way about how the message passing works in a more visual sense. But I just want to give you guys a mathematical understanding of the general thing. So Jen likes to give it further to account for different kinds of aggregations. So in this case I use some aggregation but then you can use any aggregation function, a max aggregation function any kind of aggregation function you want. Basically how am I going to sample those nodes around me and then how am I going to add those nodes around me. How am I going to transform those nodes around me. How am I going to transform my neighbors and put it and combine it with my original feature. And instead of using just a simple way I can use an entire linear layer to do that transformation. So these are just all that works. So essentially what I'm doing is looking at the nodes around me taking information from them multiplying it by some trainable weight value adding it to myself and then transform myself as well and then adding some sort of neural network activation and then I get my new value jump. So that way I share information throughout my entire graph and basically I'm not just doing it for me I'm doing it for every node in the graph. So information is shared homogeneously throughout the graph and which does lead to some problems around which I'll talk about. But for now just understand that that's how message passing works. So now Jen will carry on with a simple gnl here. I'll give a more high level overview about the gnl here. So basically you can like this whole graph you have embeddings for every vertex embeddings for every edge and embeddings for your whole graph in general. And the idea of a gnl there is that you're going to process all these copies of embeddings. So you can see over here that we have basically applied a function on them. This function is usually ML key which we talked about earlier. So currently if you just don't do anything and just do like this it's not very optimal because you can't really communicate information between different nodes or different edges. So now we have the idea of message passing which is that we're going to pass information between nodes, between nodes, between edges and basically allow the embeddings of the graph, the nodes and edges to influence each other. So we're going to basically we do this thing called pooling. So for example we consider like this green node over here and we say that this is its label. So what we do now is that we take the nodes next to it and we apply this applied a pooling function. As I mentioned previously basically it can be a lot of things like it can be maximum or mean stuff like that and then you also combine it with the embeddings of the current node with another function and this if you combine these two to get a new embedding. So you can sort of visualize it like this so basically like for example we take the embeddings of these two and we combine it with the edge embedding but you just do each of them. So this basically forms a basic GNN layer and this is the update pooling function. So the exact structure of your GNN layer will usually depend on the one that you want to solve like whether it's no classification or edge prediction stuff like that. So there are several types of GNN layers. So the first is called a neural fp. So the idea is that nodes with different degrees like different nodes are of different importance. So we can say that maybe on the pool the pool differently if the node has a different number of connections. We can see here that we are going to use a different function for pooling based on the degree of the node which is the number of connections to that node. So that forms what we call a neural fp. And now there's another different type of graph neural network. This is called graph stage. So what this does is that if you think about it, not just the nodes that are right next to it are important. The nodes far away can also be somewhat important. So what we do is that we look at the different neighborhoods and where k is like the distance to the node. So for example you can see here we have the initial blue neighborhood and this orange one and keep expanding the outputs. And the idea is that you have this pooling function pk and neural p to process the neural values. So what we do is that we iteratively combine the inputs. So you pool the nodes first and you add the current with the center on xx. And then you just keep doing that again and again. And you keep updating. And pk is also instead of using a pooling function like just minimum maximum or mean what we do is that we use another pooling function. So what this means is that you can use a machine. You can use machine learning to train the function. So for example you can use MLPs or LSTMs. But the issue with this is actually that a lot of these models they don't have the property of commutation invariance which is what we want inside pooling function because we do not want it to, we do not want the order in which you add the nodes into the pooling function to really affect its output. So to rectify this in the training process what happens is that we will randomize the order of the nodes being encoded into the function. So this prevents the function from becoming bias words any of the nodes and basically treating the way the order in which they are encoded as important. So finally also talk about graph attention networks. So typically in your pooling function you might assign the same width to each of the nodes next to it. But some nodes are more important than others. So what we do here is that there is this paper called attention is all you need and it talks about transformers and their user use as a language model. So they basically introduce a type of layer called the attention layer. This allows models to basically attend to different parts of a sequence and basically say that this part of the sequence better way is perhaps more important and then the model will use that part of the sequence more in its predictions. So basically how this works is that with how we use this is that basically you have the attention function and then you give it the embeddings of the two nodes. And then that will calculate the weightage of that edge between the two nodes. And we only perform what we call master tension which means that we only perform this computation on edges that actually exist. So for example we won't calculate this edge weight between the green node and the blue node. So now we talk about spectral GNNs. Or rather I won't talk about them because again we're going to cover them later. Essentially spectral GNNs are probably the optional talk after this. But essentially in general what spectral GNNs are is that instead of referring them as node features we call them graph signals now. And essentially if you guys know like electronics people or you know what convolutional neural networks essentially they don't do filtering in the traditional sense. Or like filtering on the data itself. They do spectral filtering. So for example low pass, man pass, high pass getting a specific channel to use a man pass frequency filter. So what you do is that you get a spectral representation of the graph and then you do filtering on that. So it's kind of like how in convolution you take a spectral or you take the Fourier transform of your image and then your spectral version of it and then you add your convolution column on top of that and then you take the inverse transform. So in the same way but you have to define a graph Fourier transform which are going through the data. But for someone who is high level implementing the GNN and you won't really need to know what the graph Fourier transform is so don't worry about it. So essentially your spectral GNNs have like a variable spectral filter kind of like your convolution with a variable but in convolution you also do the Fourier transform on the filter. So I don't know some people call it a spectral filter Yeah, essentially spectral GNNs are for spectral filters. So in general your entire graph neural network model is made up of multiple GNN layers like the graph sage layer that we are talking about or GCN or whatever the graph neural network model you use and then there are three tasks you use GNNs mainly for which is node classification edge classification and graph classification So node classification is like your standard task classifying your nodes like if I have a network of cities will there be a storm in that city in five days time? So we are classifying the nodes of which are the cities so it's a node classification task then there is edge classification task for example in neural maps you are trying to find which edges are part of the shortest path which edges are not part of the shortest path so that's edge classification task because you are classifying your edges classifying whether your entire molecule is toxic or not so to do all these tasks we basically still use the same GNN on the language architecture but essentially for edge classification what we do is is that we look at the two nodes attached to an edge and we basically combine their features together and then we feed that to a neural network for edge classification and then for graph classification we just take some aggregation of all the node features so it's just the standard GNN layers which is node feature based and then we can add that edge classification module cleaner on so now we are going to go into the notebook with DGL but I just want to ask do you guys need like a break or anything like a five minute break because I've just been talking straight do you guys want anything like a five minute break do you guys have any questions about the content so far so I guess we can go straight into the then I guess we can finish a lot faster if you guys don't want breaks or if you guys are okay with no breaks so now we are going to add the actual implementation of the graph neural network with deep graph library so I'll open up the yet, help me open up so you guys can open up the notebook tinyrl.com.slash.post.gl okay so basically DGL is the currently one of the three most used Python graph libraries which is DGL, PyCard Geometric and Jira so I don't like Jira because it uses its own machine learning back end that's not PyTorch or TensorFlow then PyTorch Geometric is based on PyTorch and DGL it can be it can have either a PyTorch back end or a TensorFlow back end now I don't like TensorFlow I don't like it because it's unwieldy and it's not highly used in research so usually when you nowadays when you tell people I use TensorFlow here, laugh at you sorry if you guys use TensorFlow already but now people are using PyTorch so we just try to stick with the standard so I should have run this earlier but so I put the install link for DGL in the top of the notebook if you guys need to link again I'll show you then you don't have to go for DGL so it's a rather large library but the Wi-Fi is pretty fast good so I'll move on so we're going to take our DGL stuff from earlier plus our Netflix stuff we're going to have some data processing libraries NumPy and Panda then of course our PyTorch and some DGL libraries so what we're going to use is data sets with DGL how you guys can use JNNs with DGL also if you guys want to take pictures I'll give you guys a link of all the information you guys can just approach all of us if you want to link or we'll show it at the end as well okay let me run the imports so first graph representation so as we come up we have done that in the works earlier a lot of people will do this we do all our graph processing in NetworkX now we want to move it to DGL for our train so what we do is we take our NetworkX graph let's begin now I mean send it to the Github it's fine my plan doesn't get done okay so what you can do basically the function you can use to get a graph from NetworkX to DGL is from NetworkX so yeah it's a very easy conversion which is why a lot of people use NetworkX with DGL actually Python Geometric has the same feature so yeah it's just conversion so you can also do two NetworkX which is basically taking your graph from DGL and moving it to NetworkX now this has some funny properties it automatically sends it as a directed graph so even though it's unweighted graph in both directions you can convert it back to an unweighted graph once you do the two NetworkX so it automatically puts it as a directed graph so you can also define graphs in DGL if you're lazy and want to skip the NetworkX step so DGL uses edge lists so your graph here you put in your source edges your source nodes and your destination nodes so if you're telling DGL there's a link between 0 and 1 0 and 2, 0 and 3 0 and 4, 0 and 5 0 between the first with the end value in this list and the end value in this list you can also use PyTorch long answers as your edge inputs so you can just run that yeah and if you don't even specify number nodes you can omit that as well so yeah so if I draw this graph out from the DGL you can see it's a directed graph so 0 is connected to 1, 0 is connected to 0 0 is connected to 3, 0 is connected to 4 yeah so then you can access the nodes by doing g.nodes and g.edges in DGL it returns a PyTorch tensor and for edges it returns source nodes and destination nodes so now how do we add features to the graph so this is all for like show you don't really have to do it but what a lot of researchers or people who use DGL do is say you can assign data to the graph with g.ndata and then I can set it to some label called teacher and I can assign features to it so in this case I'm assigning this random features to the graph so these are my node features so for every node I have a vector associated with them and basically it's a full end vector so basically every row is the associated node feature with every node yeah so this is the node feature for node 0 this is the node feature for node 1 yeah so basically this order of the list maps your g.nodes output so yeah it's pretty self-explanatory then same for edges you can do the same thing but you use g.edges and here I assign features to all my edges and it matches with your source your g.edges function so for example this feature right here 0.7262831 is associated with the feature 0 associated with the edge 0 right here so now data learning, datasets with dgl so basically dgl provides already a lot of datasets so one example is the corral graph dataset it's a transactive task so it's one large graph for a bunch of recent papers which all cite each other so that's a graph because the links between them are the citations and then the actual papers are nodes and then the input vector is just a encoder vector and it tells you whether given word in some 1433 word dictionary is in that so for example if the word graph is given the number 30 and then the word graph is in that paper at that point of the feature vector of that paper at the number 30 okay it will be 1 but if the word graph was not at 8 it would be 0 so it's just like 1433 commonly used words in that entire dataset and then the output is basically what type of paper it is so it's classified into 7 kinds of machine learning papers but essentially dgl comes in with a lot of built-in datasets here and you can view the and essentially a dataset is a list of graphs so you can access so since it's only one graph you can just do dataset 0 to access the first graph this is only one graph you can view the n data in the label data of the graph so you see the label data is not one how encoded in this case it's just 1 to 7 so if you want to define your own dataset right here you need to overload 4 functions in it process get item and lend so get item just returns the item element of the graphs your process is basically your generation of those graphs or your learning of those graphs from your dataset and your lend just returns the length of family graphs and your dataset and your end is just initialization and you don't really put anything there if you don't want to so essentially you have to just overload these 4 functions and then you just put your graphs in some list called self.graphs and access them it's pretty self-explanatory so dgl actually provides a bunch of recommendations for commonly used gnn architectures so like sage.com get.com and graph.com chebyshev.com graph.com essentially graph.com is the cutting edge gnn right now but but it's a spectral gnn so I'll explain that in the spectral section so but before we show you how these different architectures work we're going to actually build our own gnn layer just to show you how easy it is in dgl so we're going to just define some graph data here and we're going to use some features of dgl to basically do our combination for us so the first one is called gspam or generalized fast matrix dense matrix multiplication and what this does is is that it basically does your message passing step so what it does is that it takes your in a multiplied example of this generalized fast matrix matrix function is so that means you want to multiply by edge weights and then you want to sum aggregate so what this will do is currently we have three nodes with all these given all these given connectivity and all these given features and we want to do message passing for them so we do the message passing and multiply them by their edge weights and then sum them up so what it does is is 2,3 so that's this node here that's this node here so it's only connected to itself and the weight of this edge is just one so the resulting node feature is 2,3 1 times 2,3 now if you look at this node here there are two nodes pointing to it this node here and this node here so we take the feature of this node and multiply it by this edge weight and we sum them up so like that it does it for you so essentially GSPMM basically does the multiply by edge weight and the sum step it does it all for you in one fast function and then we have GSDDM functions dense matrix matrix multiplication but essentially this is the one where you want to have a custom coding function or a custom aggregation function so a lot of GNS they just use multiply and sum so in that case you just want to use your standard GSPMM functions but some of them they want to use more complex steps you want to have more complex coding functions then you can use GSDDM so here I define my own coding function I named after myself basically I sum up the nodes and I just add one so instead of just a plain sum I just add one this is just an example and instead of putting in the actual values here, I'm using this new library called DGL function so DGL function is I can define it for general nodes so remember the feet and the weight I was defining this is where I can use so essentially I can do fn.unmulti take the features multiply by the weights and then store it in this value called mailbox so it just creates a new tag called mailbox where it stores those multiplied values and then take that mailbox value and then pull it in some way with sum and plus one so essentially this is really good for when you are defining your GNS because then you can just write out this function and then update your graph, update all propagation function and your pooling function and automatically update all the nodes in your graph for you so this is a more direct way you have to assign stuff yourself but with this you can just automatically do it for you so a lot of GNS just use GSDDMM they don't really use GSPMM so you can see that no data has updated already the new pooling function which adds one so here I made up this my own GNS architecture idea now it looks very complicated but it's kind of similar to what I was talking about we're taking all your neighbor's nodes multiplying them by some weight minusing them off from our original feature multiply by some other weight and then we're just multiplying by some constant which is just the number of neighbors this giving node has and then we're adding some bias and then we're adding some activation so the way we code this out is so I'm naming this after the BBCS so we have two weight matrices so we have two NNL parameters for the weights which is your integers time your architecture since it's a we have a 5 to 4 GNN layer a 5 vector to a 4 vector we have 5 times 4 matrix so in that case you define your two weights weight 1 and weight 2 then you define your bias parameter with NNL parameter as well and then our reset parameters is basically just how you're going to set your parameters and then activation is if you have an activation function so now we can code it like this essentially we have our aggregation function here for this step of our layer but the thing is right some data doesn't have edge weights so what we can do is that if there are edge weight data then we use a model E we use multiplication we multiply by edge weight data but if there's no edge weight data we just copy off the node data and that's how we define our aggregation function now this is just instead of using the N data and E data we use we use this this term source data and DSC data they're actually the same thing but they have different names for the convenience of the reader essentially this one if you use source data here it won't actually update the graph it's all local to this function which is why it's used a lot so this step here is we're just calculating this entire term here so we're just using a standard matrix multiplication operation with the weight and then for this we're taking again the features from here and multiplying it by the weight right and then once we have our pool functions we can define our sum functions so our sum function will take in our aggregation function and our pooling function so as I said the pooling function will store in the mailbox and we're storing this value in h so it's aggregating with this function right here and then it's storing this value in some value called h and then finally we put all this stuff together so we take this term right here this term right here and this term right here and we add them all up to get our final rsc value and then if we have an activation function we use it otherwise we just return the value so it's very simple to define spatial GNNs or any kind of GNN with dgl essentially you're using these gsdvmm functions which can do your pooling and your summing automatically now it does this mailbox technique right the reason it doesn't like this is because there can be some optimization done it can reduce the time complexity of how fast you do the operation so that's a good part about it so now that you now that you can know how you can make your GNN later you can define composite models with different GNN layers so this one I'm using the graphcon that I talked about earlier and essentially I can apply these the same way as I applied by Torcha so I take in the graph and in features they return some value then I can feed that through the reload function, the activation function then I can apply the convolution layer again yeah so I can define a composite model with multiple layers so this has two layers in this case and then I can just feed in the values I want for this so how many in features, how many are features so now I'm going to run a no-class mission on this DGL dataset the corral graph dataset and corral graph dataset comes in with built-in plane mass, validation mass, and test mass so what I do is I first define some ways to measure the accuracy of the model for our time and then I run it for some number of impulse space okay, logic is basically what the model outputs so as I said the model outputs seven values of what is the most likely class is going to be part of but then our data is in the form 1, 2, 3, 4, 5, 6, 7 so this argmax function just gives us 1, 2, 3, 4, 5, 6, 7, argmax maximum index and then that's our actual prediction and then we do our loss function we basically calculate the error of this case and then we just all of this is just like ways of testing accuracy, testing accuracy in one now so if I run this here and then I run my model you guys just add this line to your yeah, so you see it starts training add the model definition again model definition is local to the cell you declared in so that's why the bug happens but you see over time the loss decreases and accuracy increases which is the goal yeah so that's basically node classification we're classifying those nodes now Jed is going to talk about link prediction and edge classification and graph classification so now I'm going to talk about link prediction and edge classification so the idea behind this is that basically I have a graph and I have some nodes and edges in the graph and what I want to do is that I want to predict whether there is an edge between two different nodes okay so first we can load the data set we're using the same data set as the previous one and we just this is just for the same data so of course we split the edge set for training and testing and basically we find all the relative edges and then we remove all the edges also and we remove the edges because we want to predict whether there is an edge so that's why we're predicting so for this we'll actually use the graph stage mode there are some blanks here so we see first what we want to do for this model is I want to there are two convolution that we need to insert and so we're calling the activation function so we can basically use the stage we use the stage call function so you can see here like it's important for already and we like put in the input features the hitch features and we say that we want to put in the main component function and basically the same thing for the second convolution except that now it's just hitch features so next from this part what we do is that we call the first convolution and then we call a activation function so the activation function not necessarily we redo we could do something like switch as well but in this case we choose to use wave and then we can call the second convolution so you can see here that this basically forms the structure of the GNN that was not previously we construct the positive and negative for the training we have set and the idea so the idea is that now after we apply all the GNN layers what we want to do now are all the nodes and have an embedding so how are we going to create whether there's an edge between the two nodes what we do is that we could use a more complex function but in this case we just can do a dot product of the two embeddings and based on and that will give you some factor and based on that factor you will know whether there's an edge there or not so here we see what we do is that we apply the dot product between this stuff so we see this is applying this function across all the edges and with that we can construct the final model of the training so you can see here this is the function for the loss and the AUC score we're basically looking over we're running the model and then we're also completing the loss and basically doing that propagation in order to optimize the weights of the model so you can see here very nicely that the loss goes down and it's actually not done from Virginian but we'll just run it from here for now and now you compute the AUC score which will measure how well the model performs in prediction, the presence of an edge and you can see here that it obtains quite a high score of 87 so next this is graph classification so for this we'll use the proteins data set which is this is a synthetic data set with a bunch of graphs and it has some nodes so we'll wait for a while for it to download so what happens here is that we're running the data loader and we see this this gets a sample of the data set and then this basically with graph data loader which is a function which is provided by TGL to make it a lot easier to load the graph data sets so here we're constructing the batches so in machine learning we always have batches because it's very difficult for you to do bad propagation across the entire data set so you split it into multiple batches at the end you complete the gradients over a single batch and use those to optimize the model's weights and this is just showing like the nodes and edges of the graphs inside the each batch so now we can actually build the graph convolutional network this is actually a spectral GNN enzyme without me talking about the data so we can also use TGL.nn which has already pre-provided the graph convolution function the first convolution takes it in peaks and h-feets and the second one takes in h-feets so now we can also call the first convolution and similarly we call activation function so you can actually see that the structure is very similar to the previous GNN layers and finally we have the pull-in function at the very end and in this case we're using the mean pull-in function so it's taking the mean of all the nodes in the table so finally we can train the model and you can see here that we're doing the same thing we're basically helping to do the loss and we are combing back propagation across every batch so if you give it some time to train it will output a test accuracy at the very end so you can see here it attains a test accuracy of 26% it's kind of long but we're just trying to show you the idea okay so now that's like basically everything you know for like GNNs but for fun we're going to talk about some spectral things so actually we're running quite early we're going to put in you guys breaks do you guys want any breaks by the way anything you guys need okay then sure I'll continue okay so now spectral GNN and spectral filtering on glass this is like the more theoretical part we're going to simplify it because it's like a bit ugly as I said so essentially what is spectral filtering so it's like filtering data and spectral domain so you get frequency representation of data and then we understand it and then we understand filters on it so for example an example of spectral filtering in the use case is like now we use FM radio but in the other time we use AM radio so how AM radio works is that it's an amplitude alternating radio wave and then all these stream at different frequencies right but essentially every broadcast sensation of these filtering out their AM radio waves and the thing is waves interfere and in the end you get this really noisy signal of all the AM waves in your radio right but then what we want is our specific channel to listen to right so we need a way to extract the frequencies that make up this way right the frequency of waves that make up this entire combined wave right and get that specific frequency to the radio channel we're listening to right so this is actually called bandpass filtering and I'll show you how I'll show you what it is but essentially before we do any kind of spectral filtering we need a way to get these frequencies of these signals that exist in the graph so what we need is a spectral representation right and this brings us to an idea that is a bit complicated the Fourier transform right it's like introduction to engineering class so a signal basically essentially there was this guy named Joseph Fourier who was trying to solve a keep diffusion equation and he realized that any signal can be broken down into sinusoidal waves so like your waves like sine, cosine e, e, i, x waves right and essentially what the graph or the Fourier transform does is it takes that signal and gives you the waves that make up that signal and then it causes a pass at those points so essentially the spectral representation is like the plotted frequencies so if we have this graph here you know three different waves plus a bit of noise you'll have peaks at where those wave frequencies are so for example if this wave has a frequency of 1 hertz then it'll be 1 it'll be a pass at the 1 point then that wave has a frequency of 2 hertz then there'll be a pass at the 2 hertz point so essentially the transform that takes from this time representation to this frequency representation is called the Fourier transform right so for a signal standard signal wave it'll cause just a single pass right so it's actually a pass to infinity because I think you know why that is later but then if you have like a more complex wave like this is called the sinc function sin pi x over pi x or sin x over x basically it looks like this and it's actually made up of a bunch of waves right a band of waves a band of continuous waves centered around zero so it causes a graph that looks like this like a square because essentially all the waves are made up of this band so it's made up of all the waves that exist here to some degree so basically how does the Fourier transform work so this equation looks very unruly as well but essentially the e to the ix term is basically just think of it as a sinusoidal wave ignore it just think e negative iwt just think it's a sinusoidal wave and the w or the omega is effectively your frequency term so the higher your omega value the more frequent your graph is so what our Fourier transform is is basically for every, we're looking at a bunch of we're taking our entire signal and multiplying it by increasing the frequent waves right so to understand the inverse of increasing the frequent waves so for example we have a function cos t plus cos 3t and we have our values here we write it in our e i notation essentially it just becomes 1 negative 1 3 negative 3 omega and then we look at our function here fd e negative t when we take our first term e to the i t multiply by e to the negative i t we get 1 so essentially we're just integrating over the entire wave and that's just infinity so that's why there will be a pass at the frequency that is associated with it right so the reason why you understand how the Fourier transform works is like ok we have this Fourier transform for the time domain Fourier transform for graphs how can we have a graph Fourier transform right there's also the inverse Fourier transform which takes in the Fourier transform and just inverts the signal so there's the existence of the discrete Fourier transform right here which basically is a way of doing the Fourier transform but for a discrete space so as I mentioned convolution n to the 4 n to the power of 4 complexity naturally right but essentially convolution can be done really fast if you convert it to the spectral domain or you do a Fourier transform data so the convolution speeds up to o n squared log n with this so yeah and also I think you know about the discrete Fourier transform is circular in nature for example you've got pass and run you also have a pass and n minus 1 so the negative values don't exist in like a discrete space so like it's always 0 to n so if you have a pass and 2 then there will be a pass and n minus 2 but then in your standard Fourier transform you see your pass and 1 then you have pass and negative 1 as well so now I'm just going to show you how you do that in code so you guys can use the oh yeah I didn't get attached to the link here but yeah you guys can look at this one okay so this is just the discrete Fourier transform wait wait Jen your sign in is required okay I want to take a minute Jen sign in but essentially I'm just going to show you the Fourier transform on different plots in the world so the one I'm using so the one that I'm using is just the standard numpy network gangs libraries to show you this okay so I'm going to run this discrete Fourier transform on a few signals and show you what you get okay we're just learning libraries one second so essentially if I run it on this if I run it on this simple sign function right you see that there will be a peak at this is a sign function with a 1 hertz so there will be a peak at 1 and a peak at n minus 1 and I'm running it from 0 to 4 pi yeah there will be a peak at 4 pi minus 1 and a peak at 1 so so f t is in a complex domain so there's a real imaginary component to it but essentially they're kind of parallel to each other so you can just think of them as like it just tells you the the sort of polarity like if it's a negative sign x or a positive sign x you can look at the real imaginary but for r level case we don't have to look at that if you look at the if you look at a smooth function x you'll see that there's a number of waves that are higher there's more waves at a higher frequency and there are less waves at lower frequency they're more essentially the wave has is made of more lower frequency waves and less higher frequency waves so essentially by looking at a function we can look at a function f t you can see what type of frequencies the wave composes it so from this case we can tell that x is made of many lower frequency waves and very small higher frequency waves less high frequency waves you can look at this from the real graph as well so the thing is we can view our Fourier transform as a matrix operation as well essentially this is called the I don't know why it's a different colour weird but essentially if you guys I should just google this so this is called the discrete Fourier transform discrete Fourier transform matrix it's essentially a matrix of that just is the same as the Fourier transform function so as I said this is an o n square if you do it like this it's o n square but because of the thing I don't call fast Fourier transform can be done at o n log n but I'm just trying to show that you can use it like that and essentially it's the same thing it's a bunch of increasingly frequent waves so you see the first wave is like very smooth it's just straight to one so there's no movement at all there's no high frequency to it and then as we go down the list it becomes more and more frequent I think the mic died I'll just speak really loudly so with this we can basically see what our DFT is made of as I said increasingly frequent waves so you see the first wave right there in our DFT in our discrete Fourier transform matrix is really smooth but as we go down it becomes increasingly less smooth so this should look like a wave but because of how it's a more frequent wave so now I can apply the band pass filtering as I was talking about earlier so essentially we have this complex wave made out of 10, 15 and 20 hertz sinusoid and we apply this 13 to 17 hertz band pass filter to it so what we do is right is we take the Fourier transform of this signal right here and then what our band pass filter does is is that it defines a zero of all these points and it defines a zero of all these points and it defines a hump like an upward curve for the point from 13 to 17 as shown here right so when you multiply this with this we basically isolate out the specific frequency we want and then that's how we get the specific frequency we need for our radio channel so we'll get some complex wave made out of all these different waves but then we get that specific wave that we want using this band pass filter so we get our spectral representation ignore the waves next to it just get the frequency we want and then apply it so now I'm going to go back to the slides so now that you guys know about Fourier transform on normal signals we're going to apply Fourier transform on graphs right now so essentially we can find us Fourier transform on graphs we can do spectral on these graphs I'll show you an example of that as well so essentially for a graph Fourier transform we can't use some normal modification of the existing Fourier transform because our data is non-euclidean it can be any sort of combination so what we need is basically we need is similar to what we need is we need lower value we need lower frequency the top, higher frequency signals at the bottom and this frequency now I'm just showing it here as like a matrix but essentially this higher frequency must be represented on the graph signal so there should be higher frequency waves at the bottom or higher frequency waves in the graph and lower frequency waves at the top so essentially we need a way to measure the smoothness or the frequency on these graph data and for that we need two tools we need the Laplacian matrix we need eigenvectors and eigenvalues so what is an eigenvector and eigenvalue essentially it's basically as I said matrices are transforms of vectors they rotate and scale vectors an eigenvector or an eigenvalue an eigenvalue pair is that an eigenvector if you add a matrix to it if you multiply a matrix by the eigenvector so say with a matrix it will only scale the vector it will not rotate the vector but it will rotate on its own span so we take this vector here multiply by this matrix here all it does is scale the vector it just increases the size of the vector or decreases the size of the vector but it never rotates the vector so that's what an eigenvector is so basically for this matrix here 0, 1, negative 2, negative 3 we have two eigenvectors negative 1, 2, negative 1, 1 and the associated eigenvalue is basically the value it's scaled by there will be n eigenvalues for any n by n matrix n eigenvectors for any n by n matrix what we need eigenvectors and eigenvalues is for eigen decomposition so because AQ because our matrix times some vector is equal to a vector times some scalar value, we can basically write a higher matrix in terms of eigenvectors and eigenvalues so each of these are associated eigenvectors and each of these are associated eigenmetrics and then this matrix last here is just the inverse matrix of this one so we can effectively write an entire matrix in terms of its eigenvectors and eigenvalues now why is this useful in linear algebra is because in linear algebra squaring matrices or taking powers of matrices is very computationally expensive because you have to multiply the entire matrix by itself and that's n cubed or n log 2, 7 but just think of it as n cubed but what you can do is if you take the eigen decomposition n cubed you can get this diagonal matrix of eigenvalues and then you can square this and then multiply it back by its eigenvector pairs, you get the square representation so that's the value of eigen decomposition but I'll show you how you use it later so for the matrix earlier I used eigen decomposition on it to show you and basically because eigenvalues are sorted by from smallest to largest this is very important eigenvalues are sorted by smallest to largest so now this brings us to the Laplacian matrix essentially it's the degree matrix of the graph minus the adjacency matrix so for example node 1 is a degree of 2, node 2 is a degree of 3 node 3 is a degree of 2 node 4 is a degree of 1 that's its degree matrix and then it's associated adjacency matrix and we basically take the degree matrix and subtract it from the adjacency matrix and we get our Laplacian matrix so why it's called the Laplacian matrix is because Laplacian is also known as the secondary derivative operator and basically it's a way of measuring convergence and divergence in a graph so we have an existing graph signal and we add the graph Laplacian we can measure how converging or how diverging those graph signals are so for example here we take our Laplacian matrix we look at this graph here and we multiply it by some signal f what we do is we're taking this node right here multiplying it by its strength and minusing off its labels so we're measuring basically the convergence and divergence in the thing or the strength of each of the nodes in the graph so why this is useful is basically if you define your signal like this you take the transpose of your signal multiply by the Laplacian matrix and multiply them by the filters again you can see what it's doing is basically comparing every node with each other and then summing all of them up so really what is comparison between nodes highly frequent data very rough data, not smooth at all is going to have a very high value a very smooth data where values are very close to each other as neighbors it's going to have a very low value so essentially this is our measure of smoothness so now we get our measure of smoothness on graphs I can show you here this graph signal right here is very smooth because it's all continuous so it's only one zero crossing you see the way is gradually across the graph data the data is smoothly distributed this graph right here you see that the data is all uneven jagged it jumps around a lot and that's very rough so now I'm going to show you these graph signals with code once again go to the same graph so the way I plot these signals on the graph is basically I did this myself because basically I'm doing a very niche task but essentially I found a way to basically you take in a graph and you can plot signals across that graph I plotted positive and negative signals across the graph so this is the smoothest measurement from the Laplace transform as mentioned earlier so if I look at these signals right here this is all even signal so the difference between these two signals these two signals are very small so it's a very smooth data so essentially if I run the smoothness measurement on this it has a smoothness of zero the closer the value is to zero the more smooth it is but if I look at these signals here and I run it we see the smoothness value is like 28 so it's a very unsmooth, very rough data so now that we have a measure of smoothness we can start to define our graph Fourier transform so for example what the graph Fourier transform is is you take the eigen decomposition of the Laplace matrix and the reason we do this is kind of intuitive because the thing is Laplace matrix is symmetric so when you take a symmetric eigen decomposition the inverse of the eigen vector matrix is always its transpose it's called orthogonal property so the eigen vector matrix here and it's inverse is just the transpose of each other now remember what I was telling you earlier about smoothness so since the eigen values are sorted by size we see that for every pair of eigen vector Laplace and eigen vector we have an increasing eigen vector so essentially our graph Fourier transform if we define it as the transpose of the eigen vector matrix essentially it's telling us that all the eigen vectors become less smooth or become more of a higher frequency as we go down the eigen vector matrix so our first eigen vector will be very smooth our next eigen vector will be very jagged our next eigen vector will be less smooth and our last eigen vector will be very jagged and the reason why you can see this is because the eigen function which is like a neighbor of eigen vectors which has a computer style bugging but as I said the eigen function operation is smooth so if you take the standard Laplace in the second derivative and you take its eigen function which is like eigen vectors we can compute that the inverse Fourier transform basis but if we take Laplace matrix and we take its eigen decomposition we get the inverse graph Fourier transform so really the graph Fourier transform is you just take the signals and multiply them by you take the signals and you multiply them by the inverse of the transpose of the eigen vector matrix which is the same as the inverse and inverse transform is you take the eigen vector matrix So now this is how we define our spectral filtering, right? In the past, what we do is we take the Fourier transform of some signals, we multiply it by some filter, and then we do the inverse Fourier transform to get the combination of them, right? So in GFT, we do something similar. We take the graph Fourier transform, we multiply it by some filter, G hat of eigenvalues, and then we add IGFT, and that's basically graph spectral filtering, right? So there are different types of spectral filters that exist. One is you can just use a diagonal matrix with random parameter. This is called a non-parametric filter, right? And then you can have parametric polynomial filters, and this becomes just translations of your Laplacian matrix. So it's called, so the reason why you include the eigenmeters, right, is if you want your filter to include some sort of locality to the data, some relation to the spatiality of the data, right? So now I'm going to show you how we can actually do spectral filtering on graphs. So what we're going to do is low-pass filtering or denoising, right? So if I open up my graph thing, I'm going to prove to you that the graph Fourier transform first works. So if you look at our graph right here, standard graph, and I plot the first eigenvector of it, so I calculate its eigen decomposition with this simple number formula. You see that the first graph, it's all smooth, right? It's all just a smooth continuous signal, all negative signal, but it's very smooth. The next eigenvector, right, is okay. It's not as smooth as the first one, but you see there's one zero crossing here. It goes down to a lower. Now we look at our fifth eigenvector, right? Oh wow, it's like very unsmooth, it's very jagged, right, and we look at our final eigenvector, eigenvector six, right? You can see, oh wow, this is even more jagged, right? So then, if we see our eigenvalues, right, and we measure the smoothness of each eigenvector, right? So the smoothness is, it increases as we go down eigenvector. So this is of a higher frequency, this is of a lower frequency. So that's how our graph Fourier transform works. Essentially, what we're doing is we're generating a bunch of increasingly frequent graphs on our graph data, right? So every graph Fourier transform is unique to every graph, yeah. And it's essentially the frequency representational graph. So now I'm gonna do graph spectral filtering on it, on this data. So if you guys don't know what low pass filtering is, it's a very common thing used in signal processing. Essentially, the idea is noise data, right? Noise is very rough, noise is not smooth, okay? Noise is very random, right? And therefore, noise not being very smooth is very high frequency. So what we're gonna do is remove those high frequency signals and get only those low frequency smooth signals from data. So here's a graph of data, right? I inject it with a lot of noise, right? Now what we're gonna do is, is we're gonna define this low pass filter, right? It's a non-parametric filter I use for this one. Essentially, I'm biasing higher, low frequency waves, I'm biasing higher frequency waves I'm ignoring, right? And then I apply this filter to my data and we get a smooth graph. So if I, if you wanna see a side-by-side comparison, see this is before, we had very unsmooth, noisy data and then we got smooth data from that, right? So we took this graph, added low pass filtering in the graph spectral domain and we got this smooth representation of the graph because of low pass filtering. So basically, you can apply this in like every sort of view. You can apply this in like, graphs can, images can be represented as graphs. Any kind of, a lot of data can be represented as graph, dictionary data, right? And you can always do some kind of smoothest low pass filtering on it, right? So that's like the real power of graph spectral filtering. It's just it has such a wide range of applicability, right? Like researchers at MIT were recently doing, like they were looking at brain wave patterns, right? They wanted, our brain is a bunch of connections, a bunch of nodes. You can view our brain as a graph. So actually there's a huge amount of impact for graph signal processing on neuroscience research, right? Because what they do is they view the graph as a brain. They need to view like a useless, they want to remove useless signals. They do stuff like low pass filtering, band pass filtering, right? So low pass filtering is removing the low signals. Band pass filtering is filtering a certain range of signals as there's such a range of frequencies and high pass filtering is taking high frequencies only. I don't think high pass filtering doesn't have that many applications, but if you want to isolate noise, you can use high pass filtering. Yeah. So that basically concludes all the notebook stuff that we're going to go through. Now we're going to finally go through basically the problems with this approach. So the problem is that eigen decomposition is way too slow. It's O and Q. So like if I have an image, right? Of 100 by 100 image, that's 10,000 pixels. So it's a graph of 10,000 vertices. If I feed that into O and Q, it will not fill that, it'll be too slow. So what we do is we use our propagation, right? And the approximation technique we use is this thing called Chebyshev polynomial. You don't have to understand what Chebyshev polynomial is. Just understand that the reason why we use Chebyshev, so there's a lot of approximation techniques that exist, like Taylor series, Maclaurin series, yeah. So the reason, the thing with Taylor series and Maclaurin series, right? Is that the error is kind of fixed around a point, right? So like for example, if I define my Taylor series around the point zero, there's going to be very little error at zero, but as I move further away from zero, the error is going to increase and increase and increase, right? But what Chebyshev polynomial does better, right, is that the error is kind of fixed to a band, right? Now, of course, the Maclaurin series approximates in the entire infinite series, but Chebyshev polynomial only can approximate from zero to one, but then you can scale that round up. But essentially, the Chebyshev polynomial has a fixed error, and the error is spread across the graph data. So we kind of need this as approximating graph spectral filters, right? Because we don't want, if we're assuming a very sinusoidal data, right? We don't want one part of the sinusoidal data being very, not approximate, another part being very close to it. It'll be a completely different signal. So therefore, the Chebyshev polynomial fixes the error. It's a very important approximation theory outside of graph spectral filtering, right? So for our spectral GNN layer, right, we basically replace this filter with parametric polynomial filter, right? Of like this, right? So the formula error is on my Chebyshev net, right? Essentially, it does an approximation of this filter, right? Again, you don't even know how it does it. Just this how it does it. So you can do a kth order approximation, and for spectral GNNs, you only need to do one layer because you see that the higher layers already account for more neighbors in GNNs, more wider range of filters, right? And then this scaling term is just a scale from zero to one to zero to the maximum eigenvalue, yeah? So this is just a scaling term, and this is just a normalization term, yeah? So now I'm gonna talk about the kind of state of the art model for GNNs, which is the GCN layer, the graph con, right? And essentially, this is just a further approximation. So if Chebyshev was an approximation, this is an approximation of approximation, right? And it's the state of the art model. So essentially what they do is they pick a second order Chebyshev approximation, right? And then these two are the same parameters, right? So I mean, these two are just parameters, so they just make it one parameter, right? And now this term right here, i plus d minus half a, d minus half, this is just normalized the GCN metric based on the degree matrix, right? Essentially, the problem with this, right, is that if you realize, right, if I keep running this for more loops, right, and I'm only renormalizing the GCN matrix, but not renormalizing the identity matrix, my gradients can explode or they can vanish, right? So the way I can solve this is by normalizing after every layer, but normalize the entire expression, right? So what I do is I make this i inverse, a, a curl is a plus i, and then I also define my new d i with a row that only. It should be, ignore this thingy here, ignore this term. Yeah, but essentially, you just sum up the values of the GCN matrix. So actually, this kind of step is a bit abstract. In fact, I don't fully understand it because the people in the paper, they just like said, oh yeah, it's to prevent exploding or vanishing gradients. And then like, when people ask them, why, where are they exploding or vanishing gradients, right? They basically disappeared out of the face of the, no one can contact them, I don't know why. So we just have to guess. So, but I think it makes sense in that essentially this term is always being normalized and we're always adding like some scaling constant to it, we want to, it'll basically blur a proportion at certain points. Yeah, because if this is like zero to one, and then this is just adding a whole, it's a whole scale upwards, right? It'll cause like a larger gradient as we add more and more layers. And essentially for GCN, instead of doing, the reason they did only a second order approximation, or first order approximation, is because they allow for more layers, multiple layers in GCN. As opposed to a Chapman, there's only one layer for your model. I mean, there can be more, but I mean normally people just use one layer because it's fine representation and this is the currency of the model. So if you see, it's like, actually this is even a further approximation. So the currency of the model in GNNs is an approximation of an approximation of an approximation. So yeah, a really nice thing about life. So finally we come to the end of our talk, right? This is supposed to be three hours, but we've got it under two. I mean, not under two, but a bit overdue. So we talked about spatial GNNs and the idea behind a message passing and spectral GNNs about how they're in the frequency domain. And moving forward, you guys would want to use, and now this, my each teacher always said that we have to enter essays with a call to action. So essentially, when you want to move into more research, you guys can try out working with more function with deep graph library. This is the logo for Python Geometry. So actually, when you go to a professor and tell them, oh, I want to do a research project on graph neural networks, that I can really tell you, find out geometry. But find out geometry is like very poorly documented. And DGL, that's why we use DGL for this, because DGL is a lot easier to document. So we really should really start with DGL. But once you learn find out geometry, it's a lot more, I guess it gives you a lot more power, which is why research uses find out geometry. You guys are supposed to check GSP toolkit, GSP box, graph signal processing box. The final line is called Pi GSP, right? And essentially, GSP box can help you do graph signal processing on these graph data. So as you saw, I did a low pass filtering in NumPy, right? And I did it in like a bunch of complex steps, right? But GSP can do it in one step automatically. You can try all different methods, like heat diffusion filter, low pass filter, high pass filter, band pass filter, whatnot, right? And then finally, we didn't cover graph isomorphism network, and which is basically another type of GNN architecture that preceded sports spatial and spectral GNNs. So before spatial and spectral GNNs, we had recurrent GNNs, recurrent graph neural networks and GINs, right? They're quite archaic, but they're useful in some cases. So if you want to research more about them, you don't have to, but you guys can check it out if you want, yeah, I just have to tell you it exists, yeah. So references is like, we have like 18 references, not properly formatted. So we're going to update the GitHub on this topic later. But like special mentions are like, basically Xiaowen Dong is like Oxford Man Institute lecturer, and he basically does like, he's like a lead researcher in spectral processing, respect the ground spectral processing. So like, we had like at least, I don't like six of his papers, yeah. And then the UPAN lectures on signal processing, that's where I got the time convolution inspiration from. And then you can look at these two review papers right here and there. So, so basically the organizing part of it is called BBCS, right? And it's about, it's an outreach organization to kids, right? We're going to be, basically we're teaching teenagers, we're trying to give them passionate CS. Now, this talk was quite complex, which is why we don't want to culture teenagers with it. But if you guys could, you guys could check it out. Basically, if you have a child in secondary school, or a polytechnic, or a JC, right? You guys can, yeah, so this QR code basically links to all the documents in the GitHub, right? It also links to my LinkedIn. Jen, you put a LinkedIn, eh? What's a LinkedIn? You know? Okay. Yeah, you can connect with me on LinkedIn if you want to. Yeah, but essentially this has a link to, the readme has a link to both BBCS, all the documents we used today, the PowerPoint, and yeah, basically all that information about RASA. Yeah. So if you have, if you have a, our BBCS June conference is in 29th May, 38th May, and 5th June. And if you have a child interested in it, I don't even know why I'm saying child, they're my age. If you have someone interested in it, please sign them up for the conference. The sign-ups are opening soon. Yeah. So that's basically it. Thank you. Yeah. Okay. That's a wrap.