 So what do we talk today about what did we talk yesterday about right yesterday was amazing I think really challenging really new topics well like at least for me and we cover see I had to prepare this and I forgot what did we cover yesterday here this stuff right only we talk about the GTN a framework for automatic differentiation with weighted finite state automata right and there is also a website right of this stuff here there is also an article right so you want to check this out we should have mentioned in the in the thing right so again this stuff here let me see there should have been this book this one all right here there is like a summary of what this stuff is about the thing we cover yesterday right and what it tells you here and then if you even check the documentation and in the examples they even go through the where is it they if you go through the example you're going to be also see how to do handwritten recognition right anyway so here we saw how we can use graphs how we can augment by torch by using graphs instead of tensors okay so this is like an extension and then we we figure how to run back propagation through these operations of the joint or no no no joint the other one intersection right so we had the soft argmax right we compute the forward algorithm which is computing the softmax and then we run back propagation so if you compute the the gradient of the softmax you're going to get the soft argmax right and then the soft argmax is going to give you the gradient for each of these items in the which we had in the softmax right in the actual softmax and then each of these cores were the sum of all the weights on the edges right and so the gradient goes for each edge but then the same edge can be used in multiple paths and so the gradients will be different and where are those weights coming from the network right so the network gives out the energy that was one type of grass okay so this is how to move from you know using tensors to moving graphs in deep learning today instead we're going to be covering another type of graphs similar but different right still graph but completely different manner right today we're going to be talking about how to have representation how to have tensors living on dirties and edges of a graph okay still graphs different thing I hope I'm not going to be confusing you too much so if you want to read more if you want to learn more about this stuff Xavier Bresson which I made a few nice very lessons so you can look up this tweet right you can just type from you know if you already know this right we just do from X Bresson and then you want to type graph neural networks 2020 right type this one you're going to find this this thing we already know how to use Twitter right and so this is very good and of course one of these links is going to be the the videos on my channel right so there you go okay so this is very good he's very like he's good moreover you also want to pay attention to where is it here you're a leskovets and he's cs224w machine learning with grass right from Stanford they have videos everywhere or you can find videos I don't know if they're officially released but anyway you can find things okay cool so what are we talking about today right graph convolutional networks exploiting domain sparsity okay what's going on we see this soon so remember from last class we talk about self-attention we had this combination of elements in the set right so we had a hidden representation H here it's going to be the combination of these elements X which are given to you in this set of inputs so we have no order right we have no idea which is number one which is number two you can assign numbers but to you know to bookkeeping but you don't have a sequence right don't have an order inherent order so then you sum these columns these these vectors by uh you know first scaling them by using a coefficient alpha where are these alphas they are contained this vector a right how many items has what is the size of a t why the size of a is t because there are t axes in my set what is the final dimension of the sum well there's the same size of each of these pink things each thing has size n therefore the size of h which is the summation of all these columns scaled by you know alpha we're going to be still in rn of course right you're summing oranges to get oranges right even if you use coefficients in the front where where where was this alpha a coming from do you remember typing the chat where was a coming from what was a called a was my attention vector where was a coming from do you remember yes no you have to type in the chat otherwise we don't go anywhere from the input no of course not the attention well where is a coming from x is my input right x is the we have a set of little x is x1 x2 x t and then we said we sum these components by using coefficients alpha where are these alpha coming from how did i compute these alphas where is this vector a coming from do we do we remember there are 50 people in class can someone help me out please no yes we don't know the query is yes so we have some queries which was a question what did we do with the question you have a question how to make pizza what do you do then now how to make pizza margarita then you compute you check the alignment the similarity with all the keys no pizza margarita pizza marinara pizza quattro formaggi pizza alla diavola and then whenever you find these matches what do you do so we have a bunch of scalar products what do you do afterwards is ea the output of the scalar product no what is a do we remember yes no soft argmax yes yes yeah so given the scores right given the scores you want to have a conversion of that into something that sums up to one or you can use an argmax you just pick the highest score and then you retrieve one recipe right you pick up the pizza margarita pizza whatever you want to get to cook or if you want to make a mixture of recipes don't do that for italian cuisine it hurts my heart anyway if you have two matches right you're gonna pick half of that one half of the other right how do you do half and half well you just pick the soft argmax of this course okay very good Raul and and and allen okay cool so a was computed right it was computed by this similarity scalar product alignment whatever you want to call it all right moreover we also saw this slide in the convolutional neural net lab lecture right so in particular the one i want you to remember pay attention was that my x no my set of data points of you know my my my data my observations is going to be these x i's no i is going to be my index that allows me to address different of the different items in this set these are functions right going from this domain omega so from omega to these rc the channels right and we saw many other examples right we saw things that are moving in one line things going on a plane even more exoteric things anyway we go from one domain to a you know vector space right rc so we map one item here lowercase bold omega into you know a function that is changing when you move the point right so you have a vector associated so we have given a point here you have a vector right in rc as you move this vector changes you see this right you move here this vector goes like this you move here so you have a vector associated to each location of this domain omega cool so these are the foundations we are going to be building on top right so given that this is understood given that the other part is understood at least two people are following i don't know what happened to the rest of the class today so let's see how we can move forward in today class so GCN what is GCN graph convolutional network we have a vector a which before we were calling this the attention vector which was coming from the soft darkmax of these metrics of of transpose keys right in my query my my queue sorry i'm flipped right so you have the metrics of transpose keys and then one query right so you have a row column row column row column and you have these vectors of scalar products you send it through the soft darkmax to get this pseudo probability and that was my a right here my a i'm going to call it agency vector and it's given to you finish the lesson is concluded more or less that that's the only difference okay that's the only difference from yesterday well last week lesson okay last week lesson we learned that a was the attention vector which is computed through again the transpose k vector matrix query you have the scalar products sent through the soft darkmax pseudo probabilities or argmax you get the one hot here a is going to be a binary vector which is given to you by whom we don't know by someone okay the data structure so the data you observe has a specific structure underneath enough you know the spending words on one line let's see what else is going to appear on this screen oh okay picture cool so we introduce now my v what is v v is my vertex my my given vertex right on which i have the representation x and the header representation h what is v v is the same as omega before remember we have a omega capital omega domain each item here was a lower case omega now i have this lower case omega is going to be exactly these vertex v okay so this could be my space of vertices for example okay what that's oh well of course so given we have done this for the last lesson right we had the generic x and then we had also the other axis right same here we have the given v the given vertex and all the other vertices v j so the other vertices v j's have representation x j and hidden represent representation h j right another one another another one one more okay cool so so far nothing has changed as we have seen last time we had these items you cannot sort them because they are in two dimensions i don't know how to sort things into dimension here on the screen and so these are elements of a set the set of vertices right which you can number but you can shuffle the number doesn't change anything oh first difference now someone gave you some connections this arrow shows you that v j has is the source vertex there is an edge that goes to the v vertex okay same for the other one then for example v is connected to the other one on the left hand side and then we have three more connections cool so now what is this a vector my a vector will have its own components set to one if and only if the jth item has an arrow towards my own self right so i am the vector v right my my vector v will have a agency so my my vertex my vertex v will have an agency vector a and the jth component of my agency vector will be set to one if there is an incoming connection from that vertex okay so let's say uh how many okay what is the size of the vector a right now type in the chat more than two people and not just Raoul and Alan please what is the size of the vector a how many items elements does the vector a have uh it was t yes what is t in this case i i drew things on the screen right lowercase t it was but yeah what is t in this case in this specific case i i drew things on the screen what is t yeah i see camila yeah uh yeah how many how many v's do we have so camila said number of v's how many v's do we have how many nodes do we have now and we count six fantastic okay so a we'll have size six fantastic let's call this one uh let's count from from left to right top to down right one two three four five six let's say right so what is gonna be the agency vector for my v can can you okay tell me maybe camila it's gonna be a vector of all zeros and then which components are gonna be set to one if this is one two three four five six this is node three right one two three so if this is node number three right the vector will have first component and second component set to one because these two items will have a edge coming towards me okay so this means alpha j is gonna be a zero vector and has well i didn't say that but okay it will have the components associated to the incoming connection right set to one okay so if this is element one this is like if this is called vertex one vertex two these two have an in like an arrow towards myself and so i will have a one in correspondence to the first element a one corresponding to the second element cool all others are zero right self-connection there is no self-connection zero zero zero zero okay so one one zero zero zero zero right one two three four five six yeah cool so what is lowercase d well lowercase d is the number of incoming edges right so that's the one norm of a right just some the ones okay so what is my hidden representation so what was the hidden representation before it was simply the sum uh items in the set right scale by the coefficient guess what is gonna be here the same tada okay what is capital x a as we have seen just five minutes ago capital x a is the summation of the columns of the x right by the coefficient store in the a so in this case what is going to be h just a sum of the x leaving on vertex one and the x leaving on vertex two right because i have only those two set to ones right if you have a that is one hot you simply selecting the jth or whatever item right the su of the x on the first in alpha i don't know if there's some yeah that's the sum of the x uh on where there is the ones yes of course right because this means as we have seen before right so we already seen this just before whenever you have a matrix times a vector matrix times a vector means you sum the columns of that matrix right x one x two blah blah blah x t scale by the coefficient in the vector no so if a has alpha one alpha two alpha t and you have the first vector uh scaled by alpha one second vector scale by alpha two and so on okay let's go back here all right so here is going to give the sum so what's the problem with the sum well now if you have multiple more more incoming edges your hidden representation is going to be blowing up right so now the hidden representation is going to be proportional to the number of connection how do we fix that help me out okay awesome you divide by d there you go then guess what this is going to be giving me just some uh combination of these vectors why not adding some rotation no why not because convolutional network right so let me put a rotation matrix uh v cool uh did we have forgotten have we forgotten something maybe there is a space on the left hand side I guess yes so perhaps we also may want to look at what is my own relation uh no relationship why am I talking about my relationship my own representation right what is my my own representation maybe we want also to rotate and add my own representation okay there you go then if we remember what are neural networks rotations and answer rotation and squashing there you go okay awesome very good so what is f we know positive part sigmoid hyperbolic tangent so given that we have now a set of axes right so we have this x i goes from 1 to t guess what you're gonna have a set of so if you have a hidden representation per item if you have exactly why wait why the self representation why not you can put as u equals zero then there is nothing there right nevertheless if you may want to use yourself as value then you can actually learn u to be different from zero right so uh the quite the answer to why the self representation for a expressibility right you can add one more degree of freedom we were saying if we have a sequence sorry it's not a sequence a set right if we have a set of axes a set of vertices and for each vertex we compute a hidden representation of course we're gonna have a set of hidden representation in matrix notation i just put all capital letters and i just press you know cap blocks and i type the same equation nothing changed right i just put capital letters what is happening here instead of having one vector a you have many columns right here instead of having one x you have the whole all axes cool and it works out right because how many a's you have one as many a's as element in x so if you have one a and one x here you have all axes and then you have all the a's right what is d minus one well that's just the d is the the diagonal of the i's and then you take them in verse cool that's it finish this is the graph convolutional network uh it's like convolutional neural nets but then you cannot actually you don't have the order of things right so in normal convolutional nets whenever you have like a temporal signal you you can have a kernel which is you know if this is your given sample your ad you can look at a few samples on this side a few samples of that side and then you have this kernel moving right if you have an image right have an image like that you have like this this kernel around here if this is your central kernel pixel right whether i had central because it's always odd number right these kernels then you can have some you know you can watch up you can watch right you can watch down you can watch left and then you move these around right but then there is order right there is top there is bottom there is left and there is right right now what happened with graph graph you have no idea where up and down or left and right are okay because each vertex can have as many incoming connections right and so your your doom right you can no longer distinguish this is different from the other one unless ah okay actually i can even do this now so right now everything looks the same right we have completely lost sense of orientation i'm like i have no idea anymore where i'm going right so everything looks the same so how can we distinguish things now can someone help me out we have vertices and every vertex looks the same right well they have representation on it but i cannot tell one vertex from the other right how can i start telling them apart so how am i connected to these vertices through a edge right if only these edges would have you know let's say some colors on them right so that i can tell different things apart right so let's do that let's put colors on these edges right let's also introduce now a representation living on the edge to spice things up right such that we can learn to put some order in this graph okay all right i hope it's understandable right despite despite that i'm you know crazy but i think i am i'm trying to convince you that things are done in a specific way because there is some logic behind okay are you with me yes no give me a thumb up please it's so hard to teach here without the reactions okay very good at least one person is appreciating and supporting all right we are talking now about residual gated graph convolutional network this comes from brasson and lorenz 2018 and this is the same dude who made a video i showed you before on the last year class edition so again we have the vertices the v myself right with representation x and h you have the other vj's in in green which have the xj hj you have all these other vertices and then the difference now is i told you i'm gonna have colored edges right so the actual color i'm gonna just have a representation so the ej represents the edge the jth edge right and i'm gonna have these connections here right and so on this edge i'm gonna have a input representation e x and a hidden representation e h okay so these are my colors right like a manner a way to distinguish between vertices right so vertices they are all made equal they have some representation on them but then if i want to treat them differently then i now have the option to change not just the weight on the edge but actually a representation like a full vector on this edge right okay all right so what how do we do this stuff so if you remember from before we said okay so let's start from the title right so the title is residual gated residual right so let's first work what does the residual means what does residual mean you're gonna have a residual connection which means you have yourself plus something okay there you go so i started with this myself plus something uh as you can tell just from the beginning you can find spot errors in this paper i think if you have something plus something that is always positive the stuff will drift okay so i argue here that we may want to have as well an additional parameter in front of this positive thing okay so i assume i i argue here that a missing parameter is in front of this item anyway what's inside well inside we're gonna have what we start from the left hand side with my self representation so i'm gonna have a matrix a multiplying myself and then i'm gonna have a matrix b multiplying someone else right okay cool uh then what we said before uh so this b is like the same as v in the line below right in the page before and so here we said the v was multiplying these other axes right so here we have this a no these agency matrix we haven't yet here at this one right so this is just the rotation of these axes cool so first of all you notice here i have an xj right i don't have these metrics full thing uh moreover now i want to sum this rotated xj this rotated incoming uh basically representation i want to modulate it right i want to change the amplitude using the second word in the title a gate there we go so we have a eta which is my gate function on the representation living on the edge okay and this should be this is my ej bold ej which is similar to this stuff we're gonna define it soon uh what next well as we saw before we have to we were summing all these incoming uh like the one corresponding to the ones right so all the vertices that are connected to myself so here i do the same i sum all the representation for the vertices that are connected to myself okay uh so what do we need to define what is this e right so we haven't defined this ej oh there's a question a and b can be said to be the weight matrices on the edges no there are no no or are they something nice no no no no there is no uh a is the weight matrix that is on myself right so this is not on the edge uh oh i see what you mean like this would be the matrix on my self-connection and this would be the matrix on the uh on the incoming edge right yeah i guess you can you can see this this way sure yeah yeah yeah right so a can be thought as being the weight matrix that is living on my self-connection and b can be thought as the weight matrix living on the incoming well on the on the edge that connects someone else to myself right and so whenever you go yeah whenever you go through that you're gonna get this multiplication but then again here we want to be able to modulate as well this incoming rotated representation by this gate and this gate is function of this e edge something we figure what is this so this edge it's simply this summation of multiple terms c d e which is the you know edge representation so the input representation living on the edge this one over here then is summing the myself for so the the sorry my bad the external vertex representation this guy over here and then is summing a rotated version of my on representation right so we have a rotation of the represent input representation living on the edge plus a rotation of the representation living on that vertex right on the jth vertex plus the rotated version of my own representation okay so all these are x x x right so this is all the input right so all the input edge representation foreign vertex myself so what is this ita well ita is going to be some sort of soft arguments right we have this instead of having the exponential here they use like sigmoid of the representation divided by the sum of the sigmoids right again like a soft arguments with different whatever uh non-linearity okay how about if we want to have multiple layers right how do we create multiple layers now so let's say this is my h my first hidden layer okay how would you go about to generate the second hidden layer for the vertex representation and for the edge representation just repeat right so you you you keep the same graph but you keep these you know uh equations right and so we're gonna have now that my hidden representation uh on the edge is going to be oh i forgot this and like the residual connection like the previous representation plus this positive part again here i believe there is a missing weight this stuff otherwise just drift anyway we were saying we can replace a x with the elf hidden representation this is going to be simply the representation at the l layer for the jth vertex right and this one simply is going to be the hidden representation on the layer l plus one right so if you replace x with h at the layer l and then the xj with this hidden representation right at the layer l then you're gonna get the hidden representation of the next layer and you can do the same for all of you know around here cool so that was the um slides are there questions if there are no questions we are there questions first answer my question is it clear right so here we have this vector living on the vertices and also you have vectors with tensors right don't don't have to be vectors you have a 10 tensors living on the vertices and then you have tensors living on the edges right if it's the first layer is going to be called the input representation x if it's inside the network it's going to be hidden right and up until you get to the last one the final uh the final output representation the y till then right and then you have the target how do people make graphs or decide what structure is good do we just figure it out by thinking about our application yeah so this is very uh application specific right so this could be like molecular molecular structures right so this is coming directly in your uh data right this could be like a friendship right if you have facebook for example uh facebook has uh non-directed they have both right so they have a non-directed connection for friendship if you're if you're my friend i am your friend or if you think about twitter instead that's a directed graph right you follow me i don't necessarily follow you right i mean i don't want to sound arrogant but i i i you don't have to follow back right although it's nice they say i don't care um so you have your data naturally comes with a graph attached okay and the point is that we can now use this kind of neural networks uh that are you know leveraging the graph structure that is coming to you the point is that uh that that actually i didn't stress enough that the title of this lesson was exploiting domain sparsity okay so what does this mean this means that not all items have to look at every other item right so in this case here we completely forgot about everything about here right we only care about our followers right maybe we don't care about who we follow okay similarly here here you can just have a few sums that are corresponding to the few items that are coming to you right so although this graph can be huge you know think about the whole facebook graph right you don't have to look at every person you just have to look at your neighbor could right um and so this sparsity is like key in uh to to actually make these uh things work in this case so moving on we're gonna be looking at how to do this in pytorch okay this is actually uh rather non trivial so we're gonna be watching this together okay so we open the terminal uh oh what happened here okay all right so we go in work github pdl uh konda activate pdl right and then we have jupyter notebook gated graph convolutional network so here i import the os and then i do the following right i said i environment variable so i set in my my environment variable dgl back end to pytorch such that pytorch is used and then i import dgl also here i import dgl uh graph and then this mini gcd uh gc data set right we're gonna be looking at this right now hold on all right so here this is just for drawing with nice colors and with on care so what is this mini graph classification data set so mini graph classification data set uh has this following argument right so first of all actually what is this dgl right so let's figure out what is dgl so if you open dgl pytorch here deep graph library no so you may want to look up this stuff high performance dgl adopts advanced blah blah blah there is no description of what this stuff is uh anyway so this is this library for dealing with deep learning on graphs okay and i think there was here the description build your model with pytorch and uh no okay we don't have description on this website never never mind all right so here i imported these libraries plot things oh yeah we are talking about here mini graph classification data set so num graph is going to be the number of graphs in this data set uh min num v is going to be the minimum number of nodes in this graph max num v is going to be the max number of nodes yeah so what are we trying to do in this example and in this uh yeah for this example right so here i'm going to be specifying the types of graphs we're going to be having we have the cycle star wheel lollipop and so on we're going to be looking at this right now so we ask for eight number of graphs in a data set right which are this one four and four eight which have from 10 to 20 vertices and then i display uh based on the label of what they are okay so the first one is going to be the circle graph and as you can tell you know maybe i have to unzoom a little it has just these labels going from zero one two three and so on until 11 as you can see there are double rs right there is a non-directional in this case um in this other case i have a star graph okay as you can tell each of these so each of this one how how is gonna be the agency matrix uh look for this how how is it gonna be looking for this one you're gonna be set to one the one of the diagonal no it's not on the diagonal there are almost right so for this one you have a one let's say okay for the first row you're gonna have a one at position number two and also a one at the last position number 12 for the number one you're gonna have a one at position one and a one in position three right so you have like a double diagonal with a zeros in the diagonal right so you have a zero diagonal matrix with a two double two by by by the diagonal i forgot how what's the name there's a word for this matrix right with the two things yeah two consecutives once and in the zero zeros on the diagonal right uh how about this one right what what is this one here so zero has ones everywhere right boom and then the all of them have a one in the first column right in the in the first sorry in the first row so the the zero right has all the ones right so the the zero has all ones columns in the columns like all the column that is a one column a one column and we're and all of the other one are gonna have the ones in the in the first item right so it's gonna be like that one one all zeros right anyway so this is the star graph and as we said here this can have anything between 10 to 20 right so this one we said has a matrix of size 12 by 12 right because there are 12 vertices this one here has a matrix of size also 12 by 12 because there are 12 vertices this one here i don't know why all of them have size 12 they should be having from 10 to 20 okay so maybe this is a coincidence i don't know anyway so these agency matrices have a arbitrary number of you know is arbitrary sized they are square matrices but they are arbitrary sized and again they have specific patterns right this is for the star one no the yeah then we have the wheel you can tell this is a wheel we have a lollipop okay this one goes up to 18 for example this was going up to again 12 uh this one goes up to 16 i believe right so all of these have specific patterns oh we have also our grid right our image basically and this one goes up to nine uh and so on okay so what is going to be our objective in this lab in our objective here is going to be to classify these graphs as one of the eight possible options okay it's going to be classification of graphs how do we do that so we're going to be just applying that equation we have seen before so here we just uh set you know my uh we create artificial artificial features uh let me see and we set the feature on the vertex vertex to be the degree so the number of nodes that are connected to that vertex and the number on the edge is simply going to be a one okay so this is fake data i'm putting on my vertices and edges just to initialize this graph we only care to classify these matrices right these agency matrices then i create my training set and and testing set with 350 graphs and 100 graphs respectively and then here i have the equation i just showed you before so this is how it works in dgl the message function are expressed as edge udf user defined function it edge user defined function takes in a single argument edge it has three members source destination and data right so the edge uh user defined function has source destination and data uh for accessing source node features destination node features and edge features the reduced function so this is allowing us to get the information from the upcoming thing then we had to compact the information so on the actual node we can be using this node udf reduced function our node udfs the node udfs have a singular argument nodes which has two members data and mailbox the mailbox is going to be the information that comes down from this edge data contains the node feature and mailbox contain the incoming message feature stacked along the second dimension that update all does the following send a message through all edges from all the vertices so you have all vertices are shooting through the vertex the message goes through the the vertices the message start arises from the vertices go through the edges and then are collected in your receiving end okay so this update all send messages through all edges and update all nodes optionally apply a function blah this is a convenient combination for performing send from each you know original vertex send through the edge you know send through the edge and then receive at a given location so let's see how this network is implemented and then i let you go so we had those matrices right a b c d and e remember i don't know if i can show you oh you have this on right here right a b c d and e right so here i have a b c and d e and there are like matrices and then we have this batch normalization for avoiding stuff to blow up so we're gonna be starting now by computing this thing here that has to travel through the edge okay so we compute this stuff that travels through the edge and then we're gonna be aggregating the things together okay so let's see how that is done uh maybe hold on let me actually yeah because if we go in the forward function in the forward function simply i define my input as being the some representation that leaves is going to be called x h actually i'm going to be using the same letter or like a placeholder for my x then i assign to each node this rotated x like ax bx dx and ex right this one here we have seen before ax bx dx and ex the c was multiplying the edge right and so we have this one a b d e the edges go on e and then you have the c e which is going to be the c e right so these are here a b d e are this one right a b d e multiplying the axis and then you have the c multiplying the representation on the edge and so you have this one right c representation on the edge and then there is this update all we saw before that update all caused these two functions right it caused the sand and the the receive and so we're going to be looking at these two things right so first one the message function we have that the bxj it's simply this bx right so we computed down here bx it's going to be this when i call it bxj right and this one over here bxj so this is coming given to you then we want to compute these representation on the edge which was the summation of these three things right so i just pick out these three things the c e dx and ex and i sum them together and i have my ej exactly as it's written here right and then i'm going to be writing on my edge this information so i such that i can use it later finally what is going to be my message my message is going to be two fold two things i send down this bj right which is this item here i just extracted and then i also send down this edge ej why do i send this ej down because i need this to compute this what is it the gate here right cool so i send this rotated and this rotation and this edge representation now that i have all this information coming down the edge i have to finish this computation and i have to do the computation of the gate then i had to multiply with the bxj and then sum together right and so we have this done in the reduced function you have ax which i just extract from the data that lives on the the the data that lives on the node my bxj comes down from the mailbox which was sent through this message right i also have ej that comes out from this message ej and so here i compute this sigma so sigma j is going to be my sigmoid right simply of this incoming edge information and then finally i compute the whole content of this without the receiver connection right this ax plus blah so i have ax right which is defined here plus the summation of these gates which was like actually just the summation here times the bj's and then i divide by the summation of all the items in this sigma right in this here so this sigma j divided by the sum is going to be the uh like the the soft argmax basically and then this one is multiplying this bxj okay and that's it so this is going to be my h right without the receiver connection so we go back here we uh extract this h which was this information here h and this e which was the information we wrote here before right so i have e and h i just extract them out i multiply by some norm i just divide by the norm in order to be removing you know dependency on the dimension but we don't care like in the attention we had the beta right i apply batch normalization and then i apply the reload right like i show you before here right so i have this reload plus and the reload here plus and then this is going to be with some the previous very thing right and so here we go after applying the reload we have the x original x plus the h and this one here and then we return and this is it uh the i have a multi-layer perception at the end which is simply having like my my data goal uh inside hold on so i had just a multi-layer perception right with a fully connected layer here and so finally you have your gated gated graph convolutional network we have some embeddings at the beginning and then we have this gated graph convolutional network and then multi-layer perception you get the original h and e through these embedding matrices you forward this stuff in the graph convolutional layer as many times as layers as you as you have and then finally we compute the mean representation coming out from the last layer and then we send this through a multi-layer perception so here i just show you this and i just train okay and so you generate the x the e you compute the output of the model you compute the loss you zero the gradient you perform back backward and then you step in the gradient descent in the opposite the gradient opposite direction of the gradient and finally we're gonna have here that you can see the test accuracy goes to 100 percent and so here we managed to see how you can train a neural network to classify graphs which are represented by matrices agency matrices matrices of arbitrary sizes right so so far we were you know we we we learned last time how to deal with sets of arbitrary numbers you know arbitrary elements and now we learn how to basically classify these graphs which are represented by these matrices which have also variable sizes that's it for today it's a lot of things i think but i think if you survive until the end most of you are still here so i think you managed to follow along if there are more questions if there are any questions write them on campus wire i will answer everything right and unless there are imminent questions right now i give you five seconds four three two one there are no more questions so i see you next week for more content okay have a nice day enjoy your thursday have a nice end of the week take care bye i hope you like it i liked it