 Thank you very much, Fernando. I owe it to you to actually move away from the sort of bourbaki, dry French things to working algorithm. I remember when you asked me to turn my wavelet code into a somewhat usable code for Python. And to this day, I'm sort of always trying to catch up with what you're doing, your role model for me. So much of this work actually is work that was done in collaboration with a graduate student, Nate Moening, who graduated last spring. And without any difficulties, he went up to industry to do data science. He works in those rooms where he's not allowed to bring himself in, so I don't know much about what he's doing. But it's a lot of data, very noisy data. And so we have a paper that describes the work. It's on the archive. You should be able to download it if you're interested. Of course, the work also happened to be closely related to what happens here. We were very grateful to have Tom and Toy for Jeff Sanders support Nate at some point. And one of the algorithms to actually solve a part of this work was implemented in Julia. And the work was supported by Jeff. So we're very grateful for this. So I'd like to sort of give you a little bit of an introduction about the problem that I'm interested in. Unfortunately, the talk is maybe a little miscalibrated. It might be a little dry, and I'll sort of try to inject some intuition. So the question we're interested in is one of trying to come up with a good notion of measuring, quantifying significant differences, as one looks at time-evolving dynamic graphs. The question started because we were very ambitious, and we started thinking about this question of looking at the dynamics of virus spreading on networks. And as the virus spreads, the connectivity changes. And we were sort of wrestling with the idea of how does one actually quantify changes in connectivity? How does one quantify changes in topology? The greatest difference at the time was just measuring the added distance, which is the number of edges that you would need to add or remove to align one graph with the other. So if you're looking, for instance, at those two examples here on the left, you have the same set of vertices, nodes. And what happened is some of the edges have changed. So edges appear, the edges appear, and the question becomes, is it significant? So if you work for a small company somewhere down in the Bay Area that is worried about social network, you may be interested in how much randomness is allowed to happen on a social network. Does it mean something interesting? If you work for a three-later agency somewhere around DC, and you're interested in patterns of communication between people, you may be interested in whether something different, something is happening or not. Yes? Yes, of course. So in these examples, the number of nodes would stay constant? Is that necessary? So we're getting technical. And I'll address that in a second. So I'll answer your question right away. So we're addressing a very simple question where we're thinking about building a notion of a derivative. And so we assume that roughly the number of nodes is going to remain constant. And we're most interested in the sort of configuration, structural changes between edges. We further assume that we have no difficulty identifying corresponding nodes. So we're not asking the sort of larger question of graph isomorphism, which is a harder problem, even though there may be some recent breakthrough in this direction. You don't answer your question. So of course, if you come from a sort of a more theoretical background, you say, OK, so this is a finite maybe Euclidean space. And you have norms. And all the norms are equivalent for your matrices. So why bother? And I'll try to convince you that it actually is worth why spending a little bit of time thinking about those questions. There are distances that are essentially useless, whereas there are distances that will pick up very nicely, very structural, very fundamental changes in the graphs. And that's, in some sense, what we want to do. So we're going to go about this almost in a Borbeke way by having axioms and definitions. We're really, in that sense, inspired by the work of Kutra, who is now at Michigan and used to be at CMU. And so what we want is we want a distance. We really want a proper distance in the sense that as we go and measure incremental distances between successive time samples, we really want to have, for instance, the triangle inequality to apply. We care about those things. We definitely want to have symmetry. We also want the fact that if two graphs are the same, the distance is going to be zero and vice-versa, the distance is zero. That really means the graphs are the same. So we'll see that for most distances or notion of similarities, this need not be the case. We can have zero distance or zero similarity, and the graphs are actually quite different. And then we actually want to inject notions of structural importance. And so these are principles that are essentially taken from the work of Kutra. The first one is that if we start breaking the graph apart into disconnected components, then we should really feel a strong distance. We should pay price for that. And that's the idea of edge importance. So if there's a disconnected component that is creating by removing edges, then it should not be, or distance should not be oblivious of that. The second principle essentially tells us that if we have a large weight, and we think about the weights, for instance, by saying how strong we believe that two nodes should be connected, that could be, in some sense, an amount of traffic, a number of text messages being sent per day between two people, that could be something that has to do with internet traffic or that has to do with the communication between two nodes. So if there's a strong, large weight, then if that edge is modified, the corresponding weight is modified, then we should see that being reflected in the distance. So these are sort of nonsensical, reasonable choices. Then the third principle is an interesting one, is that we should pay more attention to edges, in some sense, that are in graphs that are fairly sparse. So if we have the sparse's graph, which is going to be a tree where you remove an edge, you disconnect the graph, then that notion of distance should really pick up the fact that if you have a very sparse, very thin graph, edges matter. So in that sense, it's very much like if you're trying to go between two subway stations in a big city, there might be subway stations that are very poorly connected. And whereas some subway stations will be very nicely connected, that happens all the time in Paris. We have three different unions in the Paris subway systems. And so one unionism, try the other one is not, and so we still can actually go to work. So in some sense, there is a notion of sparsity in that principle three, that is sort of what we care about. Principle four is somewhat vague, but I'll try to give you examples of that, is that if we think about, for instance, a notion of a graph that is composed of a very well connected group of people with a few edges connecting them together, then those edges that connect the communities matter most. And if we start removing those edges, the distance should reflect that. So we should penalize edges, changes, or weight changes that are sort of more focused or more targeted. This is not very mathematically rigorous, but as I said, I'll try to sort of give you some intuition about this. So unfortunately, I need a few names for things to be to talk about. Try to use fairly reasonable notations, and notations are fairly standard. So we're going to call the graph G. G is a good name for graph. And V is going to be a set of nodes, or vertices. E is the set of edges. And so we're going to keep little n to characterize the size, meaning the number of nodes. M, as in Mary, is the number of edges. And so what we're going to do is we're going to connect nodes, and the way we characterize this connection is through what we call the weighted adjacency matrix. And so you can think about the matrix A, I, J as your sort of friend matrix. If you're friend with someone, you put a weight, and you can be a strongly friend, or you can be weakly friend, depending on how large the weight is. And if you're not friend to that person, then the weight is going to be zero. And so that's the adjacency matrix, and the weights are non-negative. The second thing that we need to talk about is the degree matrix. The degree matrix is basically when you're looking at your circle of friends, you count the number of edges that are coming or going out, and that gives you your degree. This is basically summing across the rows of the column of the symmetric matrix A. Finally, we need a notion of an operator that is going to give us some information about, in some sense, how fast we can diffuse, how fast we can spread viruses or information on this graph. And that is going to come out through the so-called Laplacian matrix. L is the degree minus A. And so this matrix is going to be playing a big role in the talk. And so L is almost non-singular. It has an L space of rank 1, but we can fix it and compute something that's almost quite an inverse. We call that a pseudo-inverse. And we actually have a close-form expression for this. And we'll need this pseudo-inverse, more generous pseudo-inverse throughout the talk. So J, yeah, J, OK. Sorry, you're a physicist, so my J is not your J. So J is the matrix full of 1. Yes, sorry. Good point. Yeah, it should be. So L is a nice operator. It's basically the divergence of the gradient for those of you physicists in the room. And it has all the right properties. It's symmetric. It's a positive semi-definite. And so you can actually diagonalize the operator. And so you can expand it into a basis of eigenfunction. We start at 2 here, because 1 is not interesting. It's basically characterized in L space. L dagger, the pseudo-inverse is basically almost the same expression, except you change the coefficients and take the inverse. And so the lambdas are going to be the eigenvalues. We rank all of them starting with 0, which corresponds to the null space. And then we go all the way to lambda n. Remember that L is a non-normalized laplation. And so the eigenvalues are going to grow with the size of the graph. So we typically like Green theorem. And so we're going to write it this way, which is, you know, some statement about the laplation and the gradient. And so B is going to be what we call the incidence matrix. It basically tells you it's measuring whether a node corresponds to an edge. And it tells you whether that corresponding node is at the head of the tail of the edge. And we can use any kind of orientation on the graph. It doesn't matter. Our graph are going to be essentially non-oriented. And so we can decompose the laplation in this way, where this is information about a topology. This is information about the weights. And that's exactly what we need to reconstruct L on the other side. And that's really nothing but, as I said, in some sense Green theorem that tells you that you can write your laplation as the gradient. OK, so now I'll go back to this definition when I need them. I just wanted to sort of get you started. So essentially A is that matrix that characterizes the connection, OK? So this is what we call the adjacency matrix. So one of the ways we can measure, we can compare two graphs is basically to take any matrix norm and compute this matrix norm for the corresponding adjacency matrix A1 and A2. So here I'm just taking one graph, another graph. I assume they have the same number of nodes, same size. And this is what we call the edit distance. This is the one norm of the matrix norm. So basically what this measures is how many edges you need to add or remove to align the two graphs. So you can look, for instance, at the circle of friends that you had as you start high school and the circle of friends that you have when you enter college, and then of course you lose contact with some friends so the edges disappear and then you make new friends and so edges are created, OK? This interestingly does not tell you anything about how significant those edges are in terms of your circle of friends. They does not tell you either also where changes happen. So there's no information about location or attribution, which is something very interesting for distances. So that's a true metric in the sense that it has all the properties of a distance if you care about those properties and we do. A very interesting distance is a cut distance. Essentially the idea is that you try to partition the graph into two subset, S and T, and you measure how many edges are going from one part to the other. It's like you drive in the Bay Area and you have to cross bridges. So what is e-measuring is the amount of time, if you wish, the weight that you spend going from say Berkeley to San Francisco or Auckland to some other places in the Bay. So what you do is you measure that for one graph, you measure that for the corresponding partition for the other graph and you compare how many edges are in the same set or not. If you have a weighted graph you actually care about the weights, but you can do that with unweighted graphs and just count the edges. This is a very interesting distance. It's very rich, it gives you a lot of information about the structure of the graph. The problem is it's immensely computational. So there are relaxed versions of this problem whereby you don't have a forced assignment of the partition and then you can get the cost down, but it's still very expensive. So no one computes this thing. It's a very powerful theoretical tool. So an idea that's going to be very similar to the one that we're going to be talking about is this idea of differences in path lengths. So what we're doing here is we're comparing graph G1 and G2 and the way we do that is basically by comparing how much time we spend going from one node to another node for any pair of nodes in G1 and the same thing in G2. We can use some permutation if we assume we don't know the matches between the nodes. So essentially it's measuring not just here whether there's an edge or not, but really a high level structure about the connectivity throughout the graph. Because what we're doing is we're comparing paths between graph one and graph two over the same set of nodes. So if most of the passes here that essentially minimize this distance, that achieve the shortest path here, so if both shortest paths agree, then that distance is going to be small. So it's already very interesting because it gives you information about the structure. Is that making sense? Yes. So my math is not as good as yours. But when you apply the pi there, there's no, you said the distance, but the attributes of the points so far, the attributes of the nodes, I don't see a location. You said something about weighting the edges, but you didn't say anything about assigning the distances. Is that acknowledged? Is it just right in front of me and I'm not seeing it? So that's a very good point. So I actually didn't define that. So that distance basically measures the among all possible paths that connect U and V, the one that has the least sum of weights. So no, no, my apology. So what we're going to do is we're going to actually think about this distance here, which essentially you can think about this as saying, on Google map, how many mines does it take to go from one place to another? But you guys live in a part of the world where mines don't matter. What does matter if you commute? Time. So we're going to get there. The whole talk is about replacing that silly notion of distance with one where instead of measuring distances in mines, which is utterly useless in the Bay Area or in Paris or in Tokyo, with that notion of commute time. In fact, that's exactly it. We can just go home and enjoy the rest of the day. It's beautiful outside. I finished the talk. Yes? Maybe before we leave, cut distance behind, are there any good estimators of cut distance that could drop that terrible? Yeah, so that's a very good question. Sorry, here I have my cut distance. As I said, there is basically a way to turn this thing into a semi-definite program. And then instead of looking for an assignment of the partition S and T that will achieve the max, you have a soft assignment. So you have weights. And then that turns into an optimization problem that you can actually solve. It's still pretty expensive. Is that right? Yes, that's exactly right. It has the same fiber. Yes, that's exactly right. So I'm going to go on a little bit, since I'm here. And actually, as my daughter complains all the time, I enjoy talking. So what we're going to do is we're going to quickly visit other type of ways to compare grass. Because if I were to stop here, this is completely unfair. There's a ton of very good work that doesn't actually do a proper metric, but it's very useful. And so the last true metric is that I will be talking today about is this idea of essentially a set theoretical metric is to try to align grass by measuring how much stuff they share. And so you can either look at the edge or vertex subgraph that is common to both graphs. So it's a little bit like you take, so I get my maps from AAA. But what happens, I'm really cheap, so I keep them at home. And so I looked up, I opened the drawer with all the maps where, and I had this really old map of New York City from the 1970s. So I can try and overlay that with the current 2016 map of New York City. And find those streets, those avenues, are the same. So that would be looking for the largest edge, in this case, subgraph, and to see whether it's significant or not. Does that make any sense? Question? Hi, a couple quick questions. First is, so you're kind of having this adherence to being a proper metric, in terms of being symmetric in particular. But there's lots of good measures of distance, such as the KL divergence, which is not symmetric, but it's still very, very useful. It's still well mathematically motivated. So one is, why adhere to being a true metric? And the second question is, of course, to the previous slide, the cut distance and this pathway distance are much like very different things. Yes. Because you cannot imagine that one could be very small, the other would be very large. Absolutely, yeah. So perhaps jumping ahead, does your metric have cat root structure that is present in both of those distances? So, yeah, I'm going to give you the sort of the lawyer answer, as it depends. And we'll address that throughout the talk. Is that OK? So essentially, these are beautiful questions. And in some sense, they are going to be addressed throughout the talk. So in fact, we're going to talk about things that are actually not true distance, similarities. So similarities are important. For instance, if you want to compare the subway system in Tokyo and the subway system in New York City, there's no way you can line those things. You don't have the same number of stations. You don't have the same size. Things are really not working the way you would expect. Yet you can measure how similar those networks are. And so the idea is that the first thing you do is you don't worry about the fact that you have different sizes. You just compute a feature vector that has the same size for both subway systems. And then you essentially pull the crank saying, OK, so now I have two feature vectors that have the same size. And I can think about them as just vector in a Euclidean space and measure their distances. So this is a very nice way to do things. One example is this one where you go and compute for your graph some eigenvalue, some spectrum that reveals some structures about connectivity, about, for instance, the presence or like there are communities in your graph. And then you just compare the spectrum. And that's a very reasonable thing to do. The problem is that most of those techniques, most of those similarities, are actually not true distances. Is that in the sense that they may be symmetric, but typically what happens versus in this case is that you can have that distance, that spectral distance, to be 0. And yet the graphs are actually not the same. And that's the sort of can you hear the shape of a graph type of questions. They're isospectral graphs. And so we insist on the fact that we actually want a true distance. So if this is 0, we want to be able to say, oh, well, g1 equal g2. So I'll be spending a little bit of time talking about this similarity distance, which is essentially describing the work of Kutra. And because we were so much inspired by this work, we're going to actually study this distance a little bit. So the way the distance is defined is a very natural way, is that there is here a true distance, which is pay attention to the square root here, you take two matrices, each of which characterize graph 1 and graph 2 squared of these entries. Then you take the difference, square that, and you basically have the Frobenius norm of this matrix. So what is this matrix? This matrix is called the sort of fast belief propagation matrix. What does it do? It takes the degree matrix, take the adjacency matrix, and then combine these with some very small weights that have to do with the highest degree, and then you take the inverse of the highest degree. And in case you're really unlucky and the highest degree is 0, then you put a 1 here. So let's try to see what this thing is measuring. Let's just imagine epsilon is sufficiently small. We can actually neglect this term. And we just have the inverse of identity minus epsilon a. So we just do a sort of a Parasites expansion here. And we find out that what we have is essentially the identity plus an infinite sum of Paras of the adjacency matrix. So what are these Paras measuring? So a tells me whether I'm connected to someone else. It gives me, in terms of if you guys are on LinkedIn, it gives me my sort of first level of connectivity. A squared tells me about the friends of my friends. It tells me who are the people I can reach within two phone calls. I call someone who calls someone. A cubed is a path of length 3 and so on and so forth. So S is essentially measuring, in some sense, the richness of the different paths of various lenses across the graph. And what we do is, of course, at some points, we don't care if the path has length infinity. And so what we would sort of trim down the importance of very long paths is by having this epsilon getting smaller and smaller as we increase the power. Yes, sir? Seeing a Taylor expansion like this, should we start worrying about computational burden? OK, so it turns out that that's a very good question. It turns out that we can actually compute this thing efficiently. So this is not a worry. In fact, this is super stable numerically. It behaves super nicely. So what we do is we compute this distance. So this distance, when S1 and S2 look alike, are very similar in the sense that they share many paths. The distance is going to be small, and that means the similarity is going to be large. So similarity of one means the two graphs, the two networks, are extremely similar, and that really corresponds to a distance that is very small. And so what I'm going to do is I'm going to walk you through very simple baby examples of graphs and actually measure that distance when I start popping removing edges. And we're going to see whether we get the right notion of distance, whether we're happy with what we're getting. Am I making sense? OK, very good. So there are sort of baby examples that we like to work with. The simplest one is the sort of complete graph. So this is a group of friends. Everyone is super friendly to everyone. Everyone is connected to everyone. So in terms of density of edges, this is the highest density. It basically scales like the number of nodes squared. So what we do is we remove, we pop one edge, we remove one edge, and then we measure how much the initial and the new graph has changed. What is your guess? Do you expect a large distance or a small distance? So here's the story. So you have a group of five friends. They know each other numbers. They keep sending each other texts and everything. And then two friends are really all of a sudden, they have the same girlfriend. They're really unhappy with one another. They no longer talk to one another. Does it matter? It should not matter. In some sense, if the number of friends goes to a million or 1.6 billion, like Facebook, boy, that's noise. No one cares. So in a sense, I'm sorry to say I had teenagers and the love stories in high school were drama stories for dinner time. But hey, in terms of complete graph, it shouldn't matter. So the claim is that the fact that the distance here scales like the number of nodes is very wrong. Because as the graph becomes larger, that single edge perturbation has a huge impact. This is terribly wrong. That should not happen. I would like that distance to go to 0 as the number of nodes goes to infinity. As I scale to 1.6 billion nodes, that should be 0, literally 0. There's something wrong with this distance. So now I'm going to go in exactly the opposite. So here is the way most good universities work. There are different departments. And then there's a guy sitting in the center. And depending on where you live, you may call that a provost or a dean. And then the departments actually don't talk to one another. They only talk to the dean if and when, when they need money and faculty slots. So that's the star graph. What happens if you remove an edge here? Does it matter? It does matter. It turns to actually matter. Why is that? The structure of the graph is such that in some sense you lose connectivity if you remove an edge. We'll call that a tree. It's one tree, but it's a funny tree. It's a tree. What's the shortest distance between any two departments? It's only two, right? You go talk to your dean. The dean calls up the chair of the other department. And within basically two phone calls, you can actually talk to one another. It's not a bad structure. A lot of social infrastructural works like this. So you have a small number of edges. And extremely, in some sense, very short connectivity. So if we measure the distance here, what happens is that as the number of nodes goes to infinity, we get this sort of asymptotic. We basically have a term that scales like one over square root of n. Then the next one is one over n. And then some other higher order terms. So what does it mean one over square root of n? That means as you increase the size of the graph, edges don't matter. If you remove an edge, it doesn't matter. Do we believe that? I don't know. You start breaking the graph. You disconnect. There is no, you know, it's very different from this guy. So in a sense, this is not necessarily very intuitive. So one thing you can do is you can say, OK, let's play the game where we're going to focus on that plot here on the left. Let's play the game with another graph that's called the path graph. In fact, that curve is exactly the path graph. The path graph is basically that path that walks you from the parking lot to the beach. And if you're trying to go somewhere else, you basically have either ticks or wide animals that are going to bite you. So there's no other way than just going from one node to another like this. That curve is not, it's just a symbol of the path graph. Am I making sense? OK. So as very much in the same way as the star graph, if you start removing edges, you should pay a huge price. You disconnect the graph. If you start adding edges, you completely change the topology of the graph. You're creating very big, short circuits. And in some sense, you would not expect that the distance remains constant as the number of nodes increase. This is something wrong. You should pay a price for those edges that you're removing. So what we're going to see is that, in some sense, the type of scaling that we have with this very nice similarity matrix is not very intuitive. In fact, if we look at this principle that the author of this work came up with, those graph actually violates this principle. In the sense that if we look at this complete graph here where we have as many edges as we can possibly fit, the distance grows faster than a graph like the star graph where we have as few edges as we can possibly have. And so in some sense, that idea that removing an edge in a graph that is very sparse, not at all dense, we should pay a huge price is clearly not holding up here. Question? Yes. We'll get that, actually. That's a very good question. I'll address that. Yes, very good point. Yes. OK, so essentially, right now I've wasted about half an hour of your time trying to convince you that there's actually room for new ways to measure distances in graphs. And we're going to get into sort of a little more mathematical material, but I wanted you to get some intuition about what the problem is. And actually, there is a lot of improvement on what can be done. Any question? OK, very good. So of course, I'm giving this talk in this place where people actually not just dream about these fantasy distances, but compute them. So one aspect of this work is how fast can we compute? And this is nontrivial. So if you're going to tell me that you need order n cubed, where n is 1.6 billion, you can just shove this thing and just go fishing, right? So we need to be at most linear in, I would say, the number of nodes. And if you have sparse graph, the number of edges, which is going to be roughly the same. And so to be able to do that, we're actually going to have to utilize, take advantage, of this fast randomized algorithm at the core of which is this notion of Johnson and Nostral's theorem. And so we're going to steal a lot of the very nice work that Dan Spillman and his collaborators have been doing it to be able to get very good approximate using randomized technique of the so-called resistance distance. OK, so here, this is a little abstract, but this is interesting still. So what I'd like to do is to sort of pose the question of a distance in a sort of more general framework. And so what I want you to think about is this idea that if I give you a matrix that measures the adjacency, so the connection, you can build your own function that is going to map that matrix here that has n by n entries to another matrix that maybe reflects, instead of measuring distances, pairwise distances, maybe measures amount of time that it takes you to move from one node to another. And that's essentially what we're going to do. And then so you map your graph structure to some feature, but the map is a very rich map. It's not going to reduce the problem so much that you cannot go back in some sense. And then you basically compare the feature through the map. So d here is exactly what we want. And we're going to see two examples. They're both going to be matrix norms, one where we have essentially the Frobenius norm, for which we can have very fast algorithms. The second one is the one norm, but this one norm is very rich because essentially it's counting edges. And so we'll see that it's one for which we can compute things, even though the fast algorithm will be with the Frobenius norm. So why should we bother about this definition, which is a little abstract, is that essentially, as I said, it decouples the distance into one where we first find the stuff that we're interested in, the sort of features from the graph that we care about. And then we bother about the distance. As Fernando says, not all distances, not all norms are actually equivalent. And getting one norm right matters a lot for the algorithm. And specifically, here, we'll care about fast algorithms. So this formalism is sort of vacuous and yet super useful because it gives us a way to completely rewrite most of the existing distances. So depending on your choice of fee, and you may have very many choices, you can compute, for instance, eigen values here of your adjacency matrix. So you compute this fast-bill propagation matrix. You recover pretty much every distance you can come up with. So it's sort of like a lot of the theory, it's rich, because it allows us to generalize to something else. And so what we're going to do is we're going to change the fee matrix and compute the side of effective resistance. And then we're going to change the norm and come up with what fast way to compute that. So what is the effective resistance? So I work in a department where no one should dare not knowing what a resistance is, because by the time you get into the department, we give you a little kit with a resistor, a capacitor, a transistor. And you work that at night with your little candle and you make circles. That's what we do. And all the stuff in your pocket relies on us understanding what a resistor is. So again? Oh my god, I forgot the inductors. Thank you, you caught me. You see, I'm not a really true electrical engineer. My apologies. So what we're going to do is we're going to think about our network in the simplest way. So I discovered that it's actually possible. I grew up in a country where you have to call a plumber if there's a leak in your house. And the plumber is like a neurosurgeon in Paris, literally. They were the first people to actually have cell phones. They would carry these huge cell phones. And they were working on your faucet and they were picking up phones to go to this next place. Horrible. In this country, I discovered Home Depot. And actually, I started to actually fix my own plumbing. So satisfying. So plumbing has to do with tubes. Tubes comes in different diameters. The main sewer line that at some points was blocked in my house and I've also flowed in the basement, so it's something of a tragedy. I just had to call a plumber. That's a very large tube. Why do we care about very large tubes? Because we can have a large flow through it. That's what the main sewer line is about. The little faucet that actually I use to water my garden I can work on, that's easy enough. That's a very small, about half an inch diameter tube. So the diameter of the tube, in some sense, is measuring how much flow you can push through the tube. And that's going to be exactly the inverse of the resistance. It's going to be telling you the conductance. How much stuff can actually go through. And so what we're going to do is we're going to measure the resistance. So a high resistance is a very small tube. So first, if you look at the ending of all your blood vessels, they're very small tubes. They're capillaries. And they have extremely high resistance. So how does the blood actually can go through? Normally it would stop in the capillary. And it almost does. Sometimes your capillary snaps. So how come you can actually get all the blood to go back to the heart? You push hard, yes. But it has something with how many capillaries you have. We have lots of capillaries. Otherwise, you would not be able to push all the blood through. So essentially, you start with the main tool line, which is the aorta coming out of the heart. And then you start dividing like a tree, very much like a tree. And each time you divide, the geometry changes, and you increase the resistance. But that's OK. They're all in parallel. So eventually, all the blood goes through. So making sense? Did you ever think about those things? Keep you up at night? OK, this is a big deal. It's about resistors being in parallel. That's the way capillaries work. And so we're going to take advantage of this notion, which is really rich. We should not let all the electrical engineer play with resistance. We should actually use that to our advantage, to characterize in some sense the topology of the network. And so what we're going to do is we're going to measure not just the resistance between two places, but the idea that there might be different paths going through the body, coming from one place to another. And we need to account for that. So there's this notion of effective resistance, which is the resistance that you would measure looking at the entire circuit, not just the two terminals resistors. So there's a sort of electrical engineering definition, which is ultimately extremely irrelevant. What you care about is the fact that this notion is literally up to a rescaling factor, the amount of time that it would take for someone drunk at night to go from one place to another and back. So someone drunk at night is a random walk. So you've lost your GPS. You've lost your iPhone. You don't know where you are. And so you're randomly walking through the city, hoping that eventually you'll get to your house. It turns out you're lucky. In 2D, you will. In 3D, you're lost. Well, if you fly a drone, then you're lost. If you're drunk on a drone. But if you're in the plane, you will eventually go back to your house. And then you realize, oh my god, I lost the keys in the bar. So you have to go back. How much time it takes this random walk to go from I to J and back is exactly this effective resistance. And so if you have many pathos going from your house to the bar, you will actually get a pretty quickly. Because it doesn't matter which path you take. You will get there eventually. And so this is this notion of commute time that is essentially encapsulated by this effective resistance. We prefer to work with effective resistance because we get rid of the size of the number of edges. So there is a scaling factor. So as I said, we turn our graph into a resistor network. And so for every weight that we have, we're going to define a notion of resistance, which is one over the weight. So you can think about the weight as the conductance, the inverse, the size of the tube. As I said, the effective resistance is less than the connection between I and J. But it's typically much less than that if there are many pathos. So it gives you some notions of connectivity throughout the graph. And then if you want to know how richly connected or not is your network, you can add all those effective resistance and compute the so-called Kirchhoff index. So what we're going to do is to be able to compare two graphs. And this is basically the title of the talk. It's to compare this matrix that measured the distance of the pairwise distances using this notion of either effective resistance of commute time between the two graphs. And we claim that it's the right notion. What kind of power we put here, p, depends a little bit on whether we want to fast-drive them or not. It turns out that this is a true distance in the sense that if the effective resistance agree on both graphs, they're actually identical. So what I'm going to do in a second is maybe I should skip in terms of time this slide is give you some intuition about how we can measure this effective resistance on graphs that we've already talked about. So what we're doing is we're coming back to our prototypical examples of graphs. And we're going to look at the first one, which is this complete graph where everyone is friendly to everyone. And we're going to perturb this by modifying one edge, a single edge. And we can work out the math. It's not that difficult. And then it turns out that we can get a close-form expression. Basically, that tells us that as the size of the complete graph grows, that distance between the original and the prototype graph decays at one of the end. So we get the right scaling. Basically, as you go to 1.6 billion users, my daughter not being a friend to her friend doesn't matter. It basically is noise. In fact, we can rescale that by measuring the total. This is that we measure normalizing by the total effective resistance with 1 over n squared, which is even smaller. OK? So if you care about random graphs, you can work out the math in terms of not this complete graph, which is deterministic, but a very similar type of graph, where now you randomly add or remove edges and you get exactly the same kind of scaling. This is a very nice model because you can actually work a lot of the mathematics. So now we're going to move to the star graph. And then we do the same exercise. We take our star graph, and then we add or remove an edge. We can do this in two ways. We can say, oh, let's have computer science talk to applied math. So we're actually creating a short circuit, and the dean is not involved, OK? Or we say, well, let's change actually applied math. No one cares these days. Computer science is really what matters. And so we're going to increase the weight or decrease the weight of applied math or increase the weight of computer science. It turns out we get exactly the same scaling. In the long scheme of things, the scaling is that it matters a great deal on a star graph. If we normalize by total effective resistance, this normalization here decreases the emphasis. Because in some sense, we start with a tree, and then we always have high effective resistance. So we're going to compare this with another tree in the next slide, where now we have the path. So the path is this single line where everyone has to go to the neighbor to be able to reach the end of the trail. And then the scaling here is a little more difficult to see. But basically, we have this scaling if we divide by the Kirchhoff index, which is some of the effective resistance. So it's very similar to this one, except it's the wrong order. This is n. This is n squared. And so this is the intuition that in some sense, the star graph, even though it has the same number of edges, is actually much better connected. And so even if we start messing around with connections with the dean or across departments, we preserve the connectivity of the entire campus. And that's why we get a difference power here. And we can look at the following exercise, which is what is the edge that is the most disruptive if we start adding edges on the path graph? So one of the ideas if you look at this formula is you would think, OK, so it depends basically on the distance between the head and the tail of the edge, so J naught minus I naught. So maybe make that as large as possible. It turns out that mathematics is a little different. It actually pays off to leave a little bit of a tail on both sides. If you add this edge, so if you connect geography to computer science, you actually have a wonderfully connected campus. The way this works out is that you have a nice little ring here where you have very fast diffusion. And then you pay a little price for those guides that are sort of left over in some sense here. So that's sort of a little curious thing. So what we're doing here is we're working at carefully the analysis of that resistance distance when we have the norm being just the one norm. So we sum the absolute value of the differences. So now we have the cycle. So what is the cycle? The cycle is exactly the rim of your bicycle. So you remove the spokes. That's the star. And then you just have the rim, so the circle. And so what we do here is we start adding or removing an edge. And then we have this sort of weird scaling. So what is this? It's essentially the fact that you work on a cycle. And so differences in some sense don't matter. You need to sort of take modulo n. And so you get a square here, but it depends. If you put this thing like this, your distance is almost half of the size of this guy. And so it's going to be almost constant. So it's a huge distance. You can highly perturb the cycle by creating a shortcut in the middle of the cycle. And then you get a constant distance. This is the answer to your question. So as the size of the graph grows, you get essentially this new graph by collapsing, by connecting those two points. And if you were to add more and more edges, you would get something that we call a small world model, where you have the ability to shortcut and go and talk to someone very far away. And that's essentially the model. So we get down to those curves. And so this is the kind of analysis that we can do. We claim that effectively, this is much better than this. We get the right scaling. So I think in the interest of time, I will sacrifice and apologize the fast numerical algorithm. And I go down directly to some example of graph dynamics. So I'm willing to take any questions you want on the fast algorithm. The algorithms were actually first implemented in math lab and eventually moved to Julia, for those of you who care. And this is the work of Nick Monique, who was done with Jeff. OK, so let me skip directly to the stuff that I care about. OK, so five more minutes. OK, perfect. So this is really what we're aiming for. And I'll give you some examples on real network, where things actually work. So what we want to do is we want to use our metric to quantify how many changes we have between two successive time samples as we're walking through this time series of dynamic networks. So what we expect is that if we have random fluctuations that are completely normal, the distance is going to be insensitive to that. Whereas if we have very deep structural changes that are reflective of the variables that control the graph, the growth or the shrinkage of the graph, we want our distance to pick that up. And so what we've done is we've done experiments on synthetic dynamic networks as well as real network. So the way we do this, let me skip directly to the real network and then I'll go back to that if time permits. So what we've done is we started with this publicly available network of email exchanges, about 100 people in this company that went bankrupt and everyone heard about these. And so what we do is we aggregate emails over a period of one week. And we measure in some sense how strongly connected people are by looking how many emails they change each other. The content of the email actually does not necessarily reflect the might exchange of very strong French words. We don't know about that. But they're connected. And so what we want to know is we want to know whether that exchange of emails, the dynamic of that exchange of email is reflective of what happened through the company. Is the question making sense? So small graph, 150 nodes, the edges appear and disappear depending on if during that week people exchange emails, we remove emails that are spammed throughout the company. And so what we see here is a time plot. So this is time about three years. And shown in red is the distance, in some sense, the distance that we would like to be able to compute, where we have the one norm for matrices. Shown in black is the distance for which we have a fast algorithm. And I apologize for not be able to give you the details. What is interesting is they're basically tracking one another. Sometimes they're almost identical. And they're usually about the same size. What is more interesting is that as catastrophe are happening, essentially people discover that the company was going highly bankrupt, you can see that distance increasing before that event occurs. So there is value, we believe, in this distance in terms of predictive power. In some sense, those events are events that happen that are known that have nothing to do with a bizarre way of computing distances in that email exchange. And yet you can see these sort of changes that are very indicative, very predictive of what's going to happen, essentially going into Congress and firing for bankruptcy here. Yes? This is domain specific. Was there any time along this time access when the participants in the email realized that their emails might be subpoenaed? No. OK, so this is clear? Yes, these are clean data. Yes. That's a good question. So what we want to do is we want to compare that to some distance that is sort of totally agnostic to the structure. And that's the added distance. So here, what we're doing is we're counting how many edges. So all of a sudden, we need to add one more edge because during that week, those two people starting emailing each other a lot. And you can see how flat this is. It's essentially oblivious to the catastrophe that is looming around the corner. So this is a very bad distance, not at all affirmative that the structure of the eventually, as they fire for bankruptcy, it realizes that actually there's a lot of changes in the emails. But as a statistical tool, it's totally useless. So the second data set, the real-world data set, is this MIT data set where faculty and grad students had cell phones. And throughout the one semester, as their cell phone became close to one another, the Bluetooth started talking to one another. And you can register that. And you can say, oh, actually, the grad students went to my office hours. She or he cares about my beautiful, a graduate class. And so what we're going to do is we're going to quantify those interactions as we move throughout the semester. The idea being that if people start talking a lot to one another, there will be many edges being created. If everyone is staying in their office, there are no edges. So edges corresponds to the amount of time that the cell phones are in the close physical proximity so the Bluetooth can talk to one another. And so you can see, again here, that those two distances that we can compute are usually very close to one another. What you can see, for instance, is that grad students start talking to their faculty during final. And if you've ever taught anything, you know that this is usually the case. You'll suddenly realize, oh, my god, this is going to be on the final. Are you serious? OK, let's go to your office hours. And there is also the idea that at the beginning of the semester, they also meet with their assigned faculty advisor for the course that they need to be enrolled in. There is also some nice discussion between grad students as they come back from spring break. So this is essentially the dynamics of social interaction on a small scale on a campus. If you look at the distance, you get something that is not, in some sense, not at all as an informative. For instance, the spring break changes are not at all obvious here. There is clearly some changes here, but the final week is not more indicative of the social interaction between the faculty and the grad students. So we claim that this edit distance is not as predictive or reveal less information about the dynamics on the graph. So I'll probably stop here and take any questions that you may have. Thank you so much for your attention.