 Hello and welcome back to Beyond Networks, the evolution of living systems. So today we're gonna dive into the central topic of the lecture, the nitty-gritty networks and mechanistic explanations in biology. And so I ended the last module with this slide saying that a network graph like we can see here also called a hair ball graph. This is some biology sometimes is a tool, a thinking tool, an epistemic tool to understand an actual system. It provides a perspective, a formal system that in a way represents an actual system, but in a specific way. So let's sort of try and develop what I mean by that. In this module and specifically this lecture, which is called networkology. And so I want to sort of remind you that we said that there are actual systems, patterned processes, interacting processes that generate some recognizable pattern or behavior. There are formal systems, sets of relations between mathematical objects. For example, variables that are described by equations and propositions and the act of modeling is trying to sort of bring them into congruence in some way that depends remember on the motives and the questions of a modeler as an agent. And so this sort of definition of a formal system as a set of relations between mathematical objects lends itself very easily to visual description. So let's do it. Let's represent all those variables that are in this formal systems by these circles here. Variable one, two, three, four, five, six and different colors. And let's represent the relationships between those variables by lines. Very simply add lines, okay? These variables interact. Remember, mathematical objects like variables can describe processes. They don't have to be things. And so we end up with a network, very simple, very straightforward, very intuitive way to represent the formal system. And so the network graph that we see here is sort of the natural way of representing the formal system. A graph has a very sort of specific mathematical definition. It is a mathematical structure which is used to model pairwise relations between objects. Okay, that doesn't get as much further. Last time we were talking about how the sort of tools, formalisms we use for modeling constrain what we can do. And so the mathematical discipline that's behind this is called graph theory and it's encoded in set theory, based on set theory basically. And it allows us to look at these graphs, to sort of classify them and to work with them. So we're gonna develop a few concepts here from graph theory. One is that the different variables that are related here in this formal model, they are the network nodes, okay? So the different circles in this graph and they are connected by edges, links, also called vertices. So the set of all nodes and vertices make up the mathematical network graph. And we can now use the tools of graph theory to analyze what is called the structure or also the topology of the network. I'm not gonna use the word topology in this context anymore because we're gonna use it in a different way later on and it may be confusing, but in the literature it often shows up as network topology. I'll use network structure. So another distinction before we move on, just very quickly, we can have an undirected graph as in the example that I was showing you here. So two nodes can be connected by one edge and it doesn't have a direction. Whether it goes from variable four to variable six or the other way around, it's the same relation. But a lot of systems are represented by directed graphs where the relations have arrows and they go in specific directions or sometimes in both but you have to sort of specifically indicate that. Two different, very different types. The classes of graphs. What are, you know, we can use these graphs to represent all kinds of systems in the real world. Of course, computer networks are sort of the quintessential technological application, different topologies of computer networks are shown here. This is a fascinating picture. It's not astronomy. It's a visualization of a part of the World Wide Web, 1999. It's a visualization of a data set that was used in a pioneering study of networkology. Like I call it here, of sort of looking at the global structure of networks and characterizing the networks through that. Of course, networks are everywhere. We have social networks, business networks that span the whole globe travel networks. They're all interconnected as we're finding out the hard way. It's very easy to get from China to Europe through those networks. Foodwebs in ecology, of course, who eats who, very classical network applications. And of course, in systems biology, the famous hairball. Here is an example where someone mushed up a fly and checked which proteins stick together and made this beautiful hairball graph. It won a prize at the Drosophila meeting in 2011. And it shows you the stickiness of Drosophila proteins. We have obviously also gene regulatory networks that are really big in systems biology. Like this from Eric Davidson's lab. We'll come back to this many, many times as an example. It depicts the gene regulatory network that controls endomisoderm fate in early sea urchin development. Don't worry about the details, but here the nodes or genes and the relations are regulatory interactions between them. So networks, networks everywhere. And from the massive sort of big data sets that science is currently gathering, we have a tremendous treasure of data where we can reconstruct from which we can reconstruct such networks and then look statistically analyze their structure. There are two ways of doing this. One is to look at the global structure of a network and characterize it in its entirety. So what you could do for example, is you could count the number of edges that go from one node to another. So this is called the node degree. We're gonna introduce just a few measures and I'm gonna tell you what they can do. So the node degree tells you how many edges connect to each node in the network. These are the numbers for this particular example. So you have one very highly connected node here, another one that's kind of highly connected. And then a lot of them have the same sort of node degree of two. Two connections go into each one of those nodes. In directed graphs, of course, it gets a bit more complicated. There you have to count the number of edges that go in and the number of edges that go out. And these are called in degrees and out degrees respectively for incoming degree and outgoing degree. We're gonna switch back to the simpler example here. What you can do to characterize the global structure of a network is you can draw up a degree distribution which you obtain by counting the nodes, the number of nodes, which have a specific degree. So how many nodes are not connected? Zero connected only with one edge, two edges and so on and so forth. And then dividing by the total number of nodes. So this shows you the relative frequency of node degrees in the network. In our specific example, we have quite a few nodes with degree two, node degree two, one, three and one really highly connected one variable number seven there with a degree of five. And this is called the degree distribution P of K for the network. So if you do this for really large networks, you can classify different kinds of networks. For example, you can draw up a random network, you take a bunch of random nodes, some number and you say, okay, we're gonna connect those nodes with a random set of edges. Every edge you put in a network is completely randomly determined. And so if you analyze the resulting network, you will get a node degree distribution that looks like this. It's a Gaussian distribution. It has an average, a typical average degree. So there are nodes that have, most nodes in the network have a specific number of connections to that, okay? And then in a Gaussian from this number, the number of nodes that have more or less connections will fall off exponentially. And it looks like this normal bell curve. You can have different networks that are called scale free networks though, that have a very different distribution. It's plotted here on a log log scale, makes a straight line. And those of you who are remembering their math will realize that this characterizes a parallel distribution. We'll get back to that in a second. We could draw up a really beautiful hierarchical network. Remember we said systems, biological systems are hierarchical. Here is a rather so regular hierarchical network which is self-similar. There are motives of the network that repeat themselves at different scales. And again, you get this parallel distribution in the degree distribution graph, okay? So depending on their degree distribution, you can distinguish random networks from those that have a more interesting structure. So let's focus in on the most interesting case there in the middle. That's not quite obvious to figure out. So the most important thing you need to know about parallel distributions is that they have a fat tail. This is something that everybody should know nowadays. So that means that most elements in that parallel distribution, most nodes in the network have few connections. But some of them, rare number, but not negligible have a lot. So most of the network nodes are peripheral nodes, they're called marked in red here with few connections to them. While there are a few network hubs with a high number of connections. And although there are very few compared to the peripheral nodes, they are incredibly important because they have such a high number of connections. So if something happens to them, that has a much bigger effect than if something happens to a peripheral node. The parallel distribution plotted on a linear scale looks like this. So you have a very fast drop. There are a lot of nodes in the network with few edges. And then you have a fat tail that drops off really, really slowly. A few nodes have a lot, a lot of edges. These are the hubs in the network. And that's different from the random network. If you compare that to the degree distribution of a random network, remember, that was a Gaussian curve and the tail of the Gaussian was falling off exponentially. So what I'm plotting here is an exponential in blue versus a parallel. And you can now see what I mean by a fat tail. Basically in a random network, most nodes are very, very close to the average node degree. So let's say it's four connections. Most, depending on the density of the edges, most nodes in this network will have four. Few will have five and almost no node will have 10. It's almost impossible to find a node with 10 connections. You'd have to be very, very, very, very, very lucky. But in a parallel, you will have rare nodes that have 10 edges coming in. And that's what's called the fat tail. So this particular structure, if you have a network with this structure, it's called a scale-free structure because there is no typical node degree, no average node degree, right? There's no peak in the distribution like in the random network that you saw before. This particular architecture of a network has a few interesting characteristics compared to random networks. So random networks, in those, most nodes are very similar to each other. And so they're close to the average number of edges in the network. And that means when you have perturbations, they all are about the same. And they're quite sensitive these networks towards perturbation of the random nodes. So if you remove a node, each node is very sort of important for the functioning of the network. While in a scale-free network, you have two different types of nodes. Remember, peripheral nodes versus the hubs. And the hubs are rare, but they are the most important components of the network. Because if you knock out any number of these peripheral nodes, not much is happening. So these sort of scale-free networks are robust towards perturbation of random nodes. For example, the worldwide web, the internet, itself is very robust towards any sort of individual server breaking down. Because most of the servers on the internet are only connected to a few other servers. But if one of the hubs goes down, it's a big problem, okay? So if you hit one of the hubs, you're in big trouble. So these systems are both robust, but they are very sensitive to this sort of rare event that causes a lot of damage. And I just can't pitch Naseem Taleb's work enough. So this is the topic, of course, of his book, The Black Swan. These rare events to take out a hub in the node are unpredictable and they have such severe consequences compared to all the other random perturbations that they will come to dominate the history of the network. So unpredictability. If you're living in such a scale-free network, you're very, very sensitive to these rare events. Okay, let's talk about two more measures that you can apply to the sort of structure of a network. The first one is, again, about the global structure of a network. So far we've looked at sort of the distribution of the node degrees and whether they're random or whether there's different types of nodes. So instead of the hierarchical network, we're now gonna compare random and scale-free networks to a regular lattice network here on a square lattice. Okay, and a very sort of influential paper from 1998 found out that if you have a random network, so take a step back, sorry, I got ahead of myself. So basically what we looked at before is sort of, okay, what is the degree distributions? Are there nodes that are different or are there nodes that are the same? Here we're looking at how big is the network, really? So we're trying to measure distance. For example, you're interested, again, in finding out how long does it take a virus to get from China to Europe through a travel network, business, the international business network. And so you're measuring either the shortest path through the network or the mean path length. So you connect any two nodes and you calculate the mean number of steps you take to get from one to the other. And that gives you a sort of a measure of how large, how far apart are these nodes in the network? In this sense, how large is this network? Now, the problem is these measures depend crucially on how the network is wired. Once again, if you have a random network, the distance between any two nodes is very small, okay? So you get very quickly from one node to the other. What I'm doing here is I'm, so we've sort of ordered all the network nodes and put them in a circle. If you have a regular lattice network, you can see that we've arranged them so that they are next to their neighbors here. And you can see there's a very regular connection sort of architecture here. Now, what's interesting is that scale-free networks have very few of these long sort of long distance connections that are prevalent in the random network, but only a few of those make the shortest path length extremely short. And this is, of course, was made famous through Stanley Milgram's experiments sending letters out and asking people to send them on to eventually reach a destination, a random destination. And it turned out that most of the letters got back to Milgram in six steps or less. So these are the famous six degrees of separation. So a lot of biological, social, technological networks are not just scale-free, they are what is called small world networks. They have a very compact sort of path length compared to regular networks. So there's somewhere between random and regular in terms of their sort of size, their connectivity as well. Last measure of networkology that we're gonna talk about is homing in on local network structure. So far we've been looking at degree distributions, are there different types of nodes in the network, or the size of the whole network and how connectivity affects that real size, not just the number of nodes, but the sort of time it takes to travel across the network. And now we're gonna look at how each individual node is connected and how that difference from other nodes. And the sort of way you can do this is by measuring the clustering coefficient of each node. That coefficient is calculated by drawing triangles. So you calculate how many triangles go through a node and divided by how many triangles could go through a node. So let's take node number six here in this network. And let me show you what I mean by triangle. So basically, if you have connections to other nodes, those nodes are neighbors of this node, okay? So if we count all the different nodes in that are connected, we find out that this node is actually connected to every other node in the network. So this entire network is the neighborhood of this particular node. And so in this neighborhood, we're gonna check out how many triangles could there be and how many triangles are there. So here's a triangle. V six is connected to V two, V six is connected to V four and V four is connected to V two. In the social network, friends of yours are usually also friends themselves. Okay, but here are a few, V six to V four, V six to V five that are not triangles. So we can calculate the total number of possible triangles in this network, it's 10 and the number of triangles V six is involved in, which is three. So basically we get a clustering coefficient of 30% or 0.3. Take another node. V one is connected to other nodes. So its neighborhood are only these two nodes and they are connected in a triangle. So V one is completely clustering coefficient of 1.0 and so on and so forth. So we can characterize the clustering coefficient across the whole network. We can calculate the average. We can also draw a distribution analogous to the degree distribution and say, okay, we calculate the average clustering coefficient for nodes of the same degree and that gives us a distribution that again helps us distinguish networks. Remember that we could distinguish random networks from scale-free or hierarchical networks by using the degree distribution but both scale-free networks and hierarchical networks had the same parallel distribution. And so they were in this sense the same. If we sort of plot the clustering coefficient distribution here, we find out that for this measure random networks and scale-free networks are showing this sort of flat distribution. So nodes of different degrees, they cluster the same way. While the hierarchical network, of course, has a dependency, the higher, the more highly you're connected, the hubs in this network here are less clustered because they are regularly connected to all the other parts of the network and then the members of each cluster here have a lower degree distribution. This is quite intuitively clear. Okay, so this is interesting, you can read something out. So the hubs in this hierarchical regular network are different than the hubs in the scale-free network and in terms of clustering, the hubs here, they can belong to clusters or not, just like any other node. They are not special or different from the peripheral ones. So now we've defined a bunch of measures to tell us, okay, we've got robust networks. They're small world, so they're not robust in this sense that hub, if you hit a hub, you can knock it out and also perturbations can travel very, signals can travel very quickly through the networks and we looked at some more local structure clustering. So if you take this sort of approach systematically and apply it over a whole network, there's lots of different algorithms to do this. You discover the community structure of the network. So community structure means the different modules that a network has. So for example, here's a pioneering study from 2002 and you can see a very modular network where it's very visibly clear which one of these nodes are clustered or not. So basically the algorithm not only needs to calculate the cluster coefficients, but also check, of course, the neighborhoods of these clustered nodes. Here is a much bigger network based on real data, again, protein-protein interaction data that show you different factors that interact and form clusters that correspond to functions in the cell, like the proteasome. Or intracellular signaling cascades, et cetera, et cetera. So very powerful and of course, oh, well, there's one limitation. Often there are hub proteins and scale-free networks that are between different clusters because the networks in biology are both scale-free and a little bit hierarchical as well. And so there are algorithms that detect overlapping clusters as well. So this can get quite complicated, but what we're detecting here are structural modules. Okay, so these are cluster, the more clusters are present, the more modular the structure of the network. That means certain nodes are more connected to each other than to other parts of the network within. That's within each modular, more connections that between modules. And so that is a way, of course, to make the network robust against perturbations. Even if you hit a hub within that module, only that one module, that one cluster is likely to fail because it has few connections to the rest of the network. So this is a way to break contagion on a network. This is why localism is important. Break the supply chains, make essential products locally. If you don't travel so much globally, viruses will not spread this fast. So it's a way of making networks more robust. And of course, structural modularity may also enable the functional decomposition of large networks. So far we've looked at the structure of networks. We tried to sort of classify them and we tried to find some sort of very general characteristics towards perturbation, towards contagion on a network, spread of information, that sort of thing. But what we want to do, if we have a biological network, of course, we want to know what it does. And for that, we have to subdivide, decompose it. And modularity, structural modularity seems to show us the way. So we're gonna look at that in the next lecture. We're gonna have a look at whether we can sort of subdivide the network into little chunks that we can understand, and whether we can then reconstruct the sort of functioning of a whole network like that. But before we do that, let me just quickly wrap up and connect back to the philosophical part of the lecture. So now I have introduced network graphs. This is a specific type of model based on mathematical graphs and graph theory. And they are representations perfect, natural, straightforward representations of formal systems. Very powerful, very powerful epistemic tools to analyze global and local network structure. And so network graphs are tools. They are not the underlying system. They are idealizations, abstractions, that provide a structural perspective on the underlying actual system. Remember, the actual system is a pattern process. We have abstracted out its members and its correlations, its relationships between each other. And a lot of network scientists are saying, we've overcome reductionism with this. But still we are reducing the system to an abstract graph instead of looking at its full dynamic potential. And in the next lecture, when we start to look at how to decompose the structure of a network and infer function from structure, we will see how this approach hits its limits very quickly. And we need to think really hard how to get beyond those limits. I hope you join me for that next time. Thanks for listening and bye now.