 Welcome to the short introduction to the core concepts of network analysis. The goal of network analysis is to characterize networks, both at the level of individual nodes, such as their importance, and the overall structure of the network, such as its robustness. In this presentation, I assume that you are already familiar with the core concepts of network biology, which I covered in an earlier presentation linked here. A key concept is centrality. The goal of centrality is to judge node importance, and the most commonly used metric for this is degree centrality. Degree centrality assumes that a node is more important if it has many connections, and for this reason it's simply defined as the number of edges that a node has. If we look at this small example network, you can see that the orange node has three edges whereas the green node has four edges. The green node thus has a higher degree centrality. If you look at a directed network, things get a little bit more complicated, because you have both an in-degree being the number of edges flowing into a node, and an out-degree being the number of edges flowing out of the node. The other commonly used metric for centrality is betweenness centrality. Here we're focusing not on the number of connections, but the influence on flow through the network that a certain node has. For this reason it's defined by counting the number of shortest paths that go through a node. This leads us to the topic of path analysis. A path between two nodes is simply a worry of getting from one to the other. So if we look at the two orange nodes in the network, this would be a path between them. However, I could obviously also take this path, which is shorter, and in fact it is what is called the shortest path. Again, if we look at directed networks, things are a bit more complicated, since you can't move against the direction of an edge, the highlighted path is now the shortest path. If you look at a weighted network where not all edges are considered equal, again the shortest path may not be the strongest or best path between the nodes, since it may involve a very weak edge, as in this example. To globally characterize a network, we often base ourselves on path analysis and degree analysis. The goal here is to capture the overall structure of a network, and the simplest thing we can look at is just the size in terms of network diameter. The network diameter is defined as the length of the longest shortest path. That may sound complicated, but it simply means that we are looking for how far apart are the two nodes that are the furthest apart. In the example network, that is 4. You can also calculate the so-called characteristic path length, which instead of being the longest shortest path, is the average shortest path. So how far apart are two nodes typically? The other way, and perhaps the most common way of characterizing a network, is to look at the degree distribution. That is, we count how many nodes have only one edge, how many have two, how many have three, and so on. If we plot this distribution in a log-log plot, we will often see that the points fall on a line at least approximately. This shows that the degree distribution follows a so-called power law, and that the network is a scale-free network. The scale-free topology is considered interesting, because it means that the network is very robust in the sense that it does not easily break apart. This may sound good, but I have to come with a few words of caution here at the end. Firstly, almost all the networks we work with in biology are subject to study bias. That means if you are looking, for example, at a protein interaction network, you will almost certainly know more interactions if you study a protein more. For this reason, the degree centrality may in fact be more a measure of the studiedness of a protein than a measure of its actual biological importance. Similarly, the scale-free topology may reflect more the science funding that a lot of funding goes into a few proteins, than it reflects the actual topology of biological networks. If we are looking at undirected networks, there is also a problem of looking at flow, because in most cases, like in a physical protein interaction network, there is no information or biomass flowing along the edges. This means that the very concept of a path and the betweenness centrality may in fact be biologically meaningless. What does it even mean that a network falls apart? I hope I have not discouraged you too much and if you want to have a brighter view on how you can use networks for something very useful, I suggest this presentation on visualization of networks. Thanks for your attention.