 Welcome to the dark side of network analysis. In this presentation I'll cover some major problems that are rarely talked about when it comes to using network analysis techniques to characterize biological networks. If you're not already familiar with such techniques, I strongly recommend that you go watch my introduction to the core concepts of network analysis before continuing with this presentation. First a quick recap. In the core concepts presentation I already talked about how the scale free topology of a network may tell you more about science funding following a power law than it tells you about the biology of the network. Similarly the degree centrality tends to tell you more about study bias in terms of which genes and proteins are popular than it tells you about the importance of them in the network. And finally the path in the networks will be completely meaningless from a biological perspective if there is no flow in the network such as in a physical protein interaction network. But there are other issues. Let's start with some poorly defined metrics. The characteristic path length is the average shortest path in your network and the network diameter is the longest shortest path in your network. These definitions work fine when you're looking at a network like this in which the network diameter is clearly four. But the problem is what if you have disconnected components? What if your network looks like this? What is the distance between these two nodes? If you follow the standard definition the distance between two disconnected nodes in a network is infinity and thus the diameter of this network is infinity and the characteristic path length that you're averaging over a number that is infinity is infinity too. This means that both the diameter and the characteristic path length will be infinity for any network with disconnected components even a single singleton in the network. That's clearly not very useful and for that reason people pragmatically often use only the existing path in the network. That means that for this network the shortest path is the one shown with a length of two. That is the diameter of this network. The problem is it's not really a global metric anymore. It's now the shortest path of one of the components so it's a metric describing one component not the global network. Another even bigger problem is path fragility. If you look at this network we have a shortest path between the two nodes of length four. If I remove just one edge I remove this path and I need to look for a new shortest path which is this one with length five. Similarly if I add just a single edge I can find a new shorter path between the same two nodes with length three. This means that both adding an edge or deleting an edge can change everything. It can change many different shortest paths in the network. It can change the betweenness centrality of lots of nodes including nodes not even anywhere near the edge is added or removed. It will almost certainly change the network diameter and it will change the characteristic path length of the network. But it gets even worse once you start looking at what is the main application area in biology of these methods. Typically we use them to study real biological networks with hundreds to thousands of nodes and thousands to tens of thousands of edges. These networks almost always have disconnected components at least some singleton nodes and they typically have the edges coming from some high throughput studies which means that even if we assume very low error rates in such a big network there will be many false positives and many false negatives. This means that it will break all of the path metrics because these are fragile. So what should we do? Well I would argue that we should do nothing. What I mean by that is that all fragile metrics will fail when you apply them to big networks with errors in them and biological networks are big networks with errors in them. For that reason we cannot fix this problem. The solution is thus to avoid these metrics since they clearly will not work on the kinds of networks that we're interested in studying. In other words you should simply do nothing, not use them. If you found this presentation interesting I'm sure you'll also enjoy my rant about how people are abusing the string database for doing poor data visualization. Thanks for your attention.