 similarity metrics. Sometimes it doesn't matter which one you use. If you work with omics data, you've probably tried doing hierarchical clustering, you try several different similarity metrics and you see no difference at all. Well, it's math time. If we look at Pearson correlation coefficient, a popular metric, the formula is this. If you've centered your data, all the means are zero, and you get this, which reduces to this, which can be rewritten to this in vector form. And that is the formula for cosine similarity, one of the other popular metrics. If the data are further normalized, the length of the vector is a one, and it reduces to nothing more than the inner product of x and y. So what about Euclidean distance? Well, the formula is this, and with a bit of high school math, it expands to this. If you've normalized the data, part of it becomes one, the formula reduces to this in vector form. Nothing more than a simple nonlinear transformation of the inner product of x and y. So if you have data that are both centered and normalized, you will have identical similarity rankings for all these different metrics and therefore identical clustering, at least if you're using single or complete linkage clustering.