 Welcome to this brief introduction to the core concepts of data visualization. In this presentation, I will cover the basic principles of data visualization, especially those applicable to networks. However, I won't cover visualization of networks themselves. For that, please go watch my earlier presentation on the core concepts of network visualization. The first thing to be aware of are the visual variables. In other words, what can we vary in order to encode data? The first thing is precision. We can encode information in where, for example, we placed the nodes in a network. I already touched upon that in the presentation on network visualization. However, we can also encode information in the size of the nodes. Bigger nodes might be more important. Information in the shapes. Triangles may have a particular meaning. Or in the color of the nodes. However, color is a bit more complex than that. What most people think of when they say color is what is actually called hue. The hue tells you whether something is green or orange or purple. However, you also have other color components. You have the saturation, which tells you, for example, how purple something is. And you have the brightness or lightness or value, these all synonyms, and they tell you how bright or dark something is. The next thing to be aware of are the so-called visual qualities. In other words, what are these visual variables good at? The first important such quality is selectivity. That tells you whether it's possible to, based on that variable, easily spot groups of, for example, nodes that have the same value. Almost all variables are selected, the most notable exception being shape. It's very difficult to, in a big network, quickly spot which nodes are, for example, the triangles. Another important quality is whether a variable is quantitative. You can encode quantitative data in precision, just think about it. In plots, you typically have an axis where you place things according to their value. However, you can also encode quantitative data in the size of a node. It's relatively easy to see that the area of one circle is about twice as big as the area of another circle. You also have ordered variables. All quantitative variables are inherently ordered, but the reverse is not true. For example, you can encode an ordering in the brightness of nodes. It's easy to see whether something is darker or brighter than another node. However, it's difficult to see whether something is twice as bright. Similarly, you can encode order in the saturation of color, but again, not quantitative data. Some of the exceptions that are not ordered are hue. You typically don't have a sorting to the color. It's not given whether green, orange and purple come in one order or another. The same goes for shape. They also don't have an implied order to them. Let's have a quick look at how these can be applied to networks. And for that, I'll look at two concrete networks. The first is the disease gene network. It looks like this and the nodes have been placed positioned according to the cluster to which they belong. That allows you to do selectively very quickly spot nodes that go together. We then use size, value and saturation. All of them for highlighting which nodes are drug targets. So the drug targets are larger, they are darker and they have more color. And lastly, we encode in the hue whether or not we have an FDA approved drug for a given drug target. Another example is proteomics data. In this case, we've taken some proteomics data and visualized them on a string network. The direction of change whether the protein is up or down regulated is encoded in the hue of the node. Meanwhile, the magnitude of the change is encoded in the node value. So darker blue or darker red nodes are more regulated. Lastly, we encode the confidence of the interactions both in the edge width and the edge value. In other words, highly confident edges are both wider and darker. Finally, some dos and don'ts with data visualization. One thing that is good to do is to use redundant encoding. Imagine that I want to highlight importance of something. I want to highlight important nodes in a network. I could do that using the size, making the important nodes larger. Alternatively, I can encode it using the brightness, making the important nodes darker so that they stand out more. But if I do both at the same time, I get a much stronger encoding. It's much more obvious which nodes are the important ones. You can also use complementary encodings to your advantage. You've seen that already. So for example, if you have several important categories, as you saw for the drug targets, you can encode in the size and the brightness of the node whether a node is important. But you can then encode in the hue which important category the important nodes belong to. Similarly in the up and down regulation examples, we use complementary encodings where the hue red versus blue tells you whether approach in is up or down regulated. But the value and the saturation tells you how much it's regulated. What you want to avoid is so-called competing encodings. For example, using hue to encode data on both the nodes and the edges. Imagine that I used red nodes to highlight something and red edges to highlight something else. That would mean that red is suddenly no longer selective. If I look for something red in the network, it's difficult to spot is it the nodes or the edges that are red. So for that reason, you will generally want to avoid using something like hue multiple times in the network. By that I mean generally you only want to use colors, different hues to encode data either in the nodes or in the edges, not both at the same time. That's all I have for you this time on data visualization networks. I plan to cover colors in more detail in the future talk. And if you're interested in these topics in general, have a look at this presentation. Thanks for your attention.