 Okay, and we are back and hi everybody, and since I wasn't talking yesterday, I should introduce myself. So my name is Sadevan Bast, calling in from Tromsø Norway, working at the University of Tromsø, doing research of engineering and supporting users and research groups in computation and programming. And together with me is Johan, I don't know if you want to say a sentence like after a commercial, I mean there might be new people watching. Yes, so I'm Johan Helsvik, working at the CDC Center for Performance Computing at KTH in Stockholm. And we will now in the next roughly 60 minutes, we will talk about data visualization with matplotlib. And here on my screen I will zoom in a bit more, I just want you to know where to find it in the materials, I will click here on data visualization with matplotlib. We have seen some plotting in the previous wonderful lesson about pandas, now we will do a little bit more. And plotting and data visualization is something that I think we all need to do for reports, for the thesis, for publications, for presentations, and so that's why we put it into schedule. And Python and matplotlib, so the library that we will present is a very good match for data visualization, together with Jupiter. So is of course R and GGplot and matplotlib, but I mean this is a Python course, but the sort of tools that we will show here, they are really, really good match for data visualization and we will motivate, we will motivate why. So then we have a couple of goals to achieve. So in the next 60 minutes, we will maybe not become experts of matplotlib, but at least we want you to show what is possible. So we will show you a couple of examples. The most important thing for me is that you are able, after this lesson, that you have a good starting point, and that you are able to look for help. To search for help and being able to adapt gallery examples and examples. So I will show you more about it. Concept that is very important to us is the concept of reproducibility. And just to motivate this lesson and to motivate why we, why we talk about matplotlib. I wanted to show you the sentence here from this wonderful book, which is linked that data visualization needs to be automated figures should be auto generated as part of a data analysis pipeline, and they should really come out ready to be sent to the printer. So at the end of this lesson what we want to be able to do is that at the end of our Jupiter notebook is the, is the plot that we can then send into the publication, or we can put in our thesis or we can put in our presentation without, without manually post processing it in a different program. And that is really important. I will emphasize that again later when we talk about customizing because manually tweaking images. It will really bite you when you need to regenerate 50 figures. One day before the submission deadline, or you get additional data. If you want to update the figures and then it's, then it's nice if I can just rerun the whole pipeline and at the end of my Jupiter notebook are the updated figures. It's also really important. When the person who originally created the figures left a group, and this happens I mean master students PhD students postdocs they, they, they enter the group they leave the research group. And sometimes all we have left is a JPEG of the plot. And how do you want to then adapt that. A more reproducible, flexible way. And there are a couple of libraries in couple of tools in Python exist. Here I list those that I know. I'm sure they are more and more. We will focus on my book. And why are we starting with my book and not with any of the other tools. The reason is that my book is maybe the most popular one. It's 18 years old. It's maybe the most standard the most used library. I think it's a good starting point. There are other libraries that built on top of my book. So then it helps to know my book a little bit. Those of you who who also write code in Matlab will feel probably feel a bit familiar because my book actually comes from a lot. I wouldn't be surprised if the math in my book. It actually has something to do with Matlab. And even if you choose another library, and it's sometimes a personal preference. It can be helpful. Also the chances are high that you may need to adopt a map of your colleague. And it can be for adapting some of the libraries, for example, Seabourn, it can be good to know how my book works. So let's, let's maybe get started. How are you, how about you take the screen for me and then we tried out we get together we build up a plot and to get started with Matlab. And then later we will discuss how we can tweak it and customize it and adapt it to our liking. Yes, thanks everyone. Thank you for taking the screen. And I can just first emphasize really what the other one said that I mean the importance of this with being able to reproduce your figures and also produce figures in different of different kinds. So efficient manner is very important. So a common thing is that you will make a figure for let's say one format that could be for a slide when you give a presentation. And then the same material the same date is going to is going to end up in a manuscript. And then you need to have perhaps another aspect ratio of the figure, and you might need to have a different sizes of fonts and everything. And then it's very convenient and efficient when you have things scriptable. So what we'll do now is to create our first plot. So we'll start here from from this code snippet here. So I will now move this window and go to a Jupyter notebook. Copy the code and paste it here. And we execute it right away. I scroll down and display what we get. Do we do we expect now. So should people now do it at the same time should they wait a little bit. Yes. So this is now that that you just follow along and then you will actually continue to work with the same code snippet and use that in the first exercise to extend this this example. Yes, so at this moment it's good to just watch. And then in a moment we will go into exercise and then then you can try this out as well. The result here is a rather standard to the national plots where we have this yellow dots indicating a data set. So I now scroll back up to the code and I will highlight what the different lines of code are doing. So the first statement here is specifying that not totally will inline the figure into the notebook itself. Here we import my political and by convention we do this as PLT. We have an example data set here. It's X coordinates. We have white coordinates. And this segment here is interesting. So you can see that you have two handles here. So figure is an object for the figure window. So how did one fight in the terminal standalone window would have been opened and then fig is the handle we have on this window. So here is the line of code. It is doing that you're plotting the data is not continuous. So we use a scatter plot. It arguments the X data. The Y data. Then we specify also the color. We in order to get some labeling up. We have statements here to specify the label of the X axis. The Y axis. And then some title. So these are for the moment you have placeholders and then in the exercise you will adopt these two. So to say real content. So finally in order to save the figure. One can use the statement with the figure handle. And then save fig. And you save to you saved a file of an appropriate file format. So this can then just be to a BND format, which is a good choice if you will have the figure coming up on a web page. If you want to retain full information in the plot, then it can be appropriate to save the file in the vector file. As for instance the EPS format. So we are now coming to the first exercise. I'm also watching here. I can be, but I think the questions are getting answered. I think nothing we need to really really raise here, but you can say that you can choose different output formats. You can also chat. You can then adopt the resolution of the figure. So these are things that can be can be changed. And you can also save the file in vector format. This is a good point. And indeed. So very often you might have that you have generated in some resolution. And then you would like at a later instance to have the figure in another resolution. And then it's very convenient when you can do that by changing perhaps only one or two lines of code. So what you will do in the first exercise for which you will have 15 minutes is to extend the figure with an additional data set. And you will then also use different colors so that we can separate the data. And when we have multiple data sets, then it's also good to specify what they are by introducing a legend. So let's go to the next slide. So let's go to the next slide. So let's go to the next slide. So let's go to the next slide. So let's see. Rather one. Is there something more to say for the exercise? Yeah, so just want to say that the goal of this first exercise, it may be it may be really simple for those who have already used not what lip, but the goal here is really to to see something in your notebook. And then there's also a solution that you can enroll. So there is one suggested solution how so you can also have a look there. But the goal is then that you can well, try to play a little bit, adapt the labels or add a label for the data. And after the 15 minutes, you should be able to see what we see here in this plot. And feel free to of course ask questions. Those of you who are in Zoom rooms, the helpers who are there, ask on Hec and Lee. Some of these, especially the way we do the data to scale. This might be a new thing for some of you. So feel free to ask about what is what is happening there. And so you will have 15 minutes time. And when we come back, we will then answer questions. We will discuss some of this and later then we will go in and try to learn how to tweak customized improve our plots. And then we will come back everyone after the exercise. So hope you came far with it. And we will highlight a few things that we caught on the Hec and Lee. So you've got the question here. I'm always confused with finger nags. Both things allow labels. And that's a very important question. So we will get to that in a minute. And then I think the other one, there was one thing that you wanted to highlight. Yeah, I wanted to thank for the great, really good questions and comments. There was also one about why this one. Why couldn't we just do data to underscore why underscore scale why couldn't we just multiply. And that's a very good question. And here I wanted to say that something that is can be a little bit confusing is that in Python, there are many ways to collect numbers into our collection. There are lists. There are non-py arrays. There are pandas data frames. And they can do different things. So we could have done this if we had used a non-py array that we have seen yesterday. And maybe we should have used that in this exercise, but I was not sure what would be sort of less confusing in the exercise. This is not a non-py array, but it's a Python list. And with a Python list, there are a lot of things so compactly. So that's why. That is a very good question. Yeah, thank you. I'll talk now about this with interfaces. So Matplotlib has two interfaces. So one is the more object-oriented one. And we already saw it in the exercise. So FIG and X are objects that can be handled separately. And this is convenient whenever we have two or more figures in the same notebook or script. So you would then have FIG 1, FIG 2, and so forth. And then you can access them independently, whichever. The more traditional thing would be the Python interface. Where PLT is handled to set the global settings. Something which is a very common situation is that you will look for templates for your plots on the internet. And you will find code snippets that you will start out from and reuse. And what you will then encounter is that you will see that sometimes you will find code snippets which are using the FIG and X objects. And sometimes you will find code material which is using the Python interface. So it's good to be prepared on seeing that. And for the longest time I really didn't know that there are two ways of running Matlab. So although I'm using Matlab since many years, actually every time I'm looking up something and I was surprised that every time I serve something on Stack Overflow it always looked a little bit different than I remembered it from last time. And then I think like a year ago or two years ago I found out that there are two different ways of doing it. They have pros and cons and I think it can help to remove some confusion. And again, reminding what Johan said about FIG and X which was I think a really good way of picturing it. So I like imagining FIG as the frame and the frame can have a size and I can resize it and I can save it. And it's useful to imagine the X as the canvas. So these are the axes and the ticks and the data points and this is what we really plot. And each of them carry different functions. And I don't remember all of them. So that's hopefully useful. Yes. So yes, there's a concluding remark here so our recommendation is that when you write new code from scratch that you go for the object-oriented interface. As it is it has more flexibility. More flexibility and less side effects. Yes. So now I think we are coming to styling and customization and then I think that you can take the screen. I'm working on it. So taking the screen in a second and now sharing. And I have here the lesson open and in the background I already have a Jupyter notebook ready to go so we will do more Jupytering and now we talk about styling and customizing plots. And this is something that we also often need to do for publications. So we have this now so far we have created this rough plot but maybe we need to change the font size or we need to change the aspect ratio or we want to have a different styling here. Maybe we maybe tempt it to go in and open it up in a different program and maybe you know change the font sort of manually in a different code every plot but I think it's good to resist this temptation. It's good to automate it. Let the Jupyter notebook do all the customization because then again reminding if we need to then regenerate all these 50 figures once we get it's just changing your line and everything will come out automatically. And this is also nice for all the other community that will try to reproduce my visualization. And then we can attach the visualization to the publication as supporting information and we will come back to this in fact on Thursday where we take it a step further and we will discuss Binder which is a wonderful wonderful service. So let's talk about customization. We can in Matbotlib you can adopt, you can adjust everything. Really absolutely everything. I don't remember how every time I have to search for it but what can be really useful to know is how are these things even called. So if I want to search for something how is this called and I find this very useful. It's parts of a figure. Let's open it up here. So here I can find out how are these things called. So this is called the X axis label and this is the minor tick label. Then I know what to search for. So this is useful. The other resource that I find really useful are these cheat sheets. They are a collection of really really nice Matbotlib cheat sheets for beginners, for intermediates. You can print them out and they are really beautiful. Back to customization. One more thing I wanted to mention is that there are also style sheets in Matbotlib. Let's open it up. So you can really change the overall look. I know this is a little bit tiny. So if you really like the ggplot look. So there are many styles you can choose from. And here is an example how you could change the overall style. So this is really important. We will practice that. But we have here we have a number of possible exercises. I wanted to walk you through them and you will be able to choose the one that is most interesting for you and most relevant for your work. So let me walk you through the three exercises. And you will have 15 minutes and you can choose one of the three if one suggestion is if you manage to finish one earlier you can of course start with the second one. So let me show you what they are about. Exercise number one is that this connects a bit to the pandas lecture earlier today and yesterday we will read some data from our pre-sales and we will try to plot it. And if you here is a starting point you can take this as a starting point and what we plot here in this exercise is on the excesses we have GDP per inhabitant. So in other words how wealthy is a country and all these dots are countries, different countries. And this is from the past. And on the y-axis we see what is the life expectancy. And then a sad fact is that the countries that are more wealthy, the people live longer and vice versa. And here we realize that the linear axis is maybe not the most appropriate it would be maybe more appropriate to take a log axis. So your goal is to take these parts as a figure starting point, do a bit of work search and your goal is to adapt this figure from linear axis to log axis. And additional question that you can explore what does this alpha zero five do here. So a little bit of exploration. Exercise two and of course you can unroll this and you find one possible solution. Exercise two is it's good to know it's a different thing than creating a plot for a publication which goes to print because in a Zoom presentation you have the whole screen. In a publication we sometimes only have a column. So the goal of this exercise is to take a starting point but to we want to increase the font size we want to increase the thick sizes so that it still looks readable if I print it out on paper. That's something I actually also recommend to do before publishing to really print it out in the size that it will appear and have a look is this still understandable this is still readable. So you can try that. Exercise three I think it's a very nice exercise. 15 minutes may not be enough time but I think I really encourage you to go through this. During a rainy afternoon this is something I often do I don't start from scratch I actually if I want to do a plot I open up one of these galleries for instance the mapplotlib gallery and I browse it until I find a plot that looks similar to what I had in mind. For instance this one and then the next step I do is I try to run it as it is and I hope I get the same result as the people who created it. So that's the second step. The third step is try to find out where in this example is the data and the data here probably will be these So try to find out where is the data the data is here and then try to modify the data try to replace it with your own data so try to reproduce it try to find out where is the data and then try to find in which format is it is it a numpy array, is it a list is it a two-dimensional list try to modify the data and then if you have the time try to feed it different simplified data and this will be really the key to adapting this example to your project so that's something I do very often then I want to plot my data and then the next step is to customize and tweak and here is an example exploration so does that sound fine so again we will have 15 minutes you can choose 1, 2 or 3 if you're working groups maybe it can be good to choose 1, 2 or 3 as a group when we come back from the exercise we will take a break and after the break I will maybe go through one of these together maybe maybe actually the example 3 that we really try to explore this together and then we will summarize did I miss anything I think you you got it all Radavan I think one comment I could make is that we have all of these snippets of code and when you write all of this it ends up a little bit scattered and there you have it perhaps shared between different projects but one thing which is a good thing to strive for is to put these snippets of code that you are using for the plotting for instance the main repository of code you have for doing some miracle calculation you can do it all in one place so that can be useful for generating figures for results and also for documentation of the code and for tutorial material and that's a great point and the last thing about the Jupyter Notebook is that we have the chance to have the data and the plotting and the figures but also our thinking can be all in one place because we can document it and then if I open it up again in one year later I don't have to look for where was the data again on which external hardware was it it's all in one place and you can also use the time in the exercise to actually discuss some horror stories about customization in a late minute I think we all have this with all of these stories so we will give you 15 minutes we will be back one minute well we will be back at the full hour and after the break we go through one of these exercises together yeah so in practice we will not meet at 12 so simply at 12 you go for the break and then we will meet again 10 past the hour sounds great enjoy the exercise looking forward to discuss it with you after the break so welcome back everyone after the break so I also hope that you have been working with progress with exercise one two and three so yes I would like to point out that there are good questions in HackMD and we will answer them synchronously and get back and perhaps highlight a few of them tomorrow so now I know that some of you have been working on exercise one and two and exercise we have template solutions for exercise three it's open-ended where you could go in different directions and we have one worked example with the C-Born library and is there perhaps an example exploration that we have with one of the other libraries yes so we will show now the remaining like seven or eight minutes left we could try it together because the exercise three is maybe the most interesting one but probably takes more than 50 minutes but again I really invite you to try it out and what we could do now just for a bit of difficulty is that we take one let's take one example out of my potlip and we try to adapt it a little bit and try to make sense of it so this is how I often start I go into the gallery what I will do now let's take this one here as an example it shows some scores for men and women I don't really know but first thing we will do is I will copy that into my notebook and I will try to run it to make it more readable so this is how I often start copy paste and I will run it and I will shift enter and I'm pretty happy here because I get the same result that's already a really good step and now the second step before I go into any details I try to make sense of the data and the data seems to be here so there are labels now let's simplify it a little bit maybe I just want to have two two values plotted so there are five five things in a list and I see five bars so what happens if I remove three of them will that what will happen probably it will still work so what are these data types that you have there so these are lists Python lists a list of integers here here there is a list of strings so these are not NumPy arrays one could also use NumPy arrays and I run this again and I have no two columns probably I could modify it with and now let's just have a little bit more fun here so I will call this day one and day two and I saw that the labels changed and instead of scores where is scores I want to change this thing it's probably that one I don't know, number of viewers and instead of gender let's look at numbers by tool and instead of men and women let's imagine we are interested in how many people watch on Twitch via later I don't know YouTube as we run and it's still kind of working and now I can imagine that this is probably the these are the bars and these will be the kind of error and I don't know how many viewers I guess yesterday was probably and now it's also a bit confusing because we have this is not men and women we could rename these we have a number for today so currently we have 195 viewers on Twitch so let's call this Twitch 195 I don't know yeah well in yes and how I change this I will run it it will not work because it will, oh how come it works why does it work because well I don't understand why this code is working because I was hoping that I should need also change this but you had this erase in memory perhaps yeah alright yeah good point let me go back because one thing I should really do because it meant that's a very good thing let me show that before I finish this this is very very good one thing I should really do is before I save the notebook because I think actually everything is good here it works what I should do is I like to rerun all cells you can even restart the kernel and rerun all cells let's try that because then it should really break it will restart the whole thing and run the whole thing from top to bottom let's see restart yes and now we get the error very good yeah yeah because now it it just doesn't find men means is undefined because I changed that so that was really good demo effect before saving an output before sharing with other people really good to restart rerun all cells from top to bottom because then this is the first thing that the next person will do good I will not go more into details here um but back to the lesson just to summarize we have like three moments left I find this really useful to do go through take an example that is close to what you work on right now also you can try any of these other libraries they are all great with Jupiter some of them are not part of anaconda so some of them you need an extra installation step but also have a look at seaborn just a quick peek here at the gallery also very nice library which builds on top of it and for seaborn we do have an example exploration that we can open in the lesson so you can go through that and that will work also in your anaconda so that's what I often do I take something existing I want to tweak the data if it looks somehow alright then I improve the looks and ready to publish so let's summarize the session it was very quick and we only could give you some starting points hopefully it was useful some points that I would again like to repeat is automation is our friend there will be the day when we are really happy that we have everything in notebook and don't have to redo all the figures by hand and all of them will regenerate in two minutes as Johan mentioned keep the data and the thinking process and the plotting and the figures all in one place if you can sometimes you cannot example when you cannot is when the data is sensitive then it has to be in a different place or if the data is gigantic then it also needs to be in a different place but then we can fetch it with pandas for instance and on Thursday we will take this even a step further because we will show how to create a binder instance from our notebook and then we can share visualization with others and they can reuse it and reproduce it and they don't even have to have Jupyter and Mapotlip on the computer all they need is a browser so this will be very nice and we come back to that on Thursday what did I forget to say I could perhaps highlight one thing the color scales and as you can read down here an important aspect is that some people are perhaps color blind and it's a very good thing to have a color scale that works also if you project it to a black and white color scale and this is also important for the sake of printing because sometimes you print things in black and white then it's good if the color scale is working from the beginning yes and further up in the lesson we have some links to resources that actually give you a good color palette which is adapted to these different color vision deficiencies great point, thanks so much for watching and for listening and for the questions and we will catch up with the questions and we will hand over to Simo