 So, good afternoon, everyone. My name is Jerry Ryder and it's my pleasure to host this webinar today. Now, on to our speaker today. For those that weren't with us for the previous webinar in this series, our speaker is Martin Schweitzer, who is a data technologist with ANZ in our Melbourne office. Martin has a background in computer science and a particular interest in visualisation, data science and user interface design. He has a very professional background, which includes photography, working on large IT systems, lecturing, as well as running workshops and training courses. Martin is currently seconded to ANZ from the Bureau of Meteorology, where he is largely responsible for the climate record of Australia. Today, Martin is presenting the second in the series of two webinars on data visualisation and today's focus will be on tools and techniques. So, without any further ado, I'll hand over to you, Martin. Thanks, Jerry. And thanks, Susanna, who's behind the controls. I hope everybody can see my screen. So, today we're going to look at creating visualisations and pretty much everything you see is going to be live. I'm using a tool called Jupiter Notebook. You don't have to be familiar with this tool to follow along. Also, I'll be using Python for my examples. But if you don't know Python once again, that shouldn't be a problem because most of the tools and techniques that I will be showing you will be available in other languages, for example, R and other languages like that. So, I'll be going through a number of libraries, showing generally the strengths of each library, where they can be used, how they can be used. And as we progress, we'll move from more static to more web-based type environments. So, Jupiter Notebook runs in a web browser. And so, what you see here is my web browser. And I'll just maximise it now that we know it's a web browser. And what it allows us to do is to type in Python code and then execute it immediately. And this is great for anybody doing research because a lot of the work is in experimenting. You try something, you adjust a few parameters and so on. The first two lines I've got is just to set up our environment. And it's often we get error messages showing, so this will just hide them. Some of the libraries we'll be exploring today. The first one is Matplotlib. We'll be looking at pandas, one called Seaborn. Two web-based called Bokey and Plotly. And the last one is one that's used for mapping called Basemap. And as I go into each one, we'll talk about them in detail. Now, if anybody missed the first talk, that's not a problem because I'll explain things as I go along. But a number of these examples are showing how we created the plots that you may have seen in the first talk. Also, one of the things I said in the first talk was when doing any sort of visualisation, it's important to have some sort of story or some sort of reason or something we're trying to say with that visualisation. So the first visualisation I'm going to show is actually based on a problem that I came across just while browsing the web. And I'll explain the problem. Basically, we have a room. There are 50 people in the room and each person starts with $100. Each time the clock ticks, each person takes a random number. It picks a card between 1 and 50. And let's say their card says $26, they'll give $1 to person $26 if they've got some money in their hand. If they've got no money at that point in time, they don't give anything. So the question was, after a few thousand ticks, let's say a thousand, how much money, how will the money be distributed? Will it be fairly evenly distributed among the people? Will some people have a lot of money and some people a little? And so on. So I found this quite interesting. I wrote a small simulation. The first part of this code is the simulation. So all that is simulating what happens. And I'm now using the library called Macplotlib. Macplotlib is as much as you can say there's a standard plotting library in Python, the standard plotting library. And in Macplotlib, to plot my results that I got requires one line. So to run this, I just press, I'm going to press control return. And if everything works, as I hope it will, we get a plot. So this one line of code gave me a plot. It realized that there were 50 elements and that the values inside those elements were between 0 and 350. However, this plot doesn't really give us a picture of what's happened. So what I'm going to do is sort it so that the people with the least money come up first and the people with the most money come last. And so as I said, this is an interactive environment. So all I have to do is press make that change, press shift enter again. It runs at a thousand times. Now I get a very different plot. And there we can see that a lot of people have below $50. Most of them have below $100. And a very few people have between $250 and $350. So one of the things you will note, I haven't done this yet. I'll just change this once again. And now what I'm going to do is to save the plot as an SVG. SVG is Scalable Vector Graphics. And I'll run the plot again. It will save it. And I can now open it in a web page. So we'll just open that web page here. And what we see is we've got a nice plot. And the other thing about that special about Scalable Vector Graphics is that it's scalable. If I make it bigger, because it's a vector, we don't get any artifacts. It just scales. It gets bigger, it gets smaller. We don't lose any quality. The next thing is we can look at this and say, well, this sort of looks exponential. How well does it fit an exponential curve? And so once again, matplotlib allows us to do this. We're not going to rerun the simulation. We'll just use the data from the last simulation. And when we run it, what we see is we get a nice little... So what I've done is I said, added along a polynomial of third degree that fits those points. And I've just noticed that I haven't reset something from the last time I ran it. So I'm quickly going to restart this. I'm going to have to rerun the simulation unfortunately. And when we do this again, okay, which is what I expected, an orange line. So initially, what we see is we've got this thin orange line, which nicely fits the points. And by changing the plot, we can set things like the line width equals five, run it again. And we get a thick orange line, which is much easier to see, but it's crossing over the points. So the next thing we can do is just say alpha, which is the transparency. So equals 0.5, which basically says make it 50% transparent. And now we get a nice, thick, transparent line running through those points. So once again, matplotlib gives us the ability to customize this line. If we wanted x's instead of circles, we can change the plot of the points to x's. And I'll run again and we get x's. So one of the characteristics that we often look for when looking for a library is this idea that simple things should be simple, complex things should be possible. In other words, we don't want a very long learning curve or have to do a lot of work to get a simple graph. But if we do want something special, we want to be able to do it. We don't want a tool that's really simple to use. But as soon as we want something a little bit out of the ordinary, suddenly the tool stops working. So we'll look at a few more examples of matplotlib. This one actually comes from the documentation and the easiest thing is to show it. It plots a polar graph. It's using different colors. And we've just, in this case, generated some random numbers and made them the size of the circle and generated a random number for where around the circle we've plotted it. And then depending on the angle, we're plotting the numbers in a different color. Once again, the code that's doing the plotting are those two lines. Once we've generated the data, all we need are two lines of code and it gives us a really nice polar plot. And here's another example, once again also from the documentation. What we're doing with this one is we're going to display it interactively in Jupyter Notebook. But we're also going to save it. This time we're going to save it as a PDF. So let's run this. And there we get four histograms. It's the same data each time. But what it's demonstrating to us is different ways in which we can create histograms. So we can have stacked. We can have unfilled. We can have bars with legends, et cetera. And if we have a look at the PDF that we've generated, there we go. And once again, because it's a PDF, it's scalable. And as we scale, we don't lose any quality. It just is all done with vectors. And gives us really nice output. And unfortunately, I think that's not there yet. And where that's useful is if we're doing any kind of publication, it's really nice to be able to save our output either as SVG or PDF and include that in a publication. This is one more showing the range of plots we can do. And this one is called a hexbin plot. So it's plotting with hexagons. And what we're doing is it's kind of a cross between a scatter plot, which plots x values and y values on a graph where we've got, like we may have, the x values may be how many fix somebody has eaten and the y values may be something like the wet. And we want to plot those two against each other. But what this is also doing is plotting it against how frequently those values occur. And one of the nice things is it's really easy to do log scales. What we see here is a nice graph showing us that that white area in the center of values that occur very often. And as we move out, values occurring less and less often. The following one is actually what we're going to do is quickly look at a comic. Some people may be familiar with the web comic called XKCD. The person Randall Monroe is a very funny person, but with quite a scientific bend and also quite strong computer skills. This one is called stove ownership. And it shows his health before he realized he could cook bacon whenever he wanted and afterwards. The thing about this graph is that it's hand drawn. And while sometimes we want graphs that look very polished, very professional, there's often a perception when people see a graph like that, that the figures are very accurate. And this isn't always the case. So what people did was to create a style using macplotlib that would recreate the look and feel of the XKCD. And this is quite long because there's quite a lot. In fact, they've taken two of his comics and I'll quickly plot that using macplotlib. And what we see is a computer generated reproduction of Randall Monroe's graph. This is a histogram done in very much the same style which copies another one of his comics. So taking the style, I will replot my simulation. You'll remember the results from my simulation. So if we run that again, we see we get this thing. And which once again, so now that it doesn't look slick and professional, we see really this is very much a simulation that these figures aren't accurate. And so what this does do for us is it does give us an idea of the flexibility of macplotlib. So I'll quickly restart the kernel before going into our next library. And the next library is called Pandas. Pandas is a very useful library for anybody who's working with spreadsheets, who's working with CSV files, who's working with data that's coming from an API across the web. And it also has its own plotting routines built in. So in this code here, the first line, which actually goes over three lines, I'm reading a file which is down storage levels. It's a CSV file. Anybody who watched the last presentation would be familiar that I showed some examples. You'll see the same examples again today. So the first line reads the file. The second line plots it using Pandas plotting. And the third line just adds a legend to or sets the label on the file. So we'll run this code. And there we did get... These are Melbourne dams. And this is showing that the Thompson is about 68% full and things like Taraga are 95% full. So what we've done is in one line, we've read that CSV file. We've coded what we want to call the columns. One of the columns is called name. And one of the columns is called P full for percentage full. And when we plot it, because Pandas knows about this thing called DF, all we have to say, I want to plot the name against the percentage full, and I want to bar chart. And I've also said, I want to plot from the value 60 to 100. So if I leave out those values, plot the same graph, it will plot from 0 to 100 by default. So I'll just run this one. And part of the thing was to show that even though we've got the same figures in the same graph, it looks different when we start our scale from 0. So once again, the Thompson is about 68% full and Taraga is almost 100% full. The point, the take home point here is that to create that plot took two lines of code. And the third thing that we showed last time was what's really interesting though is the gap in volume of these different dams. And that gives us a much better picture of what's happening. So when we run that, what we see is because the Thompson dam is a really big dam, it's got over 200,000 gigalitres of water deficit. So even though these dams on the right are almost full, altogether they don't even make up that deficit in the dam. So that's matplotlib. And its strength is that it pretty much comes standard with Python. It's flexible. However, simplicity often comes at the cost that it's not the best publication ready graphing tool. You can get very nice publication ready graphs by doing a bit of work. But what some people have done is to do that work to make it easier for people to create better graphs. So one of those libraries is called Seaborn. And Seaborn basically sits on top of matplotlib and simply adds some nice styles. So we'll re-plot that same plot this time using Seaborn. And all that we're doing here is importing it and initializing it. So we've just added those two lines. Everything else is exactly as the last example. And we run this and we see a totally, well, similar graph but different styling. And what Seaborn has done is to make it quite easy to change the styling. So I'll say set the style to white, run it again. And we see we get a nice clean graph. And for example, in the next example what we will do is we'll set the style but we want a white grid and a muted palette. And we will run that one and we get that white grid with a muted palette. The next one is just one of the... Well, one thing that Seaborn does, which a few packages are starting to do, is that it actually includes its own data sets when you install the package, which is really great for when you're learning because one of the worst things is you pick up a package, you try and learn it. But the first thing you've got to do is find some data to plot and so on. One of the data sets that Seaborn comes with is this one called Flites. And I really enjoy heat maps. So this is just an example of a really simple heat map using Seaborn and some of its inbuilt data or some of the data that's provided. And what we see over here is these going down the bottom are years across the Y axis are months. And so around about 1960, July, there were lots of flights and in the earlier years, I guess there were fewer flights also during winter there were fewer flights than in summer. Once again, done with Seaborn and done really with two lines of code, which are those lines. Another data set that comes with Seaborn is called Tips. And it's basically how much people were tipped at restaurants. So the first thing we'll do is to load the data set and have a look at the first 10 rows of this data set. So what we see is we've got a few columns. The first one is the amount of the total bill, what tip was left, the sex of the person serving, whether or not there was smoker, what day of the week it was, whether it was lunch or dinner and the size of the party. So we're going to use that data set and have a look at a few Seaborn graphs. First one we're going to look at is a box plot. And what we've done is we've said we want the U to be whether there was smoker or not. So the purplish color means there were smokers. The greenish color means there weren't. On the left hand side, we've got the size of the total bill and across the bottom is a day. So it does seem that on Sundays maybe people tip more and it would look like on Sundays maybe for some reason, whatever smokers tip more than on smokers. Another plot that is often used in similar ways to the box plot but carries a bit more information, encodes a bit more information is what's known as a violin plot. And these ones are quite easy in Seaborn. And in this case, what we've done is we've used a different U for male and female. And so basically you read this pretty much the same as the box plot. There's the median. There's the top portal, the bottom portal, etc. And some of the information is very much the same. On Sundays people seem to tip the most. And we can see they've been splitting this time into male and female. Those people who were the last one will remember I demonstrated something called Hanscom Quartet. It's four data sets, each with the same means and linear regression lines. But each data set looks very different. Here's a very simple example of it being done in Seaborn. So we'll just have a quick look at that. And we see it was quite easy. In this case, we're sharing the y-axis. Across the bottom we're sharing the x-axis of the two plots. And all of this was done in a very, very compact way using Seaborn. The next thing we're going to look at is plotting data on maps. And this goes back to a lot of what I do in my substantive job at the Bureau. So the library that we use for a lot of our mapping is, once again, it's a standard with Python. It's called Basemap. And the first one, we're not actually plotting anything. We're just simply drawing a map. So what we should see now, it takes a little bit the first time we run it. But we've plotted a map of the world in a few lines of code. That's pretty much from there to there. So what we're really interested in at the stage is Australia. And this projection isn't as useful as what we're going to look at now, which is a sort of MacCator. So we'll just change some of the parameters. This should give us a map of Australia, which is great. It looks a bit like the ones I draw by hand. So what I'm going to demonstrate now is some more visualization. But it goes back to a problem I was given about a year or two ago. So we have about 112 reference stations around Australia. And these are stations with very high quality data that have a long record, about 50 years or longer. And these are very important as reference stations to see what's happening with the climate of Australia. One of the outcomes of this, the reference station set is called Acorn. And we do a publication where we publish the names of each station. And one of the things we also publish is for each station, which are the closest three stations to that station? So I wrote some codes that worked out what the closest three stations were to each station. This was the file I was given. Once again, I'm using pandas to read it. And so we've got, for example, a whole screen. We've got the latitude, longitude, the altitude, and the date it was opened. As we can see, these all have a very long record. So the first thing, let's plot these. So using map plot lib, the first parts we've seen that draws the map. This line, after having read the file, plots the date on a map. So we'll just quickly plot those stations. So the black dots, of course, are the stations. And there are 112 of them around Australia. So the question I was asked is, after saying, okay, as a list for each station, these are the closest three stations to that particular station. Bing Santas, though, was asking interesting questions. They said, by going to one of the closest three stations from each station, is it possible to get from any station to any other station? Now, it may seem that the obvious answer is yes. But the thing is, because if I'm sitting here, these may be my three closest stations. But that does not mean that where I'm sitting, which is around Microthorah, that it's going to be one of the closest three stations to this station because this station's three closest neighbors are maybe these three stations. So the first thing I did, because I'm fairly visual, was to try and visualize it. And what I did was to go back to a very old package, which is about 30 years old. I first used it probably more than 20 years ago, called GraphViz. And Python includes bindings for GraphViz. So we can think of this as each station is a node, and we've got lines connecting it to the three closest stations. So what I've done is to do something that will visualize that. And so we'll just run this code, and it creates a PDF. And what we see in this PDF is that I've simply used the station numbers to save space. We can see the layout of all the stations and move across here. One of the things that we see, for example, is that station 7045, even though it's got three stations that are closest to it, there's no station for which 7045 is the closest station. And we can see it in a few other places as well. So I think over here, we've only got one line going from 859.6 to 1293. If anybody wants to guess, this part is, in fact, the stations that are in Tasmania. And if we go back to our graph map, we can see how these are all close together. That station is close to that one, but these are all closer to each other than the main one. So basically that graph helps us visualize. And yes, it turns out, after writing some code, that there is no single path. So the next question, once again, these people being scientists, is where would we have to add stations so that there's always a way that we can get to another station by visiting one of the closest three? So I came up with a new visualization, and it's called a Voronoi plot. And I'll run this code. And what a Voronoi plot does is it's not easy to show yes. I'll show it in a web page that I did. So on this map, you see the ACONSAT stations, and you see all these polygons. And what these polygons are, every point inside this polygon, for example, is closer to this station than to any other station that's not inside the polygon. So any point inside this polygon, this point, for example, is closer to there than it is to any of the surrounding stations. So basically it divides the territory up into areas. And in a way it's saying, OK, well, the temperature there, we could argue, is most influenced by this station. So if we've got a temperature, yeah, and want to check it for accuracy or whatever, we're more likely to look, yeah, than one of these other stations. But so what does this have to do with where do we build a site? Well, if we consider this line, then any point along this line is the point that's the furthest point between this station and the station. And any point on this line is the point that's the furthest one between that station and that station. Therefore, if we were going to... So any point on one of these edges where these lines meet is the point that's the furthest from all the adjoining stations. So this point is furthest from that one, that one, and that one, and obviously further than any other. So what it comes down to is if we're going to build a new station, we want it on one of these points, on one of these vertices. So it's just another example of how we can use visualizations to solve some real problems. So for the moment, that's all we're going to do with maps, and we may return to it soon. The next library we're going to quickly have a look at is called Bokeh. And Bokeh is the first library that works with Python, but its output is targeted generally at web pages. So once again, you'll remember Anscum's Quartet from a previous slide. We'll do it in Bokeh, and it's given us a really nice graph of Anscum's Quartet. And if one sees some of the original drawings of it, for example, in Tufta's book, this is very close to the original. So it was very easy to book. It required some work to make it look similar, but we could, it was flexible enough that we could. I'll quickly show another one, which is another famous machine learning dataset, which is irises. And this one is plotting, so what we're plotting is the petal width of different species against the petal length. And we see that some species are down here. Some species, the green ones are up here and some of you. The thing about Bokeh is it allows us some interactivity, so we can do things like zoom. We can also pan. And if we put the output on a web page, the web page can have these same tools. There's a wheel zoom. We can go back to what it looked like initially. So that's Bokeh. And there's one that also came from that last one called Joy Plots. And the thing about this is we're plotting a whole lot of variables against a common set of axes. I'll just, for the moment, skip over it properly because I want to look at a few tools that are useful in web development. So we're leaving Python for a moment. And the first one is one that I read a few years ago. So this is using Google Maps and I'm putting some data on it. And these are the ACON set stations once again. And when we click on one, we get a graph of the climatology, the average monthly temperature. So let's get Melbourne. So we're now in April. So the average maximum temperature for Melbourne is normally 21 degrees. This is the average rainfall for Melbourne, around 50 mil. We can also get time series and we can zoom in on the time series. So this graph and the time series, we're done using a tool called High Charts. High Charts is available free for non-commercial use, but it does require lessons for any kind of commercial use. And government use is also considered commercial. Having said that, if you are doing web pages and you are looking for a plotting package, it's worth considering High Charts. The next example is another mapping library. This one is leaflet. In this case, this is something I did for work. What we're plotting here is this data is coming from NetCDF files. Some people will be familiar and have used NetCDF and the data is coming straight out of these NetCDF files. The main purpose of this slide is to show this library leaflet, which basically allows us to put data on top of maps. In this case, it's gridded data, but we can also put... Yeah, we've got some GeoJSON. We could also be putting shapefiles and other things like utility boundaries, which you can overlay on the map. It basically allows us to overlay data on top of maps. The third example I'll show is one called Open Layers. And this was one of the more complex visualizations I did. Basically, what this one is demonstrating is east coast lows of the eastern seaboard of Australia. All of this was overlaid on this map using... The map was done with Open Layers. I think that's all I'll talk about. I think finally what we'll do is look at one more library and one more example. The library we're going to look at is called Fager. And once again, it's another simulation. I came across this thing called Perondo's Paradox. And for me, it was quite mind-blowing. So I just had to do a visualization to make sure that I understood it and that it worked. So basically, I'll try and explain it quickly. You've got three games you can play and each of them involves a coin being spun. And in game one, the coin is more likely to land on tails. And so each time in game one, you bet on heads. In other words, it's a losing strategy. So that's game one. In game two, we occasionally choose coin one. Sorry, we've got coin two, which most often lands on heads, but we don't choose coin two all the time. Sorry, we don't choose heads all the time. Sometimes we choose heads, sometimes we choose tails. And most of the times we choose heads, but two or three times we choose tails. And it can be shown once again that that's a losing strategy. In game three, what we do is we randomly decide to either play game one or game two. So if game one, we definitely lose and game two, we definitely lose. We would think that choosing game one and game two, we should also lose if we just choose randomly between whether to play game one or game two. So in this one, I've used this library called Vega. And I think the first thing I would do is just run. So I play this game 10,000 times. I play game A and plot the results. I play game B 10,000 times plot the result. And then I do P three, which is where I randomly choose between game one and game two and plot the results. We run the simulation. P one is when I play game one. And we can see I started off with $0 and up with minus $100. When I played game two, which was also losing strategy, I did actually quite badly. I ended up with minus $250. But when I alternated randomly between the two games, I ended up in the black with plus $150. So these slides or this Python notebook will be included after the talk. You're welcome to have a look at this and find the mathematical explanation why it works. Or you can also just Google Perrondo's paradox. So what have we found out? Well, I guess one of the questions is if I want to do visualizations, what's a good tool? So in brief, matplotlib is a good one to start with. Easy things are easy. Flexible things are possible. It can do dozens of different visualizations. It's very good for static plots. In other words, if you're going to publish your results in a book or whatever. And it also integrates well with Python's math and science toolkits. And if you're familiar with Python, it understands things like NumPy and SciPy. And they're all tightly integrated. Seaborn makes it easier to do, let's say, publication-ready plots with matplotlib. Bokeh has very nice output. It targets webpages. It's got a slightly easier learning curve than matplotlib. And it looks good out of the box. Plotlib, one of the things is it's based on a commercial package and there's both commercial and non-commercial versions of it available. It leverages D3 for graphics. D3 is a fantastic JavaScript graphics library that, unfortunately, the stock didn't give us time for. And because of that, the interaction is more extensive than Bokeh and also the range of things. One thing I didn't talk about at PD Vega or Vega is that it's got an interesting way of working in that it defines a language for defining the graph and it displays it. But when you create a graph with Vega, that graph includes all the data that was used to create the graph. So if you're interested in making your publications and your data available, so it's one thing to see a graph in a paper and say, okay, well, how do I reproduce this graph? It's another thing to say, okay, this is the graph and this is all the data that created this graph. So it's really worth considering if it's important to you to publish the data with the graphs. Basemap is based on Matplotlib. It's on top of Matplotlib. It can be a bit clunky, but it does the job. Cotopeye is still, I don't think, 100% production ready, but it improves on Basemap, makes it easier to use and has some great features. And then I'll quickly go through Leaflet. Its advantages were lightweight. It's quick to learn and use and supports many formats, most particularly WMS and GeoJSON. OpenLayers is more featureful than Leaflet. It used to be a steeper learning curve than Leaflet, but modern versions are actually much easier or they've improved the learning, they've made the learning curve less steep. I didn't get a chance to demonstrate cesium, but it can utilize built-in 3G capabilities of browsers and it works just out of the box. You can install it and immediately you've got a map up and running. I installed it recently just to try it out and about an hour later decided to download some earthquake data from the United States Geographical Survey and within about 15 minutes I was displaying that data on my map so it makes it really easy. What are my recommendations? If you work with Python and you're not interested in learning a lot of programming and getting deeply into it, but you do need to work with data and you're doing research, I recommend learn pandas, use pandas for plotting for static plots and use Vega for the web. Thanks very much. Well, thank you so much, Martin, for such informative and practical presentation and bravely with so many live demos, which we rarely see. So thank you for that. Now we do have time for questions. So if people have questions or comments, please put them into the question pod. Now's your chance with Martin online to ask any specific questions about packages or just some of the things that you've seen today. So please do ask away. We have got time for a few questions. So Martin, we do have a question from Marlon. What's your opinion on tools like Tableau or Tableau, T-A-B-E-A-U? Two people have asked about that one. So Tableau is what's known as the BI tool, a business intelligence tool. It's used in the Bureau. It's a commercial tool. I think I'm correct in saying that it's only commercial. There may be demo versions available. From everything I understand, it does what it's designed to do extremely well. And so it's very good at building dashboards. I think it often assumes the idea that there's going to be a data warehouse available or at least a data mart. I know previous versions where it was used, there were some issues with creating websites that were being presented to the public. This was because it wasn't WCAG compliant. WCAG is the Web Accessibility Guidelines and for government work websites need to be WCAG compliant. It had some mapping features, but the maps only allowed single layers, which would have made something like what I demonstrated with the rainfall maps very tricky because we had sometimes up to five or six different layers on those maps. So I guess I'd either want to sort of recommend or dismiss any packages, but I think from everything I understand, and I'm not a regular Tableau user, but it works well for its design purpose. And one of the areas where I know people have really enjoyed using it is where they've wanted management type dashboards on their desktop to be able to monitor whatever it was that they were monitoring. Yep, thanks, Martin. And John has popped into the question pod that there is a free public version of Tableau, Tableau. I won't get my mouth around that. So if people are interested, they could go and check that out for themselves. Collins asked Martin why Python and not Ruby, and he also has asked if MetVib or R make the grade. Okay, the reason Python and not Ruby is because I know Python and I don't know Ruby particularly well. When Ruby came out, I started learning it and then other things got in the way. I don't think there's any good reason why not Ruby, but I can't talk with authority on how many. I think one of the things is with data science, Python and R really seem to have taken a lot of that mind share. Between Python and R, I wouldn't, it's six of one half a dozen of data. There are a lot of people using R. There are a lot of domains where people really love R. Bioinformatics knows one where it's very common. And as I said in the beginning, most of these visualizations are available in almost any language that people look at, or any popular language. When people come up with a library like Procli, other people will create bindings for different languages. Thanks, Marta. Another question, do you use other mapping tools like ArcMap? That's another question from Marlon. Well, at the Bureau, Esri products are very popular. I personally don't use ArcMap, and probably just because of the nature of the work that I'm doing, and probably because of the current set of toolchain that we've got, I do use an open source product called QGIS occasionally, but even that I don't use often. So most of my work is done in the, well, of this time, is done using things like JavaScript. And so I just use the JavaScript libraries that are there. Okay. And a question from Susan, who's interested in a online tutorial for beginners in data visualization. So apart from recordings of your own webinars, Marta, are there any, anything that you could recommend to Susan? That might be one to take on notice. I think it is, and I'll definitely have a look, but there's a lot of MOOCs. So I might go to places of things like Udacity or EDX, and later I'll be noticing, particularly with sort of the current level of the monthly data sense, a lot of these places are offering courses. But yeah, certainly I'll have a look at, maybe we'll put a, in one of our snippets or something, at beginners data visualization. Okay. Thank you, Martin. And that's probably a nice segue to, to plug our updated web page. Now, we, Martin's kindly spent some time updating the content of our web page on the Ann's website. And I'm just showing you the link here. So a lot of the tools and the libraries that Martin's spoken about in the webinars are available and described there. So please go and have a look at that. And also, of course, these webinar recordings will be made available. We have one last question, Martin, from Sophie. Do you recommend Code Academy? I haven't used Code Academy. I've got an account, I know, because I keep getting emails from them. But I think it's pretty much, there's a lot of good stuff available. So I think it's pretty much trying and trying something that suits you. So that's great timing for the end of our webinar today. Thank you all for coming along and a big thanks to Martin for two fantastic webinars and presentations and making all the materials available through the presentations and through our updated web page. We look forward to seeing you at one of our future webinars. And in the meantime, have a great afternoon. Thank you very much.