 My name is Peter Smyder. I work for the UK Data Service and I'm going to be presenting this webinar today. What we're going to look at in this webinar about putting data on maps is what kind of data do we want to put on the maps? What kind of maps do we want to put the data on to? We look at some data formats and then we'll give a very short presentation using Google Vs, which is an R package. And then we have some more extensive demonstrations using leaflet, which is another popular R package for visualising maps. So, what kind of data? In order to create a map, we need two kinds of data and a way of associating them with each other. We need the data we should show on the map, possibly the results of research. We need to know how it relates to the map. That is, is it as a single point on the map which the data refers to? Or is it an area of the map normally denoted by a polygon on the map, some outline shape, something like this or any other enclosed space? So, let's look at points first. They're very easy to deal with. You just need to know the longitude and latitude or some equivalent coordinate system of the point. And here's an example of a little spreadsheet and you can see here what I've got on the left hand column there are post codes for parts of Aberdeen. And you can see on the right hand side I've got two columns for one for latitude and one for longitude. So, this is sufficient information for me to use the longitude and latitude as a series of points and put any of the data, other data associated with that particular row onto the map. So, a polygon is an ordered series of points and you join the dots. In practice, you don't actually have to join the dots because the software packages that you're using will join the dots for you. They can be very large, normally depending on how accurate you want these outline shapes to be drawn in. Each dot itself represents a longitude and latitude value, just like a point that we've just seen. And the idea is when you join the first and last dot coincide, as you join them up, you'll create an enclosed polygon on the map. Of course, having got the polygon, you still need a way of associating some data with that particular polygon. It could be anything, as we saw before, the type that we saw before, where we just had the longitude and latitude. But now that information is going to be associated with an entire polygon. In many cases, there will be some continuous or discrete variable, which you wish to represent as a coloured polygon on the map. And we'll see an example of that in the demonstrations later on. So, what kind of maps do we have? We have static maps. Essentially, this is just a picture of a map area, and you put your data onto it. This is essentially a snapshot because once you put the data on, it just becomes another picture with the embedded data onto it. There are libraries which will do this for you or help you do this, such as our world map, gg map, and there's probably many others as well. The alternative is a dynamic map. This map can typically be scrolled and zoomed, so you can move it around the screen, just like you tend to do when you're looking at Google-type maps. In fact, Google maps will work as well. The information that you display on these maps can also be changed depending on the zoom level and where you are in the map. Example, our libraries for this are Google Vs, which we'll be seeing later on, and Leaflet, which we will be seeing later on. But again, there are many more to choose from, and which one you choose will depend on your preferences and knowledge of particular packages. So what kind of maps? In both cases, the initial map may need to be downloaded from the internet. So when you're doing this kind of work, it's quite likely that you're going to need an internet connection on your PC. Dynamic maps typically make use of a web browser to display them. This is because it makes it easier for them to implement the scrolling and zooming, because you're making use of the browser functionality rather than having to code everything in R. In fact, although you're going to write them in R in the packages we've mentioned before, the dynamic maps are often written in JavaScript code in the background, but that's entirely hidden from you when you're writing the R code, but it does explain why you tend to need the other web browser. Sources of map data. Well, you can get map data from the UK data service, the link is here. In particular, the census data has a whole section on geography and the various regions of geography specifically for the UK. You can go to Ordnance Survey themselves and they will also have similar sources of maps and gazettears and what have you. Geo names. The site is more global and a bit more generic and gives higher level of value type data, typically based on longitude and latitude, but you can probably go and find any country in the world and get the longitude and latitude of all of, say, the major cities and places within any given country going from geo names. There are, of course, many more. You can go to Google and get shapefiles. Shapefiles are used for the dynamic maps as we will see later on in terms of dynamic maps, more the polygon based maps. What we need to consider, we need to consider the data that you want to put on the map and the map data you have and how you're going to match the map. So regardless of whether it's points or areas, it will most likely be tabular data as we saw before. You can put any amount of data on the map you want to, in theory, but in general you probably want to limit this to three or four items, otherwise the map just begins to get too cluttered. And as I think we mentioned before, if it's an area denoted by a polygon and an area on the map, the chances are you're going to want to use the fact that you've got enclosed spaces and colour those enclosed spaces and the colour will represent in some way one of your data items. It's not obligatory, but it tends to be what happens. As I said, this could be continuous or discrete, but you probably won't want to use too many colours. It has the same effect of having too many data items, it just becomes too cluttered. You need a very large legend somewhere on the map to describe all these colours represent and it just gets a little bit messy. So we've got two basic choices for the map data. The simple one is the points when these are just the longitude and latitude as we saw in that little bit of a spreadsheet earlier on. You may have different coordinate systems, but depending on your software you may have to convert them. So, for example, some of the specialised GIS type software systems may handle the conversion themselves, but in our case when we're looking at these R packages, certainly leaflet, you will have to make sure you have converted whatever your system is into longitude and latitude. And again, longitude and latitude, they're just two columns in your table of data as we saw before. Polygons, these are the outlines area of the map. You could create your own, but most often you find them in what's called a shape file. And these can be downloaded from the sites we mentioned earlier. Although there's actually a shape file involved in this, shape files are normally delivered to you downloaded as complete folders. And the folder contains not only the shape file itself, but also a .dbf file. And it's a .dbf file which actually contains the data related to the polygons. And they're automatically matched up when you load them into packages. So, here's a little example of one downloaded from the UK data service. What it refers to is the England parliamentary constituencies, as they were in 2011. The clip means the smaller versions, if you like, they're not such granular outlines of the polygons. But even so, you can see from two thirds down the shape file.shp, this is nearly well over four and a half megabytes in size. So these files can get quite big. I've said several files including the shape file and the dbf files, they're the two key ones. They are proprietary formats and you need appropriate software to read them. But of course, as we're going to see, we've got packages in R which will do that. And as most GIS and R libraries will combine these two files together so that the software will associate the correct polygon with the correct entries in the dbf file. So you don't have to do too much work on that. If we want to add data, for the point system it's just a case of adding new columns to your table which already have the longitude and latitude in them. For polygons, it's a bit more complex but it can be quite simple. It will depend on the software in use. For leaflet in R, the dbf file is exposed as a data frame. And we can add columns to it in a standard R-like way. And then having added our own data into this dbf file, we can actually save the whole shape file again from R. And then that will include any of the changes that we've made. And we can make use of that in the future. So we'll just talk a little bit about leaflet before we go on to the demonstrations. Originally leaflet was written to be used with JavaScript. And it allows it to be written as HTML pages and displayed using a web browser. Which is what I suggested earlier on. But now it's such a popular library. It's now being converted and made available for other languages such as Python and R. And we're going to use the R version of that later on. The functionality may differ between the language implementations. But generally for most of the usual and ordinary and common things that you want to do, you find all of the functionality you want certainly in the R implementation. You can certainly create a web based map on the provided initial coordinates and zoom levels. So you can move around, you can zoom in on the map. You can move the map around on your canvas of your web browser. You can add pop-up markers to the map. And the pop-up marker typically will denote the place. And then allows you to within the marker to have the data associated with that point on the map. The official documentation is at this site here, leafletjs.com. There are other, that will of course give you the JavaScript documentation for the original package. If you want the leaflet for R specific documentation, then you can get that from the RStudio GitHub site. And if you want to find examples, you can just Google it. But here's an example of some on the Bing website. You can just research on R and leaflet in map examples. And it will show you a whole host of the types of maps that you can actually see on or create using the leaflet package. Demonstrations, I'm going to show you demonstration using Google Viz, which is an R library was originally designed to interface with R charts, but it does have some simple mapping facilities in it. Then we're going to have a more detailed look at leaflets. And here I'm going to lead you through starting right at the beginning of simply loading the library, the leaflet library, I'm going to take you through the various steps needed in order to end up creating a dynamic map and saving it as an HTML page. So there's lots of steps and we'll go through them one by one and you'll see how it all builds up. That's going to be for the pop-up data. And then we'll do another one a bit quicker for a choropleth map. And this is where we're going to use the polygons and we're going to colour the polygons using on some item of data that we have got associated with the polygons. This is our first demonstration, a very short demonstration using the Google Viz library in R. I'm just going to go to Google Viz library. We're going to use data from which is provided with the library itself. And this is data called Andrew and it relates to the Hurricane Andrew of 1992, which blew in from the Atlantic and onto the coast of North America. I'm just going to run all of this code in one go and see what it produces. And as you can see, it immediately, as the code has run, it has opened up a web browser for me and it has mapped the whole series of data points on the map. If I click on individual ones, it tells me the state of Hurricane Andrew at that point. So out in the Atlantic here, it was still a tropical storm. It has a pressure of 994 millibars with a speed of 60 knots. And each one of these points has several sets of information associated with it. That's how the map manages to move all by itself as I click on the points to centre the point I've clicked on. So the map, the background map has been downloaded from Google and then the points have been added to the map and then it's been displayed as an HTML page in the web browser. So back to the code. The code itself, which actually draws the map, is this gviz map function. And you can see I've essentially given it two parameters here. One's called lat long and one called tip. The rest of these options are essentially default options. So we won't go into that. But what I want to show you is that the data for Andrew, which is now a data frame up here, if we have a look at that, what we can see in here is this lat long column, which we use, and this is essentially just a combination of the latitude column and the longitude column. It's done this way because that's just the way that Google viz expects you to provide latitude and longitude. And the other thing I want to show you here is that this tip here, which is actually the category and the pressure and the speed combined together as a single string separated by these BR things. And what this actually represents is what is going to appear as the data when you click the pop-up. And the reason these BR codes are in here is that this isn't just text. This is HTML code, which gets interpreted as such. So this BR has the effect of putting data on new lines, which is why we saw it as three separate lines and not just a continuous string. And other packages use similar methods of allowing you to, in what you want to display, you set it up as an HTML of the string, and then that HTML gets interpreted and put on the HTML page in which the map is included in the appropriate manner. So we'll just leave that one behind us now and go on to our leaflet demonstration. And what we're going to do here is we're going to slowly build up a map of points. And this is just really to demonstrate how you build things up. So again, I'm going to start off just by loading the leaflet library. And then the first call to leaflet has no parameters. It's just leaflet by itself. And if I call that, all I end up with is essentially an empty canvas down here. And the only clue is that I've done anything is a slight change in colour and the fact it's got leaflet down here in the bottom corner. Next thing I want to do is add a map to this. And I'm going to do this using the addTiles function. You see how these just build up using the piping command from Margarita. It just adds the different calls onto each other. And then eventually at the end, I'm going to call whatever is in M and just I'll display whatever's in M. So here I'm just adding a map and the default map is this continuous map of the world. Again, it's got the country outlines but nothing else of any use. Next thing we're going to do is say where we want the centre of this map to be and we're also going to specify a zoom level. So that will tell us how much of the map we see and the degree of detail. But the map itself, this map itself is a genuine map in that I can zoom in on it. It expands. If I zoom in on the UK, you can see it as I get into greater zoom levels. I get more detail just like you do on any map on an HTML website. And what we're going to do now is we've set a specific zoom level and a specific centre point on here. So when I run this, I get a zoomed in map and this happens to be centred on Greenwich in London. So next thing I want to do, there's still no data on this map. So I'm going to add a little marker to the map based on this pop-up value and it's just going to say hello Greenwich. I've also changed the zoom level slightly. So now in the middle of Greenwich, I've got my one little marker. If I click on it, it says hello Greenwich. Next thing I want to show is the point I was making about the pop-up values that we're going to use is hello Greenwich. It doesn't have to be simple text. We can actually put HTML in there. So here I've changed it to a bit more sophisticated HTML where I'm going to have two lines. I'm going to have hello Greenwich in bold and I'm actually going to have an HTML link in here as well. So if I run this, what I end up with is the same map, same point but now if I click on that, I've got hello Greenwich in bold and on a new line, I've got a hyperlink here and I can click on that and that's a real link. It works just as you'd expect it to and it takes me to a website here. I'm going to close down and go back to here. Now putting on single points like this doesn't really make a great deal of sense. What we're typically going to look to do is read data from a file and the records in that file will include the longitudinal attitude and the data about that longitudinal attitude that we want to put on the map. So here I'm going to read a file which has Aberdeen postcodes in it. I'll run that and I'll just show you what Aberdeen this Aberdeen death frame looks like. I've got my postcode down here. I've got a few more little bits of information and, of course, crucially, I've got a latitude and a longitude because I need that in order to put the point on the map. So now instead of using my individual points and add markers here, I'm now going to tell it to use the data in Aberdeen data frame. For longitude, I want to use the column marked long and for the latitude, I want you to use the column that and then the pop-up is the value of PC which was the first column we saw. So now if I run this, bear in mind this has 4991 observations in it. If I run that, we get a very cluttered map of markers but any of the individual markers if I click on it will give me the postcode for that particular marker. These are all different postcode areas. So far, so good. What if we want to put more than one bit of information on the map? Well, as we suggested, what we need to do is we need to change the pop-up value from that simple postcode into a more sophisticated bit of HTML. Now I'm not going to try and explain all of the HTML but this is just a standard way of creating a table. There's plenty of HTML tutorials that you can find online and you need to know more about how this is being set up. But for this first example of it, all I'm doing is I'm hard coding everything. I've just read the first line of the Aberdeen file and I've manually typed in postcode and the first postcode value and then the name and the first name and admin code and so on. This should really just give a single demonstration. So if I just run the pop-up value here and just set that variable really, doesn't do anything match. But then when I use that pop-up value in my map and again this is just a single point on the map, what we can see is the single point, if I click on that, I now get all of the information that I put into that pop-up value HTML string. I did slightly change the set view position here. That was just again matched up for where I knew that point was going to be on the map. So now we're going to do something very similar but this time instead of just using a single point, we're going to use all of the points from the data set Aberdeen. So now instead the table structure is going to be the same but instead of hard coding the values, I'm going to take them from the Aberdeen data frame. Other than that, this is very much the same as the previous example. When I run this, what I end up with is my very cluttered map again. We'll talk about the cluttering later on at the end but now all of these individual points have all three pieces of information in the same format. I have to say that the colour in the background again, you can change the colour in the background if you want to. You can set to whatever you like really. The next thing we want to do is instead of having to code all of this in the call to leaflet in the add markers function, what we can do is actually add the pop-up value into a new column in our Aberdeen data frame and essentially this HTML is the same as we've just seen but instead of putting onto a map directly, I'm actually going to add it in as a new column into the Aberdeen data frame. It will obviously increase the size of this data frame which may not be a good thing for you but it does tend to make the code a lot cleaner to run because now all we have to say for the pop-up is again give it this pop-up column name that we've just created in this data frame. If I run this, including that last m, what I get is an identical map, in fact, of the same identical information on it, the only difference is now this information is stored as part of the data frame. If I look at that data frame now, I've got seven variables and this last variable is the string which represents the pop-up value here which is an HTML table. Finally, we need to deal with the cluttering problem. So what we're going to do here, instead of using add markers, we can use the function addCircularMarkers and addCircularMarkers is only slightly different from addMarkers in that it has a radius to say how big you want your circles to be because it's effectively clustering options and there is a convenient default clustering function here called MarkerClusterOptions which I'm not going to try and change, I'm just going to let it do its own thing and create this clustered map. So if I run this, what we get is the same background map but instead of all the individual markers appearing at once, they've been clustered together into regions based on the density and as I click on or just hover over one of these circles, it gives me the outline region which is covered by it and if I click on it on a region, it will actually expand the map and expand the cluster into smaller clusters or even individual points. So if I click on this point, that is one of the individual points and it's showing me the information for that individual point and that is it for the pop-up version of Leaflet. So the other thing we need to look at is the Coroplet version. Well, I say version, not reader version, it's just a different set of calls that we're going to use and more importantly, we're going to use a different type of data set when we try to put the data onto the map. I'm just going to clean up what I've called so far just so you can see what we're doing. And this little file here, again I'm going to run it one piece at a time so I'm going to start off by loading the Leaflet library and this other library called RGDAL. I'm going to run that, of course nothing happened particularly. RGDAL is that this function here, readOGR, is a function which knows how to understand shapefiles. Now what we're going to do here is we're going to read a shapefile into this variable called England. This shapefile has a parameter which is the folder name of where the shapefile is and then it's also got a parameter called layer which actually represents the name which has the name of the shapefile itself without the .shp extension. So we're just going to read that into a variable called England and what you can see here is that it's read it in and what we end up with is something called a large spatial polygon data frame which has 533 elements and it's 6.8 MB long. And if we just expand that and have a look at what's inside there that's quite a lot inside there but the point I want to bring to your attention is this first part here, data is actually a data frame of 533 observations with three variables an alt name code and code naming code are the three variables and what this actually represents is the contents of the .dbf file that forms part of the shapefile folder. The rest of it, the polygons is, as you might guess, it is a list of 533 different polygons and each individual polygon has a whole series of co-ordinates in it. This first polygon has 833 points in it and these points are the co-ordinates which are going to be mapped out and joined up to form and enclose polygon. Now the downside of this data is that the co-ordinate system being used by this shapefile is not longitude and latitude. It's an ordinary survey based map so it uses ordinary survey system of Eastings and Northens which are these values here and so before we can actually use this in leaflet we're going to have to convert this into longitude and latitude and that is the point of these two bits of lines of code here. I'm not going to go into detail about what all this represents but essentially if you have Northens and Eastings as your co-ordinate system and you want longitude and latitude as we do want these are the two lines of code to run and then at the end of that we can see that down here now I've got essentially these are values of longitude for the various 838 points in this polygon and of course they're all very similar because in terms of longitude and latitude the points are very close together. The next thing we need to do before it's worthwhile drawing a map is we need some data to put on the map. Now I could use one of these but these are typically with the shapefile and you tend to only want the shapefile in the field of the polygons. The data that you're going to want to use is your data so what we need to do is add data into this data frame and this is where the fact that it is just a standard R data frame works in our favour because we can use any of the usual techniques for adding data or columns of data into an R data frame. In this particular case I'm going to take advantage of the fact that one of the things in the polygon is the area of the polygon the area down here and all I'm going to do is add that area into this data frame here and just scale it up to make the numbers look more reasonable. So that's what this supply function is going to do for me. You can use any other function which any other method of adding data into this data frame. Okay so now you can see I've got this extra variable called area and that's what I'm going to use to colour my map. Let's shuffle this up. I would point out at this point I'm not going to actually run it but this writeOGR the counterpart of readOGR you can use that to actually write a shapefile of your own. So now if you think about it what I've done is I've added my own data in here and I've converted the coordinate system to one I can use might be worth while saving this to a file for future use. I'm not going to do it in this specific case but it's just a simple writeOGR name of the this variable the spatial polygon data frame that you want to write and a location to put it there. The next thing I am going to do is I'm going to set up my colouring system for my map so I'm just going to create a variable called pal and I'm going to give it a palette ranging from yellow through to blue it's going to be based on the area value and I'm going to split this up into 10 bins. How many bins you have is up to you. The colour scheme is up to you but as I said earlier you probably don't want to over clutter the map with too high a value of n there. So I've just run that nothing will happen, appear to happen anyway and then I'm ready to draw the map itself. And what I'm going to do I'm going to call leaflet. Now this time until just calling leaflet without any parameters I'm going to say that the data I'm interested in is this variable called England which is a large spatial polygon data frame and then I don't have to keep specifying that as I go through. At tiles it's a default but I'm setting the view and a zoom level so I've got a specific part of the world if you like and hopefully most of England I'm adding my legend to the bottom right of this map using the PAL colour system I've just set up and it's going to be based on the area from the data in England I've got a title and an opacity and then the important part is adding the polygons and again I'm going to colour the polygons, this fill colour is going to be based on the area and the colour scheme I've set up in this PAL variable this colour down here this FF000 is just the colour of the lines which are going to separate the various polygons or for the outline of the polygon in the case this represents blue and then the pop-up is going to be the name which is taken from this data frame here and name is in fact the UK the English parliamentary constituency so if I run all of this what I get is after a while I can draw all these 533 polygons I get a nice little map or big map of England and it's colour coded based on the area size of these constituencies so you can see that in the London area that were all yellowish, these are the smaller area sizes and then as we go out to the darker area sizes say in Cumbra and the Thumblands far larger areas so when you consider that parliamentary constituency has or shaped or made up to have approximately equal numbers of people in them essentially what we've got here is a population density map of England so obviously far more densely populated in the London area and Birmingham and Manchester than it is in Cumbra and the Thumblands but of course you'd be putting your own data on this and your own representation of the legend and what have you and so that ends our last demonstration thank you for watching and listening to this webinar on putting data into maps