 Hello everyone, and welcome to this webinar about creating top metric maps. I'm Marguerita Strauler, I'm an outreach officer working for the UK Data Service, and presenting today is Oliver O'Brien, a data scientist at the Consumer Data Research Centre, the CDRC, and based at University College London. I'll now pass on to Oliver. Thank you very much. I am going to present creating top metrics maps, and this is a technique that I've used myself for my work here at the Consumer Data Research Centre. So I'm going to use a number of simple tools, and hopefully by the end of this session I would have successfully created a top metric map and will be able to display a PDF of a map to you. So I'm going to first discuss the tools that I'll be using in this webinar, and then I'm going to discuss what exactly top metric maps are and how they relate to other demographic maps of socioeconomic data. I'm going to talk about how I typically obtain the data and geodata for such a map, and I've prepared a few links for the example map that I'm going to walk a step through. I'm then going to step through a process of identifying the top metric which I'll be mapping later in the webinar. I'm going to then discuss cartographic considerations, so things that we need to think about when producing a map that is fair, is accurate, and represents the data in a way that is appropriate. I'm going to then use Q-adjust to visualize it and then create a final map, and then as Marguerite mentioned, we'll have a Q&A at the end of this session. So the tools I'll be using for today's webinar are Google Chrome, and I'll use that to first of all show a few examples of web maps that have been created by me and others in my team, and I'm also just going to use Chrome to obtain the data and download the data and geodata that we need. Of course, you can use any modern web browser such as Firefox as well as Chrome or Safari. I'm going to use the Excel, Microsoft Excel to process the raw data and create the metric. However, of course you can again use any normal or popular alternative spreadsheet software such as Google Spreadsheets and OpenOffice. It's possible that the formulas that I'll use in this webinar will be a little bit different in Google Spreadsheets and OpenOffice and similar projects, but the documentation for this project should hopefully help. You'd be able to recreate my method in those. And then finally I'm going to use Q-gis, which is formerly known as Quantum GIS, which is a geographic information system which allows me to combine the geodata and the metric data together and produce a rudimentary map of that data. And then I can use its publishing features in order to create a better looking map. Again, of course, there are alternatives Q-gis such as ArcGIS. I'm using Q-gis because I'm presenting on a Mac and I'm comfortable for using a Mac, but any modern GIS software such as Q-gis or ArcGis will be able to map this sort of data quite easily. So I'm going to introduce some regular demographic mapping first, which is the data shine project. So I'm just going to go there now. So data shine was a website which was created by me and a colleague here at the Consumer Data Research Center a few years ago. And what it does is it takes data from the 2011 census and it maps that data in a relatively simple way. So it takes what's known as a Kuroplev map. So if I do this, this is your standard Kuroplev map. So each area is shaded in a particular color representing that value. In this particular case, we're mapping the proportion of people who live in each area who have what the Office of National Statistics calls professional occupations, that is doctors and lawyers and dentists and scientists. And I'm using a simple scale here where the lowest values are shaded red and the highest value are shaded green. We basically build data shine to be able to easily show the huge range of statistics that are available from the 2011 census. So like I say, here I'm showing what's known as a Kuroplev map, which is a simple map where areas are colored according to their value. And what the data shine project did is it added labels and housing outlines, which makes the map look more like a familiar map. So in this particular example, I'm looking at central London here and you can see how the river and the location of parks and the labels help make it a normal map. But this is just a simple example of a map of demographic data. Now, geodemographics are a little bit more sophisticated. What these do is they take these sets of data such as, in this case, a second address and they use what's known as clustering to try and identify areas which have areas which are similar to each other when you look at a very large number of these variables. So in this particular case, this is highlighting that the second address residencies are very high in central London. If I go to another part of a country such as Cornwall, we'll find that the second address is also very high, but for maybe different reasons. So if you combined that information with further demographic information, you'd be able to know a little bit more about the characteristics of the population of the area. And so a project called the 2011 Oak basically maps these using the same interface and the same mapping style as data shine. This is a complex geodemographic map that uses I think around 60 or 70 variables all clustered together and from that it identifies around eight groups and several subgroups in order to basically characterize in a single expression using assume a bottom left of the sorts of people and characteristics of people who live in that area. So that is an example of a complex geodemographic map. So what top metric maps are is there a little bit sort of in between the two. So we were focusing on just one metric, but instead of simply showing low and high values of that metric, we're essentially showing that the most common characteristic in the area for that metric. And to help explain that, I have this example here. So I'm just going to move forward to the second languages map. And this is created by Neil Hudson, a researcher analyst at several of the housing agency. State agency. But what he did is more of a personal project for him. He basically is very interested to see to try and represent some of the huge numbers of data sets that coming out from the 2011 census on a map. And he created this really nice map. And what it does is it highlights for each small area in this case, London, it highlights the language after English, which is spoken by the population. So in each area, for each small area, we know the numbers of people who speak a certain language as their main language. And if we remove English from that, and then map the second language, then this is what appears. He used a threshold of 5%. And thresholds are quite important for top metric mapping and I'll go into detail about thresholds a bit later. But his threshold is 5%. And his spatial areas were relatively large ones. And the combination of those two allowed this map to be produced. And it's a really nice map, because it shows a surprising, perhaps unsurprising degree of spatial autocorrelation, in other words, areas which are near each other having the same result. And it's a really quite nice map showing some of London's linguistic and cultural changes. And essentially, what we're going to do in this webinar is produce a map using a different data set, but essentially mapping that sort of data. And that is why I have termed a top metric map and say that's what we'll be mapping under this presentation. So top metric maps are a form of geodemographic maps. They're called the pseudo geodemography maps here. But they are simple maps in that they are mapping just one demographic characteristic as opposed to the many demographic characteristics of a geodemography map. But it's more sophisticated than just your, here's a high proportion of people who speak, say Turkish, here's a low proportion of people who speak Turkish, instead of highlighting one result for each area and the most significant result and showing it on the map. So this is a map very similar to the one that I'm going to produce shortly, although the spatial units are slightly different. This is what I've termed the top employment or top industry map. This is available on CDRC maps. One thing I should mention is this map and a similar map to the second languages map are available on our mapping platform maps.cdrc.ac.uk. And if if time allows, I'll show a few more examples of the end of this webinar. And by you can see here what I've done is in in quite small detail, you can see some of these areas are very small. I've mapped the industry, which the most number of people living there work in. But again, there's a threshold here. So only where more than 20% of people living in that area work in a single industry. Now the categories of industry have used are the standard ones, which the Office of National Statistics has adopted. You can see the key on the right is and it would be a small resolution here. But the the central result for central London, you have this strong yellow color, which is professional scientific and technical occupations. You also have a lot of orange, which is financial and insurance industry, then finally a few places where the red and red is education. One particularly interesting result for education stands out at Stanford Hill in north London, where the local population have a strong tradition of homeschooling. So and again, although most people, of course, don't live where they work, more people are likely to live near where they work from a far away. And that's the reason why even though we're effectively mapping industry here, using home locations where people live is still a valid method for doing that. So the map we will create is will not be quite sophisticated in terms of the the actual display of the data, but it should hopefully show the same trends. And this is just another example up in Edinburgh, where the accommodation food service industries dominate in central Edinburgh, and you have a mixture of education and professional scientific in living in south Edinburgh. This is another example of the top ex home country of birth map, where we've eliminated the home nations for each part of the map. And we've basically mapped everywhere else where I think here a threshold is 80%. So here we're showing Liverpool, and it's showing that Liverpool has a number of areas where people who were born in Northern Ireland now live in Liverpool and they're showing up in this light green color. There's also a significant population of people who were born in China living in Liverpool. And again, that population is shown on this map as red. And again, it's only where more than 80% of people who live in a small area were born in China. It appears on this map. We've used gray to basically show where no population has more than 80%. Okay, so this is the more practical part of this webinar. I'm basically going to talk through as I basically create a top metric map using my employment data set. So the first process is discover the data that you're interested in. They may already have a good idea and a good data set that you want to map. And I should mention there's only a few data sets that are suitable for this kind of mapping. But one thing I've generally done when I've been trying to work out what top metric map should be shown is to use the DataShine website. Because DataShine will show the sorts of things that things are broken out with. So here's a good example. It landed on here, a method of travel to work. You've sort of got 10 or 12 categories. In some areas of the country, the population's obvious categories will be quite high. And therefore that potentially makes for quite a compelling top metric map. And sure enough, if you go to studio RC maps, you'll see a map of exactly this, a method of travel to work. I've also got a method of travel to work excluding car, because the car dominates as a method of travel to work in most parts of the UK, but off of London. And that again reveals new insights about how different cities move. But in this particular case, the data that I'm interested in is the industry. So if you go to industry, you'll see here are the categories of industry. Something funny going on here, which I'll mention in a moment, but in general, you can see it was around about 20, 25 or so categories. And we'll basically, I think this may make a very interesting map. So for instance, they just choose one of these like, let's do financial insurance activities. You'll see there's a strong result in West End and Central London City and Canary Wharf. But there's a low result elsewhere. And if you wanted to find out what would be a higher result elsewhere, you basically need to manually select all of the maps and try and find out where the hot spots are. And essentially, we're going to create a map which will effectively map all of these at once and basically show the top result. So in this particular case, we like to look at this one. So one thing we can do is we can go and get the data directly from a website called Nomus. Now, I could click on it straight from here in Data Shine, but here's how to find it from the front of the Nomus website. Nomus website is a website which has been produced, allowing easy download of Census 2011 and other data sets. It's called the official labor market statistics website. And I'm basically going to show how we get that data from the 2011 census. So let's list tables for serial release. And I know that the data registered in is what's known as quick statistics. These are statistics where only one variable has changed. You have some more sophisticated ones where, for instance, you see employment, how that varies by age or by gender. But this particular one is the one we want. And I'm going to go to this one here, QS605EW, which is the code for industry. Now, instead of doing a sophisticated bespoke query from the Nomus website, I'm basically just going to download all the data that I want because I essentially want to produce a map of data of the whole of England and Wales. Instantly data for Scotland and Northern Ireland is available by generally on different websites. This one generally only has England and Wales data. So what I'm going to do is download the entire table for all areas. Now, because of the size of the files, Nomus splits output area files into the 11 regions of England and Wales. Sorry, the 10 regions of England and Wales. So I'm not going to do that. Even though the map I showed earlier on was an output area level, I'm going to choose one geography up from that. I'm going to choose what's known as lower super-upper areas because I know that those will work better for this demonstration. The file sizes will be smaller, but crucially it allows me to just have one file to download. So let's download it now. Potentially, if you were following through on this exercise now or in the future and you did middle super-upper areas, you might get a different set of data because you get a different looking map because it's aggregated to a different level. So I'm going to just download this now and it's a nice and easy format to download as it has. It's simply a CSV file. Give Nomus Web a few seconds to download it. There we go. So it's downloaded to a file called bulk.csv. That is a four megabyte file. Let's go to it. And it is a CSV file. So it's a very quick look in ATEX editor. Yeah, that looks like a good CSV file. Lots of numbers. Crucially, it has these codes here and these are the codes that we're going to link with our geodata a little bit later. So I'm going to open this file in Excel. There it is. Okay, so I'm just going to put that to one side for now. Because the second form of data we need is the geodata. And the geodata I'm going to use is from the ONS, that's the Office of National Statistics, geoportal. If you've used the geoportal before, you may be aware that it was navigationally hard. Website has very very many sets of data and it wasn't particularly easy to find. However, in the last few months, Website has relaunched and it is now much, much better in terms of being able to find the data that you need. So I'm going to go to it now. It's geoportal.statistics.gov.uk and we are interested in the because our data we downloaded was the lower super-upper areas. We basically need to get a lower super-upper area boundaries. Now it has once for multiple years. So I'm going to search for LSOA boundaries 2011. If you search for just LSOA, that will still work, but you'll have to search for more data sets, whereas this one we just get nine data sets. Now for the purposes of making sure that this webinar runs smoothly, I'm going to choose the smallest possible file, which is the super generalized clipped boundaries. These are fine if you're showing a map of the whole of London or the whole of Birmingham or the whole of England and Wales. If you want to have a very detailed map and you want to see the precise boundaries between different lower level super-upper areas, then I recommend that you download the full clipped boundaries or the generalized clipped boundaries, but not the super generalized ones. But anyway, we'll go to this one because the file size is smaller and it does work quite well for these purposes. So I'm going to download a data set here. There's a number of formats available. I'm going to download the shapefile. Okay, so that's downloaded as zip. So let's go to that. And you've got your standard shapefile format, whereas a folder containing six files, which is the data of the shapefile. So that's all the data that we need for this webinar demonstration. So the next bit is analyzing the data in Excel to identify what I'm terming to be the topometric. So I'm going to go to Excel and there's a number of steps I'm going to do in order to prepare this for use. So I'm going to apply a threshold, once I've identified the top metric, and I'll save it as a csv, which will allow QGIS to read it in in a straightforward way. There's a slight extra thing I need to do because I'm using a Mac here. So this is the list of steps I'm going to do in Excel now. Basically, if I are going to augment the spreadsheet, I'm going to add a few extra columns to analyze. These columns will identify our so-called top metric, and then I'm going to also truncate. I'm left only with the areas that I want to map. So the first step is to add a total row. So just to talk about the structure, each row represents a single level super-output area. There's around 35,000 rows in this spreadsheet. If I was using output areas, it'd be around 180,000. So this will make for formula run faster. So there's four columns, including our crucial numeric identifier column. There's the descriptive name for the area. There's a date, and then there's finally a rule of categorization, which we don't need. There's a totals column here, and then these columns should all sum to this one here, except they don't quite, because the way this particular dataset works is the C, which is manufacturing industry, has been further broken into a number of subcash groups. So if I added these all up, for this one here, the totals would be more. In fact, yes, there's 1,438 people, whereas actually 1,310. And so that's because this column is, the people in this column also been represented in these columns here. Now, for the purposes of this, I'm going to remove these columns. But first of all, I'm going to sum all the columns, because we want to try and understand what the total numbers are, rather than just looking at the numbers for Darlington. That may not be representative of the rest of England and Wales. So I'm just going to do a summary now. So I'm basically going to sum from this one here. There we go. And we've got a sum of 26.5 million people. So that's the 26.5 million people who I think are in, were in employment in the 27th or so of April 2011. And we're between the ages of I think 16 and 74. There's a precise categorization of people who would appear in this data set. But that number sounds about right, if you consider it's essentially about 50% of the population in Wales. So now I'm summing all of the columns together. And you can see these numbers are interesting. There's some quite low ones here. The manufacturing industry categories, many of them are quite small, such as there's only 70,000 people in the wood, paper and products manufacturing industry these days. Some are quite big, but the biggest one is four million. And the crucial thing is some, a lot of these totals are certainly the right, certainly the same order of magnitude. And this is good. This will make for a good top metric map because it means that different areas are likely to come top for different kinds of industries rather than one industry dominating the entire map, which would not make for an interesting map. There are a few funny small ones here, such as activities of households and employers, activities of extraterritorial organizations and bodies. That's basically people that work for UN, the European Union, and other ones which would not fit well into these other categories. But anyway, this looks good. But what I'm going to do is because the total number of people in manufacturing is 2.3 million, and we've got some unbroken out categories, which are larger than that, I'm going to get rid of those subcategories. I'm just going to do that now. Okay. Next, I'm going to clean up the column title slightly because we're going to use these. These are going to appear on our map also in our legend. And all of these have this industry colon at the beginning and measures value at the end, which we don't need. So I'm just going to remove those. And similarly, I'm going to remove the bits at the end. Okay. So now we have nice looking titles. Again, you can always edit these titles forever if you want to because some of these are still quite long. Now I'm going to add a number of columns to do our top metric identification. The first one be a max value column. And that is simply going to identify the maximum value of these ones here. But obviously, we don't include the total because our total will always be the maximum. So this is displaying the number, which is the maximum value. And there's our 4.2 million, but you can see that for these other lower level areas, the max value population is a set of percentage of the full population and actually identify the percentage shortly for our threshold analysis. Now the next bit is the max category. And this is a slightly more complex bit of formula, so it's just possible that it will be different if you use a different spreadsheet package. But anyway, what it does is it identifies effectively the column heading, which for the currently selected column has that maximum value. And we do that by doing an offset of one cell to the left of our first one. So just to the left of the A agriculture one there. We want to have the same row. We always want to use that row at the top because that's where the titles are. And for the column, we want to identify which column has that value. And we want to reference to the column of the value itself. So we take our value, which is in this case is z2 for this specimen one. And we select it across a whole range, which is going from f2 to y2. So that's all of these ones here. And finally, and actually quite importantly, the match type is a exact match. So we have to put a zero there. Otherwise a cell unfortunately defaults to one which is less useful. And there you go. So for this one, for the whole of England and Wales, the whole sale and retail trade repair of motor vehicles industry category, 4.2 million, is our maximum category. But we're not interested particularly in the maximum across the whole of England and Wales. So let's just fill down. And here we go. You can see we've got quite a lot of this also on sale, but we've got a good mix of the ones as well. So we've got education, manufacturing, a few other ones here and there appearing. Financial services and such like as well. Finally, we're not actually interested in areas which have a very diverse range of industries because we're trying to highlight the top industry here. So we want to effectively truncate and only highlight the areas which have a significant industrial population as that is larger than all the others by quite some way. So we need to apply a threshold and to do that we calculate the percentage. So that would be simply divide our maximum value by our sum, which is that. And you see that it has an average at 16% across all of England and Wales. So we want to basically only include things which are above the 16%. So let's sort our entire range. Let's take a moment. Okay, let's sort again. So we go from the highest value first. There you go. Now, there's an interesting result which is already appeared here. We don't even need to map this to see it, which is that there's certain areas which have very, very high amounts of public administration. Now, public administration and defence compulsory social security. Now this category includes military bases and military barracks. And I bet you almost all of these lower super up areas will be ones which include military barracks. And you can see Richmondshire, I mean the northeast is a number of very large bases there such as Cattrick Garrison. And I strongly suspect Kingsley and West Norfolk, that's quite possibly the big some of the US or RAF army bases there. So already we can see, you know, there's some interesting results appearing here. We definitely want to include certainly these ones in. Now, if I zoom down the range, you see that roughly halfway down, so that's the median, is about 17%. We know our average is 16%. So let's map about 20, let's map everything where more than 20% of people who live in that area are working in one particular industry type. This threshold is entirely up to you as a person who is carrying out the dog metric mapping. But I'm going to do apply that cut off now and I'm going to do that simply by deleting all the other columns. And that will reduce it from 34,000 rows down to about 7,500 rows. So let's save this and I'm going to save this as a CSV. I'm going to call it top industry and 20% just to know that we're only including 20% ones there. Yes, are we fine? Now there's one extra step. If you're using a Mac, QGIS is unhappy if you give it a file which has Mac line endings. So I'm going to change this to Unix line endings, but Windows line endings will work fine as well. And this is the step you only need to do if you're using a Mac. So that's our top metric identified. Now I'm going to talk a little bit about some of the considerations we need to do when creating a map. Essentially there's three different kinds of skills that we need here. We need the skills of the data scientists to be able to manage the data, which is what we've just done, and also discover the story. So to have a good idea that there is a map of interesting data that is waiting to be made. But we also need to consider the demographic geographer roles. We need to consider that it needs to be a representative map. The problems of using groupings in this way is we're somewhat at the mercy of the data provider. So if a data provider decides to group certain industries together and break out other industries, then this technique is going to highlight those industries that are grouped together probably. So you have to be somewhat aware of the underlying data and make sure that you don't miss any interesting stories or trends simply because you are dealing with a data set that is aggregated in a certain way. So that is something to bear in mind when using these data sets. I'm using this example because it does produce an interesting map and it does show the trends that we expect, but it is quite possible if you don't apply the groupings or disaggregations in the right way that you end up just highlighting a map that's not really telling you an interesting story, which is the purpose of this exercise. And then finally, there is a role as a digital geographer. You need to use a good color ramp, a good set of colors, and to make sure you're highlighting interesting values, but you're not inadvertently biasing result by using certain colors of meanings. What I generally do is for the more obscure industry of classifications, I use brighter colors because they appear more rarely. So there's the category of a pair of motor vehicles and I would probably use quite a sort of subtle color for that one because it appears in so many areas. But if you are, say, a data journalist or you're otherwise interested in highlighting a particular industry, you might want to use a brighter color for that one. And also similarly, the thresholds, I've already applied this 20 percent threshold, that basically means that I'm not going to show data for around 80 percent of the country, but you might want to change that to show it more or less. So this is what I'm just mentioning here. You want to have a fair representative map. You want to show you need to have an appropriate data set, but you need to have one which is grouped in a way that tells you the result that you want to show, but it's also fair and not inadvertently biasing the map. This is the threshold as I've just discussed. It's a balance between exaggerating, but you also want to show an interesting story. This graphic here shows the difference between applying a 5 percent and an 80 percent threshold to the top country of origin outside of England for Oxford. You can see that a 5 percent on the left appears to show all sorts of interesting clusters of people from overseas, including, for instance, the Polish community shown in brown is down in the Macaulay area. The Chinese community is shown in red, and people from who were born in the U.S. are shown in yellow, but if you apply that to an 8 percent threshold, you see that those of the areas are much smaller, and the difference between 5 percent and 8 percent is marginal, but it really makes a big difference in the kind of map that I've had. That's why you need to investigate thresholds quite carefully when producing a map. As I mentioned, colors, you should perhaps consider color-blindness. Some people cannot distinguish between certain colors and shades, and finally, you need to think about how you're going to contextualize a map. At the beginning of this, I showed them using this technique where the building outlines are superimposed on top of the map so that you can only see areas which are built up, and there's also labels. For the purposes of this one, I'm going to use an open-stream map background, but there are many ways to contextualize your map and basically produce a proper map from the data. I'm going to go into QGIS. This is RGIS, and I'm going to load the data. The first thing to do is to load in our boundaries, so let's open up that file that we downloaded earlier on. Here it is. This is the 34,000 lower-level super alpha areas for England and Wales. Again, if you are producing a map for the whole of the UK, then you'll need to join that together with the equivalent data sets for Scotland and for Northern Ireland because they define populations and boundaries in slightly different ways. One thing to check at this point is that the coordinate reference system is good. Now, quirk and QGIS means that it's not displaying that it's using the British National Grid, but basically it is, and therefore the map of England and Wales sort of looks okay. But if you loaded another data set, which was in the so-called WS84, so it's just simple latitude and longitude, and then you loaded this in. England and Wales will look squished. That's just a quirk of projection, and so you need to re-project it to basically use a map that looks correct, but in this particular case this is the first file I've loaded up, and therefore it's used a grid projection. So we've loaded the geodata. Now, again, this bit of the technique is very different in different geos, but the way you do it in QGIS is you can load in what's known as a delineated text layer. So let's just do that now. Let's go to our downloads, and here is our top industry 20% CSV that we were looking earlier on. So here's a preview. It looks okay. There is no geometry in here, and we now join one data onto the other data set, and the way to do this in QGIS is you get to properties. It's rather a large ungaily window, but you then choose joins, and we are going to join the layer that we want to join onto this is our only other layer, the top industry. We're going to join it on the geography code, and that matches the LSO 11 code, and there we go. A quick check of the attribute table, and you can see for some areas we have further information, and that is our industry information for the other file, but most of these are blank, and those are the ones which fell below our threshold, so the only map that they don't appear on the map. So let's apply some styling. So I'm going to use a categorical style. So we go to style, change this to categorized. We want to categorize it on the maximum category, which is there. This is a purely cartographic stylistic tweak, but I'm going to change the borders. You could get rid of the borders all together, but I think it's potentially interesting to show areas which are in cities and have lots of internal borders, but have the same industry, and compare those with rural areas where there may only be one lower super-upper area, but it covers a large area. So I'm going to actually leave these in, but I'm going to change the outline width to a very small number, 0.12 millimeters, and that means that when we show it at a small scale, so it's zoomed out, you don't just have a mass of border lines obscuring the actual color. And then I'm going to use random colors, which is there, and let's do a classification, and finally I'm going to uncheck this blank one, which is representing the areas for which we don't have data. Now, if you were creating the final map at this point, you may want to go and tweak the colors used. You may want to show particular ones in particular colors. You may want to, as I say, use a muted color scheme for ones which appear a lot, but for now I'm going to just use the default ones. So let's have a look at a map. Here we go. So this isn't looking interesting. You can see that there are areas which are spatially clustered. So this area here, which I know even though I haven't put a contextual map on yet, is the wash area in East Anglia, is showing a strong result for wholesale and retail trade repair of motor vehicles. If you go to London, you can see those patterns we saw before. So here financial and insurance activities are shown as pink, and we've got our professional scientific and technical activities as the sort of turquoise green. There are a number of colors which are quite similar to each other, which is why you might want to tweak the colors. The other thing is, as you can see, the boundaries here are extremely generalized and very simplified, and we don't necessarily reflect the actual geography of the area. However, you can just about make up a familiar wiggle of the River Thames. If you want, you can apply labeling, and that means that the label of saying what it is would appear on the map above each one, or you can just have a few being labeled. However, I'm going to use the legend instead of the label for this map because it looks better, and also because QDIS is labeling is still relatively not particularly useful for this particular map. Okay, so that was the steps on this slide here, and then finally we're going to create our final map, and to do that we're going to add OpenStream map background. We're going to use QDIS as print composer, and we're basically going to create a simple PDF with a legend. So let's do that now. So to get in OpenStream map, sorry, in QDIS, to get OpenStream map, you have to install a plugin called the OpenLens plugin, and you can do that from within QDIS. Once that is installed, it gives you a large number of maps, including Google maps, Bing maps, MacQuest, Apple maps even, and Stamen Designs maps. However, it uses a JavaScript trick to get the Google and Bing maps in, which means that they won't necessarily print out on your PDF. Therefore, I'm going to use OpenStream map because that allows me to pull in the OpenStream map data directly in QDIS rather than using its JavaScript trick. So let's do that. Now, by default it's placed over the top of our data. We don't want that, slip it underneath. Okay, that's great, but problem here is a OpenStream map is a little bit bright and vivid, and the forests look very similar to some of the colors here, but also where we have results, we can't see the labels underneath. So we can fix both of those by using transparency. So I'm going to set the transparency of our data layer to around 50%, which is here. And similarly for OpenStream map, I'm going to do the same thing. I'm going to set the transparency, which is here. It goes out about 50% as well, and that's the final map you end up in. So you can see, both you can see the names of cities and towns and such like, but you can also see the data itself. So this is a good mix, and you can see that yes, that area we looked at before was Spaulding and Boston, and East Anglia, and you can see there's a good result there, a high result for wholesale retail trade. If we go to these areas here, Lake and Heath, well known military base for the U.S., and yes, that could actually will be activities of extra terrestrial organizations and bodies. So that's potentially makes sense as people on U.S. military bases. Cambridge is almost entirely one color, which is education, which you might expect. Cambridge is famous university, and Oxford is the same, and is a result around Heathrow. Around Heathrow, transport and storage predominates, and as you might expect with many, many cap firms and such like, that is the right result. So this is a good map in that it is showing us data that we'd expect. It shows a wide variety of different data in different places. It's just a few more human health and social work activities in this part of Wales. Strong manufacturing area, yeah, just south of Liverpool. I'm going to Manchester, and also in Stoke-on-Trent there. Manchester itself is showing less of a tradition of a mix of industries. It's more fragmented. So these are the results that we expect. So I'm going to produce a map of the final result. So we're going to produce a map of London. So I'm going to zoom into the London area, and I'm going to choose a new print composer. Don't need to give it a name. First thing to do is to put the map on. So let's add a new map here. This is our virtual sheet of paper here, and I'm going to add a legend as well, and that goes here. So let's put a legend here. Now you can see there's a slight problem in this. This is a very long name for the legend. So I can actually just edit all legend entries. I'm actually going to delete the name of that layer, and then this looks much better now. So let's put it maybe there. There you go. Here's a not particularly interesting map. So it's a not particularly nice looking map, but you can spend a lot of time tweaking and adding various different elements into your map. One thing that is always quite important if you're using datasets like this is to include attribution. In this case, we need to attribute both the data from the ONS. It's a current copyright, but we also need to attribute open street map, copyright open street map contributors. The data is open and free to use as long as it's attributed. That is the main requirement of both the official government datasets, but also the open street map dataset. And so that is attributed. And then finally, we can produce a PDF. That's just a warning that the PDF might look a bit funny, but that's okay. So let's go to download as where everything else was, and we will call it top metric map London. And then finally, we can have a quick look at our map. And there it is. And you can print out this map, and it is a nice sharp map as you zoom in. It's the proper vector, it's proper text, so it will print very nicely. That's our data there. So that was a quick summary of the technique that I used to create the top metric maps that appear in CDRC maps. So as I have a couple of minutes before I'll answer any questions, I'm just going to show a couple of examples of top metric maps in the CDRC maps platform. And as I say, we initially used QGIS in a way that I've just demonstrated in order to find potentially interesting maps as a demographic geographer. And then once we were happy that we had a good dataset, we would then add that to CDRC maps. So we have a number, so the top industry one is here. Same focusing on London because it's run based right now, but this map covers the whole of England and Wales. Actually, this particular one we extended to cover Scotland and Northern Ireland as well is more involved generally if you're creating a whole of the UK because the Scottish datasets are normally the same or very similar to English and Welsh ones, but sometimes they're not, even to the case that sometimes column orders, the data looks identical, but the order of the columns is different. So you have to be quite careful when combining of the England, Wales and Scotland datasets together, but in this case we've been able to do that. Scotland also uses different populations for its small geographies. So again, that's something that you have to bear in mind. The one I mentioned earlier on was Top Method, travel to work Xcar. So for Edinburgh, you can see that it is dominated by people on foot in the centre and buses or cars on the outside. If you go to London, London's top travel to work Xcar is dominated by the tube or a train which I've combined together here. So for London we can just include the cars because even then cars aren't really used to travel to work in London unless you live beyond the inner city of London there. Other ones we have are the top country of birth, as I mentioned, and then other similar maps which are not strictly speaking top metric maps but are still interesting sorts of demographic maps such as a very simple population density one which just shows that in London there's this huge area of Lee Valley which is almost entirely unpopulated by industrial units and the city of London itself is also very low in population but also we have a central heating type. So this is a top metric map. It's showing the top type of central heating according to 2011 census. In most areas of the UK it's gas but as soon as you move beyond the gas network it's oil but there's many city centre developments which use purely electric heating these days and then there are other little quirks in places so here in North East Leicester there is other source so it's clearly an interesting power supplier or the housing development in that area. So that concludes the main part of our presentation. If you're interested in the techniques that I have described in terms of creating maps such as CDRC maps and the data sharing project I wrote a paper with my co-author Dr David Cheshore back in 2015 called Interact and Mapping for Large Open Demographic Datasets using familiar geographical features in the general maps it's open access publication so you can download it from that URL. If you're interested in using QGIS to create maps like this then QGIS project itself has excellent documentation. There's also a number of blogs the CDRC project has a data blog that's under data.cdrc.ac.uk slash blog and that is actually an aggregator of several other blogs from researchers working on CDRC projects such as myself. I also have my own blog orobrine.com again when I sometimes document and procedures like I've discussed today and then finally and let's step away from this is mapping London which is not so much about creating a maps but it's just demonstrating but the huge and wide variety of maps available for London including many data maps and if you are particularly interested in in mapping data sets which you likely are because of the content of this webinar then be sure to check to click the data tab on mappinglondon.co.uk and that gives you a large number of examples of interesting data maps and I think I may have sneaked one of my top industry or other top metric maps onto our website. So thank you very much for listening I hope that was interesting