 So in this webinar, it's called Using Open UK Data Service Since Support Datasets in Open Source GIS Software. As Jill said, my name is James Crone. I work for the UK Data Service Since Support up here in Edinburgh. I'm actually part of ADINA at the Information Services Group, where we provide and support various online services dealing with geospatial data. So what we're going to look at is the range of open data that is available for download for the UK Data Service Center support. I'm also going to look at other sources of open geospatial data that can be combined with UK Data Service Since Support data, and these are from various other organizations like the Ordnance Survey or the Office of National Statistics. And then I'm going to provide a brief introduction to open source GIS software before doing specific demonstrations of using Census Support data in the PostGIS Spatial Database as our data store and using the QGIS desktop GIS application to do some mapping and some better spatial analysis. Particularly, we're going to take, we're going to create chloroplast maps, cartograms and flow maps using QGIS. And I'm just going to give you a bit of background about the UK Data Service Since Support. We're part of the main UK Data Service. With InSense Support, we provide access to and use support for data from the last five UK population census. Our decennial census from 1971 to 2011 across the whole of the UK. The majority of the data brought by the UK Data Service Since Support is available as open data. Whereas in the past, some of the data was only used for strict academics, now most of it is open data. As well as the core census data, we also provide various other non-census data sets, which can help when you're doing analysis on the census data to get extra context to that data. If you've not been to our website, there's a link on the bottom here, which you can look at later on. So what is open data? Some brief descriptions. Basically, it's any data that anyone can use access and share. It should ideally be providing a sort of a common machine-readable format. And what's special open data is it must be licensed, so it's good to help people who can use that data and then remix it and then combine it with their own stuff. That's just a definition of the European Data Portal. Do a Google search for the European Data Portal. You can find information about them. Within the UK Data Service Since Support, all our open data is provided under the Open Government License, which is a UK thing for open license or open data. As part of the UK OGL, it's called the OGL. Whenever you re-use the data, you should also acknowledge the source of the data, and it's possible to provide a link to the OGL. In terms of the types of census data that are provided for the UK Data Service Since Support, it breaks down into four key data types. There's the aggregate census area statistics data, and this is the tables, so this is the core census data from 1971 through to 2011. And there's lots of lots of variables and lots of combinations of data you can get about individuals that was covered during the census days. There's the census interaction data, which is flows between places. So that's like, it could be commuter flows. So where do people live and where do they work? As well as migration stuff. There's census micro data, which is detailed on long-gauge information about people. And as well as that, we have supporting data sets. This includes boundary data sets and geographic lookup tables. So these allow people to map the census data and to create maps and spatial analysis, and also to use geography as a way of linking that census data to other data sets. If most people use the UK Data Service Since Support, they may have seen the Infuse and Casuab applications. These are the UK Data Service Since Support applications that allow people to download the aggregate data. Infuse will allow you to download the 2001-2011 data across the UK. And the people living in UK Data Service Since Support done a lot of work on Infuse. So it allows you to download data from across the UK and they've done a lot of work to harmonize it. Data from England, Wales and Scotland and Northern Ireland. So in some respects, the Infuse thing is quite unique and powerful. To get the earlier data from 1991 to 71, you have to use the earlier incarnation of the application called Casuab. There's also to access the flow data. This is a separate application called Wicked. And that will allow you to download flow data. Again, that's the information pure migration that's been captured as part of the census. We also have Boundary Download Applications, which will allow you to download this supporting data. So here we have the Boundary Data Selector application. And we'll allow you to download Census Boundaries, which you can use to map some of our Census data sets. We also have an Easy Download Application, which just provides the pre-canned version of the boundaries. With less control, you can't specify which geographic area you want. You just get the entire country. Again, in Infuse and the Boundary Data Selector, most of the data that is sensitive to this is all OGL. With Wicked, some of the data is OGL, but others are more restricted, depending on the level of geography in terms of the migrations. Some other sources of data that you can use from other organizations is the Office of National Statistics Open Geography Portal. They allow you to download Boundaries, as well as Postcode Directories. I think the emphasis on the ONS is to write access to contemporary data sets. So you can see here, for example, you can get data for 2017 of clinical commissioning groups. That's the Health Geography, which you might be able to use as context on information for our Census data. The thing to watch for the Open Geography Portal is it has less support for older Census Boundaries, such as the 71 and 81. So in that case, you're still better off getting that data from the UK Data Service since support. Another great source of open data is the Ordinance Survey itself. The Ordinance Survey is Britain's National Mapping Agency, and they have an Open Data Initiative, where you can access a whole variety of geospatial data sets, including backdrop mapping, postcode boundaries, gazetteers. They also have, like, three versions of some of their high quality data sets, which are really useful to provide background context to the Census data. It's going to give a background to GIS. So GIS stands for Geographic Information System, and it's basically a way of... it's an information system for modelling the real world. It tends to come to two distinct views of GIS. There's a vector GIS, and there's a raster GIS. Vector GIS consists of the real world in modelling in terms of points, lines and polygons, and raster. It's like a... it's a regular grid. You can imagine it could be an aerial photograph, or a digital terrain model, and there's different ways of storing the geographic data, and then you can run digital types of analysis on them, manipulation. Typically, a GIS may end up as a desktop application, which allows you to store that GIS data to manipulate it. So, for example, you could capture our lines and polygons to do some sort of create maps from that data, or do some sort of analysis. This is the cutest application that we'll be using today to use some of our Census data. GIS is a big world. There's a lot of different technologies, so we're just going to show some of them. Specifically, if you want to look at how Census data is stored within a GIS, here's a typical aggregate data of the top left here. So we have these things called geographic identifiers. These uniquely identify each of the small area geographies, which could be an output area, which the Census statistics we produced for. Here we have a bunch of... we have three different Census variables. We have total population and male and female, and that just tells us the number of people that are males or females within that small area. And then we can also get the digital boundaries. These are the digital boundaries that represent the actual Census open areas on the ground. And by time relating the Census statistics to the boundaries, we can create maps and do spatial analysis, as well as using the power of geography to link in other geographic datasets. Let's say location brings context, so we're linking the stats to the boundaries in geography. These geographies are common key to bringing other datasets. In the GIS world, you tend to come up with a lot of common dataset formats, like the shapefile and CSV files to store the attributes. So what is open GIS software? Well, if we first look at what is open source software itself, it's computer software, made available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. And it's often developed in a public collaborative manner. This is in contrast to sort of non-open source software, which may a commercial company may develop, in which they do all the bug fixing and testing internally and don't allow others to the community to feed into that development. So open source GIS software is just a special flavor of open source software targeted towards the migration of geographic information and data. Within the OS, within the open source GIS community, one of the major is an organization called the Open Source Geospatial Foundation, the OSQ, which is a non-proper organization whose mission is to support the collaborative development of open source geospatial software and promote its widespread use. So they have like a number of, they're like a banner organization, and there's a number of like different software applications that sit within them and they provide support and look after the development of these applications. They also run a very good conference every year, which tends to move around the world. And they have like local chapters and so for example there's a UK chapter which has its own conference stuff. And again, if you're just getting started with open source GIS software, their website is a great place to start. So there are different types of open source GIS software and these are aimed at different types of user or application. So from the top to the bottom, there are the core geospatial libraries. These are what developers would use to write other types of software and to add new features to those software. There are the desktop applications such as QGIS, which allow end users to manipulate GIS data and create maps. There are web mapping software which allows people to do online mapping of the data and then these two at the bottom are more related to the actual management of the data itself. So there's metadata catalogs that metadata is data about data. So it's how you describe that data and how you make it discoverable by other people. And again, there are GIS open source applications which are helped with that sort of stuff. Today, we're going to look at two main open source GIS software applications. We're going to look at using PostGIS as a spatial database in which to store some of our census data and then we're going to use QGIS to connect to PostGIS and do some mapping and some great various types of visualizations of our census data. So just to start off with PostGIS, let's say to get started with what PostGIS is itself. So PostGIS adds GIS functionality and data types to the PostGIS QL database. So PostGIS itself is like a type of database. It's an open source database but it comes with sort of like supports for geometries but it doesn't have this two GIS functionality as part of it. So the idea is you have PostGIS on top of PostGIS and then you get like the full ability to store GIS data in PostGIS and you can also do all your GIS data processing analysis using spatial SQL so you can write SQL to query that spatial data in PostGIS, PostGIS and then you can have other applications that talk to your PostGIS database in order to access your GIS data. And that says PostGIS is part of USTO umbrella of software. So in terms of how you go around getting PostGIS, the prerequisite is a PostGIS installation. You can install PostGIS and PostGIS on a number of software platforms including Windows, Max, and Linux, etc. And Windows is actually pretty easy to install. PostGIS itself comes with these X-Tools. You can just run to install PostGIS database on your Windows desktop and then there's a thing called the stack builder which allows you to install additional parts to PostGIS such as PostGIS on top of that PostGIS database. So it's actually fairly simple to get straight forward to install a PostGIS on your Windows machine. I think for Linux you can make it as difficult as you want to depending on how you want to install it. And again, for Max, there are different options. So I've got a PostGIS installed on this Windows laptop which we'll be using for, of course, the webinar and the demonstrations. So in the first demonstration I'm going to do, I'm just going to load a shapefile of boundaries into PostGIS. From set to support I've downloaded some Scottish Council areas and what I'm going to do is load them into a PostGIS special database that we have access to. And I said there are lots of different methods of doing this. I'm just going to look at one method which is a nice easy way of doing this. And then once we've done that we'll see how the data is stored around PostGIS. So let me just drop out the presentation here. Let me first just open a PostGIS itself called the administrative tool that lets you admin on PostGIS databases. So basically there's a bunch of different types of databases I have on this machine. I've already set up a database called James because my name is James. And within this I create very schemas. Schema are just a way of partitioning the data in the database. And then when you can see it, we know the table is loaded. So let's just load some data. So I'm looking for my PostGIS tools. So when you install PostGIS on Windows, it gives you this. It gives you a PostGIS importer and export manager tool which will allow you to upload shapefiles into your PostGIS database. This is what this tool is. You can do this in Linux or I think on Macs. You don't get this nice GUI tool. You get command line tools which should do this similar sort of thing. The first thing I have to do is tell the tool where my database is. So let me just do that now. Okay. There's two tabs here. So there's an import tab and an export tab. I won't use the import tab because I want to import the shapefile. So I'll just navigate to my data. So I've got my desktop. And I've got two shapefiles here. The one I want is this Scottish one here. So I know how to set two options. I first need to tell the SRID as a spatial reference identifier. And that just tells the PostGIS what type of the spatial reference system is it British National Grid or is it Data for America or Australia? It just tells the spatial reference of that data. So I know this is British National Grid and this uses like a code of 270. I have to tell which scheme I want to put the data into. So that's like the partition in the database where the data is to be stored. So I'll add the UQDS one cell. So that's what I want to use. I've confirmed that and that's all I need to do. I could add multiple shapefiles here by doing this and having them queued up. I'm just going to add one. And I just run this import statement. And you can see it very quickly. It's done something. If I go back to the PG admin and do a refresh. You can see that it's now loaded the data as a new table. If you look at the table, you can see how the table is being created. You can see we've got various columns here. We've got an identifier column. We've got a GUID column. That's like the presentation. That's the unique ID for the geographic area, which tells us how we can relate and send the stats to it. The name is a place name of that geographic area. But importantly, we have this geometry column. And that's the GIS part. That's the geometry, the polygon extent of that area or in the state of the county area. And it's through that that we can map the data under spatial analysis. The import has also added an index to our data. And that just means that the features on that table can be accessed quickly. Just check. So that's great. We can actually, because it's a database, we can query the table just as you would any other sort of data set at any other database. So the PGM tool comes with a GUI that allows you to type in SQL queries and stuff to use that access system or something. But what we can do is we can post just provides all these special functions that allows to manipulate the spatial data itself. I can just do an order or something. So if I run that, you can see it pulls back results from the table. So again, you can see I've got the GUI identifier. So I need to identify each record. It tells you where it is. And because I've run this function called STS text, it's worked out this from the geometry, the centroid point of that polygon. So you can see we want to use just purely using PostGIS to do GIS type functionality on our data. We could have done this in a GIS application like QGIS or ArcGIS. But it's really nice to just get your data purely from SQL by running queries. So that's really nice feature. So I just go back to this presentation. So the next thing is like, how do we actually then view our data from PostGIS? And to do this, we can use the QGIS desktop GIS application. So again, like PostGIS, it's really easy to install QGIS. It comes with executables that just allow to... It's quite hefty downloads. I think it's like 300 megabytes on Windows. And then you just run the executable and it will install QGIS on your Windows application, on your Windows computer or your Mac or your Linux. And then you get this really cool, nifty desktop GIS application. I've already pre-installed it. So I just fire up QGIS now. So let's wait for QGIS to open. There you go. So I've got version 2.18. QGIS is quite actively developed. So you might find the version number increases quite rapidly. I think the next major release will be QGIS version 3, which will be really quite nice in terms of what it does. Okay, so it opens QGIS. So there's various menus on the top. I've got the main map window here and some various panels on the side. But we just want to add some data from our PostGIS database to QGIS. So we have different types of data we can add from the left here. And for PostGIS, PostGIS, we have the big elephant symbol. So I just need to tell QGIS where my PostGIS data is. So I've already set up a connection here. I'll just edit that. So this just tells QGIS where my data is. It's okay. I'll just connect. Give it my password. Okay. And I just simply pick my table, which I've added to QGIS using that tool. And there we go. There's our PostGIS data within QGIS. And I can just select. I can click on the various features of the boundaries. I can see the properties. Or I can see the entire table. I can open the actual table. So that's great. We can have our data stored in PostGIS. And we can access it from QGIS. So one of the common tasks within GIS, and especially using census data, is the amount of your boundaries. And then you might have aggregate data in a CSV table. And you want to be able to join them so you can do some sort of, so you can map the census data by geography. And this is a task of joining data. You basically have like that. You have a unique identifier and a CSV data. You have the same unique identifier. You simply want to join the two tables. So we can do that in QGIS quite easily, which I'll just show. Basically, you go to properties of the layer. Let's first add the CSV file. So I first have to add a CSV file of my census stats. Again, use a QGIS dialog here. I have to pick no jump rate, because the stats don't contain a junction column. If I open the attribute table, you can see all the census stats. You can see this geographic identifier, which uniquely identifies each row. Again, you've got name and you've got various stats from left to right. So here we've got population 11. I think this is like a data do of housing and stuff. So we have like a number of people who rented local authorities, who rented from a local authority, from a housing authority, from a private one-lord and stuff, or over-occupiers and stuff. And we simply want to join those stats to the boundaries. So we go to the properties of the boundaries. There's a join area here. And we simply want to add a new join. So we select the data you want to join to the polygons, which was our CSV of census stats. This one here. And we just have to tell QGIS in which columns contain those same geographic identifiers. Okay. And we just apply that. You okay that? And now, if you do that, we can see that the stats have now been joined to the polygons. Now, you could also, because this data is in PostGIS, you could load your CSV directly to PostGIS and just do it in Postgres itself. But some people find it easier just using the GUI and QGIS to do this. So there's just different ways of doing the same thing. So that's great. We've joined our stats to our boundaries. And now we maybe want to look at doing some actual visualization. So the first thing you do visualization non-geographically, you could create bar charts and stuff, or pie charts, or pictograms almost. But because we've got the data geographically, you want to do some sort of spatial mapping. So such as a quarter-plot map. Here's an example here for London. I think this shows the percentage of people working more than 14 hours per week in London as a part of the UK 2011 census. And this is being the city of London. You can see that the highest percentage is in the city of London. That's all the bankers and stuff. Working at long hours, or is it in central London as you get towards the suburbs, people are working fewer hours. But before we do this, we're going to do some sort of mapping fundamentals. The first thing to look at is actually how we classify data. So we have like a range of data variables. And the classification part is simply simplifying that range of data into categories in order to allow us to recognize patterns in that data. So we have like here a bunch of data from 3 to 93. And we just want to simplify that data into five classes. So here my class, I have a class from 3 to 20, 21 to 38, 39 to 56, 57 to 74, and 75 plus. So we've simplified the range of data into five classes. And obviously by choosing a different number of data classes and how we allocate the intermediate to those classes, we'll produce different sort of maps. The other thing is styling data. So again, here are our five categories. And then we just choose the sort of colors to apply to those categories or to map them. And again, we have all kinds of choice here. We can have like a linear. We go from light to black. So small data values have a light value, margins have a darker value. And there are various choices here. This color viewer tool is really nice to try and pick the ultimate color style to apply to your data. I think QGIS as it has plugins will allow you to use color viewer. So that's something to look for. So where are you going to do it? Create a color path map using QGIS. So again, the color path map shade area is place to go variables. So again, from QGIS, you just use the properties of the layer. We go to style and let's go. We want to go to the graduated option. And when you pick the sort of variable we want to map, let's use diff, which I think is the difference in mortgages between 2001 and 2011. So I just classify that. And I just apply it. And there we have our color path map. And this being QGIS, I'm going to adjust mapping application. You can actually create a print hard copy of this. So it's got a print composer in QGIS. So it basically allows you to lay out how you want to sort of map your final maps in here, which you might want to include in a report or something. So let's just change the portrait, add a new map. So you can see it dumps the view from our main QGIS window into this layout. And you can play around with this a bit. You can do things like QGIS. So you re-center that. And we can add some map essentials like a key and a north arrow and a skill bar, et cetera. So we just drag this on. So there's a skill bar. We also need a key so that we can tell what the various colors mean. The color legend here. Give them up a title. So we just drag on a text box. So tweak the font and stuff. And because this is like our center database under the open government license, we have to write an attribution statement that indicates where the data came from. So when you download data from the UK Data Service, it will always come to our terms and conditions document. And that will tell you the attribution statement you have to use. So here it is. And from the boundary data selector, it's called terms and conditions.html. So I just open that with Firefox. You can see you've got this attribution statement here. And I'll just copy into QGIS. So let's add some more text. That's another text box. Please increase the size of that. So it's a bit legible. You can stick that in the bottom. So it's not massively exciting. But again, we've just done that. It was a PDF or something. So let's see. It's created something. So great. That's our map. But again, we can dump it into a report. We're writing something. So collaborative maps are one way of visualizing data. Another way of visualizing census data is a thing called a cartogram. And the cartogram, instead of the, we're just shading the areas by the variable, we actually manipulate the areas themselves according to that variable. So there are two types of cartogram. There's a distance cartogram and an area cartogram. A distance cartogram is a classic example. A distance cartogram is the tube map in which the distance between the stations is not a journey distance. It's actually the time between those stations so that bonds between the mobile apps are relatively short or far from each other but it's much further apart. But it doesn't represent true geographic distance. With an area cartogram, we simply distort the actual geometries according to the variable being shown. So here the big circles are like large populations and the small ones are small populations. But we still maintain the relationship between the areas. So we still see that Texas is near precisely Louisiana and Washington, Oregon and California still sit between each other. And cartograms are quite heavily used for socio-economic data. They feature quite heavily in the media publication of data related to the last EU referendum. So as a guardian, we created these cartograms showing the result. And they were widely used for census data as well. So there's a great book called People and Places by Danny Dolling and Beth and Thomas, which uses cartograms extensively to show census stats for the 2011 census. And it's really quite a different way of viewing the data. Unfortunately, QGIS comes with a plugin which will allow you to create these sort of cartograms for the census data. So we're just going to look at doing that now because that's a different way of viewing the data than a chlorophyll map. We're basically going to create this sort of thing on the right here using our QGIS plugin. And then you can sort of compare it with your chlorophyll map. You can see the problem with the chlorophyll map is like the areas of large population like Edinburgh and Glasgow swamps by the bigger areas. With the cartogram, they will grow and it's a lot easier to see where the major areas are so these white bits here. It does distort the geography quite a lot but it still maintains relationships between neighboring areas. So it's still quite a nice way of viewing the data. So we'll just look at how we do that in QGIS. I'm going to use the same dataset and what I'm going to do is I'm going to save it to PostGIS so I can use it in this plugin. So to do that is a feature in QGIS called the DV manager. And that will allow us to save our data that would join two posters. I can do this. I have to tell you what data to save. I'm going to call this Scotland Web Stats and what I can do is I just grid it these because I'm done with that. I'll just re-add that PostGIS data. It's this new one here. I've not lost any data on it. You can see we've got our stats downloaded before. This is quite a powerful feature of QGIS is that it comes as well as the standard functionality that you get when you download it. There's also a range of additional like they call them plugins which are additional community developed functionality that individual people have created and extend the functionality of QGIS. So it makes it really quite powerful. One of the ones we're going to use is a cartogram one. So to install the plugins you've got this plugin to manually install and it tells you all the plugins you've installed. You can see I've already installed this cartogram one. It has to connect to some external website so sometimes it can be quite slow. It's thinking about it and you click on the links it'll just tell you various things about the cartogram. So this is the sort of thing where I create using this plugin. That's great. I've already installed it so to access it and to go to the vector and when you install the plugin it adds different menus and stuff so I'm going to create a cartogram and I'm going to use the same variable as before which I think is a def m thing and then you can sit there and create our cartogram and it may finish at some point. Can you look? Yep, that's finished. So here's our cartogram. Again we can style it up and see what's going on. So yep, there's our styled cartogram and it's just an alternative way of viewing the data. Some people don't like cartograms I think they're a nice way of viewing the data. In the final bit of demonstration we're going to look at creating flow maps. So I said at the beginning of the presentation the Wicked UK Data Service Sensor support application will allow you to download flow data and that stuff will migration and stuff. And we're just going to use that. Get another QGIS plugin that allows to create flow maps from this data. So a flow map you can see here that this is showing the migration queue from leads to the in terms of where they work. So they'll be living leads and they work on these other local authorities and you can see that most people which work in and on leads will migrate to Bradford to work or Wakefield or as Les would say migrate to York. These are quite a nice way of just exploring the flow data. But I'm going to use some different sense of data for this. I go back to QGIS and I'm just going to get rid of these two data sets. And because we're running in quite short time I'm just going to add these as a shape file and a CSV file. So back to my desktop. So these are the boundaries. These are English and Welsh Census Merge Local Authorities. And I'm just going to add a CSV file of the actual flow data because this one again is if I open up the flow data you can see we've got an origin and a destination on the flow so that just tells us that 16,000 people in terms of where they work and where they live they live in this local authority and they work in this local authority and all you want to do is try and visualize this on this data set by creating a flow map. So again I'm going to use a plugin to do that. So I'm just going to open the plugins again and QGIS is thinking about it again. So I'm using this plugin called RSins which I think is French for snail or something. But again you look at the home page it gives you the developer documentation. It's all in French but you can basically tell what it's doing it's creating these flow maps and there's an origin and destination of flux. So that's the sort of thing you want to create. Again I've got it installed again so I can just go to flow maps and I just have to tear up the origin destination of the data I want to map. So if I do this now you can see it's created some flows for leads and because I've not mapped I don't have full data for the entire country. I just want to isolate the local authorities that we do have data for. So to do that I'm going to do some spatial analysis and just simply to select all the polygons which intersect with the flow lines. So analysis spatial query that's the one we want. So we just want to get queues to select all of the local authority districts which intersect with the flow lines and create a new selection. So you can see that it's done that. I'm just going to create a new layer and that did work. I just have didn't try that again. That's worked this time. So there we have our smaller selects in our local authorities and again I can just I can tweak the properties of this data to make things nicer so I can do things like I can show the labels of the features. So that's great, that's our flow map the majority in the leads who don't live and work and leads are migrating for work to Bradford and then to Wakefield and then far fewer are migrating for work to Doncaster, York or Harrogate. So that's just one way that you can create a visualization in QGIS of the full data.