 Good afternoon everybody. Hope you're all well. So prior to Nigel's piece on spatial ecology and COVID-19, which uses census data, I'm going to give you a very quick whistle stop tour of UK census data. I'm the head of the aggregate data unit within the UK data service and I've been involved with census data since 1995, so quite a long time. I did a placement year from university, the University of Manchester in the in the computing centre and then I'm glad to say that they took me back on and I've been working on census data ever since. So if we could have the next slide please. So we're going to start with a quick quiz. So in 2011, how many people in England and Wales identified their religion as Jedi? And if we get the next slide, we should have a way of recording this. So if you go to www.menti.com, use the code 4-8-6-8-4-6-0-0, we should get some answers. Okay, so we have a look at how you've got on with that. So census data is classic aggregate data. It's counts of things, people aggregated to a geography. Amazingly, the majority of people got that right. It is 6,632 and all of these people live amongst us. Next slide please. So the question on Jedi is probably not particularly useful to ourselves unless you're planning on founding a Jedi church. But what if the above diagram was your local area and the numbers represented households where you can identify single person households, people living alone? And then if you can add in the characteristics that the person living alone was over the age of 60, using that information, we can build a picture of where in the country we could target resources for looking at vulnerable people who might be shielding from COVID-19. Next slide please. So the census population in the UK is held every 10 years. The first modern census in the UK was in 1841 and the data that the census agencies hold is held completely anonymous and secure for 100 years and they are extremely rigorous in that security. We have digitized census from 1971. It says 1961 ongoing. That data is almost ready for release. We will have that as soon as it's available. Is this the 2021 census, the last ever census? They say it's going to be but then they said that the 2011 census was the last one that it was going to be. So we don't know. I personally think that there will be a 2021 census. Next slide please. We talk about a UK census but there is no such thing as a UK census. We have different nations, different governments and each of those governments drives the ideas for the census. In England and Wales we have the ONS. In Scotland we have NRS and in Northern Ireland we have NISRA. They all have different but similar geographies. They ask different questions at times. They look for different outputs from the census. However there is a lot of harmonization that goes on. The census agencies talk to each other and they do plan to have harmonization across most subjects. Next slide please. So the 2021 census for England and Wales held on the 21st of March. They've had a 97% response rate for that census which is unprecedented. I guess it shows that the online website they built for recording that was actually very good. They have this completion time for each census with just 23 minutes. And what we've got going on now, several months after the census, we've got a census coverage survey going on. So they're looking to see how well it was covered, what sorts of people responded and who didn't. We have got a census quality survey going on so they will randomly go and talk to people to see how they felt about filling in the census and also so we can cross-reference that and see if what they vote actually tell us with what the questioner can find out. And then we've also, we've just closed the, sorry, the ONS has just closed its output consultation. Next slide please. So a very similar thing going on in Northern Ireland. I think statistical disclosure control. They are still currently working on the best methods for ensuring that the data when it is released is not disclosed if you can't find individuals out from that. And the next one please. Scotland, Scotland haven't yet had their census that'll be taken place on the 1st of March. They suspended their census due to Covid. They have had a very successful rehearsal and the rehearsal of the data processing. And there's a lot of work going on to ensure that people are not counted twice or they're not counted all. So if they were in England and they moved to Scotland, they don't want to be on both censuses. If they were in Scotland, in Scotland in last year, well, yesterday when they moved to England tomorrow, they won't be on any census. So they want to make sure that people are counted somewhere. The next steps, lots of seeds, sorry, lots of seeds. So they're going to compile 3% of the census wasn't completed online. So that's a very small number, but it's still a lot of paper. So they're going to compile all the paper census returns. They're going to clean the data. They will, for instance, remove duplicate outputs. It does happen that, you know, someone in the house returns, completes a survey, sends it in, and then someone does it online. They're going to complete the census. So they want to have 100% coverage. So they're going to fill in the gaps where there's missed or invalid or inconsistent responses and estimate those who haven't responded using statistical methods. We're going to cross validate. So we'll be using alternative administrative data to QA the responses. Consultation. So they've consulted with local authorities, charities, community groups, commerce and ourselves on what we'd like to appear in, how we'd like the outputs to appear. And there's been a lot of work going on on confidentiality. So they're working tirelessly to protect anonymity. Next slide, please. Oh, sorry, no, we'll stay on that slide, sorry. So the outputs themselves should start to appear a year after census date. We haven't got a definite date on that. The first sets of figures will be headline figures. After that, we will get the data in tranches with the simplest data first with more complex, multivariate data towards the end. They're looking at a year to two years to get all the data out. So the data will appear in ready made tables, those are traditional census tables, tables that flex, so tables that you can tweak slightly, and also the ability to build your own tables. So you'll be able to specify the variables you want on that. And then it'll analyze the census database, make sure that none of the data you want is disclosed, if it is, it'll blur that data, and then it'll output that to you. They're also going to be linked to administrative data as well to fill in gaps or add extra value to the census outputs. Next slide, please. So what can census data tell us? All sorts. I'm not going to go through that list, but massive amounts of information. It's gold standard social survey data. It covers 100% of the population. No other social survey comes even close to that. It's for a single point in time. So it starts out extremely useful. After 10 years, you've got to start to think about how populations and lives have changed. Those categories of census data, they can be broken down into subclasses. They can be mixed with other topics, and it comes in a number. We have a number of different census products. So we have the aggregate data. We've got geographical boundary data so you can make maps. We've got flow data so we can look at where, how people travel to work and where they travel to work. We can look at how people are migrating within the UK and from the UK to abroad and vice versa. We've got micro data as well so we can start to look at individuals and anonymized individuals as well. Then we've got some derived data as well. So we've got some deprivation data as well, for instance, and spatio-temporal data. So things like in the past we've had POP247 which is looked at daytime and nighttime populations. Next slide, please. So in 2021 the census changed. It changes every time we do it, but in 2021 we've got new questions on gender. Northern Ireland didn't ask the gender question. So this is a voluntary question about how you identify your gender, as opposed to the sex that was registered at birth. Next slide, please. We had a question on sexuality in all the nations. Again, a voluntary question. That was four categories, straight, heterosexual, gay, lesbian, bisexual, or another writing sexual orientation. Next slide, please. In England, Wales, and Scotland we had a question on veteran status. So this is people who were in the armed forces but are no longer there. There are separate questions on armed forces personnel. This wasn't asked in Northern Ireland because it's still a sensitive topic. Next slide, please. We've got a new question on health conditions in Scotland and in Northern Ireland. So in England and Wales they just asked, do you have a long-term limiting illness and left it at that? Northern Ireland Scotland, they're asking what that illness is for Northern Ireland. It's a tick list in Scotland that's a writing. So that might be very telling with long COVID, for instance. So we look forward to seeing the results from that. Next slide, please. What the census can't tell us is there's no information on wealth income. So we derive our deprivation data using information like overcrowding, access to cars and vans, and socioeconomic grouping. There's no personal identification. They use data blurring and obfuscation to make sure that hopefully individuals cannot. Well, as far as I know, no individual has ever been identified from census samples. So in theory, if you're a hit man and your target is of Chinese origin living in Belfast but following the Sikh religion, it might be possible that you are the only person in your particular output area. So what we will do is they will add a random number to that one, two or three. And they might also swap your records with another similar type of output area. Next slide, please. Census geography in this country is a little bit of a mess. In fact, geography in this country is a little bit of a mess. Because we have four different nations and four different governments, we have four different sets of geographies. They all share the same building block, which is the output area, which is about, in England and Wales, about 125 households. However, in Scotland, it's about 50 households. So there is some difference there. From the output area, oh, the output area is a built after the census has been taken, because they are socioeconomically heterogeneous. So they're designed to have a similar population. Designed to not spill over obvious geographical impediments, so motorways, for instance, rivers, etc. They are aligned to local authority boundaries. So they don't split over, they don't split over into different local authority areas. From the output area, we create a set of geographies called super output areas, lower super output or middle super output areas in England and Wales, data zones and intermediate geographies in Scotland and in Northern Ireland, they just have lower super output areas. These geographies are supposed to be little changing over the years, so that you can use them for comparison with older censuses. None of those geographies particularly relate to anything real. No one gives their own criteria or their address. No one sends their local taxes to a data zone. So we also have on top of those, we have real geographies of regions, counties, local authorities, wards and electoral divisions, which have different names in all of the different nations. So there are no regions and counties and local authorities in Scotland. They have council areas. There is no postcard geography in the census outputs. There is an interface that we've created for matching and converting between geographies, and that is called Geocomvert, the address is there. I think we might talk a bit about that later. Okay, next slide please. So the boundary data, the data that we can use to create maps, we have all that at the place called UK Borders at Edinburgh, which is part of the UK Data Service. We've got data all the way back to 1981 in mapping for ship, file, KML, CSV, all the mostly most used mapping formats. One of our older interfaces, Casweb actually has 2001 and 1991 data boundaries bundled together. So if you get data out of there, you can actually have it in a mapping format all together, and if you don't have to do the work of joining your mapping data together. Next slide please. We have a number of different interfaces to the census record data, which has come about because we've been given money at different times to create new interfaces for new censors. Infuse for data from 2001 to 2011, and that includes a combined UK data set. Casweb for older census data, and Decan, it's currently only 2011, but we are loading 2001 into it at the moment. That's for bulk downloads of whole tables for whole nations, geographies, or whatever you want. That's what if you're doing bulk analysis. Next slide please. Deprivation data. There aren't any questions on income or wealth, so we use data from room occupancy, home ownership, tenancy, car availability, employment status, etc. to create a number of recipes, car stairs, town's end, and the IOD. They all use slightly different recipes for creating these indices. We have all that within geoconvert, so you can add deprivation data to your census data if you wish, or you can add it to postcode geography as well. I think that's all I've got to say about deprivation data. If you want to match geographies, so it tends to be that if you're taking social survey data or you're asking questionnaires of people, you're asking for postcodes. If you then want to match postcodes to census geographies, we have a tool called geoconvert, which uses while mail address points within geographies to calculate an area of your population and then proportion those that are up to larger geographies. You have to think about what you're attempting to do. Trying to aggregate data up to very high levels, there'll be very large errors that can happen. Also, postcodes change a lot, and we do have information in there about when postcodes start and when postcodes terminate. The royal mail re-uses postcode, so you just have to be careful when you're matching it to have a look and think, oh, there's an Aberdeen postcode here, but it's coming out for a Portsmouth census geography. We've got lots of documentation on geoconvert about how we can use that. Next slide, please. I think I've already mentioned actually deacon, bulk data. We've created it around census themes, very used to search, and has lots of metadata in there as well, and as we say, it's expanding all the time. I think, yeah, if we have any questions. So, what we're going to move on to is the ecological analysis. In starting to talk about this, I'm aware that people may be at very different stages in terms of understanding the ways that these methods, what underpins these methods. I'm going to go through some things which some of you may find easy and some of you may find quite complex, so I'm happy to answer questions as we go along, specifically about the contents of this, but there'll be an opportunity at the end to wrap up as well. So, if I don't get to your question during the session, we'll cover it at the end of explaining this before we get into the practical work. So, first of all, spatial data is a particular form of data, and it has, when you look at the files you've got, you have a number of different elements of those files. So, I'm just going to talk about the key things. So, the first is the projection, the second is shape, and then the kind of data we attach to places. So, first of all, the challenge we have is we're taking something round the earth and translating it into something flat. And what that means is, depending on where you look at it from, look at the earth from, you get very different pictures. So, we tend to look at maps of Norway from above somewhere around the equator. So, Norway looks quite small, but if we move round our point of view to above Norway, it would suddenly become quite, it would become much larger. So, the projection system is one where we derive our view, and the data we're using, the spatial data we're using, derives that view from something generated by ONS, which allows us to see the UK more or less from above it when we map it. The second thing that's there is our shape files, which are the way that we create boundaries on a map. So, in effect, it's a series of dots, and we go through joining up those dots to create spatial areas. And then the final thing we're going to cover is data types. So, an aero unit is a polygon, a shape, and they're given codes and names within data. So, when you download data from census aggregate data, you will have codes that attach it to a particular geometry. So, in this example, we're looking at lower layer super output areas. We've got some data attached to them. And when we create a shape file, we can attach to that the values associated with each aero unit. The second type of data is point data, and this is used to generate points on a map. So, the two examples there are actually tram stops in north Manchester. So, we can create an absolute place where those are and locate them on maps as well. So, those are the two types of data we use. If you are mapping individuals, you could similarly put that data onto point data. We've talked a little bit, Dave talked a little bit about output area geography, which we're going to focus on. But in the example here, I just want to show the benefits of the output area geography or the differences between output area geography. So, in the map at the top, we can see greater Manchester. And this is showing the proportion of households in the private rented sector. So, we can see a proportion of around 30% in Manchester, which is the long thin local authority area. When we move down to the next layer of geography, which is a middle level super output area, there's 57 areas and we can begin to see patterns of concentration. The darker part and the lower part of the diagram is around the university, which is where there's lots of student accommodation and student halls. And the part further off is near the city centre, where, again, there's quite high concentration, quite high density. As we move down to lower layer super output areas, we can see this becoming more granular. And the values now go between 5% and 90%. And finally, in output areas, we can see there's 1530 and you can get some real levels of detail. So, we've got some areas with no nobody living in the private rented sector up to 97%. So, that's something about the geographies available and that would, using these techniques, you'd have to choose your geography depending on what you are linking to. So, in terms of COVID data, COVID case rates are published by Public Health England. There's a consolidated publication as well as the daily ones which you may see in the news. There's a consolidated publication of all the results, positive results from the last week released each week. And that's held at mid-layer super output area. So, that's the data we're going to be using because we're constrained by where the level of COVID cases are. Vaccination data isn't available at that level so I couldn't include that within the data source. And we're using Greater Manchester. You could cover this across England, across a region, et cetera, but Greater Manchester seems like about a reasonable size to give an example of how to use this data. One of the things I provided is you've got two sets of shapefiles and I wanted to show you the difference between them. So, on the left, we have a physical map of data which is showing the COVID cases per 100,000 population at MSOA level in the first week of October 2021. So, as you can see that gives you a kind of picture which tells you something about where COVID cases are. But if we look on the right, the boundaries have been resized to reflect the population density. So, those where the population density is higher have become much larger. So, you can see that Manchester has grown significantly in that map and some of the outer areas have shrunk quite a lot. So, if you pick out parts of that map, you'll be able to see both of those. Now, they both have their purposes. So, I've given you both options. So, if you're writing for a policy audience or a more populist audience, then the physical geography probably works best. But if you're trying to show the extent, I suppose, of whatever variable you're mapping, then the cartogram has some definite advantages. So, I'm going to now move on from those mapping basics into some of the principles of spatial analysis. So, I'm going to go through these in a bit of detail. There's more on the slides and I suspect this is the part where some people might want to stop and ask questions. But in terms of these principles, these are what you'll be applying in the workshops. So, the first thing is what's called the modifiable aerial unit problem. And what this is is that depending on where you join boundaries, you will get different values. So, in the example here, you can see we've got a square with nine aerial units. So, the right of that, we shifted that over to three units. Below that, we've got a different set of boundaries and the final option is another set of boundaries. And if we look at what that creates, the red dots are showing where there's what look like errors between the spaces. So, in terms of the boundaries, you're seeing quite different results. This particularly happens. I suppose this randomness is one of the things that output areas were designed to try and overcome in terms of grouping areas of similar characteristics. But one note of caution here is though that's true of the output area, it's not true of those things that are built together because those are built continuously. So, we may well introduce error by going up from output area to lower layer and mid-layer, super output areas. So, that's what's called the modifiable aerial unit problem. It's something you might need to consider when looking at any analysis that uses spatial techniques. The next thing is spatial dependence. So, if you think about where you live, sorry, I'll just answer Paul's question here, which is asking if is it correct to assume that any physical representation of a place is at least slightly distorted. Yes, that's true just by the perspective that's taken on it. So, we can get close to representations we think are okay, but the representation we use for Greater Manchester is not taken from above Greater Manchester. It's a UK-wide one. So, spatial dependence is where there's a relationship between things. I suppose if you think about the towns or cities or villages or countryside you live in, you would probably think of certain places as having characteristics that were similar. So, if you know Manchester, you would know that there are different parts of Manchester and some are regarded as affluent, not many of them. Some are regarded as places where there's lots of students. Some are regarded as places where there's quite high rates of poverty. What that suggests is that within those areas there is a kind of clustering effect. So, that's what spatial dependence is. So, if places around you are like the place you're in, then there is a level of spatial dependence. The second kind of concept is heterogeneity. So, this is something I think that geographers in London and people doing spatial analysis in London find quite a lot that what you're seeing is very different areas, very close together. So, it's a kind of issue that will affect any calculations you do. So, just to give you an example of spatial dependence, on the left-hand side we have again a set of units, aerial units, with rates of burglary. So, the lightest shading or low rates, the medium shading are medium rates of the high shading. The darker shading is high rates. But if we look at income per household, we might expect there to be some correlation. So, if you look at number six, then there are high rates of burglary and income per household is high. So, why isn't that happening in number 16? So, one explanation might be that the higher rates of burglary surround a cluster of low rates of income. That's not definitive but it's something that you might wish to explore looking at that breakdown. Whereas on this pattern, which is fertility levels and female labour force participation, there doesn't really seem to be a correlation at all. There are some patterns there, but there's enough going on that suggests, so one in eight for example, where there's high rates of fertility correspond to low rates of female labour force participation in one and high rates in eight. Now, this might be because we haven't included something that would help us to explain female labour force participation, or it might be there's just a factor that isn't affected by where you are. So, moving on from there, what we try and detect is whether there's autocorrelation. So, whether there's clustering of high or low values or negative clustering of neighbours with high and low values like a chessboard pattern. And the effect of that autocorrelation in the error term in regression is to make estimates of the t-test values unreliable. So, positive spatial autocorrelation will increase the value of the R-square statistic and negative spatial autocorrelation will deflate it. So, those are the kind of problems that come from a non-random distribution that is caused by spatial autocorrelation. So, in order to kind of work out whether there is spatial autocorrelation, we need to generate some kind of value for an area, an area or unit, and its neighbours. And there are different ways of doing this. So, one is based on areas that are next to each other. So, contiguity. One is based on distance. So, you take the centroids of the polygons, the shapes, and calculate a distance. So, if you say within 10 miles of this space, of this shape, then all of the units will be regarded as neighbours. And there's a set of kind of work-throughs that go through that detect which neighbour, which aerial unit should have which particular neighbour. Or you can limit the number of nearest neighbours. Now, I think just to say here, I tended, I've used this technique for a number of years. I tend not to use distance because it does cause issues with more densely populated areas. So, I tend to use contiguity. And for the workshop, we will use a contiguity measure. So, looking at those, there are two commonly used ones. So, on the left we can see a first order rook. So, this is based on chest playing. A rook can move up and down and sideways. So, what you do is take the area units above, below, to the left and to the right. And the second is the first order queen, which takes all touching units. So, it takes all of those that are adjacent to the unit you're looking at. Those, and you could also limit the number of neighbours. So, it will cut them down by some calculation. So, that's the first order. If you were looking at large geographies with lots of units, you can also go up to second order, which would move these out another area. So, it would go on the rook side, go up one, left one more, left one more, right one more, and down one more. And with the queen, it would move out another block. So, having generated that, we can then look at spatial order correlation. So, I put the formula there. I'm not going to go through it. But Moranzai was developed. And what it returns is a value between 1 and minus 1. And it's a diagnostic that tells you it's a spatial order correlation. So, 1 is complete spatial order correlation. My positive order correlation and minus 1 is negative order correlation. A value of 0 or close to 0 would imply there's no order correlation. And what we get is a plot. So, in effect, what this plot is doing is taking that value for the aerial unit and the average value for the weighted areas. So, in this example, this is a queen's contiguity first-level weighting. So, it's taking the average or the mean of the variable across those units and then plotting them against each other. So, what this is suggesting is that similar places are grouped together. So, the direction is a positive space, suggesting positive spatial order correlation. The next thing we can do is look at how this is happening in each aerial unit. So, again, a formula on the local indicators of spatial association, which are used to generate leaser maps. So, let's have a look at them. So, what this is showing is on the left-hand side is a map of areas. I haven't put the variable on here, which is a bit poor on my part. So, this is a physical map of Greater Manchester, and it's showing the types of areas that are conjugated together. So, those areas that are red are high values of this variable, surrounded by high values in the neighbouring areas. So, you can see the crosses of red around the outside to the north, and in part of the other boroughs there. And the blue is low surrounded by low. So, this is where there are low values of the indicator surrounded by them. I think this was an indicator I generated for the training programme, which is based on a hard-working index. So, there are a few light blue areas, which are low values surrounded by high values, and there's a few pink ones, which are high values surrounded by low values. So, that's the map on the left, the Leesa map. Quite commonly, that's produced with an indicator statistical significance. So, the darker the green shading, the more statistically significant the values are. These can be generated, they're quite often published together, and the colours are a kind of convention that is used in producing and talking about these maps. So, when we move on from there to multivariate analysis, we have a basic linear regression model, which is a constant coefficient for each of the variables selected, and an error term. When we think about spatial articulation, we think that the error term may vary between the spatial units. So, in the example here, survival probabilities in the event of a heart attack are likely to depend on the distance to the hospital. We could also take into account the severity of the heart attack. You could also let the intercept vary in the model. So, what we get to is, after including all the variables that we think are meaningful, we may still get a significant result in the Moran's eye. So, that's the global measure. And what might be causing that is spatially correlated variables that are omitted, spatially correlated errors in variable measurement, and things we haven't considered, interactions we haven't considered. And to address this, we can use two models. We can use a spatial lag model, which takes account of positive of spatial dependency, or a spatial error model, which takes account of spatial heterogeneity. So, in thinking about spatial lag, we use a lag value of the dependent variable. So, we take the dependent variable we're looking at, and we look at the lag value. That's the mean value of that area and the ones around here, in the case of a queen's order. And the result that comes out is a parameter called row. So, if you do a spatial lag regression, you'll get a parameter called row. And in terms of spatial error, the results are put in a term called lambda. And those are both estimated by maximum likelihood. So, in terms of deciding whether there's spatial autocorrelation, you've got some diagnostics. In deciding which approach to use, you can do this either through theoretical considerations or exploratory data analysis. Within the regression model in Geoda, this is available also at least in R and Stata. There are diagnostic tests that give the Lagrange modifier and robust Lagrange modifier, which identify whether there's lag and or error. So, if we assume that there is no spatial autocorrelation, we would use a standard regression model. So, that is the process we'll go through later on. We then look at the multiplier lag being significant or the error being significant. So, if one is significant and the other isn't, it would point us to one of the models. If both tests are significant, we can use the robust statistics. And again, choose one or the other. There are some cases where both will be significant, in which case you may well end up presenting your three sets of results. So, your standard regression model, spatial lag model, and spatial error model. So, we're going to move on to the workshop in a bit. We're going to have some questions. But what we'll be doing is using Geoda to load the data, explore some of the descriptive statistics available within Geoda, produce weights, and then explore the kind of clustering and sparsity using Lisa maps, exploring spatial autocorrelation, and then moving on to regression. Now, a note of caution here. I suppose in designing this workshop, we're thinking that the 2021 census is going to give us some really good data to be able to explore elements of this. But in developing this, I've used the 2011 data. So, quite a lot of that data may well have changed. So, the nature of places, the kind of people that live there, et cetera, may well have changed. And in testing these results, a number of the coefficients don't give you significant results. So, what we're doing in the workshop is really exploring. Now, the two I've given you, the two examples I've given you in the workbook, do produce some significant results and do have something to discuss. But I think it's fair to say that I would assume that using 2021 data when it's available is much more likely to predict levels of COVID cases. The other thing to say is that the case data that we're using comes from 13 months. So, from October 2020 through to October 2021. And the way that COVID has changed over that time has been quite significant. So, there have been what are commonly called hotspots. So, I was teaching at another university in October 2020. And the areas where the students lived became a particular hotspot of that time. The cases went through the roof. And I think that movement of students to university was quite significant in many places last October. Whereas, in different times, it will be related to different reasons. So, the early cases were very much associated with health staff. And if we isolate where health staff live, we might get some indicators of that as a part. Okay. It doesn't look at this stage like we have any questions. Now, you're going to have to help me here and tell me if you can see the screen. Because what I'm going to do next is to move on to quickly demonstrating Geoda. So, okay. So, this is the interface. Open this. So, this won't be sized. Geoda, if your site isn't the best, is a strange menu. But to start with, it has a menu across the top, a number of icons which relates that menu. So, the first thing I'm going to do is to open a shape file. So, I go to file, open shape file. The workbook that you should have access to, and we'll drop the link again in chat, will take you through these steps yourself. So, I'm going to open the cartogram of the MSOAs in Greater Manchester. So, you can see this is now a map of Greater Manchester. I've just highlighted one of them. So, what this is doing on the left-hand side, it tells you how many units there are, 346 MSOAs. And as you can see, it's a cartogram. So, the areas have been leveled out in terms of size. So, the next thing I'm going to do is just explore some of that data. So, I can go in and look, for example, at a histogram, which will give me the frequency distribution. And it comes up with all the variables. So, let's just pick May 21. And that shows me the distribution. So, we can see that there's a few areas with quite high rates, but there's a lot with quite low rates in May 21. I can't see a map right now, Nigel. Sorry if I'm misunderstanding, but I can just see the menu up close. Can you see the histogram? It says histogram Crate May 21, but then it's just a menu below it. Right, okay. I'll go back and write. Now, can you see my screen now? Yeah, I can see the map and it's a bit more zoomed out as well, which is good. We can see it really clearly. Okay. So, I'm going to have to flip between the menu and so on. So, the next thing I did was then go in to explore and look at histogram. And I picked the rate for May. And what that gave me was another pop-up window. And I'll enlarge that a bit. Can you see that? Yeah. Okay. So, what, as I was saying, that's showing a number of areas, more than 200, with low numbers of cases, and just a few with quite high numbers of cases. So, there's one over here with 570, I think, if I'm reading that right. Now, I'm going to shut a couple of these windows down just to go to the next one. Okay. So, back into my GEODA menu, the next area of exploring you can look at is a box plot. So, that is similar. It gives you a distribution. I'll do it again for the same value. And it gives a simple box plot of one variable. So, you can't put multiple variables in here, just one. So, it gives you a way of exploring the data to see how it's distributed in terms of values. And you can see the outliers at the top. I'll close that again. You can compare two variables. So, if I was to take that rate in May and compare that to the proportion of students, that's if there's any relationship. And it's suggesting there's no relationship in May 21 case rates with the proportion of students in an area. But if I was to do that with the October 20 case rates, the picture is quite different. So, if you can see those two scatterblocks, you can see in October 2020, there was an issue associated with students. So, those areas with high proportions of students appear to have high levels of cases, whereas there's apparently no relationship by the time we come to May 2021. Okay. So, I'm going to close down those two scatterblocks. So, that is probably about it for the other plots there. You can explore them. The next stage would be to go into tools and to generate your weights. Now, I've already done that, but I will show you the screen just to show you how you do it. So, you create it, you add in an ID variable, and then you select what type you want. So, if I click on Queen Contiguity, I'm going with those right next to it, create it, and I can save that as a file. And now, I can use that when I move on to looking at the next stage. So, that's a one-off stage. So, now I can move into these, which are the diagnostics. So, I can look at the Miranzi, which we talked about. So, this will plot a variable and say, is there any relationship with places around it? So, let's pick a different one. Let's go for January 2021. And this is plotting the rate in January 2021 in an area with the weight surrounding it. So, we put in a Queen's weight, Queen's Contiguity weight. So, this is saying there is some positive auto correlation between these, and it's giving a Miranzi value of 0.22 something. So, I should make that a bit bigger again. So, that's plotting those variables. So, back into Geoda, if I go into space again, I can also look at the local area. So, if I look at the local Miranzi for the same area, it gives me a set of options. So, I can produce the significance map. That was the green one to show statistical significance, the cluster map, and the Miranzi got the scatter plot. So, if I just click on OK with that, I get three different maps. So, let's have a look at them each in turn. So, three different figures. So, this is the clustering of COVID-19 cases in January 2021. So, we can see some red places where there seem to be quite high rates of clusters, some blue places where there's quite low rates, and then the light blue where there's low surrounded by high, and the pink where there's high surrounded by low. And a large part of the map is not significant. And we could look at the statistical significance of those as well. So, you could present those two maps together, potentially, if you're explaining what you've done. And last I've got the Miranzi, which we saw before. So, we can explore those different variables that we might think about in a model, using those different techniques. Finally, we could look at methods, and where we get here is the regression. So, I'm going to pick a different time. So, the first thing I do, can you see this, by the way? The regression screen. Yeah. So, on the left, I've got a set of the variables. So, if I pick a dependent variable of January 21, which we were just looking at, and I'm going to try using the concentration of people who work in health, education, and emergency services. So, at the moment, I've got no weights fail. So, if I click that on, I get the weights file. And I'm running a, first of all, a standard regression. So, this just one's a classic regression, and produces an output file with those coefficients. So, this is telling me that up here, that this is not explaining an awful lot of variance, 2%, that there's a positive effect for health, but it's not statistically significant. There's a negative effect for education, it's not statistically significant. And there's a negative effect for the emergency services, which is statistically significant. Now, thinking about what these coefficients mean, what they mean is that for each unit increase in the value of this coefficient, the effect on the COVID case rate is minus 4279 in this case. So, for each percentage increase in the proportion of blue like people living in a neighborhood, there's a reduction of 42.7 or 42.8 in the case levels. Okay, so that's the first part of the model. Now, the second part of the model here gives you the diagnostics. So, what this is saying is the lag model is statistically significant. So, there is a spatial lag effect, and it's also saying the error model is statistically significant, but neither the robust tests are significant. So, what we can do with this is to go back to the regression model that we've got here, and we could try the spatial lag effect and see what happens. And we get another regression report for this. And if we look at the results, we've got some changes here. So, the results include introducing spatial lag, leaves health very significant, sorry, very similar results, but not statistically significant. It shows that education is now statistically significant and has a negative effect. And those working in emergency services is statistically significant, but the effect has reduced. So, I'm just going to park that one over here and try spatial error and just see what happens with that. So, what this has done has changed those results again, and it's given you a lambda effect, but it's saying the only significant effect is education, people working in education. Now, I kept this fairly simple. I think if I was thinking about this, I might be thinking about how I could isolate health staff who are at the front line from people who are managing etc. So, I might be using the NSSEC social classes to try and isolate people in particular occupations. So, the census gives a broad occupational space of health. It gives three or four different categories, and I've combined them here. So, in effect, what I've done is run through the ways that we'll be using geodas.