 Hello everyone and welcome to this webinar exploring diet and obesity in children using geodemographic classifications. I'm Margarita Sarraulo and I'm an outreach officer working for the UK Data Service. And presenting with me today is Michelle Morris and she's a research fellow at the Consumer Data Research Center within the University of Leeds. So this is a quick overview of the webinar today. I will introduce phase two of the Big Data Network very quickly and the Consumer Data Research Center and then Michelle will tell you more about her research on diet and obesity in children including the data and methodology that she used. And then we will finish off with some questions. So DSSC has created a Big Data Network which is divided into three phases. As part of phase two it supports the establishment of centers with a focus on business and local government data. So these three centers are the SSC Business and Local Government Data Research Center, the Consumer Data Research Center and the Urban Big Data Center. So these three research centers make data routinely collected by business and local government organizations accessible for research. They are benefit to data owners in society and they also show that individuals identities are always safeguarded. And you can find out more on their websites. So the UK Data Service has been funded by the Economic and Social Research Council to support this phase two of the network and to enable researchers to make the most out of these data for knowledge exchange and impact. So our aim is to unify data discovery across the Big Data Network support phase two data collections and to encourage the sharing of information expertise across the data research centers and to coordinate user training and capacity building in Big Data Analytics for researchers using the data. And just a few words about the Consumer Data Research Center. It was established to contribute towards ensuring the future sustainability of UK research using consumer data and it combines expertise from the University of Leeds, University College London, University of Liverpool and University of Oxford. It has a data catalog, a searchable data catalog and it offers open, safeguarded and secure data and the data covers a range of topics concerning the characteristic constraints and outcomes of consumption. They also offer training and other services, training on data analytics for example R and GIS and you can find out more on their website CDSC.ac.uk. And now I'll hand over to Michelle and she'll tell you more about her research. Yep, okay. Hi, so my name is Michelle Morris and I'm a University Academic Fellow at the University of Leeds and I'm going to talk to you about some of my work today which is exploring diet and obesity in children using geodemographic classifications. The research that I do at Leeds within the Consumer Data Research Center is all related to how we can use new data for different types of health research. Okay, so first of all then, this is a bit of an overview of what I'm going to talk about for the next 30 minutes or so and I'm going to start with some background information and about why diet and weight status is so important for health. I'm moving on to the geographies of diet and health and then talking about some of the methods in my research, the data that I've used and results for both diet and weight status in children and then at the end I've got discussion which is going to be interactive and we're going to use some of the polling questions. And the first polling question is actually now. Just want to get a bit of a feeling for what you think about some of these questions. So what do you think is the main modifiable risk factor for non-communicable diseases? So these include things like type 2 diabetes, cardiovascular disease and cancer. Maybe the title of the talk gives it away, but 83% of you have said that you think diet is the leading modifiable risk factor with 14% saying physical activity and 3% saying smoking. So that's really good because diet is up there as the leading risk factor very closely followed by physical activity. But it was only a few years ago when smoking was the leading risk factor. But with all the different anti-smoking campaigns we've actually made a big public health shift in the UK to look at how we can do that with diet. So I'll move on with the talk. Okay, so like we've just been saying diet is the leading risk factor for a number of different non-communicable diseases as shown on the slide. And one of these is overweight and obesity. So overweight and obesity is such a huge problem in the UK and throughout the world. So the direct cost to the NHS for overweight and obese is 5.1 billion, with a wider cost to society of an additional 11.5 billion pounds. This doesn't take into account personal burden. There's lots of evidence to suggest that overweight and obese people have a significantly reduced health-related quality of life. So there's a lot of personal cost there in addition to the monetary cost documented above. So the importance then in children, so there's 10% of four to five year olds in the UK are obese but by the time we're adults it's 67% of men and 57% of women are costless either overweight or obese. And obese children are very likely to go on to become obese adults. It's not just that they have obesity as a problem in adulthood but also potentially all those other non-communicable diseases that were on the previous slide. So independent of whether obese children become obese adults, they're more likely to have these different types of non-communicable disease. So I've got another question now for everybody. So do you think where you live is a risk factor for these different types of health outcomes? So this is an easy one it's either just yes or no on the poll. Okay I'll just leave that for a couple more seconds we're up to 90% of people voting. Okay so everybody has said yes I think where you live is a risk factor for different health outcomes. So I just want to talk about two different components of where we live. So there's the environmental composition. So if you think about yourself, your own knowledge is your own belief. When it comes to obesity essentially it is an individual's responsibility and if you move enough and don't eat too much you shouldn't become overweight or obese and keep that energy balance but in real term it's much more complex than that. It's all about your surroundings. So if we go from the orange in the middle about your own beliefs and then think about your family and friends and colleagues around you who have an impact on your life and then a larger scale about what facilities are in your workplace or school and clubs around you and then the different community relationships between organizations and the national level then policy guidance and national measures that are put in place to promote healthy behaviors. So that's the environment composition and how the different scales might interact to affect the way that you behave and therefore your life has to be coming over weight or obese but then we've also got the environmental context which in this case I'm saying is the specific geographic location where you live. So if you live next door to a fast food outlet or compared to next door to a leisure center you might likely have different influences around you. So we take an example then on here if you're living with family and friends who are really fit and physically active and eat a healthy diet you're more likely to do the same whereas if all your family and friends would rather go to the pub to socialize and have fish and chips then again you likely do the same. And part of this talk I want to propose that your demographic classifications goes some way to accounting for both the composition and the context of the environment. So now it's time for my next poll question so hopefully everyone's still awake. I just want to gauge a bit of an opinion about whether you're familiar with due demographic classifications and how they're typically used. So again it's just the yes or no so let me gauge how long to spend on the next slide. Okay so we've got 80% of votes now and we're quite split with about 70% not familiar with due demographics and 30% who are so I'll just spend a few minutes on the next slide explaining this a bit more. So due demographic classifications combine demographic characteristics with small area geography so if you think about the census data and there's hundreds of different questions collected about you and in the due demographic classification there's to pulled out about 60 of those characteristics and then use them to segment the population into like groups but those characteristics are all tied to small area geography which is about 250 people so it really is just your immediate neighborhood. There are open source and commercial versions of due demographic classification and they're predominantly used for marketing purposes so we target certain types of the population for selling certain types of products and they're really quite underused in health. The main difference between the commercial and open source classifications so Cameo is a commercial classification which is what I'll talk about in this presentation and then the main open source due demographic classification is the output area classification which is developed just using the census data by the Office of National Statistics and all of the methodology for how that has been developed is available online so you can go in and see exactly which census variables have used what type of clustering models modeling techniques they've used and then you can see radial plots for which characteristics are most dominant in each of the clusters. With the commercial classifications they don't just use the census data they also use data they've collected themselves from different types of surveys so it might be information about which newspapers you read what types of TV programs you signed up for and various other types of information along with any information they've got about your income so they're potentially much richer but they don't publish the methods that they've used to generate the clusters so with both of these types of classifications they're built in a hierarchical structure which is the case for many classifications not just geodemographics so for the Cameo which I'm going to present today at the top level there are 10 groups so for example if you've got some blue collar workers who live in terraced housing with young children in Newcastle they'll be put in the same group as blue collar workers in living in terraced houses with young children in Leeds and Bradford and Sheffield so we can identify that those are the similar types of people and we know exactly where they live if you were going to plot them on a map but when you're talking about you know this blue collar community people we don't say that Joe Bloggs living in such and such an address we'll just know that they're one of 10 different types of people and that they live in this unit of 250 people so once you append you know just the 10 IDs to some data that you've got or people's postcode it becomes unidentifiable again because you could be anybody you know you just it's just one in 10 different groups of the population and then in the Cameo there's a within those 10 different groups they break down into a number of smaller groups smaller groups so that you can pinpoint more specific characteristics and that's the same with the output area classification but the output area classification has got three tiers of the hierarchy so when you get down to the lowest level you know you've got some real kind of key characteristics which are driving those types of groups okay so to think about this application in the context of big data and the big data network which this is part of presentation part of the series of with the increasing volumes of big data so big data new data you know all different types of data you know nothing about the size we can have hopefully gain an improved understanding of people's behaviors and with more volumes of data we can hopefully also generate some better geodemographic classifications where we can group people into more pinpointed types of classification and then to go hand in hand with that we're working with increased computing power we've got improved secure data infrastructures so that we can host or potentially host more sensitive data and in a safe way and keep that anonymized and also we've got more power for matching to these different types of alternative data sets that are becoming available but what really of interest to me is the fact that we can take all this data and all this computing power and actually apply it to real well problems and in this case overweight and obesity so just to move on then to give you a little bit of background of some work that I've done previously using data on adults so this chart that you can see here this uses data from a cohort of 35 000 women in the UK they were recruited 20 years ago and we've continued to follow them up for their health outcomes so for things like different types of cancers there's a latent period so we've collected their data at baseline and we're still getting every three months information about whether or not they've developed cancer so this chart is just a cross-section of the dietary information that we collected 20 years ago so you can see here that these groups to the right hand side the blue collar community city living countryside prospering suburbs constrained by circumstance typical traits multicultural these are 2001 output area classification groups which is the open source geodemographic classification released by the census so these relate to different colored bars on the chart the seven groups across the bottom are dietary patterns so these would derive from a cluster analysis of the food frequency questionnaire data that we collected from these women and the far left hand side the monotonous low-quantity omnivore these people eat lots of sugar white bread and chips and it's also the kind of poorest quality diet the next one along is the health conscious diet so this is the healthiest diet in these groups there's lots of fruits and vegetables and whole grains then we've got a traditional meat chips and pudding eater which is somewhere in the middle right on the kind of healthiness scale the higher diversity traditional omnivore which is healthier than the traditional meat chips and pudding eater because we've got more fruits and vegetables and whole grains and they've also just got a wide diversity in their diet we've got a conservative omnivore these people don't eat very much of anything we just have quite a repetitive diet but it's healthier than that monotonous low-quantity omnivore then we've got two types of dead vegetarians so these people I don't eat meat but the low diversity vegetarian has a much narrower group of foods that they eat compared to the high diversity vegetarian who has a full range of all different types of pulses and lentils and fruits and vegetables where you can imagine that the low diversity vegetarian has lots of kind of jack-of-the-peas and baked beans which is exactly what I did as a teenager being a vegetarian so we can see these seven different dietary patterns and I just want to highlight to you for this really unhealthy dietary pattern which is actually the cheapest dietary pattern as well there's some work that we've done that shows that this is half the price of the health conscious dietary pattern for people to consume per day and the groups that have got real spikes there is the blue collar communities and those constrained by circumstance so these are the people who are working in manual jobs in the blue collar communities and constrained by circumstance includes people who are living in public housing or all people's homes and in an area that because of their circumstance is constrained so they're the two kind of least affluent groups in this classification and they're having the poorest diet if we then look at the health conscious you can see that it is the least popular diet by far but within that and within it there's less of a range between the different groups you can see that those in city living and in the countryside are having a high percentage of them are consuming this dietary pattern and all the differences between these groups are highly significant because the numbers are so big in the sample and then the last thing I just want to point out is that the low diversity vegetarian group the highest percentage consumption of this is in multicultural areas and also in the city living but what's interesting in this cohort is that you might well expect that multicultural people be having a diverse vegetarian diet because of cultural beliefs perhaps but in this cohort 99% of the women are white and the ones the 1% who didn't who weren't white didn't report their ethnicity at all so it's not that the women who these diets are themselves of multicultural origins but they're just living in areas that are highly multicultural and it's showing in their dietary patterns which is particularly interesting when we think about that environmental composition in that ecological model which I showed earlier on and then if we think about the weight status in adults I just want to highlight these two columns so again across the bottom we've got those output area classification two demographic groups then we're in the colored bars we've got the percentage of obese overweight normal weight and underweight and in that blue collar communities and constrained by circumstance we're seeing a much higher percentage of women who are overweight and obese so although that that geodemographic classification in this case the output area classification is 41 different variables from the census none of them had anything to do with um diet or weight and health and yet we're pulling out these really key differences when we combine them with a cohort information that's collected that information on diet and weight status so then to move on to the children you can see that um we just want to think about before we start that geodemographic classifications are derived primarily from data collected on adults so just want to think before we start about whether or not they're going to be sensitive to the behaviors of children then to move on to the methods so the work that I'm showing you is an example of combining different data sources to add new insight into real world problems it's a cross section analysis of two different data sets and I'm going to present some descriptive statistics and results from a regression analysis so the three different data sets that I've used the first one is cameo data from core credit information group this is a commercial geodemographic classification it uses the postcode of the people in my two cohorts and links it to a cameo group and I've just used the 10 levels at the the 10 groups at the top level of this classification and these are the 10 groups so first of all I've got the business elite then prospering professionals flourishing society and you can read on to the bottom there's there from one to ten it's a approximately decreasing on the social gradient with those who are less affluent in the lower groups around like cash conscious communities on a budget and family values with the business elite and the prospering professionals being the more affluent but this is much more than just an index of deprivation because it includes a whole wealth of other data so then the second data set is the one which contains information about children's diets so for the three search I've thought I've gone around and effectively borrowed data sets from colleagues who have used collected data for a different purpose and then used it to profile it by the geodemographic groups to see whether we can gain additional insight so this is a secondary data analysis from some data which was collected on a national institute for health research project in collaboration with the royal horticultural society and the data was actually collected as part of a school gardening intervention project and there's a reference at the bottom of this page if you want to go look at the study itself but it was a randomized control trial where they split different schools into different groups to for them to obtain a gardening intervention delivered in the school one delivered by the royal horticultural society or not getting a project at all and they wanted to see whether if children grew their own fruit and veg didn't increase their fruit and vegetable consumption and at baseline they collected information on the children's diets and then there's that again a follow-up and it's that baseline information that I'm using in this research so the children's dietary data was collected using a validated tool called cadet a children the child and diet assessment tool so it's a type of questionnaire where we ask the children what they've had to eat and when they've had different types of food the children in the sample the cross-sectional sample I've used there's 873 of them they were all based in London and the data was collected between 2010 and 11 it was 873 who provided complete home postcodes and that's the ones that I've used for this analysis and then the third data set is the data set about childhood obesity so this is some data which is collected by a colleague as part of the rugby league and athletics development scheme which was a collaboration between Leeds City Council and Leeds Becket University and my colleague collected body mass index data from over 14,000 children and all with a valid postcode and the data was collected between 2005 and 2007 whenever I present this I get really impressed because these 14,000 children were all measured by one person, Clio Griffiths as part of her PhD so that is a lot of measuring of children so that's the three different data sets that we're going to talk about so I've linked using the postcode the cameo ID into both the diet data set and the obesity data set and then here are some of the results so you can see here that the highest in this table we've got the number of children in each of the groups I've also you can see at the bottom I've got a line for total which is 873 who had complete postcode and then the whole of the cohort was 1554 so there is a slight difference between those who provided a postcode and those who didn't but the results above are just for those who have provided a postcode and those in the prosperous professional groups they had the highest number of portions of fruit per day compared to those with family values who had the lowest which is 1.7 portions of fruit compared to 3.6 so that's having more than double portions of fruit then if we looked at the portions of vegetables the highest consumption of vegetables of 1.3 portions per day was in the paying the mortgage group compared to the lowest being in the family values again so already we can see there's a different sort behavior here around vegetables compared with fruit which is consistent with other research and now this is the results of a regression analysis where the baseline vegetable consumption is the independent variable and the 10 cameo groups the dependent variable so all the models I'm going to show you are controlled for clustering by school and the plot is showing you the coefficient of grams per day of vegetables with the 95% confidence interval in the error bars the red line in the chart is the reference category which is that business elite group so all of these are showing compared to the business elite have they got higher or lower vegetable consumption so while you're all looking at it and thinking there's any difference there's no difference all the confidence intervals are crossing zero there's just no difference across any of the groups on vegetable it looks as though the family values are having the lowest vegetable consumption compared to the business elite but we can see that that's about 40 grams per day of vegetables which is half a portion so when we're controlling for the differences by school we're not seeing any difference so if I move on to fruit and this is showing more of a trend so this is the same as before where we've got baseline fruit consumption in grams per day by cameo group with business elite as the reference category on in this one we can see again all the confidence intervals are crossing zero and that there's no significance between the different groups but at this time there appears to be a trend in that compared to the business elite and then the prospering professionals everyone's having slightly less fruit but nothing significant going on if I move on to weight status we're starting a different picture so if we look at the business elite group we can see that there's 17 percent of the children who are either overweight or obese and then if we compare that to these cash consensus communities we're looking at 28 percent which is more than 10 percent more children who are overweight and obese and also just want to highlight the on a budget group so they've got 26 percent overweight and obese and also the highest amount of underweight so using this geographic classification to profile weight status we're starting to see that certain groups really do have more of a problem than others never do this is the regression results using BMI standard deviation score so to measure overweight and obesity in children they we use body mass index which is way over height squared and then the standard deviation score about that because the children are changing ages it takes into account age and gender and then the BMI categories on the previous slide they use the international obesity task force cook points to define the different groups of weight status whereas in this regression analysis here I'm using the continuous BMI STS score again comparing to the business elite which is the red line and you can see the coefficients with the confidence intervals about them in the dots and the in the bars so here compared to the business elite everybody has got a higher BMI STS score and we can see that as you come down the chart with the exception of the two groups at the bottom they're becoming more the becoming you're seeing that there's a more more likely to be have a higher BMI STS score and only the first two groups have the confidence interval crossing zero the borderline significant results where all the others are significantly higher BMI STS score and just to show you the actual values for the regression analysis here we can see that for the cash conscious communities the coefficient is not 0.35 which is 0.35 standard deviation scores higher than the business elite you can also see that the first two groups are significant to the 1% and all of the others are significant that 5% so these differences are key and there are also important differences with regards to if children could all reduce their BMI STS score by 0.15 you'd be seeing clinically significant differences across the population so I just want to just conclude that little part of my talk to say that from the analysis that I've done diet and children doesn't appear to vary by a cameo group when we use the fruit and vegetables as indicators but weight status does vary and quite significantly and then for the last part of my talk I just want to discuss some of these results and I'd like to make that quite interactive so I'm going to use some more polling questions as well to see what to sort of pull apart some of these results that I've presented and hopefully to generate some questions that you might want to ask a bit later on so if you could answer some of these questions for the poll okay so you've got a few options here for what you think might be is the main cause of obesity in children given what I've just shown you now so if you could have a look at these and cast your votes that would be great I'll just leave it a few seconds for anybody that's been snoozing slightly to have a look at the questions and see what you think you're doing well 70% of voted already so thank you all very much I'll just leave that a couple more seconds all right okay so I'll just I'll share those results so that you can all see them but you can see there that 46% if you think that the parents socioeconomic status is the most important with 31% saying diet and then 12% physical activity and the environment so that's quite a spread which I think is probably quite typical of some of the evidence as well and that different people do think that different things are important okay we'll just hide that and move on slightly so on that then do you think that we need to investigate further the impact of physical activity and if you could all vote hopefully you should have a yes or no options up there okay 83% of people have voted I'll just leave that a couple more seconds all right okay so 72% of you think that we need to look more into the physical activity as well I completely agree okay so my next question is so given that the geodemographic classifications are based primarily on adult data do you think that's limiting the results that we're seeing for children okay 60% of people have voted I'll give it a few more seconds right okay so 83% of you voted 52% have said yes and 48% have said no so I can just share that up so again there's part part of me that thinks you know if we had if we were able to get that sort of data specifically about children to generate a child geodemographic classification then you'd think we would get more accurate results but when it's about diet and obesity it's such a familial kind of behavior in that there's one shot for the household you're likely to be consuming and behaving in the same way as your family but actually that adult data is probably possibly a better indicator than that specifically for children anyway so it's you can completely see why the audience all of you are split almost exactly halfway I think I've just got one more question for you now okay I've got one more question but they're not slide for it just on the poll so I'll just hide that and ask you the last one this one is a bit more about dietary assessments so do you think that the problem is actually to do with the measurement of the diet can you all see that slide yet people are voting now so do you think that the reason we're not seeing differences in diet by geodemographic classification is actually because we've not measured it very well or the children haven't reported it very well or a combination of all of them okay so 60% of you voted all right okay so we've got 87% of votes brilliant okay I'll just close that now so 77% of you are saying that you think that it's the limitation of dietary assessment and you might be very right with that the cadet tool that was used for assessing diet is the best tool arguably the best tool available in the UK for collecting diet on children but maybe as we're moving forward with different types of data we should look at things like sense cameras pictures taken of the children's food and analyzing that instead and there are lots of different types of new ways of measuring diet each with their limitations but also with the benefits so to kind of recap then on the things that I would have just discussed with you so you could argue that diets not the problem some of you said that socioeconomic status and physical activity is a bigger problem and again the reporting and measuring of diet is potentially the problem with the fact we're not seeing anything in diet so if we can explore new ways of collecting data about diet we need to look more into physical activity and or sedentary activity in children because in the study that I've shown you it doesn't take into account physical activity against what the children are eating and I think that while it might be nice to have a child geodemographic classification given limitations with data and the probably only a small gain by having a child specific over the adult specific classification it's not something that we really need to pursue straight away so the next steps for me with this research is to look at physical activity and sedentary behavior and to look at purchasing records instead of self-reported dietary measures and try to explore how consumer records link to health data so this could be store-loads, cab purchases or collection of till receipts or whether when people are using preloaded food cards in kind of work or university type environments so watch this space so I just would like to acknowledge some of my colleagues who collected this data so Dr. Collegra first from Leeds Beckett University collected the data from the RAD study which is the obesity data and Dr. Megan Christian at Leeds Beckett University again but formally at the University of Leeds collected the data from the children's diets as part of the Royal Horticultural Society and thanks to Core Credit Information Group who've provided the cameo classification for this research and then before I open up for questions just want to use this to plug a little bit of work that I'm doing in the ESRC strategic network for obesity so we've set up a strategic network for investigating how we can use big data and new forms of data to understand the environment and how it affects our influences obesity so if you go on the consumer data research centre website in the research area you can find all the information about this network which I'm the director of or you can follow us on Twitter which is at obesity underscore network or you can follow me at Dr. Shalem so I'd like to thank everyone for attending this webinar I hope you found it interesting and thank you for interacting with us using GoToWebinar and thank you to Michelle for telling us about your search. Bye-bye!