 Okay, I think we're coming up to time for the second set of parallel streams. So if you have a question, put it in the Q&A. If you want to make a comment to some or all of the people here, then put it in the chat. So the first paper is quite a big collaboration. So the three people involved are James Miller, who is a senior researcher in the Scottish Government's Office of the Chief Economic Advisor. He's had policy roles across the UK in DWP and Social Security. And he's project lead for this work stream on social returns of education. Zoe Mackay is also contributing to the research coordinator at Skills Development Scotland. And Gilliam Wiley is an evaluation research executive within Skills Development Scotland. So I'm going to pass over to them to present. You've got about 20 minutes with five minutes for questions. Can someone let me know if that's showing his full screen? That's good to me, James. Okay, well, good afternoon, everyone. So to start us off, then, I'd just like to set a bit of context for the work. So in Scotland, education is a devolved area. So that results in a delivery and funding environment that differs a little from that elsewhere in the UK. Educational funding in Scotland is delivered through investment by the Scottish Government and also to enterprise and skills agencies. So those are Skills Development Scotland, which is a national agency that is responsible for delivering things such as careers, skills and training services. And then also the Scottish Funding Council, who funds Scotland's further higher education institutions. And the Scottish Government and both of its agencies work together to support Scotland's education and skills system. And also to collect data and evidence to show that education funding is contributing to its various policy goals. And that's really what today's presentation is going to be focusing on. How we evidence the impact of education in Scotland and how we determine whether the funding from government and its agencies is contributing towards various policy goals and how we've been using the APS to help us achieve that. So in recent years, there's been growing policy and ministerial interest in developing a more comprehensive understanding of the outcomes from education and skills investment. And this ambition has been reflected across various policy documents from the Scottish Government over the last few years, such as our 15 to 24 learner journey review and also our enterprise and skills review. And it was recognised that the evidence base that's available to policy makers around the impact of education and skills investment could be better. We currently have a fair, slightly limited understanding of the economic impact of skills acquisition. And we do collect some evidence around non-economic outcomes from education for particular learner groups, such as modern apprentices. But large evidence gaps remain across other qualifications. And in general, we lack that comprehensive understanding of the total returns of education investment. And this growing demand for evidence on the return from education investment also sits alongside another area of interest, which is well-being policy. And the general move towards incorporating well-being as a key part of policy development in Scotland. So to address some of these evidence gaps, the enterprise and skills strategic board was created a couple of years back. And we refer to that as the ESSB from now on. And the ESSB's objective was to align and coordinate the activities of Scotland's enterprise and skills agencies in government. And one of the metrics that ESSB's efforts will be judged against is Scotland ranking by the Organization for Economic Cooperation and Development on various outcomes, including equality, well-being, sustainability. And it currently sits mid-table for many of these. And the ultimate ambition of the ESSB is to develop that comprehensive understanding of the return of education investment, not just to the exchequer and to the economy, but also to the individual and to society as a whole. And in doing so, allow for more targeted policy interventions. So that's a brief summary of the background to this work. Now just to take a quick look at the breadth of the education and skills system in Scotland as it stands. So across the various universities and skills agencies, Scotland invests around £2 billion per year in education. And that goes towards supporting 34,000 modern apprentices, 120,000 college students and 140,000 university students. And the question then is what is the impact of this investment on our various stakeholders who you can see on the right there, be that society, individuals, employers, and also the universities and colleges and apprenticeship offers. So our project takes a dual approach. We have two work streams operate in parallel. The first one is what we call the economic work stream. So that's looking at the various economic returns. And then the second one, which will be the focus of today's presentation is our non-economic work stream, which looks at returns to individuals and to society at large. The project, which began in 2019, set off with an ambition to understand all of the non-economic outcomes to learners in society. And we identified some of those outcomes through initial engagement with learners themselves. And those were things such as confidence, improved health and developing met skills. And we would do that by carrying out a social return on investment. The NSROI, which is similar to a more traditional ROI, aims to identify, measure, and eventually value in monetary terms the non-economic returns from a particular policy or investment. As COVID-19 took hold, it placed not only physical restrictions on face-to-face research, but also posed an ethical challenge as to whether we should proceed with the work. The entire education landscape was, of course, changing very rapidly at that point. Learners were experiencing a lot of uncertainty around their futures, so it was not clear that it would be ethically sound at that point to ask them about their futures. And of course, we could also see responses to the research heavily influenced by COVID and no one knew at that point whether the changes that had taken place as a result of COVID would be temporary or permanent. But as a result of that, a step change took place, and the decision was made to pause the primary research and to temporarily narrow our scope to look at a measure we could evidence using existing secondary data, and that was wellbeing. And actually where the annual population survey comes into the picture, the most readily available measure of wellbeing within the secondary data sets that we had access to within that time period was the APS. And the APS, of course, measures wellbeing using the four ONS wellbeing questions, which are listed at the bottom of the slide, which cover anxiety, happiness, satisfaction, and how well individuals feel their life was worthwhile across an 11 point scale. So this slide gives an overview of the data that we used as part of phase one. As I said, the APS was the primary data source, and that gave us a sample of just over 59,000 respondents. And there's a couple of caveats worth noting for this data set. Primarily, that it was not possible to say for certain whether the respondent had completed their post school education in Scotland or elsewhere in the UK. So we had to use two rules to identify who to include within our sample. Firstly, if a respondent had any form of Scottish school level qualifications, then we assumed that they would also complete their higher education in Scotland. And secondly, if a respondent had no qualifications at all, then we would include them in our analysis if their current residency was within Scotland. So as I said, those rules gave us a sample size of around 59,000 respondents. We analysed those respondents based on three different levels. That is whether they did or didn't have a qualification. The second level was whether they had a school or post school level qualification. And then the third level, we broke down qualifications into individual groups that we had sufficient sample sizes to analyse. And also it's worth saying that we complemented our data analysis with two additional surveys. So that's the apprentice wellbeing survey and the graduate outcomes survey. And these allowed us to look at wellbeing returns from two groups that are a little bit harder to isolate within the APS. But all of the data that I'm going to be covering today will be from the APS. So now taking a look at our APS sample, there's a couple of things to be mindful of as we go through the results. So we can see on this chart here, most of our respondents had some form of qualification. However, around a fifth of our respondents had no qualification at all, which is slightly higher than what we would expect based on Scottish education data. And then if we have a look at the age profile of the sample, it tends towards older age groups with over half of the sample being aged over 50 and around 80% being aged over 35. And when we consider that alongside the second table up there, age finished education, which shows that most people finished their education before the age of 24, we can assume that the majority of our respondents didn't acquire their qualifications under the current education and skills system in Scotland. And then finally, there are some groups which are poorly represented or not represented at all. So these include ethnic minorities, so white respondents were slightly overrepresented in our sample, and also certain groups of learners that I previously mentioned that we can't isolate within the data, and that includes college students, adult learners, care experienced people, that sort of thing. And understanding how we can incorporate such groups will be part of our future phases of our research. And then it's also worth saying that all of the data we're looking at is of course taken pre-coronavirus, so it doesn't take into account any changes to education since then, such as a move to online learning. And of course at this stage, we're only able to look at correlations with identifying causal relationships will be a future phase of the work. So moving on to the findings then. So we started off by looking at the education and wellbeing relationship more generally, and this slide shows the relationship between those who have no qualifications and those who have any, and we can see that those who have any qualification report higher wellbeing across all four wellbeing questions. And then again, when we break it down by level of qualification, we can see a slightly more mixed picture with not a huge difference between those who have school or post-school level qualifications, but the difference between those who have no qualifications and any remain. And then finally, when we break it down by qualification type, we can see a slightly clearer picture starting to emerge, whereas as the SCQF level of the qualification increases, then in general, so to do reported wellbeing scores. And we can also see looking at the red line, which shows the Scottish mean wellbeing score, that almost all respondents reports above average wellbeing, with the exception of post-school non-university qualifications. So that's things such as trade apprenticeships who report slightly below average wellbeing. So as part of the work, we also carried out a regression analysis to better understand the qualifications wellbeing relationship. And a green box here shows a significant positive association, a red box, a significant negative, and a gray box, no association at all. And there's a couple of things worth highlighting. So perhaps unsurprisingly, those possessing any form of qualification, reported improved wellbeing, and for university qualifications in particular, however, that showed increased anxiety at over-school level. We also found non-university qualifications have few significant correlations with wellbeing, except for worthwhileness. And then finally, when we compare wellbeing outcomes with the general Scottish population, we found that in general, as you get higher and higher levels of university qualification that leads to higher and higher wellbeing. In addition to looking at the education and wellbeing relationship more generally, we also wanted to understand the benefits experienced by different learner groups, which could, for example, help us target investment towards learner groups who derive less benefit from education. And the first group we looked at was Bisex, and we found that for both males and females, acquiring any form of qualification was associated with higher wellbeing scores compared to those without qualifications. Also, both males and females were likely to report higher levels of anxiety if they had an undergraduate degree. And then finally, female respondents tended to report higher anxiety than male respondents, irrespective of their qualification level, but they also reported higher levels of worthwhileness. And there could be many drivers behind the patterns observed, some of which we touched upon in our Phase 1 reports, but a couple of explanations could be these differences reflect the difference in subjects studied by males and females. So we know, for example, social services and healthcare are more female-dominated areas, and results from our other datasets suggest that that's a framework that is associated with higher levels of worthwhileness. And also differences between subjects could also be linked to things such as labour market pressures and limited job opportunities within those areas. The second group we looked at was differences in learners across age groups, and there was slightly less differences here than by gender, but in general, older respondents tended to report higher wellbeing and younger respondents in those aged over 50 reported in particular higher worthwhileness as their qualifications increased. There were also relatively few statistically significant results for those aged 25 to 34, which suggests potentially other factors could have a greater influence on wellbeing for that particular group. And then finally, almost all age groups reported an increase in anxiety when moving from school to university or first to higher degrees. And then the final learner group that we looked at was disabled and non-disabled learners, and we found that whilst people with a disability reported increased satisfaction as qualification level increased, they still reported lower wellbeing scores than non-disabled respondents. In fact, those with a disability appeared to report a larger increase in wellbeing scores between qualification levels than non-disabled people across all levels of qualification. However, the findings showed that those with a disability still reported lower wellbeing than non-disabled respondents no matter what the qualification level was, and also consistently below the Scottish average. Again, there could be a couple of reasons why these effects are being observed. It could be that the causal effect of education is greater on disabled learners, or it could be having a qualification is more meaningful for people who are disabled. Again, further work in future phases will explore this in particular. But as I said, nevertheless, a gap in wellbeing remained between people who were disabled and people who were not, which raises an important question as to what the skill system can do in the future to support this. And that's all from me, so I'm going to hand you over to my colleague, Zoe, who's going to take us through some more findings. I fear there, Zoe. Yep, thanks, James. So we also looked at associations between wellbeing and other factors like degree subject choice. So in the graph on the left, we can see that respondents with a STEM degree reported higher wellbeing scores than those within arts, humanities, and social science degree across all four measures. And the table on the right shows the difference in wellbeing scores for those who study particular subjects compared with the Scottish average. So we can see here that those who studied subjects like arts, languages, and linguistics at university tended to report below average wellbeing scores, while those who studied things like business and financial studies, medicine, architecture, and dentistry reported wellbeing scores above the Scottish average for all four measures. Next slide, please. We also looked at the relationship between employment status and wellbeing and found that this seemed to vary by reason. So you can see here, those who were economically inactive due to disability had some of the highest anxiety and the lowest happiness scores. Next slide, please. We also looked at the association between wellbeing and income using the measure of hourly pay. So here we can see that those who earned more typically reported higher wellbeing scores, though with a slightly more mixed picture for anxiety. And finally, we found that those in receipt of any state benefit tended to report lower wellbeing scores. So here we can see that those who earned more typically reported higher wellbeing scores, though with a slightly more mixed picture for anxiety. So here we can see that those in receipt of any state benefit tended to report lower wellbeing scores than those who weren't claiming state benefits. And those who were claiming state benefits also reported wellbeing scores below the Scottish average for all four measures, which you can see in the table at the bottom of the slide. Next slide, please. Thank you. So to sum up our findings, overall respondents with any form of qualification tended to report better wellbeing scores and from our initial exploration of wellbeing and qualifications, we've identified a few main findings. So in terms of age, the most statistically significant results were found for the 35 to 49 age group, suggesting qualifications might provide greater wellbeing benefits at this age, while the low number of statistically significant results for younger age groups might suggest other factors are more influential. In terms of gender, females tended to report higher anxiety than males across qualification levels, but also higher worth-wildness. And in terms of disability, those with a disability reported poorer wellbeing scores than non-disabled respondents, irrespective of qualification level and consistently below the Scottish average. But the fact that respondents with a disability tended to report a larger difference in wellbeing scores between qualification levels could suggest that having a qualification is possibly more impactful for people with disabilities, but further work would be needed to look at this in more detail. We've also looked at the relationship between wellbeing and other factors, like degree subject and employment status with a few findings that I went through in the previous section. But just to sum up the results, it's important we recognise that, although we have identified some interesting trends, the research limitations to date don't allow us to establish a causal relationship between changes in wellbeing and gaining qualifications. But that's something we hope to be able to look at further in the next stage of the research. So I'll now hand over to Gillian, who's going to talk through some of our plans for the next phase of the project. Thanks, Zoe. Okay, so I'm going to finish off by covering what's happened next for the project as we round off phase one and look towards what we're calling phase two. So what we're planning is extending our approach to look beyond wellbeing to arrange other non-economic outcomes for both individuals and for society. So the blue diagram on the left shows the non-economic outcomes for individuals as a result of participating in education and training. And it includes things, so personal wellbeing, as James has went through, and also meta-skills, which is skills needed for the future. Fair Work, which is based on Scotland's Fair Work framework, includes things like job security and fulfilment, broadening our horizons, so that means people have a clear idea of what opportunities are available to them. And we've also included health, self-confidence, and relationship with others. And then the diagram on the right shows outcomes for society, which includes social attitudes, civic engagements, things like voting behaviours, volunteering, productivity, inequalities, because we're aware that education possibly entrenched existing disadvantage in society, crime rates, and health on a societal level too. So we've highlighted the outcomes in the encircles that we believe the APS will be able to help us explore in the next phase. So these were created following a literature review, where we found evidence of a link between these outcomes and education. And they were finalised based on their importance in current government policy, which were then subsequently validated by stakeholders through round-table meetings, and also a small number of engagements with current learners, and that was prior to the pandemic. So go to the next slide please James. Okay, so how can the APS help us? So the APS allows us to explore the different pathways that are important to our project by being able to split by a qualification type, and it's also got large enough sample sizes in Scotland for us to allow us to do this. It also contains variables that allow us to explore many of our outcomes of interest, and it's particularly true for fair work and health. And it also contains additional characteristics in the secure version of the dataset that allows us to explore particular groups in society, and also linked with geographical data, such as Scottish Index Multiple Deprivation, SIMD, which will allow us to further investigate the relationship between socio-economic status, education and social outcomes. Next slide please James. So how are we going to explore both the individual and societal level outcomes further? So there's a number of things that we're currently working on. We've produced an evidence review and logic model which maps all of our individual societal level outcomes to indicators, and then what secondary data is available for us to explore each of these indicators further, and we've also highlighted any data gaps. So through that evidence review, there was two main datasets that were identified, and that is understanding society and the APS. So we're currently exploring how a longitudinal dataset like understanding society might be useful for a project, and might also help us address the issue of causality that we've experienced in phase one. And we've also just got access to the secure version of the APS, and that will allow us to explore these outcomes by more characteristics than we had to be able to do for our wellbeing analysis in phase one of the work. We're also having regular meetings with our expert panel, and this includes subject experts from academia, public and third sector, and again really useful feedback on our approach of these meetings. We're currently conducting secondary data analysis and also making plans for primary data collection. So that's to fill the evidence gaps that we identified in our evidence review, but also hoping to use some more qualitative research methods to further explore the issue of causality that we haven't been able to address through secondary data analysis. So I think that's it from us. That was like James. So thank you for listening and happy to take any questions. Okay. Well, thank you very much. We might have a few minutes at the end if people have any more thoughts. Just keep popping them in the Q&A. Bruno's presentation is a video, so I'm just going to share the screen and put that on in a minute. Hello, everybody. Thank you very much for taking the time to listen to my presentation. Thank you to the UK Data Service for having spaces like this, especially for students like me. Having the chance to present at work, I think it's the only way forward in our own research, having the chance to listen to feedback and comments from people that are more senior than us. So thank you very much. Right. Let me share my screen and let's start with the presentation. Right now, you should be able to see my screen. I'm going to be presenting my work on inequality of opportunity in the United Kingdom, a supervised learning approach, more specifically, a conditional infra-industry. And now let me say before I start, but this is something, this is work that I submitted for my dissertation. My master's of research dissertation here at the University of Glasgow. Almost five months ago. This is not something that I'm working, exactly something that I'm working on for my PhD. However, please, if you have any comments, if you have any feedback, I would be more than happy to receive them. So there's my email if you want to like to drop me an email or say anything at the end of the presentation in the Q&A session. So it's been a challenge to go back to and see what I've done five months ago. But I think it was a really interesting experience. Okay. Okay, a brief overview of how I'm going to structure my presentation. Firstly, I'm going to tell you what's my motivation of my research in general. Then briefly introduce the theoretical framework and the methodology that I use for these empirical work. Afterwards, I'm going to present the data and the implementation of the algorithm. I want to try to be as concise as possible with the algorithm. But again, if you have any questions, please feel free to let me know and I can address them at the end of the presentation. Finally, getting to the results and what's the opportunity structure that I found for the UK. And again, briefly commenting on the performance of my model. So the motivation of my research in general and I think most people would agree with me, stands around this sentence. So inequality might be one of the most important issues of the 21st century. Inequality in any of its forms, so either income, wealth, education or even healthcare, I think is one of the most relevant topics to be working on nowadays. If we look at the trends of global inequality, as most people would know, the between country inequality shows a decrease in the trends. And this is mostly explained by very poor countries like China or Vietnam growing at much faster rates than other regions. However, the within country inequality shows a very different trend and obviously it depends on which country we're looking at. For example, for countries like the US or some of European countries, the within country inequality has shown an increase. Now, specifically for the UK, the dynamics again are very different depending on what we're looking at. So if we look at the net household income, so this is income after taxes and transfers, we see that there's indeed a decrease in inequality. However, if we step backwards and only see the individual gross income, particularly earnings, so that's the income for the employed, we see that actually income inequality has seen an increase in the past 20 years or so. Now, this is in an actual what's motivating this particular work. This particular work, sorry. So this is something that I like debating about with many people that I know. To what extent is the idea of meritocracy valid? So we have people arguing, okay, I am where I am today because of my efforts and that's totally valid as well. However, do we need to step back and look at the other circumstances as well? Like where you were brought up? Where were you born? What kind of household you had when you were 10, 15 years old? So the question is, do circumstances matter for the different attainments? Either educational, occupational, or income. So the recent question for this work is are individuals in the UK constrained by circumstances? If so, which are the most relevant? And does income inequality translate into income inequality? Now, I'm going to go back to these recent questions at the end of the presentation and see if we were able to address them or at least hint and shed some light on them. So the theoretical framework that I use for this work is a joint-rumours approach to equality of opportunity. Now, the main idea behind this is that people's attainments depend on both efforts and circumstances. So what an individual is actually held accountable for will depend on these two. Now, I know that this discussion is very philosophical so it's very difficult to draw the line and to actually identify what an individual is held accountable for and what not. And I think this is one of the main drawbacks of the theory. However, let's just take this framework and see what we can do with it. Now, the whole idea is that so Romer puts forward a principle of levelling the playing field. So let's define a group of individuals with the same circumstances as types. Romer argues that we should be as a society compensating individuals that come from more disadvantaged backgrounds. So from very low circumstances backgrounds. So in this sense, of course, for an individual what matters is how much level of effort they put into their activities. However, for a society to compensate what matters is the degree of effort. So let me just put an example to identify what Romer is talking about. Let's say we have two different types of individuals in the society, type one, and let's say I can actually measure effort on a scale from one to ten. So if type one of individuals present a distribution of effort that goes from let's say one to six with a median of three and then we have type two of individuals with a distribution of effort that goes from three to nine with a median of five. Now for the society to compensate we cannot just look at the level of effort and say okay, people that were actually giving a level of effort to five should be getting the same rewards. This is not correct. Why is this not correct? Because obviously for someone of type one being much above the median effort within the distribution signifies a lot more of effort than from type two. So this is just a brief comment on why we are looking at level and degree and not level of effort. Now as I said before the limitation of the theory, first of all is that again it's a largely normative approach and so it's very difficult to actually well again measure effort measure circumstance and so on. And obviously this is not a whole complete theory of distributed justice because it is not telling us anything about what are the circumstances that matter for society and how to compensate for them. So again I'm going to just take this framework as very normative and then ideally with this work I'm just trying to identify what are the circumstances that the data is showing me in the days of the UK. Now the methodology for people that are not very acquainted with regression trees is just a type of supervised learning. It is supervised because I'm going to be inputting covariates to my algorithm and the whole idea is to predict what the outcome for an individual is going to be. Now I'm not forecasting unobservables what I'm doing is basically predicting my sample into the training and the testing data sets I'm fitting my algorithm in the training sets and then I'm predicting the outcomes in the testing sets. Okay So ideally I would like to divide my sample into non-overlapping groups so this is a whole idea of regression trees. Now the way I can actually divide my sample and make the splits varies one would typically look into minimizing the mean squared error however I'm going to be working with conditional inferences trees which is more statistical in nature and if you have a look at the bottom of this slide what I'm basically going to be doing is testing partial hypothesis for distribution independent so I'm going to take a covariate condition my distribution and test if that distribution is independent from the unconditional distribution dy Now the way I can test this varies as well and I'm going to go back to this when I explain to you the algorithm later on So for this work I used the label for survey hence why I'm here representing my work and I used the third quarter corresponding to July September 2019 why did I use 2019 that was the latest dataset that I had before Covid hit I didn't want Covid to get involved in my estimations and I had to use the third quarter people that worked with the label for survey should know this but because I have in the third quarter I have variables on social mobility Now most of my work is following very closely a paper by Bernoulli et al 2018 it's a working paper from the work back so most of the things that I'm going to be doing here and all the training of the data the working age for individuals from 30 years old to 59 are going to be subject to what they did in their paper Now this is a much broader paper compared to mine they made an analysis for the whole for many European countries they used a dataset called SIOC which has homogeneous data for the European countries which I believe was fed by another service called Family Resources in the UK so instead I'm going to be using so just focusing on the UK I'm using the quarterly label for survey the target variable that I'm going to use for my analysis is the average cross hourly pay which is as we know calculated by the ONS depending on the respondents by the individuals that actually complete the survey I'm going to be using 99.5% of the distribution just to not deal with our liars and I'm left with 6,853 observations so even though the LFS has more than 80,000 observations because of all the trimming and because I'm working with the individuals working age I'm only left with 6,000 or roughly 7,000 observations which is still a very good one so I'm just going to move forward with the presentation now this is how the distribution of hourly wages look like for that period that I analysed now what I did here was rounded up to 2 decimal places and then bin them in intervals of 50 pence so if we have a look in this first interval here I have people earning from 9 pounds to 9.5 now the distribution looks like something that we would expect most observations are around minimal wage the median is just about minimal wage and then it's right skewed as again we would expect so that's not the median the dashed line is mean for the algorithm what I did because I didn't want to use this very spiky distribution I transformed my track target variable into logs so all the analyses that we're going to see forward are with hourly wage logs of hourly wage so the covariates that I picked for for the analysis are both so given both because of other empirical works on opportunities have already used them and also based on the theoretical approach that I had so Romer's approach to inequality of opportunity so I have gender I have some demographics like nationality, ethnicity and religion and then a very interesting one that I found for the UK which I believe I'm the only empirical work working with this variable is the number of GCSEs a child takes at the age of 16 so for those that are not very acquainted with the education system in the UK GCSEs are exams that children take at the age of 16 which for they prepare for roughly two years so they pick them at the age of 14 and they work for two years and 66 exams at the age of 16 I think so GCSEs are English in Scotland I believe they're called national fives but they're still considering my data sets under this variable now five of them I believe are compulsory so all of them have to so children in the whole country have to take at least five GCSEs however we have some children taking more than 10 so they remind that I'm working with the number of GCSEs and not the performance on GCSEs to capture circumstances again why did I why I picked GCSEs and not A levels is something that we can have a chat about at the end of the session or you can drop me an email and we can discuss it as well then we have health condition as a variable which is a long lasting health condition and then three different variables to capture social mobility we have household composition, main earner and occupation of the parent I think it's a parent but actually it is father now unfortunately I didn't count with education of the father but occupation of the parent is what I have so I'm going to be working with this four minutes so the distribution of this is the distribution of hourly wages in logs for males and females right so the blue line is males the orange one is females and that's the difference that we have in means I'm going to do the same analysis for GCSEs a number of GCSEs orange line describes people that take fewer than five GCSEs at the age of 16 and then the blue line people that took more than five and this is the distribution of hourly wages decomposing by the occupation of the parent so if your father was a manager and professional you would be in the blue distribution if your father belonged to a different occupation you would be in the orange line so essentially what I'm going to be testing here is if I'm going to be testing statistically in a recursive way if these distributions are indeed different so the algorithm works us forward and as we always do we will choose a significant level of alpha and then the maximum depth of the tree so how much splitting I would like my tree to have following that I will test the null hypothesis of distribution independence I'm going to have this set omega so set omega is a set of all the possible realizations of my covariates X so let's say my set of covariates is only gender and occupation of father being professionals and managers then my set omega would be gender male gender female occupation of father managers occupation of father professionals so I'm going to be testing the distribution independence of every of every element in that big set omega I'm going to pick the p-value of both tests and then I'm going to pick the one with the smallest one now if p-value is greater than a significant level of if I've already reached the maximum depth of the tree then I'm going to exit the algorithm if not I'm going to pick that variable as a splitting variable now the next step would be checking where to draw the line using that splitting variable so for example for occupation of father once I pick that occupation of father relevant covariate to explain the variance in hourly weight now I need to decide where that threshold is going to be drawn so it's going to be drawn at managers is it going to be drawn at managers and professionals managers, professionals and technicians and so on so that's what I do in this step number four if again that new p-value is greater than my significant level then I'm going to dismiss the split and exit and then otherwise I'm going to keep it and I'm going to repeat the steps until I reach the maximum depth of the tree now this is an opportunity structure that I found with the UK something that I forgot to mention, let me quickly go back to step number two the way I'm going to be testing for hypothesis for sorry distribution independence and variance as well so in one case I had detesting so testing for differencing means the second part of my paper I try called Mogorov which goes beyond the mean and test the independence on the whole distribution now the results that I'm going to be presenting here I best are based on detesting so what's shown in ellipses are the variables that my algorithm picked a splitting variables and variables worth mentioning because of they explain the variance in hourly weight now what's included in the branches of the tree is the threshold that my algorithm drew for the variables so in these cases it's just binary options so it's either male or female running out of time same with GCSE it's either more than 5 or fewer than 5 but for some variables I have different splitting thresholds and then finally the square boxes here are hourly wages so the average hourly weights within that grip and the number of observations for that subgroup so if we have a look at the opportunity structure in the UK the first thing that we observe is that for those taking fewer than 5 GCSEs at the age of 16 long in health conditions and nationality seem to matter interestingly only for females on this left hand side of the tree occupation of the father seems to be a relevant variable that explains the variance in their income if we have a look at men so we hear men with long lasting health conditions that took fewer than 5 GCSEs they would be earning on average 15.15 pounds an hour which is greater than 15.04 which is how much females that took more than 5 GCSEs at the age of 16 so allegedly more educated however these group of females have parents that come from more disadvantaged backgrounds because the professions are not managers, professionals or technicians and then the most disadvantaged group in my sample would be females non-british females that took less than 5 fewer than 5 GCSEs at the age of 16 these subgroup earning on average 9.49 pounds per hour which is less than half of what my most privileged group of in this sample are earning which is 20.21 and these are men that took more than 5 GCSEs at the age of 16 now this is how the distribution of errors looks like what I did here was predicting as I was telling you at the start of this presentation telling you the I predicted the outcomes for the testing sample and then you compare that to the true value this is how it looks like all of them seem to be this is deviations from the actual value from my predicted value again everything is in logs but it seems to be around 0 so the differences are not that big and finally I'm going to wrap up because I'm running out of time I think although there's so much more to be done I know there's so many endodidate issues around this that I did not deal with there's so many fixed effects that I did not deal with I think identifying the number of GCSEs and occupations of the father asking circumstances is not trivial in terms of especially social mobility now with respect to the methodology I think I had a view out of sample accuracy of the estimations given the sample size was roughly 7000 and the non-comprehensive set of covariates in the data I would have liked to have much more variables to predict and to use in my algorithm however I didn't but I think overall if it was not that bad I'm going to rough it up here because otherwise I'm going to extend too much thank you very much I'm going to stop sharing okay so thanks very much see you all thank you very much all you especially speakers thank you okay bye thank you bye