 Okay, so hello everyone, welcome to day two of this introduction to quantitative time diary analysis, short course or workshop, jointly co-organized by the UK data service and the Centre for the Muse Research. So UKDS is the training branch and it's based at the University of Manchester and CTUR is based at University College London. My name is Pierre Walterie, I am research associate at UKDS and also research fellow at CTUR. So I am going to hopefully take you through some more exciting demonstrations of things that we can do with time diary research, time diary data, sorry. So quick reminder, last week we covered three main topics. So the first one was I tried to sketch the origins and milestones of time diary research or at least some of it. And then I presented and I spent some time showing how time diary data and most diary surveys are structured, what is the data structure and survey design. And then the last thing we do is we started putting our hands to work with our estimation of duration of activities and the probability of engaging in activities using the MTO. I saw the multinational time use study. Okay, so now what are we going to do today? Today I, apart from this recap, I wanted also to maybe start with a second exercise similar to what we did last week, but with another type of data looking at travel this time. Then we will look at a number of important issues. I mean, for any survey research, but in particular for time diary research, which is about data quality. So what can we, how can we diagnose the group that we are working with good quality data and if there are issues, how can we deal with them and also in fairness and waiting. And then we spend a significant amount of time looking at time programs, which is a way of visualizing activities throughout the days. And then further topics will be about sketching how to use multi-modeling techniques that some of you may already know such as regression with time diary data. I will just cover this in brief due to time limitations. And I would also, in the last part of today's will be spent maybe discussing and exchanging about opportunities of research with time diary database, maybe on your own ideas and interest. Okay, so I'm stopped sharing just for a second and I wanted to ask you, there are, yes, Q&A, feel free to as remind that kindly by Emma to ask questions during the presentations in the Q&A box and I will try to catch them as you put them. But before I start and we delve into this practical, I wanted to ask if anyone had questions or comments based on last week's workshop, so feel free to switch on your camera and unmute yourself and ask questions. We are not that many today anyway, so I think everyone, we can just manage this flexibly. Okay, now I am going to share in the chat. Yes, I'm going to share in the chat the link to the Dropbox folder. I will start with a second example. So just in case it's not clear enough, if you have not downloaded the data yet, please download it from the link I've shared before. I'm going to go through and demonstrate some coding, so this is from the demonstration from the decoding is contained in the workbook 17 HTML file, which is also in the Dropbox folder. And if you open it with any browser, once you have done it, you'll be able to follow what I am going to demonstrate now. Feel free to shout for help if you have any problem while doing that. Okay, I've shared a screen again. So I thought it would be interesting to basically try to do the same type of computation as we did last week, but this time looking at another type of data. And which is and also maybe not just looking at the main activity variables, but also some background diary variables as you're going to see. So reminder, how does it work so I am assuming or if you want to. You can work alongside this demonstration by having our studio open and then copy and paste the way I'm doing it here, the syntax from the HTML workbook into your own studio, and then you will see, you will see it working for yourself on your computer. Obviously you will need as we have done before to change the working directory here that you are using on your computer. Okay, so we start as we did last week, first by loading the necessary. R packages so the player for data manipulation alongside tidy R GG plot to for nice plots haven for importing SPSS and state data set and there will be a couple more packages that we need and load further down the line. Okay, so once we set the we have set the working directory here, we can just like we did last week, load respectively the episode file, which I chose to store in an object called app. And also, I removed a couple of variables that we are not going to need to keep the data frame as small and tidy as possible. And then the day level data that is contained in a separate data set so empty US teach in the DTA. So that with these two data frames in memory we have all the data we need for today. So this last week, we can also create the study variables, you could argue that I could merge everything and work with one date set which is possible but I prefer to keep things separate so which as a consequence means that I need to create the study in both data sets, as I will use it later to merge some of the data. Okay, as I was suggesting, we are going to work with modes of travel, which is an interesting features of time diary research, especially given the growing interest. The current stress about sustainability, because we can follow the way people travel for different reasons. And we also have records of the mode of travels, the way they choose to travel when they do. Whether it's by car, by walking or by public transport, for example. So in the empty US and I'm going to show this quickly. In the empty US the travel is coded this way. If you look at the categories here, ranging from main 62 to main 68. So this is a code which is presented as if these were separate variables, why they said but in the long format empty US that we are working with, these are simply code so if made a main variable is coded 62 it refers to this one and 63 to this one and so on and so forth. And see we have reasons of travel, whether it is from travel for work, education related childcare related, etc, etc. And there's I was looking for description of the modes of travel. I will show it later because it's not in this document I think. I will show that one a little bit later but they usually, they include travel by car, travel by other means. This is the M Trav variable in empty US. Actually, if I do a variable search, maybe I could start. So this is the, these are the mode of travel that are recorded in empty US. So by car, truck or anything that's basically an individual means of transport that's burning petrol. Public transport, walking, cycling or another mode of travel requiring some physical activity and then unspecified travel. So with that knowledge in mind, we can decide to, we are able indeed to record the time people spend traveling per day. And here I chose to keep things simple by using the dichotomic variable. So by car or estimated versus any of the means of transport so including public transport or active travel. So I'm recording in, I am first flagging the car related to transport travel as TRC.T in the data set. And all the means of travel TRO.T. So it's just flagging the travel episode first part of the condition. Main greater or equal than 62 and smaller or equal than 68 in both cases. Alongside M Trav is equal to 1 for cars and M Trav is greater than 1 and smaller or equal to 5 for all the means of transport as we have seen in the code book. And I'm just using the basic or the base R if else condition, which is the more straightforward way of specifying the condition recording R. So, and as we've done last week, so if condition is met then the variable records the duration of the episode and is zero if it doesn't mean the condition for selection. So once we have done that, we can now compute the daily total for each one of these two type of travel. So I'm calling these two variables TRC.D and TRC.RTO.D which simply records the sum of the two respectively the two variables we have just created. And you will notice if you're not familiar with R that I'm using a grouped command here because obviously the sum is computed for each diary within persons within day within persons. Okay, so now I have this TRC.D and TRO.D variables and I want to have a look at them. So, of course, you can ask for descriptive statistics, but it's also nice for example to look at the histogram. So this is what we do when in R it's really simple to get a histogram of a variable you just need to type hist followed by the name of the variable you want to have a histogram for. You will notice here that I am and that's maybe a link with what the conversation happened with one of you earlier. Since we have TRC.D and TRO.D our daily total spent traveling variables are aggregate variables then we only need one row per day. And this is how I select one row per day, epnum being the number of the episode number in the episode data set. And also given that there's a lot of participants and we know that already that do not travel on a given day I wanted to have a look just at the distribution of duration of travel for people who travel so this is why I specify also that the histogram will only apply to people who have traveled on the day. And yes, FREQ is equal to F is just a way of asking for probabilities rather than number of observations. Here I don't spend time asking R to display a nice title for the histogram because we're just interested in data but that's just a brief overview of the distribution of this variable. And you can see as well that this variable as well. So the first one being duration of the duration of trouble moving cars or estimated and you're the one I remind you the duration of trouble with other transport. So the vast majority and it's maybe something you've heard before the vast majority of such journeys are really short journeys. You can see here they are they seem to involve duration durations of less than 50 minutes per day. So the first overview of our travel duration by mode of transport. Of course we usually want to go a little bit further. And one of the things we may want to look at is summary statistics such as or point estimate such as the mean duration. I can very easily do that now that I have computed my daily total. So I can simply ask R to compute the mean for me. And that's what I get here. And again, in order to make sure that my results are not based by the different numbers of episodes between people, between diaries. I only compute this mean for the one keeping one line or one record per day. And so I end up with estimate of the car journey here, which is about 14 minutes on average against 22 minutes on average for travel by or the means of transport. I'm just moving a bit camera here. So are there any there's a problem here. Yes, if it's Sorry, I didn't see your question earlier. So if it's telling you that it cannot find study, it means that you probably did not run the code that created the study variable. And I'm going to it's at the top of the exercise here. So I need to make sure that you have run these two lines, which create the study variable in your data set and the way I can read your error message is because it cannot find the study variable. Okay, so we now have these variables. Okay. So now that we have these means the next stage is to see if we can find interesting comparisons according to research questions we may be interested in such as for example, are there country differences in the amount of time that people spend traveling using different mode of travel. And of course, you are probably not going to be surprised to be to discover that it is indeed the case. And with the data that we have it is really easy to compute such differences. We're using a syntax that is very similar to the one we've used here. So I'm again starting from the episode five. Again, selecting only one line for day one record for day. And then I'm asking to group by study as participants try to do earlier study, which is basically country years commission of country years and to compute the, respectively, the mean and the median of each duration. Why do we want the mean and the median because as you may know, if you're a little bit familiar with statistical analysis means tend to be sensitive to and so sometimes over sensitive to extreme values. So having a couple of people in your sample who have a very long travel duration on a given day may skew your results a little bit too much worse. The median, which is less sensitive to extreme value gives a better overview of what's going on in the data. And I'm just going to come to this in a second. So the table here shows what we get and we get indeed interesting results. So focusing on the on the means here, we can see that as you may have expected, the US is the country with the longest duration of traveling by car or similar and the country which has the least amount of time spent traveling by other means than car. Spain is the country with the lowest time spent on cars and the Nesimans, the country with the largest amount of time spent on other transport, more of transport than car. And of course, you may have heard about the amount of people cycling in the Nesimans. Now, I think it's important here to go back to a topic we covered last week, which is when we look at the time that we did that, we always have to make a decision as to whether we look at the full sample, which we have done here, or if we want to only look at participants. So the median here with the amount of read of zeros shows that there are lots of people who have not traveled on the diary days, especially here traveled by other means than car. We could decide that because that's our research interest that we only want to look at the people who did travel on the day. But if it were the case then, of course, each column here would be computed with different samples for each day, so we would not really be comparing the same people. That would be the cost for such an approach. But on the other hand, instead of having probably a typically small estimates of travel duration, we would have in the same way we did last week with the duration of time spent on paid work, we would have values that are probably more closer to what most people actually spend on travel. Okay, any questions about this? Okay, so in the same way as we compute duration, we can also, of course, compute probabilities. So once we have duration, it's always easy to compute probabilities as defined as having a duration greater than zero. So this is what I do here in the next quote chunk. So again, starting from the episode five, and again, keeping only one record today, I'm grouping the data by study, and then I'm asking how to compute, it's a little bit of a trick, the mean of this probability. And it happens that r can compute that all in one go, which relieves me from having to compute separate probability variables and then aggregate them. There are different ways of doing this. You could also ask for a cross tab, for example, but I am doing it this way today. Okay, so and similarly as before, we have a nice table that allows us to compare probability of using any of the modes of transport on diary day. And again, you won't be surprised to see that the US is the country in the sample with the largest probability of having people reporting and traveled by cars on any day, that 0.6 probability, and that compares to a country like Spain, where the probability is significantly smaller, 0.42 per day. And if we look at the other means of transport, similarly in the US, you're very unlikely to be traveling by anything else than a car or by other means of transport in that way. And in your countries, especially again, Spain, you are much more likely. Of course, we are not talking here about just using other modes of transport, organized transport. It's also about walking. Of course, here there are lots to this than just hypothesis that I can make on the fly and this data aggregated for really sometimes large countries. So of course, significance analysis needs to go behind looking at such data, but still it's interesting to compare differences at that level. Now, before moving on to another topic, we can and I think that is a good way of covering again the issue of merging data. We can look at person level or other person level information according to which we can draw or comparisons. And I thought it would be interesting to look at gender differences in this amount or probability of traveling by mode of transport. So in order to do that, I need to put together, merge together the duration variables I have created with individual level information. There are different ways of doing this. You can either create a new data set. Yes, there's a question. Okay, so going back to this. So these are our probability. So here, as I was saying, there are different ways of merging data. So we can, as I've done it here, add to the person, the data level, sorry, data that I had before opened separately. You can add the aggregate time estimates computed in the episode that said you can, one could also decide to put everything into a new data frame. Oh, yes, there are different ways of, actually this is what I did here. I didn't add it to the day level data. So I created a new data set rather than adding it to the existing one. That's the nice thing about how you can have as many data frames as your computer's memory allows for it. Anyway, so I am indeed merging the and my computed duration here. So you will see that I'm only selecting from the episode that said the stuff that I really need. So the identification variable study as computed previously household identification, personal identification, dairy identification, and then the two daily totals that I computed. And the first four variables are only going to be used for the purpose of matching with the day level data. And to keep things simple here, I am only keeping the observation for which there's a match. So if for some reason there's something that I don't know the way, then I am asking how to drop this, which is something you need to think about. And you need to think if it's what served your research purpose when you do it with your own analysis. So as we did last week, I created more clear gender variables here. And then once this is done, I can this time using two grouping variables study as before and gender produced the mean duration of travel by mode of transport by gender. And just out of curiosity also I created a ratio variable which measure the ratio of travel by other means divided by the duration of travel by cow. So this ratio would be greater to one as soon as travel by other mean by cars is greater or equal duration of travel by cow. Okay, so the table here shows the results and as before we have interesting contrast. Does any one of you would like to comment instead of me doing all the talking? What can you see from this table? Okay, so one of the things we can see in the table is something that's been documented in the literature that have been looking at gender differences in paid work or gender in general, especially in Western countries and which is that women tend to spend less time than men to travel because more often than men, women have family responsibility look after children and as a result either do not have paid work or if they do have a part-time work especially in countries like the UK and even if they have full-time or paid work tend to look for jobs that are near to home so that they can still come back easy to look after children who are going to school if need be. So that is reflected or that is one of the reasons why researchers or how researchers explain the shorter duration of time commuting but not just commuting spent by women. So that's one of the interesting things we can see in these results. I didn't look at male to female ratio within country which would be something that could be interesting to look at so you could do for yourself to see if the level of gender differences are broadly similar between countries or if countries are more unequal in terms of the duration of men and women's travelling time. The last column about the ratio confirms maybe things that we've seen before. So the US is the country with the lowest ratio which means that the mean time spent travelling by car relative to other means is by far the largest and hence the smaller ratio. There are almost no countries where people or where the duration of car travel is smaller than non-car travel. There's only one exception and this is Spain, women in Spain who seem to have more journeys by the ratio of car versus non-car journey in Spain tends to be occurring mostly or to be largest for journeys that are carried out by female. So this is just really an illustration of how time use data can be utilised to document interesting research having to do with travel sustainability and gender differences in the world. This is quite interesting because with this car timings you can't really tell if it's people in their cars travelling long distances or if they're in their cars stuck in traffic or can you. Just curious because it might actually be that even though the other modes of transport are used less it could be I don't believe it because with public transport you still have to wait but it could be that's actually a more efficient way of travelling because you have your bus lanes or your undergrounds or something else but probably not, maybe not. I think that there are certainly many ways or several ways in which this rough analysis can be refined a little bit. So I suppose one could look at so I don't think it's possible to directly look at so first of all if you look at commute versus non-commute that could be a first way of looking at usual travel versus maybe less usual travel and assuming okay not all unusual travel are long travel but travel for leisure that's not for shopping maybe a way of trying to capture longer travels and then of course yes there is also a way of looking at the proportion of long travel so if you look at someone's tradition you can long travel that is done by car rather than other means of transport so there are ways of playing a little bit with the data but it is true that formally speaking you can't determine whether people are stuck in traffic although actually yes what you can look at is the time of the travel so and again it's not a perfect instrument but say if a person is having a long travel journey or reports a long travel episode it's between five and six in the afternoon then it's more reasonable to infer that this could and that is a work commit it's maybe more reasonable to assume that there has been some suggestions involved but of course it's speculation a little bit I am starting again with this homage to Salvador Dali and the way that time escapes always escapes a little bit our attempt at measuring it yes I need to cover now a number of topics that have to do with the way we do statistical inference with time-dairy data given the specific nature of time-dairy data okay so a recap some of you may have or maybe all of you may have come across this at some point in your career or the other but I'm just reminding it here to make sure that we are all on the same wavelength so when we use when we do data analysis with survey data it is very often because we would like to find out about a population more specific population from which the sample with working with is drawn without going into all the statistical theory that underpins this one of the main reasons samples or certain samples allow us to infer things about population is that they were connected at random and over large enough and they are made of a large enough number of individuals or units so however we also know that most samples or if not all samples have some degree of bias the bias comes from the fact that not everyone is equally likely to be part of the sample as in to have been sampled there are groups that are less likely than others to be selected for sampling and even if there were that were not an issue we also know that some people some type of people for example younger men are less likely to take part to surveys either because they are not interested or because they have a lifestyle that make it more difficult than older people to actually contact them contrast that with older people who have a more regular lifestyle for example and of course this is just an example of other groups that are less likely to be sampled okay so and there are all sorts of techniques that are used by statisticians or survey designers to try and alleviate these issues stratifications, disproportionate sampling, clustering are some of these tools so when one computes estimates from samples one has to take account of this and even if it's often not shown because for teaching purposes we want to go to the essential and not go into all the detail of estimation actually when working survey data at the very least all estimates should use weights what are weights, weights are variables that allow to compensate for the fact that some people in the sample were less likely or people of certain groups in the sample were less likely to have been selected whereas others belong to groups that were more likely to be selected so weights in a way gives more importance allows estimation to this more importance to people who we know are underrepresented and less to those who we know are overrepresented but that's only one part of it because weights allow to compute more representative point estimates of single values of characteristics one is interested in such as a mean for example but we also want to know about the uncertainty the precision of our estimates and this is why the best estimation technique we can use try to also take into account not only of the weights but also of survey design characteristics but I'm not going to go into the details of this is just a reminder and of course doing this is not always possible so in most cases people will just use weights when computing estimates so this was in general for social surveys now what's happening when we are trying to conduct some inference with surveys so and this is where I am going back to the questions that was asked by Elena I think I hope I'm not getting the name wrong Elena sorry so first of all we need to think about the nature of our population so in a particular social survey the population of preference tend to be either people or households here what we are actually sampling is days of people and this is so because in most surveys we collect several days of people even in the case of one day per person still we are still collecting a sample of days so therefore we are working with a population of days now there are immediately a number of issues that potentially arise from that so first one is when were the days collected so in the case of high quality time use surveys such as the UK 2014-15 the survey was conducted during a full calendar year between 2014 and 15 so which means that in theory these days that were collected are representative of the days of British people or British people as defined age 8 and greater because this was the population of reference for that survey was designed so the days of British people during that period during that year and even more specifically two such days were collected so these were representative of on the one hand weekdays and weekends now that's interesting in theory but there are a number of caveats so one caveat is the fact that I haven't put it in the slides but not all surveys have enough means or have enough funding to conduct a study for a period of a full year so there are many cases where the time use survey has only been conducted in over a single month for example and alternatively where only one day per person has been collected so all of this has an impact on what claims can be made about the representativeness of the data so if indeed your time use data has been collected, your population of reference in November so it means in theory the inference you can make are only valid for what happens in November so there are these sort of caveats that need always to be taken on board when conducting analysis with time diary data and also it's even more so when we are working with comparative database such as DMTOS which is made of very large number of sometimes very different studies with different designs so it's important not to become too quickly enthusiastic it's important to curve our enthusiasm if we find interest in reason with DMTOS just to make sure that we are not comparing with pairs depending on the survey design however it remains that it is if the right conditions are met and if we are working with data that is good enough we can still make claim about having reasons that are representative of the way people spend their time as in the days that were spent on some activities for a certain population or during a certain period of time as in historical time okay so that's for the principle but in practice in the same way as one has no response for traditional social demographic characteristics we also know that people do not actually necessarily answer or fill in their time diary on the day there is allocated if everyone was doing it and therefore if everyone was filling in their diary on the randomly allocated day by the study then you would one would reasonably what could expect that to have a relatively balanced distribution of days weeks or months in the case of a year long survey but that's not the case because for all sorts of reasons people do not necessarily fill in their time diary on the other reasons may also play a role but since still with a view of being able to make inference about given time period in a way that is robust we want to have estimates that give equal weights to every day of weeks and weeks of the period of reference so this is why we are time use survey designers compute an extra layer of weights that are diary weights so we have seen that in social survey weights are designed as way of commenting for non-response at the individual level or unequal probability of selection but here we have a diary level weight that takes into account the fact that we want ideally to be able to equalize the distribution of days and if possible months in our data and with that in mind I will show in a moment how it translates once these weights are applied alongside the first level weights that I've mentioned about then one is able to make more robust inferences about the time duration or probability of activities for a given population so this last point that I have already said here so the inference we make with time that every data are limited to the field duration or the time of the year during which field work has taken place the main issue with time that every data I would say one of the main limitation is not so much in my view the fact that there can be a limitation in the time of the year during which to which we can infer but it has more to do with the fact that within person variation is limited so as you can imagine it would be very costly to ask people to fill in a time diary for more than a couple of days this is why more surveys settle for two days per week some go as far as and this is the case of our Dutch sample here in the MTS data for seven days per respondent but it's hardly ever go any further than that so which means that when we want to or if we are interested in studying variation within persons of time span on activities or probability of activities then things can become a little bit trickier so with the data in time diary data you can it is possible to compute estimates for example of the amount of time people spend on physical activity at the aggregate level at the population level with the caveat that I have liked it but if one is interested in studying what makes someone engage in physical activity on a certain day rather than the other and that's usually very tricky to conduct with standard time use data you can of course if you have funding collect your own data in which you collect more days possibly on a smaller sample but same time diary data would limit your ability to do these types of things okay any questions in relation to this okay so another topic also that we need to cover quickly before going back to R it's the issue of another issue related to data quality okay so it's not uncommon to find problems related to the quality of time diaries of course it varies very much on from one study to the other certain studies have very good standard of completion and checking the data orders less so it's sometimes up to you as a researcher to look at check that a number of things are as they should so I'm just going through here a number of common parameters to put it that way through which quality has been looked at in time diary data in particular with DMTOS so in DMTOS it has been decided that the in order to be labelled as good quality episodes diaries or being sorry good quality diaries diaries needed to have at least 7 episodes so people who for some reason did not fill in at least 7 episodes in diaries are considered as bad diaries and there's a flag in the data that there was these respondents or at least their diary to be selected out because it's possible that or it's likely that they didn't record properly what they were doing their day so this decision another component of what makes bad diaries according to the DMTOS definition is the extent to which in addition to having not many episodes some key activities are missing of course it happens to anyone to skip meals or even to have sleepless nights but if this comes together with small number of episodes then and maybe other indications that are not mentioned yet then it may be a sign that the person who filled in the time diary did not do it seriously and it's best not to include that time diary in the analysis so what is in the context that I think it's something I've mentioned before 15 episodes per day is the average that we observe people fill in so and of course with people sometimes filling in many more than 15 episodes per day so if you have a number of episodes that begin to drop below 10 this is a sign where you need to question maybe the quality of your data and I'm just mentioning it this one here we won't have the time to demonstrate it is the case of incoherence in the data you can have enough episodes you can have the key episodes being present but it may be also that the data does not add up properly it may be a whole by whole we mean when the old ending time of episode do not coincide with the beginning time of the next episode for example or when the time at which the episode the diary begins is not properly mentioned but all of this there's too many episodes which can also happen but this usually looking manually looking at the data and trying to find out what here it's a little bit of an art rather than solving issues with coding in terms of what can be done to improve check the quality of the data of course one of the first things to do is to count the number of episodes so within each diary or within each day to check that the total amount of time reported for activities is adding up to 1440 which I don't need to remind you is the total number of minutes we have each day and indeed check that the beginning time of a given episode is identical to the end time of the previous one for each series of episodes on a given day but as I've just said apart from this very little things to look at the rest of the check unfortunately needs to be done in case by case now from my experience of time research it is not there are often diaries which have issues but there are not that many of them so the vast majority of diaries in dates set for secondary databases tend to be over good quality so are there any questions before I switch to the demonstration okay so I'm going to demonstrate or to show a little bit some code in relation to what I've just presented so back to our studio and back to our LTVS sample okay so as I may or may not have hinted at producing robust estimates with survey data in general and with this whole topic in itself and I don't have the time to cover this here so my main recommendation is and I would do the same if we were not talking about any surveys if you're using R and trying to do inferential analysis please refer to the survey package which is a survey that has the most comprehensive set of functions for robust from survey data I won't have the time to demonstrate here analysis using the survey package but I think we have enough information here too or you have enough information from this course if you have the conducted that type of analysis before inferential analysis with survey to apply it to the time diary data or to aggregate time diary data so please use the survey data if the package if you if you can and if you can't then I'm sure you're already aware there are a number of comment specific ways of waiting estimates in our but they come with of course the usual warnings about the validity of your standard error and confidence interval estimates so two common functions are for example WTD mean and WTD var from the HMISC package I just wanted to maybe show a little bit with real data how we can diary weights work in practice and in order to do that I will use DMTOS and then an example from another data set as well so to start with with the data set that I shared with you we can easily look at the distributions obviously given that we know that these were separate studies we look at country or study that would give the same result and we use here the x-tap cross-taps function I'm using the cross-tap function here rather than simply table because it allows using weights in a limited way so the first cross-tap here just shows that we don't have a balanced distribution of days except for the netherlands which from the start managed to have an equal proportion of people for each day and it may be because this is a sample in which if I'm correct a 7-day sample so people were asked to fill in the area for the full week and it may be that they just retained in the data they shared people who filled in for the whole weekend not those who failed to complete their diary every day and don't forget also that it's the US coding so day one and day seven are respectively Sunday and Saturday and you won't be surprised to hear that most people or a larger proportion of people fill in the diary at the weekend rather than during the busy week so with this information in mind we need to try and find a solution so in M-T-U-S the variable that is used for weighting is called PROPWT so it's not just a diary weight it's a weighting variable that has the original weight sampling and non-response weight from the data but to which a diary level layer has been added so to see how it works we simply reproduce our cross-tap command here or x-taps command here and the trick for weighting data with the x-taps command in R is just specifying the weighting variable on the left-hand side of the equation between the brackets so this is what I've done here PROPWT on the left-hand side and the two variables country and day on the right-hand side so I get the same table as before but you can see that now all the days have a almost identical probability of distribution tiny number of exceptions and now in all five surveys the proportion of each day is similar to the one we only had for the Netherlands so that's our first stage but of course in an ideal world we would also like to have something similar for months of the year and of course this is where we have a limitation that is due to data collection so as the next table shows there is only a limited number of studies for which data was collected throughout the year so in practice only the French UK and US data was collected in a full calendar year the Dutch data was only collected in October of that year and the Spanish data was collected every three months also as you can imagine collecting continuous data across the year can be an expensive undertaking so in a year-level or in a yearly time you study you can also have weights that compensate that equalize the probability of selection of months within the survey so here I'm demonstrating some coding that unfortunately you won't be able to produce because it entails using data from the 2014-15 UK time dairy data to be fair you will be able to reproduce it if you have access to UK data so the data which may not use not necessarily very difficult to get if you're affiliated with a higher education institution in the UK or even abroad I'm demonstrating here what the weight is doing so first I am looking at the unweighted dairy day act so it's the way the day which the dairy was collected I'm just looking at before and after weighting you can see that the result is similar to what we've seen before with ProWT in the empty US and now I'm just also going to do the same with month of the year and you can see that in the original data a few months were overrepresented so October and November were overrepresented so we applied the weight and we can see that after we've done that we have fairly balanced distribution of month of the year so with that type of weight supplied and of course given that we have this is of course supported by data that was indeed collected throughout the year then we can make inference about the time spent on activities for a given year for the population of that country okay any questions so it means that whenever you are conducting estimates of time-dairy data with a view to make claims about the population you definitely need to add these weighting variables in your comments ProWT in the case of empty US okay so now I quickly go through the practical side of episodes we talked about episode quality so just a reminder of here what a series of episodes look like in a long dataset format so we have country, service of the year household number personal number, diary number and then we have here episode number from 1 to 19 the duration of the episode in minutes the start and time and the activity here, coded alongside empty US categories so look, why am I showing you this because this is the first way of looking at whether something is wrong with some diaries for example one of the things we want to be the case is for the ending time here to be identical to the beginning time of the next episode so if there is something wrong in your data or if you have identified that there is a given diary one of the things we get is whether these match each other and whether there is something you can do sometimes it's easy, it can be that people have a bit of optimistic and found themselves with not having enough time to correct the time diaries so they may have the right list of activities but they put too many minutes for say the penultimate activity and then they found that what they want, they filled in the last line, didn't have enough time so they left it as it is but it's something you can correct but impute by hand okay so that's the way there's some code here to check the number of episodes per person per day so we follow the same logic as we've done before we're just creating a variable here which we call maxEP which tells us the maximum of this variable epnum which is simply the index or the episode count for each diary and if I ask for a table of this, you can see that there are indeed a few very problematic diaries even a couple of diaries more than a couple, 39 diaries with just one episode so clearly that's a problem and a few with a really really large number of episodes this way there's another way of visualizing the distribution of episodes in diaries and it's using again histogram you just ask here for maxEP still one record per day and that gives you an idea of what it looks like and you see we're not very far from the 15 episodes per day that I was referring to earlier here it seems that it's maybe even a little bit more what if I want to identify diaries that are problematic so a quick and dirty way of doing this in R is to ask for a table of episodes for which maxEP is smaller than 7 that's the kind of implicated comments what can easily do in R so and I've added prop table here to get a proportion so you can see that fortunately it's a tiny proportion so 0.005% of the sample of episodes have bad quality so that's good and more than 80% looking at the other side of things episode with 15 sorry with 15 episodes or more there are 83% of them which sounds reassuring you can also look at diaries quality per country and I am not trying to show off here with UK but it is true that the UK sample according to that definition the best quality as opposed to the US and French data okay so that's it for the data quality topic I wanted to talk about I think that of course as you will do your own research you will learn your own syntax your own way of going through these issues but I think in here try to identify maybe the main things to think about when working with data okay so now that we have covered the boring stuff it's not boring but with estimation I would like to go back to the analysis of data and more specifically I wanted to talk about tempograms yes as I said sorry for this mishap I am going to full screen so tempograms so as we have seen there are different ways of describing activities and episodes so the simplest ways are simply by computing point estimates of duration or probabilities of activities duration of activities or probability of engaging in activities but after a while we may want to look at things that are a little bit more complex or that give a little bit more information than just means and probabilities however interesting they may be so this is why while still being descriptive analysis the idea that one could map the activities of given population throughout the typical day is becoming of interest so the principle really of tempograms of such a mapping is simply to ask yourself what is the proportion of people or rather of diaries by activity by time of the day so what are people reporting in their diaries at 10 o'clock on a typical day and also that allows to look at how things tend to be sequenced or ordered throughout the days and this is what we are trying to do in a visual way with tempograms so this plot here is maybe a short hand way of trying to answer the question what are men doing on the typical weekday or what were men doing on a typical weekday in the UK in 2014 and 15 and you can see that so we have already seen such a plot before so we have as x axis the time of the day starting from 4 a.m. until 3.59 a.m. the next day we have on the y axis the proportion of people and then we have all these activities here proportion of people for each one of these 8 activities in pink we have sleep and self-care in brownish here I apologize by the way if my description of colours is not really accurate because I am colour blind but I hope this remains understandable so we have the paid work here and in Violet here we have leisure activities such as reproductive work here or caring in darker green so if we look at this what can we say we can say that well men tend to sleep understandably and surprisingly until all the vast majority of them until 6 a.m. and again start to sleep most of them start sleep from 11 p.m. or midnight onwards with a slight proportion of people who have sleep or self-care actually so it's not just sleeping I should not get carried away with sleep and more time people also spend time doing self-care having breaks for example around lunchtime and surprisingly or indeed after the working day from 4.30 onwards and then the big drop in paid work really occurs from 7.00 or 8.00 p.m. onwards so it's of course interesting to see the amount of time paid work takes for people especially for the men during the day so the big chunk of paid work occurs here in the morning and until 8.00 p.m. and when they are not doing paid work men tend to be engaged in reproductive work or shopping reproductive work is basically looking after the household what is usually called household chores and you won't be surprised to hear also that during the weekdays tend to take place towards the end of the afternoon and in the evening of course it can be interesting to look at more complex ways of dividing once day and how you can produce your own recording to suit your research interest but I was just showing this as a way of showing the logic of a time program so there are different ways of computing time programs which are more intuitively straightforward I will demonstrate computing time programs for time slot data so time slot data is a variable in which you have variables for each one of these time slots you would have given that we have 144 time slots in a day so what we in fact plot is the proportion of people by activity for one of these 144 time slots so you won't be surprised to hear that the way to create a time program bears similarities with what we have done before so first of all we again decide on the definition of activities we are interested in and then we record the episodes in these categories and then we compute the proportion of respondents with a diary in each group of activity and then we plot it using area plots fortunately in R we have this advanced plotting function in the digital tool package that makes our life easier we can then do it by for example gender and day of the week so and that's what you would look at so on the left hand side here we have the plot of shoulder here so these are men on weekdays now we can compare weekdays and weekends and we can see of course the immediate difference by the larger area that is occupied by leisure activities throughout the day clearly that's one of the main story when you look at a week versus weekend for men there's clearly more leisure and more productive work and still some but much less paid work here and of course we can see that the difference between genders are also really marked and less the area for paid work is always larger for men than for women even on weekdays and that reflects the fact that women do more productive work and are less engaged and tend to work part time since we are in the UK and the fact that women do more productive work less paid work suggests at the same time that the total amount of work they do is no smaller and maybe even larger since there's more caring as well than men so as a result they have less or at least not more leisure time or at least they don't have a larger or higher probability than men of engaging in leisure activities and that remains true and this weekend it's interesting to compare here the amount of time or the area representing the probability for women to engage in oops, what have I done to engage in I think I jumped the slide yes, so the probability of engaging in productive work here so yes I'm going to share the quote I just wanted to demonstrate the type of contrast we can look at here so what we are going to do now is to compute something similar but looking at differences between countries so I am going back to our studio or rather back to my workbook okay so here I am asking those of you who are a little bit less familiar with art to bear in mind that if you feel a little bit lost you will have the opportunity of course to reproduce that experiment with this by yourself after the session so in order to produce code that I felt would be interesting I had to use a syntax that may be a little bit more complex than what we have done so far okay so in order to compute the data necessary for having time slots we need to convert at some point our episode data into a time slot data set and if we want to do that we will need to use a special function which uncount from the title our package which duplicates rows according to a variable so in a sense we are going to ask our later on in the code to duplicate the number of rows from given episodes according to the duration of these episodes so the first thing I do here is to check the quality of my data, why am I doing that because the first time I try to simply create time programs with the five countries data looked really twisted and I had to really look well into the data to find what was wrong and it appeared that some country data had issues so I'm just computing first and I know from the start also that we can't for technical reason plot the Dutch and the US data with the same syntax because the sample design is too different however so we are left with the UK France and Spain and I am going to look at these activities similar to those I've showed you for weekdays so the first thing I do is I'm checking the number of episodes so the maximum number of episodes the total amount of time and whether it will end it's 14 and 14 minutes per day that's the first thing I do and in order to make sure that I'm not imagining or altering my initial also creating a new date set with the episode data that's something that can be easy to do and the other thing I do I am adding from the day level data so I don't think that the episodes I've had it the weighting variable from WT because I want my time program to be computed with weighted values so I have this M date set now that has these quality check variables that created that has the WT variable and I just check on the data quality so I can see that the minep there's an issue with the minep variable so not all diaries start with episode number one I could spend and I would probably for real world research advise you to look in depth into what it is so but for the purpose of this exercise I will just decide to delete them especially given that they are really a small proportion a little bit less small in the UK but still less than 1% for the two other countries then there are the issues and that particularly matters for time programs of activities that add up to 14 minutes per day so we need to see to that and again I would advise you to look at what's going on with the data if possible even if it takes a significant amount of time but here again for the purpose of this exercise I will just get rid of them it's important to keep in mind that the proportion of such diaries is not small it's between 37 in France and 16% of the diaries in the UK so we have done we have checked that we can go to the next stage so we are going to create the third data set the last thing we need to create before we go is this count variable which is basically the time variable that is divided by 10 because we just we don't need we just need a number of slots so I'm creating the slotted set from the episode 1 I'm dropping off the bad quality of the diaries I have defined them earlier and this is where I am using the uncount functions which is going to create rows for the duplicate the rows for each episode of course in the hypothesis of having episodes that last one minute so one unit in terms of the diaries resolution these search lines won't be duplicated only lines that are longer episodes that are longer than 10 minutes will be duplicated so I have this episode file this slot file that I can start looking at so it's if you want to have a nice neat display you will need to convert some of the data into a data frame so that all the haven attributes do not show up but I'm asking you to trust me that the data is as it should and now obviously what has changed is that as any time slot data set you have 144 lines for each every single diary in the dataset so basically your dataset now has become much bigger which is also a reason why time slot data sets are not used so commonly a quick quality check are indeed all the diaries made of 144 records yes they are which is cool and that is indeed the case for all countries so we have hopefully data that is good enough and we can start working on the time program ok so here's how I did it so first of all I created a label that was going to simply contain the description of activities people were engaged in so similar to those in the plots I've shown earlier so sleep self-care, work and schooling reproductive work etc etc next stage I am simply recording this in the data so that I have the proper categories that much so you could do it by hand instead of having a predefined label simply specify the character strings here but it would be a little bit messier I can then define this variable as a factor and I reordered categories or the levels of the factor so that they follow the order from the plot earlier why do I use factor because it's easier to define an arbitrary order otherwise by default they always end up being sorted alphabetically by R and now for the bulk of the computation so as I said I chose to go for something that's probably a little bit more complex than a simple time program here so that we can have interesting results so what I've done is I have used the loop so that I could in a single operation loop compute the time programs for each of the three countries in the sample so in a sense what I am doing is the following I am creating a list which is this special and very handy type of object that you have in R which can store all sorts of things including data frames and this list that I call TMPR is going to be used to store the three sets of data that we are going to create the time programs for so the first level of the loop is between countries simply defined by the values of the variables of the country variable and then I am within the data I am creating a matrix which is where these time program data will be stored and now I am iterating between all the lines of the diaries so I am asking for each of the lines ranging from 1 to 144 for each time diaries to indeed compute the proportion of people who reported the given activities in their time diaries so the engine the core of this computation here is simply something we have done before so it is a cross tab I am just asking for a cross tab of sorry it is not a cross tab I am using the cross tab function because it allows for easy waiting but just the frequency table of the variable I have created here the main S the 8 category activity weighted by probe WT converted into proportion by probe table and then round it so that it is more human and I am computing this for each line of this time slot diary for each country so it is actually quite as you may imagine it is computationally intensive so it is depending on how powerful your computer is it may take a little while I am pretty sure there are other ways of doing this and if you have a suggestion for more efficient coding I am more than happy to take them into account but I found it shows well what is actually going on under the hood when you are doing it that way okay and then once this has been computed the right name is allocated to the right summary values here and then numbering post hoc each one of the 144 estimates for each set of results so that is the first part this is the part where we compute the sets of proportions of activities engaged in per time slot per country now we need to go to the next stage which is about coding this sorry plotting this and you will notice that I was not strictly obliged as I have done it here but I do have one country one loop between countries for computing and one loop between countries for plotting I could have kept on going with plotting within the same loop within country but I didn't want to make things too complicated if we want to use dgplot we are going to need to convert our data which is in the meantime has become a wide format data because we have lines of proportions for the variable of interest we are going to convert into long format using the reshape to library or more precisely the melt function from the reshape to library or package and this way we will have data that is immediately usable by ggplot so how do I do this so I am just creating a new object but this is a temporary object here it is going to be deleted each time or written on each time another step of the loop is taking place which stores this converted to long format matrix using time as id variable I am renaming for easier handling later on this is the second variable and that is the trickier bit this line here is about making sure that the plot represents the proper time of the day so we know that the day starts at 4am which means that which time is expressed in the way we computed them it starts at time 230 and then of course it goes on afterwards the other thing is yes and you need to convert time that goes beyond that to according this here is probably the scariest looking line of all of this but it is actually simply converting the time expressed in minutes into time in hour and minutes so that we have more intelligible x axis for people who are familiar with time diaries and finally I am just making sure that the factor delivers out the proper one for the activity so in other words the proportion of the main s variable created earlier will reflect the label used before ok so that's for the first stage of the plotting so we have prepared the data for plotting and we have prepared it in a tmp.g object now for the proper plotting so I am creating this plot by using the ggplot function you may or may not be familiar with ggplot it has to be thought of as a series of layers that you add successively to an object or plot object and that increasingly gives it in the shape you want it to have so the first stage is the data and then the basic characteristic or what's ggplot jargon the aesthetic of the plot so I am specifying here that the x will be what has been created as the time variable which is in other words my time slots and why the value which is the default name for the variable that was created by the melt function here I could have spent time changing the name but I haven't and then so these are the the first stage second stage this is where I am telling ggplot that on top of this I would like to have an area plot so I add geom area and I need to clarify the aesthetic so saying that the categories of the activity variable will be the base for coloring the areas it's stuck meaning of course that they need to be put on top of each other the rest is a series of options that make the graph clearer strictly speaking they are not indispensable for the plot to be correct but of course they make it much easier to read such as clarifying the x and y axis clarifying the title as well as font and panel background characteristics ok you will notice here if you want instead of having these plots to show up immediately you can have them saved using the gg save command there's a png plot for example ok so I'm repeating this before looking at the results so what have we done we have first created a slot time slot data set in which we have recorded the activity variable into a number of categories of interest we then have computed for each one of these slots the proportion of time diaries in which these were which were falling in each one of the categories and we end up therefore with a summary variable for each time slot with proportion for each diary in each activity and I have done that by looping between diaries lines of diaries and countries second stage I am now plotting these again looping across countries and I'm plotting these by converting these research that I had stored in a list into a long data set I'm just cleaning up the units here especially the time units so that they match what actually is measured in the data started 4am measured in time in hour and minutes as opposed to just minutes and then I plot for each country these data that are just melted or in long formats by specifying time x-axis and the proportion of diaries in activity is the y-axis and having colors to separating the different categories and this is the result so we have 3 plots here as one could expect and you will notice that the plot here is very similar to the one I showed during my presentation and I'm surprisingly so because this is the data, the time program for men and women before it was just men during one week days in the UK so that's the UK data and now we can compare the UK data with French and Spanish data and it's actually quite interesting but if I'm just going to compare French and Spanish data there is quite a number of differences so would anyone of you be prepared to comment on these differences big lunch breaks for the French and not only it is big but actually it's interesting what do you mean by big well they stop working with the UK it's kind of a small I don't know it seems like the time that they are breaking from work in the UK but actually I'm not sure now that's correct yes exactly that's interesting so what it shows is that there are more reports of lunch breaks in French than in the UK definitely but when you look at the width, the temporal width we don't measure duration so we can't really make inference about duration but it's roughly happening during it's not that the French are having lunch break at any point it's just that they seem to have much more of them or many more of them seems to have their lunch break at the same time dinner break so that's from Sadaf yes there's a dinner break so I suppose there are two ways of looking at it we can also scroll up to the Spanish data which is kind of not as radical as the French one but still closer maybe to that which here the story may have is also the fact that there's a it's not my field but it's something I've heard reported in research the fact that there's a gradual disappearance of traditional meal in the UK by comparison with continental Europe and in that way the UK would tend to follow a little bit more the lead of the US so that's clearly one of the interesting difference between the two countries another one of course is maybe the time at which people or how people organize their leisure there seems to be a little bit also maybe it's a consequence of sharing meals but there is some reproductive work here more intensively going on in the French than in the British data set here leisure time is organized different ways but of course it would be really interesting also to do it here in order not to over complicate things but if you look at gender differences between these countries I'm pretty sure that they would be interesting research as well and there's always a balance I guess to draw between the more specific the group you're looking at the more interesting potentially the results but of course there's a limitation which is the more specific the group you look at the fewer observations you have and the more imprecise your estimates are so it's a little bit of a balancing exercise that you have to determine for yourself depending on what your research interests in so that's maybe a way of mapping people's activities during the day so I'm going to maybe ask if there are any questions or comments apart from how the French have their meals I need to clarify I'm not French any comments or questions yes I am fully aware that it's not easy code the only thing I can tell you if you try to do it again and again you're going to see it's going to get easier yeah maybe I wanted to seize on the opportunity before I move maybe to the last to the last slides so are any of you engaged in or have interest for time that research what is your research interest actually have you already worked with time that research are you planning to on what kind of topic couple of things quickly just in passing because I really do I probably need to share a screen again actually so back to me okay so I'm just saying a few words about this you can go through the slides by yourself so just to mention that for those of you who are interested in the area of paid work it's some time you serve is including the UK ones have been collecting data about what they call work schedule and work schedule is a separate rather rudimentary time diary in which people were only asked to report whether or not they were doing paid work but the specific feature is that it's not anymore just two days a week it's still seven day week so it's kind of a compromise between trying to minimize respondent fatigue by only asking them for a full diary for today but still collecting some data about their work life structure for seven days and of course if you're interested in paid work then that's a great instrument because that really provides you with a photo a full photograph of someone's work week much in a much better in a much more precise and better way than if you were to as one just traditionally ask questions about how what's the duration of your week how do you sometimes work out of hours etc etc here we are we have an instrument that in theory offers people the possibility to or in research to study a real work week so that's how it was recorded so sorry the image quality is not great but let's just show you but give you an idea so people were so in order to minimize respondent fatigue people were just asked to draw a line and each work is on today and this was then called it separately so this is the very basic way of plotting such time schedule obviously it would become more interesting once you start looking at gender or other type of difference but yes that's already great and potentially interesting instrument to to look at okay the main two things I wanted to mention here are basically the next step of time diary analysis so I've just mentioned two things and yes for your record if you have any specification you want to ask me don't hesitate to send me an email I can provide more specific answer okay so of course what we have done so far is still falls under the label of descriptive analysis so we have simply described characteristics of our sample and made some timid attempt at inferring some univariate or bivariate characteristics of our sample but a lot of us social science researchers want to go a little bit beyond and do some multivariate analysis so I wanted to just make a few comments about what applies and what are the things to look out for when doing multivariate analysis with time diary data so as you may probably already be aware that in two outcome variables the easiest way of thinking about outcome variables from a time use perspective are either the durations that we have computed before have a daily duration of given activity as outcome variable or having the probability of engaging in an activity on a given day as the outcome variable and if you are familiar with the regression or the GLM framework you can see how this can translate with actually a linear model in the case of duration or logit or probit model in the case of probability of engaging in activities I would encourage you to keep a number of things in mind such as that the activities the duration of activity may have a very different shape depending on the activity so you can it's really good to check the activity you are planning on modeling actually can be submitted to the normal distribution and if it isn't maybe other types of regression or techniques may be better such as Poisson regression for example an issue you will have to dealt with which is partly related is what you do with the zeros so if you are modeling sleep or time spent eating you are going to most people will on most days will report engaging in these activities however if you are starting to look at activities that are a little bit less common and that can start with paid work so you are going to find your data with that your data consists with non-negligible number of zero so that has a substantive and methods impact so the substantive is what are you actually introducing and how do you take into account this non-participant to your theoretical framework but from a technical point of view of course having by model distribution with lots of zeros at one end or lots of people having insert zeros or zero observation at one end will affect your regression model so there are different ways of doing that and I'm not going to go through them if there are economists among you selection models is one of them so in which you model separately as part of a similar property of participation with the amount of participation but you can also you can also decide when you look at participants plus a ring of observation so as we have seen the the basic unit of observation of time diary data is diaries, time diaries so depending on what you're looking at and depending on the survey design you may or may not have lots of observation diaries within people if you're looking at sleep for example or eating again activities that are carried out on most days you will end up in the case of UK data with most of the time 2 observation per person which means that your standard error are going to be clustered within people so what to do with that again different ways of doing it this I suppose it's always best probably to start with the simplest way and then to explore if it suits your purpose the simplest way in the example I've shown would be to look at with simply computing models with robust standard errors is not sufficient another way of looking at it is the actual amount of observations for which you have 2 diaries per person looking at paid work again even if in theory you have 2 days of diaries per person most people do not simultaneously work on weekdays and on the weekends they do but the number may be negligible with regard to the whole sample size so it may be ignorable yes so it's just something to investigate and to keep in the back of your mind that potentially there's an issue there with transferring of observation of course if we're working with 7 diaries then that is becoming even more important so how to fit linear regression model in R I'm assuming that you have come across the LM or GLM functions from the stats package already which and there's a way of computing sandwich estimator for standard error and sorry for the typo here when doing modeling so that's maybe a first broad overview of the multivariate analysis landscape here so it's very easy to compute model of some of the variables we have computed here so for example computing GLM model or LM model of the total amount of time spent working on a day as a independent variable as a dependent variable and gender for example as a independent variable okay so that's a quick word about regression now another kind of research avenue that has been pursued by some researcher has been to try and create typologies and sociologists like to do creating typologies of behavior so and one of the common tools for creating typologies relies on a combination of sequence and cluster analysis so and maybe the work schedule that I have demonstrated earlier provide an example for that so if you define a sequence of activities and you decide to group them into clusters using cluster analysis and say that the sequence you are interested in are sequences of paid versus paid work versus anything else or paid work versus leisure versus reproductive work then you end up with a potential series of sequences that you can group into clusters and these clusters then can become outcome variables for the analysis so there's an old but still interesting paper by Lennard and Cannes in which they did that with HPK work in 2011 I think there's been an update to their paper recently and there's an R package that is doing precisely that so I realize that we are running out of time and we could keep on talking for a long time so please so first of all are there any questions before we conclude for today I will stay a couple more minutes but if you have any questions please help us designing first future event by filling in the evaluation form that I think Emma has shared in the chat before leaving and anyway thank you for taking part to this workshop I really enjoyed interacting with you on this and I hope this will serve you with the rest of your research feel free to be in touch if you have specific questions that I haven't covered into the workshop but I will stay a couple more minutes now for those of you who have the time if you want to ask any extra questions or make extra comments