 two minutes past two, so we could probably give it a start. Okay, so hello everyone. Welcome to day one of this introduction to Quantitative Time Diary Analysis Workshop that is jointly co-organized by the Center for Time Use Research and the UK Data Service. My name is Pierre Walterie. I am a research fellow at the Center for Time Use Research and a research associate at the UK Data Service. We are going to go through a number of things today, but the first thing I wanted to check with you is that you have signed the licensing agreement so that you can use the MTS data that we are going to use for practice exercise later on. So if you haven't, please do it now. I am going to share again the link in the chat. If I find the role, yes. So I am sharing this again. So that is the form that you need to fill in order to be able to use the data for the exercise. Again, if you haven't filled in that form, please fill it now so that you can use legally the multi-initial time use study data for the exercise. So the course material will consist in presentation slides that I will show and comment on this afternoon. And then when we will reach the practical, you will be able to download some data. I will share it maybe during the first break today. Data that I will share with you data that you can then try to work with on your own computer. There will be a workbook and more generally the course material available on GitHub. I will share the link as well during the break in about 45 minutes so that you can download what you need on your computer. Okay, so that's for the practical aspects of what we are going to do today. So now maybe I am going to go a little bit more in depth into what we will be doing today. Okay, so as you are aware, this is a two-day course. So today, unfortunately, maybe the more boring one of the two days because this is the day during which I am going to talk the most. It's unavoidable that some background and some maybe milestone of time that we research is part of what is being told to you today. But we will promptly move to more practical hands-on sessions in which you can yourself discover and play with time diary data. Today, as I said, we are going to, I will probably spend the first session mapping out the origin and milestone of time diary research and then we'll have a short break. And then I will talk about more specifically the structured design of time diary data as it is available in most common time diary surveys. And then the last session and maybe if I spend less time during the second session, we may start during the second session, we will start computing simple time use estimates. Next week, we will mostly spend our time on the laptops, computing a little bit more advanced data visualizations, time programs, looking at specific time diaries that are called work schedules and also examining issues around data quality and weighting. And if we have the time, we'll do also some attempted multivariate modeling using time diary data. So this course is tweaked or is targeted at intermediate users. So we will be using for the demonstrations as well as assuming some basic knowledge of statistical concepts. However, I won't have the time to explain issues or how to use R. So if you're not familiar with R, bear in mind that you will have the time maybe after the session to try to reproduce what I will be demonstrating. However, the syntax that I'm using is certainly not very difficult. Okay, so that's all for this starting point. Now I will jump immediately into the first presentation. So I am assuming that everyone can see this properly. Give a shout if you have issues visualizing the slides. If you have questions about what I'm going to talk about during the presentation, please add them to the Q&A or type them in the Q&A space of Zoom. And I will try and answer them as I see them. Great. Okay. So to start with time diary or time you study, maybe we may probably need to get a reminder that humility is something quite important. Time maybe by comparison with other things, we, other phenomenon we are trying to measure capture with quantitative social science is maybe the ones that is, that lends itself the least easily to such quantification. And this is of course inspired, this image is of course inspired by Salvador Dali. Okay. So I am basically going to talk about three things here. Brief history of time you studies. I will present the main features of time you surveys and time diaries as used in quantitative, in most quantitative studies. And then I will show maybe I will, as an example, typical kind of analysis that we can do with time use analysis or with time use data. Okay. So we'll start maybe with a big question. What is time? I'm sure that physicists have their own answer to that question, but from the point of view of social science, there hasn't been actually that many attempted theorizing time. One of the most famous person who's studied time from a sociological theory point of view is Barbara Adam. And she, we, she has, considered or proposed that time is, or should be considered as an implicit knowledge. It's something that we all know about, but we actually struggled a little bit to define or explain what we actually mean, understand, feel, experience, precisely. However, she proposed two kind of approaches to time. The first one is what she calls a delimited time. Basically, that is a time that indeed we can measure and divide or that we have in the course of the recent history of humanity, increasingly measured and divided by clock calendars and other devices. Delimited time is, and at least with the rise of Western civilization, increasingly seen as universal and following invariable rhythms. So each day has the same length, the same months are the same everywhere. And also the idea that time is a limited and often commodified resources, especially of course in capitalist society, in which increasingly, again, a speed and optimization of time during the most with the time we have is valorized. On the other hand, there's also what she calls contextual time. And contextual time is basically the time that we experience in a more of an implicit way without being really able to measure it very precisely as we do with a clock. And it has different aspects. So there's the idea that there's a natural dimension to it. We all experience seasons, we all experience planetary cycles and we are all subjected to it. There's a new direction, which is always the same and which is irreversible. And also this, contextual time is not unique. There are different experiences of it depending of who we are. Of course, we can see different cultural experiences of time across societies, but also between other beings, things as even if you don't think much of trees, think of the experience of time from the point of view of a tree as opposed to that of a butterfly, for example. So all these beings have different experiences of time. We are not going to talk much about trees today, but it was just a way to put into perspective what we are trying to work with here. Needless to say, the time that we research is mainly or only preoccupied with the limited time, even if it's only one aspect of time. Okay, so how has been time studied in sociology? So time has been studied as either an intrinsic object of study, time for the sake of it, so to speak, which may be probably not the most common way you may have encountered it if you have a social science background. So there's been some work in social theory looking, for example, at acceleration of time in modern societies, or social times and social reasons. What are the times we share with others and what is the trend in our societies with shared time? Is there more or less and less such shared social time? And also maybe looking at the classic of sociology, there's been also a discussion of the role of time, measuring time as a way, or as part of the ongoing rationalization process of modern society. I can see that someone is asking for the slides, so I could probably share the GitHub page here where they are. I'm going to share them in this chat. Okay, so you should see the two files that I'm using for the slides on this GitHub repository if you want to see them now. Okay, I'm going back to the presentation. Okay, so as I said, time is an intrinsic object of study, and this is where we are going to spend a little bit more time. Time as a tool to investigate other social issues. In a way, times as an instrument. We study time as a way to get a better understanding of other social issues. You are, I'm sure, aware of what I mean here. So a broad area in which a lot of time has been spent looking at time is productivity. Organizational studies, how can we maximize the production or productivity of workers? One of the precursors of such a study of time were in the 1920s, the scientific organization of labor. So people who were trying to time every single gesture workers were doing in factories, in large-scale factories, manufacturing factories. There are more subtle ways of looking at that nowadays, but this is still a preoccupation across some areas of organizational studies. Time is a way also to look at inequality. So there's the issue of time poverty. People who are able to buy other people's time who themselves may have not enough time to do the essential of reproductive work in their life. And also an area that has been written a lot about in relation to time is the gender division of labor within the household. And we'll definitely come back to this later. But for a long time, these attempts or interests were limited to anecdotal evidence or limited evidence because of lack of systematic large-scale empirical data. So what we can see, and it's partly mirroring what I've just said now, there's been a growing interest or there was a growing interest for systematic study of time, of the way people spend their time across or alongside the 20th century. So I will refer you to the book written by Gershuni maybe later on, if you want to enter into the details, but just a few milestones. So in the early 20th century, one of the first systematic time diary study that could be identified was one looking at peasant households in Russia in which people were already trying to understand how farmers were spending their time in order to help modernize agricultural production. Similar studies, but with another scope, were carried out by the Fabians in London with a view to try and understand how people in poor women who were seen as mostly responsible for the household at the time, how women in poor households were spending their time and how maybe things could be improved for them by amending the way they spend their time. Further down the road, similar studies were carried out by Soviet economists at the same time as the scientific organization of labor was progressing in capitalist societies. The US Department for Agriculture, also USDA, spent some time looking at how farm workers, actually around the time of the Great Depression, spent their times and also later educated women. And so this was the idea of understanding changes in gender roles in the household. In a more market research type of approach, the BBC conducted a lesson or surveys. As early as I understand, in the 1930s, the objective was to obviously see how people would spend their time in order to best tweak and tailor the programs that were being produced. And not to mention the pioneering mass-observation studies that was carried out in the UK, in which everyone, a large portion of the operation was asked to fill in a diary of what they were doing, of what they did during a single day. So just an example, a series of anecdotical, but maybe also examples showing that there's been an increase in interest for time diary, the way people spend their time over the course of the 20th century. So as I've already said, people did that, organizations, governments, corporations did that in order to better understand on monetary economic productivity, to better understand and control labor force behavior, sometimes also to feed into social, and in the case of socialist economy, central planning of the economy, but it was not just limited to socialist or communist economies. Yes, understand consumer behavior, investigate social issues as was the case of the Fabians. And if we want to really, really get to the founding father of sociology we could do, we could see that as a part of the continued rationalization process controlling, improving, optimizing modern societies. Okay, but now enough of sociological theory, let's look at the empirical studies as they developed over the course of the 20th century. So the really important milestone in the development of time diary studies was Alexander Salais the use of time. Some of you may have heard of it. It was really a pioneering studies conducting in the 1960s. You have to imagine, so it was in the middle of Cold War, but still this researcher who was actually a mathematician, Hungarian mathematician decided to set up a consortium of research team in 12 countries on both sides of the Iron Curtain. And with a view to study the way the people in mostly urban households spent their time. It is pioneering because of this large scale multinational approach and it is also pioneering because this is how the now broadly followed structure of time diaries was invented. So time diaries in which people are asked to fill in diaries in which they look at what activity they do at each time point where they carry the activity with whom for 24-hour period. So that was a study carried out in the 1960s and the book, the use of time was published in the early 1970s. The second important milestone in the study of time use was a creation in the 1980s of the multinational time use study by Jonathan Gershuni, who is still co-director of the Center of Time Use Research. And to this day remains the main source of harmonized time use data. This basically comprises 55 years of study, worth of study, about a million time diaries in 30 countries and 70 surveys. I'll come back to this in a little while. But it's not by far the only comparative study that is around. There is the United Nations ICATUS that is growing in importance and recognition. And that's the harmonized European time use study. Each one has a different purpose slightly. I'm just going to go back to this in a moment. And then apart from this multinational effort of designing harmonized comparative data, you have also a lot of single-country time diaries studies that are regularly conducted. The main or the largest ones are probably those from countries such as the US with the American time use study or India, Indian time use study. But there are lots of other countries in which time diaries studies are conducted. I may be able to show a few things here. So that's the website of the multinational time use study. Just to show you if you're interested. So that show you an idea of the countries that are covered by the harmonized empty US data. And this is a large chunk of the studies that are really downloadable if you're interested. Another one, I think I had the... Here's the... That's not what I wanted to show. Yes. So that's the page of HITUS. So HITUS is basically the EU or the European Union's version of harmonized time use survey. It's a series of definitions about common norms that EU member states have to follow when producing time-dary data. And I think there's a requirement for EU member states to produce time use studies at least every 10 years, which increasingly gives rise to an interesting repository of comparative data. OK. So I've just shown the web page of the empty US. So yeah, this is just a focus on the most important fact about it. I'm not going to spend more time on that. So now, OK, so we've had a broad overview of what kind of time use studies have been conducted so far. But what do time use survey actually look like? And I will take the example to talk about this, the example of the 2015 UK time use survey, because it's a relatively good quality study and also it's closer to us. So the 2015 UK time use study whose data you can download freely from the UK data service is made of about 16,000 dairy days produced by 10,000 respondents in 4,000 households in the UK. So everyone in the household that were sampled and who was aged eight and above could fill in the survey. In order to have data that's represented for the whole year, so it was the survey was conducted throughout the whole of that year. So not just a couple of weeks or months. And maybe in a way to have a first understanding of what time dairy surveys are about, it's basically made of two components. On the one hand, there's an individual or person level survey in which the traditional questions people are asking surveys will also ask to respondents, such as what is your age, what's your job, et cetera, et cetera. So indeed a time diary. And for a time diary, people were asked to fill in as minutely as a 10-minute interval what they were doing over two 24 hours time periods. It means one day at the weekend and one weekday, both of which randomly allocated. So from the point of view of the time diary, therefore the unit of observation is not the person anymore, but it's the day. So this time-in-time slot is also called the resolution of a time diary. Not all time diaries have the same resolution. Some are less precise than others. So there are time diaries that have 15 minutes, for example, resolution. So now what is recorded in time diaries? So the first thing that's being recorded is, as I'm sure you are already aware, what people are actually doing. So what their activity is. And usually most time diaries are made of or asked about the main and secondary activity. So what are you doing? But if you are doing something else at the same time, what are you also doing? Main, secondary activity. I'm eating dinner and watching TV. Or I am talking to my child and at the same time as stroking the cat. Second series of information is about the context of the activity. So where was it taking place? Was it at home, at work, somewhere else? Indoor, outdoors. Another important piece of information that is common in almost all time surveys is the co-presence. Who else was present as I or the respondent was conducting this activity? So was it family? And if so, who? Children. People flying out to the household. Or even coworkers. So these four elements are what is really the core information that is provided in most time use surveys. And any unique combination of these by conversion is called an episode. Each time a combination, a unique combination of these changes you change to another episode. I will show that a little bit later. Back in 2015, the UK time you studied innovated in a number of features by also recording whether activities were conducted while using a device. So it was not just were you also looking at a device while doing the activity but were you conducting the activity through a device? This was the time when one was beginning to really look at the way we were using such devices as mobile phones, for example. And then another important kind of data that was collected was immediate well-being, something that was developed by Kahneman among others. So a measure of enjoyment. So in addition to the activity that they were filled in, people were also asked to rate the level of enjoyment they experienced while carrying out this activity, which opens the door to a really interesting field of study of immediate well-being as opposed to the traditional life satisfaction based measure of well-being. And of course, one of the key things about this another time use survey is that they rely on harmonized nomenclature. So at the time, the diary was filled in as my hand so people were literally writing on a form of what they were doing and this was then recorded by the survey company. But they were following or by recording the activities recorded by people. They were following harmonized nomenclature, which was indeed the harmonized European one hitters that I've already mentioned. So that's what the diary, pen and paper diary looks like. And it's indeed drawn from the example that were provided to the respondent of the 2015 UK time use study. So you can see that it's basically like a table, which each 10 minutes episode or 10 minutes slot, so it appearing as a line and each one of the dimensions of the time diary that I've presented appearing as a column. So that's the column for the main activity here for a secondary activity. That's the device column and co presence, etc. And you can see at the last column with a range of from one to seven is where people were asked to rate their level of enjoyment of the activity they were carrying out. So something interesting here which you may have already understood is that obviously the fact that we have a 10 minutes resolution doesn't mean that people fill in something new every 10 minutes. Some of the episodes last more than 10 minutes or the activity last more than 10 minutes and there can be a difference in the degree to which respondent are conscientiously fill in that time diary. So we have people who are a bit lazy and who don't fill many activities in their diaries. For example six or seven, I just got up had breakfast, went to work, etc. or by contrast people who are very very detailed and provide sometimes 25 plus episodes and activities in their time diary. So what are the common estimates that researchers derive from time diaries? So the most common ones are as I'm sure you can gather the time we spend on activities, the duration of activities. And that's of course that opens the way for really interesting comparison between genders, between social groups of the way or the duration of some activities by comparison with others. Then there's the sequencing that is the order through which they take place throughout the day, which can be also an interesting field of study, think about meals for example. And then you can also look at the probability of the importance of activities on typical days given the characteristics of some people. How likely are some people to be working on a Sunday for example. And then yes, you can also as I've said, compare the importance in terms of occurrence or amount of time spent compare the importance of activities between people. And of course change over time between surveys since we now have been collecting time data for some time now. So this is maybe a very basic example of what the sort of thing you can look at or research with time diary data. So what is it? So it is presenting some data for the UK across 50, a little bit more than 50 years time frame showing the amount of time respectively respectively women and men spent on four broad types of activities. Sleep and personal care, leisure activities, paid work and unpaid work. And what does it show? Well, it shows maybe the first thing I don't know what comes to your mind. The first thing that came to my mind when I saw that for the first time was things look pretty stable. We don't seem to have spent to have changed very much in the amount of time we spent sleeping or even on leisure and recreation activity. On the other hand there's a really, really major difference in the way back in 1961 men and women spent their time in paid work and of course there's a the major change here is that men spend less time on paid work than they used to. Women spent more time, clearly more time in paid work than they used to. But that is not fully compensated at all by a decrease in the unpaid work that they do. So, yes, just right up of what I've explained here. Yes, so that's basically the conclusions I've just described. The fact that we are looking at activities over 24 hours means that whatever we look at it will always add up to 1440 minutes. Okay, so I will now present another type of analysis which looks more at the sequencing of activities throughout the day. And this time we are using a slightly more complex typology of activity. We are moving from four activity to eight activity. Okay, so that's what is called a temp program and the good news is to do something like this. So what does it show? The x-axis here, the horizontal axis show time of the day. And the vertical axis is the proportion of people engaged in one of the eight activities that are described here. So these two plots here show the proportion of these 24 hours period, respectively for men and women in 1961 on a weekday. And of course you can see that the main difference that was really visible is the way women were doing a lot of what was basically half of their day during daytime hours doing housework, cooking, cleaning as opposed to men almost exclusively doing paid work or if younger education. Such plots also is interesting because it shows the rhythm of the day so you do think you obviously do some sleep personal care at the beginning of the day then you are engaging in some activities then you have a break and then whoops you start again engaging in activity and then in the evening you spend more time either unfortunately for women at that time in the housework and paid work or indeed leisure and what is also interesting is the way that leisure was structured so men spend more time looking at watching TV basically whereas women were engaging also in other types of leisure activities now let's jump back let's jump to 2015 and what can we see well there's more there's a bit of a change as you can guess I won't go through all the details here because I'm running a little bit behind but I think the main lesson here is that if there's still a discrepancy between the amount of time and the propensity for men and women to be engaged in paid work at any point things are the differences are narrowing so that mirrors in a way the other plot I showed earlier the vertical bar plot the other thing is as you can see there's been a collapsing a little bit of the needs bumps that we could see for 1961 which means that the day of the typical days of people are becoming more and more heterogeneous people having invariably their break at lunchtime or the same for the same amount of time etc etc is decreasing and also you can see here that leisure takes place at also other times of the day than just the evening narrowly defined as was the case in 1961 ok so to summarize this overview I've shown that over the course of the 20th century there's been a clear rise of interest for trying to understand the way men women and household spend their time there's been an increasing time-dairy data collection especially from the 1960s onwards and we have now significant amount of data that allow us to do some interesting comparative time use research so time-dairy is as I said record primary and secondary activity in their context location presence and the first observation we can have from looking at some of these comparative data over time is that we can see both stability and change in broad daily types of behavior and big area of change being the division of paid work between genders so I will stop this presentation here and maybe I should stop sharing now I am going to maybe move to the second presentation so in which we are really going to look a little bit deeper into the structure of time-dairy data and you are going to see that some of the questions you were asking are hopefully answered in this presentation ok so given that this is a type of study with a number of specific data and recording so there is some specific vocabulary that is used usually by time-use researcher so to look at three key notions activity and we have just talked about that and activity simply what someone in the time-dairy records as its main or its main or indeed secondary action so you can have two types of and I will show an example later you can have two types of recording this activity the independent paper diary people just write down what they are doing and then it is recorded or in more advanced type of online time-use service then you select an item on a menu or you tap it on the screen of your device and yes, so multitasking is usually recorded in time-use studies, people are offered the option of filling it up sometimes to three simultaneous activities and it is usually also left to people to respond and to decide which activity is the main versus secondary one there can be some social desirability there because when you have a father who says that he is looking after a child and watching TV it may be a bit difficult but what is really the primary and secondary activity sometimes there is some room for interpretation there ok, so that is an activity, an episode that we discussed is a unique combination of the main for type of variables that are collected by time-use survey that is primary activity, secondary activity co-presence and location and just to show an example here in the slide I show how to move from one episode to the other for example doing episode 1 and 2 the activity is not actually changing so I can start by reporting that I am watching TV while eating crisps alone at home so that is one episode but as soon as someone joins me as soon as my son for example joins me in the room then that becomes another episode and even if my main activity watching TV doesn't change and then of course if the main activity for that matter the secondary activity changes as well then that is again another episode and the duration of episodes they vary from a very short amount of time to much more, you have respondents that report very long episodes of paid work for example and of course one of the episodes that is very common to last for several hours is sleep sleep at night obviously because nothing changes much during people's sleep time and then the third concept is that of time slot time slot is simply the minimum duration of an episode resolution of the time diary okay now this plot here is a way of illustrating the data structure of time diary surveys in a simple way and of course we are talking here about the time diary data because as I've said earlier time use surveys have also a person level questionnaire component that is more akin to traditional surveys so here we have just starting from the more specific level so we have the time slot we have talked about but these time slots can be or are indeed embedded within episodes and then the episodes are embedded within the days of the respondent and the days that are part of the time diary, the diary days are indeed themselves also embedded within people, within a respondent most of the time you have two days per respondent less often you have one day or sometimes you may have seven days there are a number of time use surveys that have been asking respondent to keep a time diary for seven days and of course respondents can be also are indeed embedded within households and depending on the survey design of the time diary survey the last three levels can be identical or can differ if I am correct the American Time Use Surveys looks at one day per respondent so in that sense the data is collected at person level because we only have one day per person and in the case of surveys in which only one person is asked to fill in the survey per household then these three levels coincide so we have a single day or a single person within a household in the case of the UK Time Use Survey as mentioned earlier these are all different because we have several people most of the time interviewed by households and for each one of them we have two days of data so that's the basic structure of time diary studies ok so now in terms of files as I said we have different types of different ways of recording time diary data but in most of the case alongside the time diary there's always under one form of the other individual file or person level data basic or less basic socio-demographic characteristics are recorded and then some surveys may also provide day level files which in the case for example of the multinational time use studies are called aggregate files so in such files each line of the data set records a day which means that there are often two and sometimes more lines per respondent and these data sets usually comprises some pre-computed variables so typically the time spent on some activities so that maybe researchers statistically mind it can just compute estimates for the data without having to compute the duration themselves and also they may include some questions that were asked about the day to the respondent so for example some time there is survey was this particular day for which you filled in the diary, a rushed day or did you feel stressed or is this a normal day, a typical day so these are day level variables that would also be in such a day level file and then all the version of the data of time use data may also include time-dairy data in wide formats I will come back to this later so the most two type of file that one comes across often when time use data, individual file, normal survey data or classic survey data and then day level file with some pre-computed aggregate variables as well as day specific information okay and now a third type of data structure and file that you will find yourself having to deal with when downloading time use data or time use survey is episode level files in long format so the long format simply means that the level of observation by contrast with the day level file the unit of observation, the line an episode and so therefore you can have several, sometimes many such lines, rows in your data set per diary day and a back extension per respondent depending on this person as I've already mentioned you can have more or less episodes reported on a typical day but usually on average people report 15 episodes so this file format this long file format is quite intuitive but and that's the reason why in older survey it wasn't commonly used, it requires more storage space and computing power than data in the wide format, the wide format is basically format in which each episode is recorded in as a variable as opposed to a row in a data set so the table there shows in a really summarized way an episode looks like an episode file in long format looks like so we have episode numbers here we have the person number, the day number the duration of the episode and obviously what the activity consists of as well as the start and end of episode so what do these numbers of start and end mean they simply mean cumulative minutes starting from zero until 1440 which is the basic way of calculating time in a time diary survey the first episode is some sleep, 6 hours of sleep that's 360 minutes followed by a 20-minute shower and a half an hour breakfast I didn't show co-presence but we could have had a difference in co-presence during these times so that's a typical episode level file in long format and we will be working with such a file in a moment another way of working with time diary data is to consist in simply dealing with a slot level data set each line of the data set consists of 10 minutes time slot if 10 minutes is the resolution of the time diary survey they are not that common because they require a lot of storage more than episode files and basically they are not strictly necessary except for some specific type of analysis but such slot level data have so each time diary is comprehensively recorded in such data set so for each time diary we have 144 lines corresponding to 10 minutes slots and yes, a slot ID is necessary and another thing to keep in mind and that's not specific to time slots data, that time use surveys usually consider that the day begins at 4am and ends at 3.59am this is because for some people who have certain type of work schedule it is easier to consider to leave the late night as part of the previous day rather than the new one ok, so these were the most common type of data structure for the record I will also mention the more historical data structure which is the wide format has a data structure that is similar to the aggregate file in the sense that each line represents a day but by comparison or contrast with a long format either time slots or episodes are represented as variables which means that in order to consider the time slot case we would have a first variable for what's the activity in time slots one and then and so on until activity for time slot 144 which means that you would need 144 variables to record the primary activity is 144 variables for the secondary activity 144 variables for the co-presents and 144 variables for the location of the activity which may be a bit cumbersome and less intuitively easy to understand as time diary in long format. Modified version of the same logic consists instead of having time slot variable you would have episode variables you would have activity for the episode one and then and so on until the last episode and of course in that sense you would the number of variables non-missing data would vary from respondent to respondent because the number of episode varies between respondent and time diaries but given the way our computer works it usually is faster for computer to compute or deal with time diary data in wide format it matters less now and that's why it's probably as computer becomes more powerful have become more powerful so that's why probably it's used a little bit less. Ok I think I have it's something I've already showed in a way so the typical variables you will find in most time diaries are variables that record primary secondary activity they will have different names depending on the data producer so in the case of episode files because of course unlike time slot data the duration is not implicit you will have the episode duration you will have the time so usually start and end start time or incremental time if it's time slot co-presents location and then also if it hasn't been removed for protection against data disclosure day of the week month calendar date and whether this was filled in as a first diary or second diary and as I already said there are other ways other things that have been collected more recently information about whether people felt rushed on the day enjoyment device use and then work specific data structure that I will cover next week and now let's move on maybe to the way activities have been standardized so as I said until recently activities have been recorded on pen and paper diary which was historically costly way of conducting survey and this is also part of the reason why you have comparatively sorry a few more time use surveys than other surveys because there are more costly to administer nonetheless it's indispensable to standardize activities if you want to do something about robust and comparative research and so there is a proliferation of norms and guidelines you could argue that Western nomenclatures such as the empty US haters or the American time orders such as a little bit different in order to reflect the more diverse way people spend their time outside of the Western world to give an example so that is the beginning of the haters you will note sorry that as with other nomenclature you can have different degrees of precision you can look at three digits activity in which you have a relatively detailed description of your activity so washing and dressing, personal care service using a personal care service etc so this is one digit which is other personal care here so that's just no it was two digits sorry the one digit is personal care full stop so that's an example coming from the harmonized time so by contrast this is a clip from the Indian time use nomenclature and it is interesting because it shows a much greater detailed spend on recording agriculture agricultural activities given of course the importance of agriculture for the Indian economy still now and it is something that you will not find in certainly not in that degree of detail in the harmonized European time use survey for example there is a question from Carol so what things would be included in personal care services well I suppose that that would be things where you pay someone to to take care of you for example that would be probably going to a hairdresser and things like that or if you are an elderly person and you have paid for care at home that would also be under that category if you go through the the original document of the heaters there's much more detail than obviously I'm giving here okay then another example from the time use the American time use survey and again the 80 US has this interesting characteristics that it's much more detailed than the European versions or the empty US version for that matter so as far as kissing hello or goodbye to someone meditating which is something you usually don't find in most of the time use survey so it has a really narrow or rather specific fine grained definition of activities but of course it supposes that you would need to have enough respondent who are filling in that level of detail in their diary by contrast the empty US is made of fewer categories simply because the main goal of empty US is to harmonize as many time use national time use studies as possible so that comes at the cost of the precision of what is being measured so that's an example from the empty US documentation of the type of activities that you are looking at or that you can compare countries that are part of empty US with and of course as time use researcher there's a trade off to be drawn between really specific high quality data which you will probably only find by looking at a single or limited number of countries or broader more vague data but on the other hand that allows you to compare countries and also maybe countries over time I wanted also to show you maybe before I move to how to work with data to show you an example of online time diary if I find it so the center for time use research as well as other research center have been experimenting with other ways of administering time diary or recording time diary survey so that's just an example of online time diary survey here so that's an instrument that's not really the latest version of this online instrument because this one was designed before people started looking at device, mobile phone based time diary instruments so you can see the shape of the screen shows that it's designed for laptops but there are now mobile phone versions of comparable instruments and they are really versatile and they are currently being administered so the latest version or a latest version of the UK time use study currently being administered by and has been collected using such an instrument of course there's a limitation here and as you can guess if you are not able anymore to writing what you are doing then you are more constrained so the diversity of what people are feeling in their time diary is constrained so back to the presentation at the end of the presentation now how in a very concrete way how do we produce time use estimate from time diaries the simplest way if you are not very familiar with working with data or if you just want some quick estimates could be to simply look at pre-computed aggregate variables in the survey you are interested in, they may already have been indeed computed by the data producers for you and you just need to look at your means or distribution functions for these variables but if you want to compute and that's what most researchers will want to do to compute your own then that usually entails four steps so first of all you need to record or flag the time use activities in the original episode that you are interested in so basically as you can imagine given the large number of activities recorded one needs to simplify one way or the other the original data so that one can produce intelligible results and then once this has applied the second step consists in summing the time or adding the time spent on these activities over the 24 hour period can be for the period of time if that's your research interest so within day and also maybe within person and then at that level at that stage sorry you are already able to compute descriptive statistics next step could consist in merging that diary level data in the diary whether it's episode or the file with the person level information you are interested in to then produce comparisons for example by gender, age, social status etc etc so these are basically the four typical steps one would follow when conducting producing time-dairy estimate of durations in this case so that is the end of the second presentation so before we move on to the practical I would like to ask you again would you have any questions is there anything that is unclear hello again everyone so has everyone managed to get access to the data the way I suggest we work is the following so as you've seen so the data itself consists of two data sets day level and episode level data set and we will be working with them using the R syntax now what we are going to do now or what we are going to practice is described on workbook that is available on github pages and I have shared the link here so I am going to share my screen again so that is the workbook which is basically an annotated R code that I am going to demonstrate now if you are not familiar with this type of format each box here which has R syntax can be copied simply by clicking on the little notepad icon here if you click here you can then paste it readily into your own sorry our interface editor so I am going to demonstrate this using RStudio and as I said in the workbook I am just keeping things simple so I am just working with two windows version of RStudio I am leaving the other two aside for now so that we can just focus on the actual syntax and the output so is this clear for everyone or are there any issues ok so let's start so the first bit is uncontroversial if you are familiar with R so we just clean up our workspace here making sure we remove every objects and then we load the libraries that we are going to need which is deep layer as data manipulation library ggplot niceplot and haven for importing SPSS or STATA syntax most of what we are going to do today is relatively basic so relies on for the rest on base R functions and libraries so the only bit that is a little bit more difficult depends on setting the working directory that is suited to your own computer so in my case on the windows computer I am working now this is where my working directory is and you will notice that the data is still on my computer is still within another data folder here but you can adapt it to your own so I am going to run this I am just checking that it has done what I was asking get working yes that is the right directory so the next stage consists in opening the episode dataset so mtusteachapp.dta means it is a file in STATA format so what this is imbricated or embedded comments here so the core of the command is a haven command read underscore dta that converts the STATA dataset into a haven object but as we want to keep things simple we are converting this haven object for now into a dataset that has only that is a data frame ok so that is the first thing but then we also want to clear up a little bit the data we want to keep things as simple as possible so we remove using deeply as pipe here and select function we remove variables that we are not going to need so the wave indicator because we don't have logistic data core 25 which is a nomenclature of activities with a smaller number of the main mtus ones and we also given the type of analysis we are going to do we also remove children and it is hopefully opening the data if I am too slow or too fast for your own pace feel free to do things at your own pace and following the instructions on the workbook ok so now I have the data I can have a quick initial look so what do I have so what is the size of this episode of this episode so I have 1,800,000 plus episodes and 20 variables so there is a relatively large number of observations for the number of variables that are not uncommon for long type of data in long format so the next thing we can do is I suppose each one may have their own habits is maybe to look at the variables that we have so for example if I type names app so that gives me the names of the variables that are in the data set but as they are stored in the data if by contrast I type ls that would give me the alphabetical list of the variables what else may we want to look at well we may want to have an initial feel for the data so let's go for it and indeed we can see that these are the first 6 lines of the data so what can we see we can see that these first 6 episodes come from the Spanish study from 2009 so these are I'm surprisingly all about the same person the same day household ID, person ID if you aren't sure about the meaning of these variables keep hold on the variable description I have provided in with the data so the year the entire was collected and that requires a little bit of comment here but just let's go to the main episode variable takes a little bit of time because there's obviously lots of observation and let's just show basic frequency tables of observations for the main activity if you're not looking at codebook you can see that most common activities such as sleepings are like little bit coding too because these are ones which have the most the largest number of observations so it means as well so we have that's the main variable if you're a little bit more seasoned more seasoned user R and haven and you haven't converted as I've done it here the data frame into or rather the haven object into a data frame you may be able to visualize the STATA labels as well with plain English labels okay so that's the main activity I would be curious to see if it's going to work so haven offers way of optionally visualizing the level of variable or factor variable so I'm using here the same function as I've used previously on the whole they said but for just the main activity and also on incorporating the label using the s underscore factor as opposed to s dot factor which is a base function allows to not only convert an existing haven object into a variable from a haven object into a factor but also to use STATA labels value labels as levels of the factor that's for the technical bit but as a result you can see what corresponds to the first six episodes for that person so sleep and nap and then meals watch self-care and then some walking which is not surprising okay so if you want more details on the variables you can ask for how to show you its class and it will show you a different component of the object so the last one being the actual numbers and the other two corresponding to the variable value labels and variable labels for the anecdote I think there's an option as if you say that if I'm correct yes so if you specify both as an option to s factor it also in addition to the actual value label it gives you the actual underlying numerical code so that allows you to or that is sometimes easy an easy way to having some codebook information without looking at the documentation so for example you can see that walking is activity code 43 okay now let's keep on exploring yes it's empty with data so what country do we have so there are different ways of exploring data I'm pretty sure you have your own and also if you have if you find better ways of doing things that I'm doing in the practical that are more efficient coding feel free to suggest things as well I would be more than happy to discover new ways of coding okay so let's look at the country in the data okay so that just shows the number of observations we have for each country we want to have a look at the variable it's a character variable so it's not a factor so it's an alphanumeric variable okay and we have a lot of observations from France and Spain and the UK and a little bit less from the US and the Netherlands okay country but then what else can we look we may want to actually look at what country we have what country data we have for each year so it's quite simply I can just copy here the code let's just do a cross tab and again there are different ways of doing cross tabs in R but I'll just use that one using the X tab function so in the X tab functions you need a formula you use you specify your variables using a formula format so you need a tilde and the variables in the singular or plural are on the right hand side of the formula object so here it's a cross tab of country by year for the date set and we can see that yes some data is more reason than other the data for the UK actually comes from the UK time you survey have talked about but you can see that there are different numbers for the same study it's actually accounts for the fact that even if a study is conducted or is seen to be taking place in a given year the actual field work may span over the calendar year so the 2015 UK time you study for example actually took place I think between July 2014 and July 2015 even if it's 2015 study so how I proposed to create a study variable in the data to allow us to identify uniquely each study so I'm creating a study variable which is the concatenation using the paste function of the respectively country here and survey variable I think this is a separator it can be a high funnel just a space and I can look at the result and now we have a clearer way of identifying studies ok now something else we may want to look at now that we have a unique study which is to have a look at the number of days each study in compass so I'm just doing a cross-tap here of id which is the variable that records the diary so if you have a single-day diary it's a constant for one and then if you have a two-day diary your first diary will be coded as one your second diary day s2 etc so what we can see from here is that the diary has the number of diary days between studies so we have two countries for which there was only a single diary so the Spain and the US and we have France and the UK which have the more common two-days diary which I have mentioned before and interestingly the Nethanus has gone the full seven-day so week-long diary so with a seven-day intercept of course having more days comes at the cost of having fewer observations for a given day so this is why the number of observations for each day is clearly smaller than those from the other countries okay so now we have a sense of the number of observations we have or the number of days we have so from here with R I can show you that there is an easy way of getting the frequency not the frequency, the proportions or percentage for these results so if you embed the x-tap function into a prop table function and spacefying the option 2 for column percentage that gives you and then multiplying the results by 100 to turn them into percentage that gives me for each country the proportion of observations for each day and I can run that to a single decimal to make meter and in most cases except for France the countries are being the days sorry have just been observed in a similarly fairly balanced way so you have the same proportion of days per day of the week okay let's move on okay so as I said earlier we may or we may not want to look or use the more complex functionalities from the Haven package but if we do and we can produce a relatively neat overview of a person's day so here the code simply recode new versions of the main activity variable the location and a simplified version of co-presence alone which indicates whether the respondent was alone or not when she or he conducted the activity and as factor with the value labels the state value labels which makes it easier to understand if I do that then I can again request it and that makes it easier to read the day of a person there are more illustrations here of such exploration of diary data so that's basically using the print function I'm just asking state I think the print function is not strictly necessary if you work on the console version of the data but it shows the first 20 rows of the data and as you can see at least the first 19 rows show actually the full day of that person and that gives maybe a first first-hand sense of how a day can be visualized the day of a person can be visualized in a time diary so I can see that the person sleeps obviously then has some food does some washing and then cleans, works somewhere prepares a meal and then engages in some leisure activity, prepares another meal watches a film or TV and then go back to sleep, that's a schematized day of that person now okay so this is just an illustration really there are lots of ways and I'm sure you know you have your own ways of exploring data frames now what I want to demonstrate a little bit or to illustrate in the time that we have left is maybe have a first go at estimating durations and from that time diary data so in order to do so we are going to work with the way or the amount of time people spend doing paid work I will spare you having to look at the empty US documentation the codes that or the activity codes we are interested in when looking at paid work are codes 7 until 13 covering respectively paid work in the main job paid work at home, second or other job not at home and paid work to generate household income it's travel as part of work work break other time at workplace so following what I've said earlier we first need to tag or to recode the episode level data so that we are able to identify this work or paid work episode and yes I know that in the example here I am using every all the code between 7 and 13 as paid work even if 10 is not paid work but for the sake of this exercise I will consider the number 10 and paid work to generate household income as paid work feel free to exclude it in your own syntax if you prefer ok so how do I do this there are different ways of doing it but in a simple example such as this one I am just using the base if else function so I am creating an episode level variable here which takes the value of the variable time which is the episode duration if indeed the code of the main activity codes are comprised between 7 and 13 or 0 otherwise so it's basically a variable which actually cause the duration of work related activities so I am just pasting it here and I am running it ok let's just check yes so level relatively low mean but it has created a variable ok why is it so low well the reason the possible explanation is the fact that since we are working with episodes and episodes of many people not all of which are actually working on the diary day so we have lots of zeros so a way of looking at whether the work related duration we have coded is credible is to only look at the summary for those who actually did report some work on diary day still it's quite small why is it small in addition to what I have said it's also small simply because we are looking at the mean of each individual episodes it's an episode level mean or distribution if you want what we want to do here is or what we are trying to do here is to compute day level time spent in paid work so that requires using a little bit more advanced functions of also I will rely on the group by function from the deeply structured package which is quite a nice function so what do I do I simply ask so I'm starting here from the episode for my episode it said and then grouping it by a number of variables so the household person day it will have a different I basically I'm grouping it by day unique days and obviously that will have different meanings depending on the way the data was recorded in some countries that will indeed differentiate between these different levels and in others it won't make a difference because there was only one person for household with one day okay so if I do this I am creating so grouping and then the mutate command creates a new variable which I call WKB which sums for each specific diary day these of paid work that we have tagged earlier so I'm just running this that was quick and yes I can already ask for a quick summary so oops no I can ask for a summary of okay so I have done something wrong here can any of you identify what I have done that is wrong feel free to add it in the chat or in the does anyone comment here what's wrong here is the fact that I am still working with the episode data so I have computed these it may at the end of the day not make a huge difference but I am actually computing a mean over the whole series of episodes whereas the mean that I have computed is a day level mean so I need to tell R that I want the mean computed only for the a single day there are different ways of doing this a way of doing this is for just retaining since we have values of this mean this total duration for each episode we want to retain the first episode for each day oops oh yes now I need to see how it's called actually yes I know what I did wrong I forgot to specify that it was okay so I can see here that's slightly different and that's a correct way of computing a mean of an aggregate variable using day level data okay so we have an average duration of paid work of daily amount of paid work with a mean of about two hours per day that's not very much before I go further down that road I want to show how this can be plotted so I can I show some code in the workbook where I can store the results of a computed mean into an object that I can then plot so I start with the episode data then as I've done using the base R syntax earlier I filter so I only keep the first observation the first episode really the first row for each dairy day but it doesn't matter because we want only the mean computed for that aggregate variable then I group the results by study and then I I'm asking how to compute the mean by working time per day and that's stored in an object that's called REST so that's the results but we can see some slightly different daily means the US have clearly the longest working day and the UK the shortest on average and with the Netherlands somewhere in the middle these type of data can be easily plotted using the base R plot function where I'm just asking to plot all the data that's identified by all and also specifying maybe title which is the main option daily working time in selected countries and x-lap, y-lap for the respectively horizontal and vertical label axis and Yipi have got a graph a bar graph representing these differences so it's the first way of visualizing duration data but still you're going to tell me and I wouldn't blame you for that that it doesn't seem like a realistic way of looking at paid work if we have a way if we only have two hours on average per day so there are two issues that are related to this so the first thing is we need to make sure that we differentiate the results by the relevant units of observation so something here is we have so far looked at the mean duration of work for any day but of course a typical week is made of weekend and weekday and the weekend people work less than on a weekday so what if we look at the results depending on weekend versus weekday so this is what the code below does so it creates a weekend versus weekday variable and please note that the coding of the weekend variable in mtus follows the US convention so first day of the week is a Sunday and then the lines below again I've just put it in a single chunk of code but what it does in a sense is to compute separate estimates of mean working time the same way we've done it before grouping so for eponym is equal to one so and then on weekdays grouped by study and then the mean for that duration so for weekdays on the one hand and for weekend on the other so which means that so we are computing these two estimates and we are adding them as extra column to the data frame or the results of the object containing the results we have already created so if I run this code I've created okay and I look at the results you can see really stark differences unsurprisingly between weekdays and the weekend and again we can try and plot this data and we can plot this data using again the bar plot function you will note that for reasons too complicated to explain here it's easier if the row names contain the name of the study as opposed to a separate variable given how bar plot functions so I'm just pasting the code here so I'm working so that's what I've said I'm using row names for the identification of the study and I'm just creating a bar plot here oops why is it not well you have to trust me that the code works when I run it as part of this workbook so that shows basically the weekend versus weekday differences by country you will see that the main European countries do not have very different durations the US by contrast stands out with longer working days but now another aspect we want to have a look at oops why do we want eponym equal to one eponym is equal to one means that since we are computing daily means right so we want this daily means so we have created the daily mean variable in an episode data set in effect we still have a data set with a variable number of episodes per diary per person etc right so if I want to have to compute a mean that reflects or that gives the same weight to everyone irrespective of the number of episodes in their diary then I need to select only the one observation per diary which is the total duration of paid work for this diary otherwise not doing that may affect the value of the mean that's being computed given the different number of episodes in the episode data set so in effect I am turning the duration of that computation the episode data set into an aggregate data set ok so now the second thing we need to think about and that's really something that is quite fundamental when working with time-dairy data it is basically about what we are interested in and there are two options so the first option is I want to compute estimates that reflect the whole of my data set and why do I want to do that because if I compute estimate with all the data available and all the diary data specifically then I can compute estimates that neatly adds up to a full day as I showed earlier with this comparison of typical days between 1961 and 2015 right but then depending on the activity one looks at it may be affected by the number of people who are not carrying out such activity so it doesn't really matter maybe in the case of activities everyone is doing typically sleeping, eating self-care if everyone has an equal property of having a shower every day but on the other hand in the case of activities that are less common or not everyone does it then that begins to affect the extent to which the means or the estimates we are computing are informative or typical of the values that people do so in the case of paid work and to go to the bottom of things paid work we either have the option of as I've done so far looking at the mean of paid work for everyone in the sample or for only those who reported paid work on the diary day so the positive thing if I start looking only at people who reported paid work is the fact that I'm going to I'm beginning to have estimates that are closer to typical paid work or working days as one can expect but on the other hand it means that I am not each time comparing the same sample because obviously I am only taking the mean for people who reported paid work so to cut it short here I am repeating the same series of computation here so I'm creating a new object with paid work duration that I'm calling resw and I'm following the same logic as before but I'm adding an extra condition so I'm specifying that I only want to compute the mean for those observations where WKB our daily total of paid work is greater than one so everyone who didn't report work for example most people or a lot of people on Sundays for example will not be taken into account so first column will be these people everyone and then people weekdays and people at the weekend and then I am following the same syntax as before to create bar plots and here we finally see results that are beginning to be a little bit more realistic we can see now that the mean duration of the working day has jumped to between 450 and 500 for most countries which means between 6, 7, 8 hours of paid work per day which is much more realistic and of course the reasoning I'm following here applies to any activity you may be interested in researching so having to do the choice between only looking at participants that's the jargon we would use for that versus everyone in the sample okay are there any questions so far? so now I will quickly move to the last bit that I yes we have 10 minutes left so the last bit that I wanted to demonstrate here I will probably not have the time to do the very last one but you will have you can work on that on your worksheet and then I will take question next week so if we are able to compute estimates of duration then it's not very difficult actually to from there compute estimates or probability of participation at least in R because basically the probability as we've just seen of engaging in paid work on a given day is simply defined as the probability or in our jargon the mean of WKB being greater than 0 and so we can similarly compute a dead set which includes these results so I'm using the same logic so I'm creating manually so to speak an object and specifying the columns here so the first one is the probability of engaging in paid work at all on a diary day and the second and the third one at the weekend and each time from the object created by the summarized function of the player I am selecting only the column that contains the result so that I can neatly pile them up into this single object so what does it look like and of course it's all grouped by study as previously so these are probabilities of engaging in paid work by day of the week as in weekday versus weekend and you can see if you if we multiply this by hundreds you can see how you can read into percentages as well and you can also see that there are differences interesting differences between countries clearly people in the US much more likely to be engaged in paid work at any point during the week for example a little bit less so but still more at the weekend so we have these results and again if we want to we can plot them in a bar plot showing how likely respondents in each country were to engage in paid work at any point in time of course this is just the beginning here we are only I'm just sketching or showing how really basic computations can be made but what we are really interested in as a researcher is to look at how these things differ by variables we may be interested in gender socio-demographic characteristics I have five moments so I will go quickly through this so what that means or how that translates into a typical very research workflow is first of all you need to get the day-level data or person-level data depending on what you have which I'm doing here by opening the MTUS teach int date set which has this data and I'm storing it in an object that I call D and also for the sake of simplicity I'm creating exactly the same study variable resulting from the concatenation of country and survey into a study variable and I can see here so I have 88,000 observations and more variables okay so we have our individual data so the rest is partly a matter of taste there are people who prefer to create a small number of variables and add it to the episode file others who just want to merge everything into a single large date set there's no really right or wrong way of doing things so what I'm doing what I've done in the workbook is to create to simply add the work level so the day-level duration of work to the existing diary day-level file so in other words I'm adding my WKB variable to the D date set that I've just opened and I'm storing this as a new object that I call DT and I'm using the base R function merge which has a very straightforward syntax so merge object A here or data frame A I mean D with the second one using study household ID, personal ID and diary identification as matching variables and I'm also asking it not to retain unmatched observations it should work okay so it's done what we want to we can check so it's dropped a few observations which have data that didn't match we can check that it has kept the data the variable with we wanted to add and yes we have now our WKB variable into that date set okay so we now have day-level date set that has some duration of paid work that we have computed ourselves what if we want to look at gender differences along in paid work by country day of the week versus weekend so I suggest here we create a new gender variable which is a little bit more explicit than the one contained in MTS so it's quite straightforward I'm just adding the value level the state of value level so that now the variable appears as directly male females there we are and then I am going back to estimating paid work and now I'm using maybe a more formal approach to this so I'm just looking at the mean of paid work for people of working in most countries so 16 until 65 I am still grouping my results by study obviously but also now by gender and whether it's a weekend or a weekday and simply asking for the mean of course needless to say keep in mind that in some cases it may be more relevant instead of asking for the mean to ask for the median or the measure of central tendency so that computes the value of interest that can be ignored isn't that and I can look and I have a series of values some of which differing rather markedly from other ones I suggest to look at the GT plot the function which is one of the join the crown of the R software it's a really advanced and really good plotting function and to use this plotting function for displaying our data here so if you're not familiar with GT plot it functions as a series of layers in which you specify the it's called the aesthetic AES the main parameters of your graph are the X variable the Y variable so X being as before the study and Y being the time duration and I'm also asking it to create a difference by the gender variable that I've created so it will be using different colors oops, in effect and I then specify that I want my plot to be a bar plot and with the bars being side by side this line is not very important it's just where many specify the colors for greater readability I'm asking for an inverted graph where horizontal bar graph is opposed to the vertical one so that the label here display in a nicer way you can try without it you'll see what the result looks like and finally I'm asking to have one facet one sub plot for one each category of the weekend versus weekday variable and that is what this result looks like so we can see as can be expected duration relatively clear differences in the duration of paid work including of sorry marked the difference in the duration of paid work of people who did work on the day so that doesn't take into account gender differences in economic activity but still even when looking at people who work on the diary day we can see as one can be expected differences in the duration of paid of the paid work throughout the day and this is obviously related to what we've seen earlier in differences gender differences in the amount of unpaid work that men and women do more unpaid work and therefore have less time to do paid work this translates into shorter working days which can be formalized at part-time work in the Netherlands one of the country with the largest proportion of a part-time worker so that shows clearly here and interestingly these differences are less marked for people working at the weekend okay so that's maybe I'm going to stop here for today before I close the session I wanted to ask are there any questions so in the code I have demonstrated so we will be exploring this further next week looking at more specific stuff if you have any question feel free to send an email and I will mention it I will discuss it at the beginning of next session and as I said now that I can see that we have a smaller number if you have a substantive research interest or a project you are thinking or already working on feel free to drop me an email and I would be more than interested if you were presenting it or talking a little bit nothing formal at the end of session next week and also this won't be recorded so that feel free to be shy about being recorded this last session will not be recorded so I'd be more than happy to hear about people's research interest so yes Emma reminds us that it would be nice if you filled in the evaluation survey that will help us plan future even but I hope I will see you again next week at same time so the 17th at 2 o'clock where we will do more exciting things about time programs for example estimating proper survey estimation with time diary data among others I hope you are going to have a nice evening and see you next week, thank you