 Hello, this is the first in a series of three videos I'm running many models within R. In this video, we're going to look at how to create a grid of specifications for the different models that we want to run. We're going to be using the tidyverse package to do this. The tidyverse package is actually a package of packages. It includes several popular packages such as dplyr and ggplot2. But the tidyverse is also a design philosophy. As a result, the packages within the tidyverse share common features and they work very well together. If you don't know much about the tidyverse, I've included some links to resources with this tutorial. I've also created a worksheet which you can work through alongside these videos. The worksheet includes fully annotated code. Let's quickly load the tidyverse package into R using the library tidyverse command. If you've not already installed it, you'll have to install this prior to using the library tidyverse command. We're going to be using a simulated dataset in this tutorial. The dataset contains data from four pretendor cohort studies representing four generations. Baby boomers, Gen X, Millennials and Gen Z. I'm just going to download this data from the internet, loading it into R and assigning it to the name DF. I've put the data on the OSF so you can download it yourself. This dataset contains repeat measurements of BMI at each decadal sweep in each of the four studies. You'll notice I've used the pipe operator here, which is used extensively within the tidyverse. It allows you to chain commands together and it can be read, do this, then this. So here we've taken the URL and then we've piped it into the readRDS command to download that dataset, which is an RDS format, into R and assigned it to the variable DF. Let's have a quick look at DF using the structure function str. Here we see that DF is a tibble with the dimensions 80,000 by 11. A tibble is a type of data frame, which is a rectangular data structure like a spreadsheet or a dataset in stator. It has rows, which are observations and columns, which are variables. DF is a long format dataset where each individual has a measurement of BMI for each follow-up that the survey was run for. So for instance, for the baby boomers, we've got observations for each cohort member from follow-ups at ages 15 to 65. We're going to be using this dataset to examine the association between cognitive ability and later BMI. Let's have a quick look at the raw data, sampling 20 observations at random using the sampleN function. Let's make this a bit bigger. You'll see that the 11 variables are cohort ID and we've got foob, which is the age of follow-up, so 15 to 65 for baby boomers. We've got age, which is the age of BMI measurement, BMI, which is the BMI. Then we have three measures of cognitive ability. We've got non-verbal ability, verbal ability and vocabulary. Each of these, by the way, was measured in adolescence. We also have three further variables that we use as control variables in our regression models. So one is an indicator for whether the person is female, one or a zero. Then we've got two that are printed to the console family class, which says whether someone was from a non-manual or a manual socioeconomic background. And then childhood BMI, which was measured prior to the measure of cognitive ability. Now the current data are perfect for adopting a many-model approach. We have data from four different cohorts, each of which measures BMI at multiple different ages. We also have three different ways of measuring cognitive ability. Now the field of cognitive epidemiology has found widespread associations between cognitive ability and multiple measures of health. Dominant theories in the field would predict that cognitive ability should have a causal effect on BMI and predicts that an association between the two should be seen regardless of the age of follow-up, the year one was born, or the measure of cognitive ability that was used. It is therefore worth running a many-models approach with combinations of cohort, age of follow-up, and measure of cognitive ability to examine how robust associations are. These theories would also predict that we should find an association regardless of whether we control for childhood BMI or not. So we'll add this to our analysis too. Our specification grid will contain all the information we need to run a particular model, that is the specific cohort from follow-up to take the data from, the particular set of control variables to use, and the particular measure of cognitive ability to examine associations with BMI using. We'll first create a few objects that we're going to use to construct the specification grid. First we're going to create a list that contains the sets of control variables. This list will have two elements. One will just define a basic model with controls for age, sex, and family class. The second will add further control for childhood BMI. So let's just write that out now. We'll paste this across. The basic model just has those three covariates, and the second one will have those three covariates plus child BMI. We've called these models basic and child underscore BMI. Let's just quickly run that and save it to name modcovars. Then we can just look what modcovars look like. So it's a list of two elements, basic and child BMI, that just contains those strings. So modcovars is a list, and we can access elements of modcovars using the double square bracket notation, and putting within those square brackets the name of the element we want to pick out. So if we want to get the set of variables for the basic model, we can just write basic and quotation marks within the double brackets, and that will get us back to the string. So here we go. This is the sort code we'll be using later to construct our models. Next, we want to get the set of cognitive ability variables. We'll use the str subject function to subset the names of DF, the column names of DF, to those beginning with cog, because we named all of our cognitive ability variables cog underscore, and then something after that. So the syntax is str subject, then the string that we want to subset. In this case it's the names, the column names from DF. And then we're just using a regular expression. This tilde sign gives us back any string that starts with cog underscore. So we get back cog nonverbal, cog verbal, and cog vocab. Let's just save this as cog vars. Okay. So as mentioned, we want to run separate models for each cognitive ability measure, each set of control variables, and each cohort and follow-up. The cohorts differ in the set of follow-ups used to measure BMI. So we're going to be using the distinct function from the time diverse to get the combinations of cohort and follow-up that are actually observed in the DF data frame. So DF pipes this into distinct, and then we just name the columns that we want to get the distinct values of. So we return back a table of 16 rows, which has the different combinations of cohort and follow-up that are observed in the data set. So we have the years 15 to 65 for baby boomers, 15 to 55 for Gen X, 15 to 35 for millennials, and 15 to 25 for Gen Z. This is just two of the things that we want to loop over. We also want to add two new variables for cog var and the covariates that we're going to use. So to combine that information in, to get the full set of specifications, we're going to use the expand grid function, which will just produce a table with all the combinations of the inputs that have been provided to it. So it can pipe this distinct, the result from distinct, straight into expand grid. Then we'll just add cog var equals cog vars, and co-vars equals names of mod co-vars. And now we get back a table that is 96 rows long. We've got each combination of cohort, foop, cog var. You can see we've got cog non-verbal, cog verbal, and cog var, cog vocab. And then we've also got each combination of co-vars as well. So basic and child BMI. Okay. So now we have these 96 rows. We can see that, for instance, the first row would be a specifier model using data at age 15 for both boomers where cognitive ability is measured using the non-verbal measure of cognitive ability and the covariates that we use are just from the basic model. In the second model, we would use the same data, same measure of cognitive ability, but we would be adding covariates for child BMI as well. Before saving this specification grid to an object, I just want to do one last thing, which is to create a new column to identify each row, and we're going to call this new column spec ID. This will just be equal to the row number, so one for the first row, two for the second, and so on. There we go. We're going to use this spec ID to pass it to a generic function that allows us to select a row, and we'll do this rather than pass all the information from the other four columns in separate arguments, which can get cumbersome. Let's just save that to ModSpecs, and just quickly check ModSpecs is saved correctly. Great. OK, so we've got, that's the basic way of getting a specification grid. In a few lines of code, we've been able to specify 96 models where we've gone. We're going to loop over the cohort, the follow-up, the cognitive ability variable measured, and the covariates that we're going to add to the regression. Thank you.