 Hello, thank you for attending this lightning talk at the R Medicine Virtual Conference. My name is Ariel Mundo and I will be talking about generalized additive models for longitudinal biomedical data. A little bit of background about me, I am a PhD candidate at the Department of Biomedical Engineering at the University of Arkansas and my research studies how cancer is affected by treatment over time. So the material for this talk can be found in detail in a preprint from our lab in bioarchive and this paper covers the material I am going to talk about today but in depth theory of generalized additive models and a workflow for gamma selection in R using biomedical data. The slides and the code for this talk are also available at this link which I will also put in the chat. So the motivation for this talk is because I have analyzed longitudinal data through my PhD training and in this case it's just repeated measures of some subjects in multiple groups across time. And they are powerful tools because they allow you to see the evolution of an effect over time. Pediatrics, cancer, nutrition are just some of the areas that use like longitudinal studies. How do we analyze longitudinal data? In biomedical research we tend to do the following. We do repeated measures, then we do a repeated measure in NOVA and then we do post hoc comparisons. Or a linear mixed model can be done as well, followed by post hoc comparisons. To illustrate this I would like to use some simulated data. So from this paper here I have some two more volume trends over time between two different treatment groups. The paper only gives us a mean value so I will use simulation to generate points or observations at each time. In that way we can recreate the train of the data and then we have a data set we can use to analyze and use for our purposes in this talk. So we repeat the measures in NOVA where we are trying to explain the volume of the tumors across different groups over time. And this is an interaction in NOVA. If we do an NOVA we see that there are significance across time, across each treatment group and their interaction. And if we do post hoc comparisons this tells us that there are significance apparently at day 7. But what if we actually plot the model over the data? So here we can see that the model are the two lines we see, one line for each group, but the points are not following that line or saying another way the model is not following the train of the data. Especially at day 15 notice that for group 1 is fairly off. What's going on here? Well our repeated measures in NOVA is nothing more the linear model that uses an intercept, some fixed effects, linear slopes, and an error term to try to explain an observation. So basically it's just a line. And this graph I have here from a paper you see that the train of the data is very linear. So a linear mixed model or our repeated measures in NOVA is going to do a good job capturing the train of the data and giving you reliable inference. But in biomedical research things don't always look linear, most of the time. I have a couple examples here. So in this paper longitudinal data was collected across different markers and optical markers. And here on the right is another paper that has data on tumor volume across time. Because in each of these graphs the train of the data is not linear, you see that the signal goes up and down and it's really not following a linear pattern. So if linear models such as repeated measures in NOVA or linear mixed models are limited to analyze this data, what can we use? We can use generalized additive models or GAMs for a short. Where the construction is going to have a response we're trying to explain using an intercept. But now instead of using lines we're going to use something called a smooth function that allows you to follow the train of the data and we're also going to have an editor. To make it visual I have a plot here, a graph. Panel A shows a function. This is our original function. This is a sine function. In panel B we have something called basis functions. The basis functions can be used to construct in a piecewise manner the function from panel A. So we have four basis functions which are going to be penalized according to the matrix that appears in panel C. So you'll see that basis number two and basis number three are heavily penalized and because of that their shapes change significantly. But when all of these basis functions are added they are going to be able to recreate a trend that the function of panel A has. So panel E has the actual smooth which is made up of the different basis functions and it's actually following the trend from the function of panel A. So this is pretty useful because then you can follow the train of the data and it can be not linear, it can be of any shape. Now how does a GAM look for the simulated data presented before? So the syntax is here, the details are on the preprint from our lab but basically we're again trying to do an interaction GAM for the tumor volume. If we plot then the model and the data we see that the model is actually following the train of the data. So the model is going to be the line so each line is a trend per group or it's smooth that was speeded over time for each group and it's following the train of the data. This allows you to have reliable inference from your data. Now for GAMs there is no P value based post hoc comparisons as it is for repeated measures in OVA but we can still do pairwise comparisons and the way we're going to do that is we're going to compare the confidence interval for each smooth and see how different they are. You can think about it in the same way that you can think that if two things are similar then their difference is going to be zero and if it's different from zero then it's probably going to be because the difference is significant. So here we have a pairwise comparison and whenever we have the confidence interval which is the yellow stripe not falling over zero or covering zero we can tell that there was a significant difference and we can see that it started happening around day three and this is consistent with the data because we see that from the plot both tumor volumes are pretty similar for the first three days but after that they diverge and this is what the model is telling us. So this allows you to have reliable inference from your data. GAMs are also advantages because they can use different covariant structures with repeated measures in OVA cannot. They can work with missing observations and if needed you can use different types of splines such as Gaussian process or thin plate splines. In conclusion I'll finish saying that it's always important to do a visual exploration of your data before feeding any model. GAMs are really useful because they allow you to feed no linear responses over time which happens a lot in biomedical research and it's the same idea as feeding a repeated measures in OVA or a linear mixed model but instead of using a line we're using a combination with splines to construct a smooth in that way following the trend of the data. And as usual p-values can be very misleading when you are doing a statistical analysis. I'd like to acknowledge the work of the co-authors on the paper from our lab regarding GAMs, Dr. Tipton from Mathematical Sciences and Dr. Timothimo Muldoon from Biomedical Engineering at the University of Arkansas, Sibia Canelung from which the theme from these from my slides is based and Allison Prismannes Hill which suggested the fund, the Atkinson Hyper Label which I really like than I use in this presentation and our funding agencies. And I also have here the references for the different graphs I have used in this presentation and I'll be more than happy to take any questions you might have. Thank you.