 This tutorial is about a package called Metagam, which does meta-analysis of generalized additive models. The motivation for the package is that combining data from multiple studies is crucial for scientific progress, especially when you have high-dimensional data as in brain imaging or genetics. And by combining data from different studies, we can increase statistical power and also the predictive accuracy. However, sharing of data can be quite challenging in practice. For example, due to privacy concerns or that the participants might not have consented to let their data be shared, and also more practical, like different data formats being used. One way of circumventing this problem is by doing a meta-analysis. So, for example, if there are capital G different groups, each of which has a data set, we can let each group G fit the model to their data and get some parameter estimates of the model, and then we can share the parameters rather than sharing the data, and find the joint estimates as a weight in some of the individual estimates. A problem with this is that meta-analytic methods require parameters, and those parameters should be quite identical in their interpretation across studies. However, many problems require semi-parametric methods which are better able to estimate nonlinear effects. So, for example, if you look at this linear regression example below, where we have an intercept beta-naught, but rather than beta times x, some slope times some exposure, we have some unknown function f of x. This is called semi-parametric. Another question is, how do we combine the estimate f hat sub G from G different groups? One example that we use in this paper and in the package is generalized identity models. A generalized identity model estimates the function, the semi-parametric or non-parametric function, as a combination of some spline weights, or spline basis functions, where each basis function receives a weight, which we call alpha hat. The kth basis function gets the weight alpha hat k. However, the basis functions often need to depend on things like the range of x, and these will, by necessity, become different between groups. This makes the spline weights estimated in each group not comparable, and they can't be meta-analysed directly as if they were standard regression coefficients. One motivating example comes from neuroimaging, where we're asking or studying how sleep quality is associated with the lifespan development of the hippocampus. We had data from six European partners, but we had problems getting permission to share the data so that we could analyze it all in a single place. What we did instead was that we had each group fit the model on the form, y, where y is the hippocampal volume, and we have some function of h, and then some function of h and sleep quality, and then a residual term. The estimated functions from each group look like this, where you see that some groups like CAMCAN and LCBC have data from the whole lifespan, at least the adult lifespan, from 20 years up to 90, whereas, for example, Y12, which is in Oxford, had only people between 60 and maybe 85 years of age. But we got these functions. Now, each group then shares a function which can return prediction from this estimated f hat when you plug in new values of the predictor's x. The only things we need to get this is point estimates and the covariance matrix of the spline weights, and we need to know the basis functions that were used. So, in principle, they can even use completely different models, like one group can use cubic regression splines and the other group can use quadratic splines, for example. Anyway, the meta-analytic estimates f hat is then a weighted sum of all the individual estimates, and these weights would typically be closely related to 1 divided by the squared standard error at the particular point. So, the way to do this in R would be that each group fits a model. MGCV is the most used package for generalized additive models, so they would set up a model at group G using their data, and then they would send this data to someone who would do the meta-analysis, which would then predict from the model G as some common grid of values which are common to all the studies, which we defined as where we want to see our predicted estimates, and then we can combine this. However, each model, mod G, would be full of individual participant data, so fitting a model and sending that model to somebody would also send all the individual participant data, unless you're careful, because those things are typically part of our objects that make up the models. So, that's the first place where the meta-gam package comes in, because still each group would fit its own model. However, we have a strip raw data function that works both for MGCV and for GAM4, that you can apply to the model. So, first you fit the model, say you used a GAM function, then you apply strip raw data, and that function removes everything that is on the individual level, and it keeps only the few things you need to create those predictions that we need. Then you can share this model with the rest of the world, and the only thing you expose is your coefficients, their covariance matrix, and the basis function that you used. In the next step, we combine the models from each group, and then first we need to make a list of those models, so let's say we had three groups who have fitted the model on their data, then we create this first list, and the next thing we do is actually to compute the meta-analytic estimate. Then we just do meta-gam on the models, and we have the common grid again, which is the predictor values where we want to actually look at the meta-analytic estimates. So, here is the result from the new imaging example. So, to the left we have the effect of H on hippocampus volume, and we see there is quite a lot of overlap agreement between these estimated functions, and the reason why we have the mega-analysis here was that in the end in this study, we were actually allowed to share the data, even though that happened after we developed this package, so we were lucky enough to be able to compare. And we see that even though the confidence bands aren't completely overlapping, they're quite not overlapping actually, but the overall trend is very similar. And for the effect of sleep, there is very close to zero estimates, both for the meta-analysis and the mega-analysis. The meta-gam package also supports post-fit analysis. We have a summary function and a plot function as most modeling packages in RDo, and we also have something called dominance plots, which we can call by plot dominance. It shows how much each study contributes to the overall meta-analytic fit as a function of some predictor variable of interest. So here we see it for H, for example, and we see that at around 70 years of age, there are many different studies that contribute quite a lot, because that's where most or actually all the groups in this case had data. Whereas if you look at the 20 years of age, it's the LCBC and the CAMCAN data that completely dominate because they had the overwhelming majority of data for those ages. We could also look at how different the estimated functions are as a function of, again, some predictor variable and here of age. So that we can do with the plot heterogeneity function. And we get the plot like this. So to summarize, the meta-gam package offers distributed fitting or generalized data models, removal of raw data from model objects, meta-analytic combination of fits which are close to what would be obtained with access to complete data, but which we typically don't have, unfortunately. And it also offers convenience functions for visualization and for statistical summaries. And some future directions include extensions to more classes of nonlinear models, basin simulation, and development on new algorithms for computing p-values of the meta-analytic estimates. So thank you.