 I would like to invite our first speaker, Jouni Hälsker, to take it away please. Hi, my name is Jouni Hälsker. I will talk to you about basic inference of nonlinear and nonneg Gaussian state-space models in R using B sizin package. So what are state-based models? State-based models are a large class of statistical models, including, for example, structural time-series models, ARIMA models and generalized linear models with time-variant coefficients. In general, we have some observations y from y1 to y capital T, where the subscript denotes time or other ordering variable and where y at time t follows a conditional distribution gt where we condition on alpha t. Here alpha t's are called latent states, which go similarly from 1 to capital T, and they follow transition distribution pt, which gives us the distribution of next state given the current state. Both observations yt and states alpha t can be multivariate, and both of these distributions pt and gt can depend on some hyperparameters theta. Our interest is in the joint posterior distribution of states alpha and parameters theta, given our data y. Also, predicting future observations and states, as well as interpolation of missing observations, is straightforward within this SSM setting. But wait, what about KFAS? Some of you might be familiar with this other state-based modeling package in R, so what's the difference with this BSS and then? Well, KFAS was designed from the perspective of maximum likelihood estimation, whereas BSSM means to base in inference, although maximum likelihood estimation is also possible. Second, BSSM supports wider variety of models, and instead of simple import and sampling, BSSM uses particle filtering, which scales better with the number of time points. BSSM is also easier to maintain and extend in future, but on the other hand, creating models is currently a bit easier using the formulas index of KFAS, but actually you can convert KFAS models to BSSM format with the helper function as underscore BSSM, which can be useful when constructing complex models. BSSM is an estimation of state-based models, so how do you estimate this? There are two special classes of SSMs. First is a case where the distributions of observations and states are both linear Gaussian, and another one is where the states are categorical, but I will not discuss this here because that's not supported by the BSSM. So what is so special about these models is that the marginal likelihood can be computed in an analytically tractable way. For this linear Gaussian case, we can use Kalman filter algorithm, which gives us the marginal likelihood of Y given theta. This marginalization of latent states alpha can be used to construct an efficient Markov-Cemontekallo MSC-MC algorithm. So what we want to do is to run MCMC targeting only the marginal posterior of hyperparameters theta, and given samples from this marginal posterior, we simulate states from the so-called smoothing distribution, which is the conditional distribution of all the latent states given all the data and current theta. And because theta is often low dimensional, simple adaptive random walk metropolis works typically well. For the Bayesian inference of general states based models, things are more complicated, because the marginal likelihood is no longer analytically tractable. Instead, we could consider at least three different options. First would be that we forget the special nature of states and read the state similar as thetas, and sample both using some general MCMC machinery provided, for example, by box or stamp. Unfortunately, this is often inefficient due to the strong correlation structures and high dimensionality of alpha states. Second option is to leverage some approximate methods, such as the Laplace approximations as in inla. This is often fast, but the bias by construction. Although the bias can often be negligible in practice, it is hard to quantify the amount of bias in specific applications. Third option would be to use so-called pseudo-marginal MCMC or particle MCMC methods, where the marginal likelihood is replaced by its unbiased estimator from particle filter. This is asymptotically exact, like the first option, but often computationally heavy, so we need to run particle filter at each MCMC iteration, possibly with a large number of particles. It can also be a bit tricky to tune the MCMC algorithm and define a good number of particles for efficient inference. So I see MCMC for state-based models. So what BSSM does is it combines the vast approximations and execs methods by first finding an approximate marginal posterior of thetas, and then correcting this with important sampling type weight. We call this ICIS-MCMC method. So first assume that we can compute approximation p hat of the marginal likelihood for accurate theta. So then we can run MCMC targeting approximate marginal posterior of theta, where the true likelihood is replaced by its approximation. Then for each theta from this approximate posterior we run particle filter, which gives us the samples of alphas, states and unbiased estimates of the likelihood. So in the end we have a weighted samples from the joint posterior, where weights correspond to the ratio of true and approximate likelihood terms. This works really well for the models supported by BSSM, but of course in general the approximation should be good enough, similar areas in typical important sampling. For the post correction BSSM uses by default a particle filter called CAPF, which again leverages the approximate model computed earlier, leading to a particle filter, which in many cases needs only a few particles, making it computationally efficient. Other particle filters could also be used, for example basic puts of particle filter, which is also implemented in BSSM. Note that the post correction needs only to be done for each accepted theta independently, so it is trivial to paralyze this efficiently. Okay, so what kind of models BSSM supports? First we have linear Gaussian models, where the observations y are a linear combination of states, plus some Gaussian error term and optional intercept term, and similarly states depend on states of the previous time points. Different models can be defined by defining different model components D, Z, H, C, T, R, A1 and P1. These are vectors, matrices or arrays, depending on whether we have a univariate or multivariate model or whether these model components depend on time. Often we know the structure of some of these model components, whereas some of these depend on parameter vector theta. We can build these models with BSSM using several functions, for example SSM underscore ulg defines general univariate model and BSSM underscore lg can be used to define a structural time series model, where unknown parameters correspond to the standard deviations of the noise terms, or regression coefficients of the X-Variables X. Non Gaussian observations. This is another class of models supported by the BSSM. Here the observations are non Gaussian, while the states are still linear Gaussian. Model building is similar as in the previous case, we just now have to define the distributions of observations as well. BSSM currently supports Gaussian, Poisson, binomial, negative binomial and gamma models, and you can have a multivariate models with mixed distribution as well. For these we can use Laplace approximation for efficient approximate inference or exact inference based on the ISMCMC approach. Essentially we iteratively find a linear Gaussian model, which has the same conditional mode of the states as the original model. Here is an example of a simple bivariate Poisson model, where we assume that the both time series are generated by the same latent state process alpha. So I'm first simulating some data, and then I define two r-functions, which are used within the SSM underscore MNG function, which defines the whole model. So these two r-functions prior fun and update fun, which define the log prior density and how the model components depend on the parameters theta. Even though these are r-functions, which are used within the compile C++ code, we only need to call these functions once per each MCMC iteration. So the overhead is more compared to the actual likelihood computations and so on. These functions together with the definition of the model component, such as jet and t, are then given to the SSM underscore MNG function, which returns the model object usable as an input for other functions of the package. In addition, BSSM also supports two model types, nonlinear models and models where the state equation is defined as a continuous time diffusion. For the nonlinear Gaussian model, the approximation is based on the mode matching similarities in the non Gaussian case, but this time using extended Kalman filter and smoother. An unbiased estimation is possible using particle filter. For models where the state equation is defined as a continuous time diffusion, we assume that we have observations at integer times. Here the approximation is related to the coarseness of the time discretization mesh. Here the time discretization more computationally demanding the particle filter is, so we can use coarser approximation in the first phase and then do post correction using the final mesh. These models are quite general in a way that these nonlinear functions are defined, so we can use r functions for defining those, as then we would need to go back and forth from r to C++ within the particle filtering. So instead users should define the models using the templates provided in the minutes. Now as an example, I'll show how to analyze yearly drownings in Finland from 1969 to 2009. We have a date on yearly drownings, which we assume follows Poisson distribution conditional on the latent level mu, average summer temperature and yearly population. The latent process mu is assumed to follow random walk with a random slope and the random slope is again defined as a random walk. So we have three hyperparameters, regression coefficient beta and two standard deviation parameters and the latent state vector alpha t contains the level and slope terms. Estimating this model with the BSSC, we can build this model using BSM underscore ng function where we define our data, the population size exposure, the distribution Poisson and predictor variable summer temperature and finally some priors. For priors of basic structural model we can use helper functions normal for regression coefficient and gamma for standard deviations where we first define the initial value and parameters of the prior. Then we just call function run underscore mcmc with a certain number of iterations and number of particles used in the post-correction phase. By default, this uses highest mcmc but fully approximate inference is also possible as well as normal particle mcmc methods and its delayed acceptance variant. We then see from the summary of theta that the temperature has a small effect. One degree rise in average summer temperature in Celsius leads to about 10% of more deaths or drownings. And finally we have a figure of intensity xp mu, number of deaths per 100,000 which shows that the drownings have drastically decreased in recent decades. In 70s we had around six drownings per year per 100,000 and now only have around two. Thank you for listening. Here are some references. Please check out the package on CRAN if you're interested in state space modeling. Thank you. Thank you. I think we have time for a few questions and if you have any, please type in the Q&A. I've already seen that one question was answered by uni. I'm not able to ask a question so I'm going to go ahead and answer this. He said that the categorical variables should be marked on models and someone is not supported. So is it possible to do so in the future? Not really. It's a good question. It's quite different methodology for categorical settings so that would need a separate kind of interface. In a sense you could do it but it's a slightly different problem to how to do those. I'm doing that kind of thing for another package so maybe in future I will talk about that then. I don't see any more questions typed in the thing but one more. I didn't get a chance of how intense these computations are by making use of any kind of computation techniques. I mean, of course, some of it is inherently sequential but any chance for other kinds of battle computations or anything like that? Sorry, could you repeat? No, is there any scope for parallelism or anything like that? Parallism, yeah. So basically I'm using the OpenMP for parallelizing the post-correction phase. So of course for the MCMC it's hard to parallelize because it's already very sequential but then the post-correction phase you can do that in parallel fashion easily. Thank you. Let's see. It's been going well on time so I think that maybe I'll just wait for just a few seconds more to see if there are any more questions coming up. And as I said, even the battle is, you know, if you have questions you have to speak up because I'm not even going to type my questions with the Q&A. All right, so I think that thank you very much for your talk. And our next speaker is Rachel Lukasen who will be speaking on VM Monitor checking the robustness and sensitivity of Bayesian networks. Please go ahead. Thank you. Hi, I'm Rachel Lukasen and I'm speaking today on a package that I worked on with my colleagues to check some diagnostic monitors for Bayesian networks that check their robustness and sensitivity. So if you think about it when you run a linear regression model you almost always turn around and then check your residuals but often this isn't common practice when we have Bayes nets. So our package is designed to make that process easier and there are two main components of it. The first is a set of functions that diagnostic monitors that can be used in increasing fineness to check the forecast that flow from a model to check how accurate they are and these work specifically on Bayes nets that are learned from data. The second set of functions that we have is a set of sensitivity functions and those take the probability distributions that are underpinning your Bayes net and checks the effects of what happens when we change those, how sensitive it is to those slight changes. So those monitors work for Bayes nets that are learned either from data or from elicitation and their data can be either discrete or continuous. As a motivating example to showcase our package we worked with a set of data from the UCI repository where it's called the Pima Indian data set. It has information on 392 women. It's discretized into binary variables and this is the information that it has. The number of times the women were pregnant their plasma glucose concentration diastolic blood pressure skin fold thickness 2-hour serum insulin body mass index, diabetes pedigree function and ultimately whether or not they tested positive for diabetes. So when we think about our Bayes net we're often sort of looking at this test for diabetes as the outcome variable and we'll test that with our monitors. I want to take a moment to acknowledge that we chose this data set because it best showcases what our monitors are good for but I want to acknowledge that I'm using this data without explicit consent of or compensation for the original participants who actually refer to themselves as the Akimel Oodum and there's a really fascinating paper by Joanne Radden that gets into the ethics of using this data set sort of far removed from the original context which was collected. So I wanted to take a moment to recognize that and then with that I'll dig into our robustness monitors. So the largest monitor that we have is the global monitor and that is for looking at your network as a whole. So to do that I've looked at two just sample Bayes nets that you might do if you fire up the BM Learn package and decide to use a hill climbing algorithm to find the structure for one and then I've used a maximum hill climbing criterion for the model on the right and then I've come up with these two network structures and so suppose the global monitor is most useful for determining the differences between two models. So we just run global monitor and the only other parameter that we have besides our directed acyclic graph and our data set is alpha and that's commonly set as the number of levels for a node and a data set so since all of ours were discretized binary variables here alpha is 2 and then for each vertex what our global monitor does is it competes the contribution of the blog likelihood from each node and then you can compare the two nodes and sort of come up with a Bayes factor per node. The other thing you can use global monitors for is you can use some of the finer nodes that I'll discuss in a minute and then go back and recompute your global monitor to see how it just changed the model as a whole. The rest of the monitors that we look at from here are really derived from the sequential framework which was first developed by Phil David around 92 and refined in some subsequent papers and those all focus on looking at the quantity PI so we're looking for the predictive density from the Bayes net that you took learning from all the previous I-1 iterations so we would learn the whole DAG just based on those and then predict the next thing and so we can think of the score function that we then take from that as sort of the level of surprise of seeing the next observation and so our index J, Y sub J indicates all the possible values that that particular node can take so bold Y would be positive negative say for our diabetes node and then here we're looking at the logarithmic score function. There's lots of other score functions that we could have used but here we're just taking sticking to the log score. So when we look at our diabetes example when we think about predicting the next iterations sort of a sample of what our data set looks like so we might learn on the first nine observations and then try to predict the 10th. So once we have our log score then we then need to think about standardization there's two main standardization techniques the first is relative standardization so that works when we just take a ratio of two different quantities like we did for the global monitor the ones that we'll look at next I'll use the absolute standardization where we're computing as that statistic using the following for the expectation variance so we take those probabilities that we looked at and then multiply them by the log probabilities and sum over all the possible values y sub j that this could take and this gives us the expectation value for that next forecast that we're looking at and then we compute the variance in a similar format and finally get this sort of test that statistic that tells us how surprised we are to see each the next observation so sort of the benchmark that we use here is if you're looking for anything where the absolute value of your that score is outside 1.96 and that could be an indication of poor model fit so we'll look at some examples of what this looks like but that's just your main hallmark when you might want to check out what's going on with your model when we talk about the sequential framework that's just derived from just means sequential predictions so the first example we have a sequential node monitor of a sequential monitor is these node monitors and they come in two flavors marginal and conditional so first we'll look at the conditional node monitors and those check how appropriate each probability distribution is for a given node so in our our package BN learn these are implemented with C margin monitor and then you just call the name of the node and it will generate this graph that shows you sort of how close it is coming to these sort of trouble trouble boundaries indicated these red dash lines indicate z is equal to plus or minus 1.96 so we see this monitor is seems to be performing pretty well it's not coming anywhere close to our problem areas and so this is for the node diabetes pedigree function and it looks like a good fit now we're going to look at our sort of main outcome node of interest the diabetes whether or not somebody has a positive or negative diabetes test when we look at this we notice that sort of for the later observations once we hit sort of like mark 300 there seems to be some sort of some sort of indication that perhaps the probability distribution isn't as good a fit and there can be a lot of reasons for this it may be that there's some sort of underlying dependence that's not captured in the data that's given maybe people are coming in to the clinic with their friends and they all have some sort of latent traits that aren't clear from the data that we've collected so that's the marginal monitors the conditional node monitor is very similar except when they get to that next observation why I they also pass evidence on all the other variables for that I federation and so that can give us another indication of how good the probability function is and so here's an example of a node monitor that is giving some issues this is modeling pregnancy and we noticed that it's doing very very badly for the first half of the data so we might want to unpick that with our finer monitors so the next monitor that we have sort of goes down another level to figure out what's really happening with that probability distribution it's called the parent child monitor and so this will use the same sort of log score that we're looking at before except we add this extra parameter so that's a specific set of the values of the parent node and we're going to use the same standardization that we had on our previous slides so here we're going to dig further into what's happening with that node for pregnancy and we're using the specifically looking at where age is low so for young women what's going on and so sure enough our monitors reveal that there's specifically poor forecasts of pregnancy for young women so that might tell us that there's some sort of you could go back to the study authors at that point and say why don't we investigate what's happening with young women because this part of our model isn't predicted very well so that sums up our robustness monitors and then with that we turn our attention to the sensitivity functions so for those we take our DAG and then our probability set of conditional probability distributions and that gives us our model that we're going to call G and then we can take our nodes and set those into observation and evidence variables O and E and our sensitivity functions are going to tell us something about that probability distribution of our entire model for our observation observational variables given our evidence variables so we want to know how this overall probability varies once we start tinkering with the conditional probabilities that compose our model this question was first posed by Janet Arwish as what changes in the conditional probabilities would make the global probability of our observational variables given our evidence equal to the specific value so we see how that's been implemented in our monitor here first off we look in our diabetes example about how does the marginal probability of a positive test depend on the variable that glucose is high and so our function in BN monitor is called sensitivity we call our BN our BN object and look at our interest node diabetes and we want to specifically look at positive diabetes test and high glucose then we're going to sort of compare that with our what the conditional probability of a positive test is given a low level of insulin and we want to know how that varies when the probability of a high level of glucose changes so these are the two plots that we're going to generate here and when we look at them side by side we see that for both charts we see that the probability of having a high level of glucose as that goes up the probability of having a positive test for diabetes is also going to increase and for the conditional probability chart on the right this increases very nonlinear and that's what we expect to see from the results of Kupea and Vandergaan so the next function that we look at in our sensitivity models is called CD distance and with that we're answering questions like how do changes in the probability of high glucose affect the overall probability distribution of the BN and so there's lots of different metrics for this but specifically for this we're interested in a metric called the CD distance and we've implemented it with the function CD and when we do that then it gives us our plots we see that overall the CD distance is smaller for changes conditioned on high BMI and high glucose comparatively so that tells us that enables us to measure the distance between our probability distributions and sort of the last main function that's in our sensitivity package looks at the probability or answers questions like what sorry so we have the, we want to know what the probability of positive diabetes test is and we have high blood pressure and so we can do this from the grain package just by using the query grain function and this generates this chart that shows us what those probabilities are but suppose then we take this back to our experts say look at this and they say actually we think this probability 0.3814 down in the corner should be at least 0.4 we might want to know what would need to happen to the probability distributions to bump that up what effect is that going to have on our global model so what configurations of conditional probabilities will make this possible so our function that we introduce here is called sense query and what that does we go and we specify the specific values that we want our model to take and then what sense query does is it runs all the possibilities and returns the settings that would need to happen for this to be the case so this gives us all the plausible scenarios so that's what I have thank you to my co-authors who did all the sensitivity functions Manuel Leonelli at IE University in Madrid and Ramsey Ramanathan I believe just graduated with their MSE from Bounia so that's what I have today, thank you thank you very much that was quite interesting I'm just looking at the chat to see if there's any or the Q&A to see if there are any questions and I don't see any but I had one if you I did not recall that you mentioned something about elicitation so is there any kind of these examples was there any kind of prior destination or anything like that maybe I'm just an example no this we mostly looked at the use the diabetes data set just because it was available but I have worked with some elicitation data yeah I think for the robustness monitors you need some sort of online data an online setting and some data coming in where you can check it but you could use the original elicitation probabilities and for the sensitivity stuff yeah you could just use all you need as the conditional probabilities and then you can plug those in and see how that works so today it was we didn't use that for the specific example but you could thank you you had a question yeah I was going to ask about the ordering of the variables I understand if you have kind of sequence of data coming in but if you sort of have this static data this diabetes example how does your this robustness checks depend on the ordering of the variable if you are just looking at the right it's um it does it matters quite a lot yeah so I think it's um yeah that's kind of where the nuance is that's where honestly it was sort of hard to find an example because so many of the ones where people are in the end they're all sort of synthesized or something right and so you don't get the sort of what like what you get in real life where there's like weird people come in with their friends or some sort of line order yeah the other thing you can do with these though where I don't show that here but you can take some sort of covariate and order them according to the covariate and then that would give you sort of an extra way to sort of figure out where it's going wrong yeah good thanks any further questions we're doing well on time and of course just a reminder that the panelists will come back together in the end as well and for further questions so please do not be shy about asking questions I think so far I'm pleased that there's some questions being asked of the speaker and thank you Rachel and so I will move on to our next speaker our next speaker is Vittoria Dominic Ollandi from Duke University I need to be speaking on Flame which is an interpretable matching for causal inference go ahead so yeah hi everybody I'm Vittoria I'll be talking about this package Flame that does interpretable matching for causal inference this is joint work with the almost matching exactly or AME lab at Duke University so let's get started so causal inference seeks to quantify the effect of some treatment here I'll take to be binary on some outcome why and one way of framing the problem is to consider two potential outcomes for each unit I why I have one which denotes the outcome under treatment and why I have zero which denotes it under control now crucially only one of these is actually observed because you can't both be treated and not treated the other which we call the counterfactual has to be estimated from the data now this can be a little complicated when you have observational data because your covariance might become founders so pretend that you have you're trying to figure out the effect of some drug whether or not you receive the drug that's the treatment on some health outcome it might be the case that people that are sicker in general as captured by the covariance are more likely to take the drug right so a naive analysis might conclude that the drug is actually harmful but actually it's just that you have this kind of selection issue which is biasing your effects so you need to adjust for this somehow and one approach to doing so is called matching so to start thinking about this let's imagine that we have a treated unit I and in our data set we can find a control unit K that has identical covariate values it's an identical twin well then intuitively YK which is the observed responsive unit K is a really good estimate of the unobserved counterfactual of unit I because they're identical in their covariates you expect that K looks a lot like what I would have had they not been treated so this would be great if we could find units like this but exact matches in high dimensional settings are quite unlikely as a side note they're impossible whenever you have continuous covariates but that's not something I'll talk about here and so we're going to have to settle for some sort of approximate equality so now the question is how do we determine when units are close enough to match and our approach to this takes the form of an optimization problem we call our approach almost matching exactly and we'll unpack this bit by bit starting with the second line so given the unit I we're going to find units K that may not match exactly on all the covariates but match exactly on a subset of covariates that's selected by theta so theta is a binary vector of zeros and ones with zeros where you're not enforcing exact matches and ones where you are so this circle here denotes an element wise or Hadamard product right okay so you're finding units that match exactly on a subset of the covariates and also that have opposite treatment because you're trying to estimate counterfactuals and out of the many that you might be able to find that satisfy this we're going to choose the one that corresponds to the highest importance covariate sets where importance is determined by these covariate weights W and so implicitly what we're doing by solving this problem is defining a distance metric that determines when units are close enough to match and it one prioritizes matches on relevant covariates through W this ensures that you know in a medical example maybe we don't match on hair color at the expense of matching on something much more medically relevant like sex in addition it matches exactly one possible which kind of gets us closer to the gold standard of finding identical twins in the data that I was discussing on the past slide and so in practice what's going to happen is we're going to iterate over covariate sets over these status starting with more important ones we're going to take a say that we're going to match units any possible units exactly on those covariates that we're going to choose another one match any remaining units exactly and kind of iterate until stopping and our algorithms are just telling you how precisely to do that iteration so the first algorithm is called DAME and it solves the AME problem on the previous slide exactly for each unit so it will give you the highest quality covariate subset for each unit that you can find matches on and even though there's two to the P possible covariate subsets there's an efficient solution via property called downward closure and this essentially tells you that you can search the space in a smart way so you're going to start with the best possible covariate vector which is a fate of all ones denoting exact matching and then you're going to go to the next most important covariate vector and then the next most and so as soon as you find a match for some unit because you're searching in this order it's guaranteed to be the best possible match. Flame approximates the solution found by DAME via a backward stepwise selection procedure so at each iteration you eliminate an entire covariate that is the first iteration you try and match exactly on all the covariates then you ignore the least important one, match on the remaining ones and keep going. Now in practice no one gives you a nice little covariate vector describing the importance of your variables in your dataset and so what we do is we run a machine learning algorithm on a separate holdout set of data that we're not going to match and whenever we're trying to consider a theta a vector of covariates we're going to compute its predictive error, its P.E. which is the error in using that covariate set to predict an outcome and this determines the next covariate set to match on. We're going to choose the theta that essentially yields the lowest predictive error okay so on to the package the package implements these two algorithms flame and DAME and the functions flame and DAME they match input data under a wide variety of specifications some of which I'll go over here and they call efficient bit vectors routines for making matches so you're not doing something really slow like looping over units and covariates when you're trying to find matches they return S3 objects of class A and E for almost matching exactly with print plot and summary methods that I'll discuss installation is standard I just want to point out that there's two different github links out in the wild one is a mirror of the other so you don't have to be worried if you see multiple or anything like that they're both okay and so to illustrate the package I'm going to look at the some data put out by NCHS in 2010 which is data on neonatal health outcomes in the NICU and in particular we're going to look at the problem study by Kondraki in 2020 on the effect of extreme smoking on birth weight and extreme smoking is defined as the mother smoking at least 10 cigarettes per day throughout her pregnancy and this analysis I'm going to look at a subset of half a million observations I'm going to use all the covariate information though which includes the sex of the infant the races of the parents previous cesarean deliveries and others and if you download this data set onto your computer the first thing that you're going to know this is there's a lot of missing data and when you're trying to match you need to decide what you're going to do with it so there's a missing data argument to the flame and dame which tells you what you do with missing this in the data that you want to match so there's three options the first and maybe the most obvious is just to drop the units with missing this if you're missing anything you're not going to match that unit the second imputes the missing values via mice and then matches on the completed data set and the third keeps missing values in the data but doesn't allow for matches on them so two units are only going to match on a covariate set if neither of them has any missing values for any of those covariates and then there's an analogous argument for dealing with missing this in the holdout data set which used to compute predictive error and speaking of computing predictive error this is a very important part of the algorithm because this determines your covariate importances which determines the covariate subsets that you match on and ultimately affects the matches that you make so there's two implemented options for computing PE ridge regression via GLM net with cross validation to choose the regularization parameter and gradient boosting via XG boost with cross validation to choose a variety of other parameters but you can also supply your own function if you think it's better suited to the data that you have at hand or maybe you want something faster that doesn't do as much cross validation and there's details on how to write these in the docs and stuff but this is just a function that's going to use a linear model in order to compute predictive error and so we can put some of these pieces together to show a full call to flame to match the natality data calling dame is you know identical, you just use that function instead and so just going briefly through this bit by bit in the data argument we pass the data to be matched which here I've already loaded into a data frame called natality you can pass a separate holdout set to compute predictive error and this is a portion of the data from the matching data that will be taken and used to compute the error and that's not matched so you're not double dipping or anything and you can decide whether to match with or without replacement specify the names of the treatment and outcome how you want to deal with missing data if it's present, compute predictive error and lastly this is a flag that's telling you to estimate conditional average treatment effects to the matching so a CAIT is essentially the effect of the treatment on a subset of the population like on certain units that have certain covariate values and so as I mentioned once you run this you're going to get an object of class AME when you print it you just get some information about the number of matches made an estimated treatment effect here we have that the estimated treatment effect is around negative 180 this is grams so that corresponds to around negative six ounces so as expected the effective extreme smoking on birth weight is negative and then there's some details about how missing this if present in the data was handled there's a plot method which comes up with one of four plots the first one shows the number of covariates that different units matched on so here we see that the overwhelming number of covariates actually of units matched exactly on all covariates which is great because matching on more covariates means that the matches are higher quality we might be concerned if the majority were only matched on five covariates or something if those were not very useful for predicting the outcome the second plot plots the estimated conditional average treatment effects against the size of the corresponding match group so match group is just the set of all the units that are matched to each other essentially so larger match groups have more stable CAIT estimates so this can be useful to see if larger match groups have systematically different estimates the third plot is an estimated density plot of the CAIT estimates here it's nice and unimodal and symmetric but in cases where there's skew or multi modality that suggests that there might be subpopulations in your data that are experiencing a heterogeneous treatment effect and those can be important to kind of follow up on and study in a little bit more detail and lastly we have a heat map of the covariates that are dropped at different iterations so at iteration one it's blank because we match on all covariates on iteration two we don't match on pre pregnancy hypertension and so on and so forth so the variables that are blanked last such as mother's race here are those determined to be more important for predicting the outcome by the method used to compute predictive error you also have a summary method which gives you additional information on numbers of units matched average treatment effects and some information about match groups including high quality examples so quality here is determined by units that match on a lot of covariates and then on units that have large match groups and so we can visualize one of these using the mg function which takes that minimum two arguments the unit whose match group you want which I'm here taking from the summary object and then just the object of class AME and if we look at the match group we see kind of unsurprisingly that everything is the same all the covariates are the same and I did show you that lots of the units matched exactly so this may be a little underwhelming but other matching methods actually don't guarantee that you match units that are similar in their covariate values and that's really what makes this method interpretable and it's really important because the fact that we're grouping these units together is the basis for our estimating a treatment effect to be what it is so you can show a match group to a domain expert and then it's very easy for them to confirm that yes these units really are similar in meaningful ways and it's okay to estimate a treatment effect based off of them so just to wrap up Flame and Dame are scalable algorithms for observational causal inference they use machine learning on a hold-out set to learn a distance metric that prioritizes matches on more important covariates and the resulting match groups are interpretable because they're made matches are made directly on covariate values and they're high quality because those covariates are ones that are more important for the outcome Future Work in a Package is going to focus on a database implementation Flame is pretty scalable like I said I ran it on this dataset with half a million observations but the full dataset doesn't fit in memory so the database implementation will handle that and then algorithms for mixed continuous and discrete data and yeah thanks very much here's a link to the web page and a QR code for the same thing you can find the papers the links to the github user guides and things like that if you were collaborators are more comfortable in Python there's data, there's packages for that and a lot of other good information of that sort and yeah just to conclude I want to say that we came up with these methods because we think that they can be really useful for individuals in a wide variety of fields analyzing their own data analyzing interesting effects of certain treatments on certain outcomes and so we'd love it if you use the packages if you have any questions feel free to reach out to me if you don't have any questions but you're still using it also feel free to reach out to me and yeah thanks very much for listening thank you I don't see any questions in the Q&A oh there is one okay good so Kristoff asks why the two arguments missing data and missing holdout can they be different yeah so they can be different so missing data corresponds to how you deal with missing this in the data that you're going to match and missing holdout handles missing this in the data that you're going to use to compute covariant importance or predictive error so as an example of why they might be different well first of all you can't you don't have the option for keeping the missing values in the holdout data set because you need to run some sort of regression on it but even otherwise I mean maybe maybe you really don't want to drop missing units in the match data because you really find it crucial to match as many units possible and so you're going to impute them or something but you're pretty confident that a smaller subset of your data is sufficient for estimating the outcome well in the holdout set and so you drop those units for computationally efficient reasons so there's different reasons why you might have them be different but one corresponds to how you deal with missing this in the data to be matched and one with how you deal with missing this in the data that you used to compute covariant importance I just make a comment that often I have to do I do some and so it would be interesting to see how this algorithm can be sort of applied there to see what kind of matches it gives compared to others so I'm really curious to check this out yeah you should definitely check out the papers I'm of course very biased but there's I think some very compelling simulations on how this can do compared to methods like propensity score matching and even if you know propensity score matching does have nice theoretical properties but the fact is that if you do look at some of the match groups you are matching together units that look wildly different there's nothing in propensity score matching that prevents you when you're trying to determine some sort of medical treatment effect from matching together a 20 year old and an 80 year old that might be totally fine even if a doctor would never say oh yeah these units are similar so we're going to kind of compare them in order to estimate the treatment so yeah it's a nice feature of our method yeah and also I think that often when you do a match it would be nice to see exactly what was matched on so you actually picked out those things so that's actually very useful because it almost seems like a basic step but sometimes you know it goes out in the wash yeah absolutely yeah it's a very good point so thank you very much and I want to say that I think all of you have done wonderfully in terms of time so I really have made my job a lot easier just wait a second more to see if there's any question or anything like that okay now I forget whether the panel happens now or is it after the sponsor talk I don't know I think it happens after the sponsor talk I think so I'll go ahead and then introduce our next speaker is Mehret Batav Singh and he is the I'm not mistaken the CEO of Crocogia and they've been a big supporter of our consortium which has done wonderful work in supporting odd activities so of course as I use our participant thank you for the sponsorship and so we'll go ahead and I think he's going to have a video presentation I think so can you go ahead please hello everyone I hope you have enjoyed the conference and are ready for the weekend as the conference comes to a close I sure have enjoyed the talks a brief introduction about me I am the founder and the CEO of Crocogia we are a data science consulting company heavily involved in the R community I also sit on the R consortium board as well as the infrastructure steering committee I will be talking about the work that R consortium is doing how we are being the invisible hand in the community and helping to connect the dots I would also like to share why I decided to join the R consortium R consortium fulfills a unique need in the growing data science space our language resources are critical tools in the data driven economy but where to find the resources which ones are best to a leverage and which needs strategic support and funding that's where we have stepped in but let me also talk about how we do it as you can see at the ground level we have an infrastructure steering committee a marketing committee and working groups R consortium sits under the umbrella of the Linux foundation it's governed by the board that consists of the representatives of the member organizations ultimately our consortium is here to support the R community and promote, develop and extend the reach of R and while we are talking about members our members leading institutions and companies are dedicated to the use develop and growth of R they are working with us together to ensure the future of R and we couldn't be prouder let us take a moment and think why they choose to do so strategic funding of R is not something that individuals can do no matter how capable and it's not possible for shareholder beholden corporations either that is why the vendor neutral open governance our consortium has continued to be supported by our members over the years and if you happen to be one of the part of the organizations we thank you very much for the support from a high level view you can see some of the big ways we are involved in the R community our consortium grants to our infrastructure projects our repositories our specialists our events and our communities not just locally but worldwide and so far we have invested more than one million dollars into all these various initiatives these funds support projects from small conferences and our development to large scale projects such as the R validation hub and I'm also sure that each one of you has experienced the work that we are supporting either as part of our ladies group or the our user group or some other form that has been a brainchild of the consortium we foster the community the R meetings and those gatherings this past year was a hard year by any measure and the fact that we are not meeting in person and are talking virtually is a testament to that companies struggle to find a way to move forward during a global pandemic our open source governance and foundation model has been uniquely positioned to benefit the worldwide community of users maintainers and developers of our software as an example our continued to shine as a tool to effectively map and organize data on COVID you can see these activities and projects that are on the screen these activities and projects are run by various parts of the R consortium some are run by the infrastructure steering committee, ISC that evaluates funds and manages individuals as well as team technical projects a marketing committee that helps us identify organize and support various events where we can collaborate over common objectives and initiatives in addition we also operate via working groups which consist of members as well as non-members that help us run top level projects and I will be giving examples for all of these shortly but let me stop here and take a pause and talk about two examples that are being used heavily day in day out by all of us the first one is the package DBI the first few grants that our consortium made went to DBI project which was run by Kiril Mueller to improve the R database interfaces the DBI project and resulting eponymous our package substantially improved our low level connectivity to several databases including our Postgres our MariaDB our SQLite ODBC big R query and several other database packets the impetus behind the project has been a long term effort from the project to do distributed statistical computing the second example that I want to talk about is the R Hub the initial infrastructure that was put in place into our consortium was primarily done with the idea of R Hub in mind it is a collection of services to facilitate our package development it has become ubiquitous now it is free for all members of the community the most prominent servicer the R Hub builder compliments the services offered by CRAN by allowing our package builders to check their packages on several platforms against multiple versions of R additionally it allows package developers to build their package binaries and the binaries for all source dependencies on several platforms and R versions and these are some of the examples that how ISC has helped and we have come together as a community to achieve so much here I show you a small sample of the projects that we funded in 2020 and this is an example of the projects that we are funding right now in 2021 and I'm pretty sure that just like we talked about DBI and R Hub we might be talking about some of these projects in the future now coming over to working groups one example that is very dear to me and I'm closely involved is around farmer related working groups and how holistically we are trying to solve the pharma industries problems and just to showcase how we organize we are currently working with government agencies like FDA on the top and providing a vendor and a company neutral environment but at the same time collaborating and cultivating industry-wide initiatives and you are more than welcome to be part of these working groups even if your company is part of our consortium great even if it's not we welcome all and also let's talk about a more detailed view of the working groups they are all working in tandem towards the common goal of developing R for activities and pharma how we nourished R for activities and pharma and those initiatives range from IT to infrastructure to compliance to training and education all working coherently all working cohesively towards the holistic goal of how we can make R more successful in the pharma world I hope this gives you a window of the work being done by the R consortium as we support the growth of R with the ISC examples with the working group examples and of course the fact that we are here our consortium is one of the sponsors for this user conference as well now that you know about what and why of R consortium let us also talk about the dots the various dots that we need your help to connect with and in my mind the dots for any community has a few characteristics the first one I would say the community is organized around a common goal and we all know how strong the R community is how aligned the R community is and how we collectively want the success of our language second one I would say is members interact frequently the events the local events and various forums allow us to engage and interact very closely third one I would say that the members contribute towards the common good the various working groups the various ISC projects the various conferences provide us a platform to work towards those common goals and lastly but not the least I would say the community really recognizes those contributions we really appreciate all the hard work that is being done by the volunteers and the consortium all of them we really appreciate it and last but not the least this user organizing committee and all the volunteers have helped us do this virtually they deserve an applause now let us also talk about you how can you help us you can help us by joining the R consortium you can also help us by asking your organizations and companies to join be it your university, be it your college be it the place that you work at please help us by joining and supporting us this allows you to further demonstrate yours as well as their commitment and contribution to the R community so let's become friends follow us reach out to us contact us you know through LinkedIn, through Twitter through the website I really look forward to connecting with you and I also look forward to meeting you guys in person next year thank you so much thank you very much and I think at this point we are all all the panelists are going to be available for any further questions that the group may have by the way I also wanted to add my thanks at this point to the user organizers and all the people behind the scenes who have done tremendous work I know, I don't know whether they have slept so and I looked at Stephen's checklist for example and I told him I don't know how you can keep this up it's unbelievable how many things he's doing at the same time so I want to have thank you to all of them and with that I also open the floor up to questions for others actually I have one question from Herpa Tuff which is do you have any statistics of how the usage for example I know that it's gone up tremendously and for example it's never clear for example how many VMs are allocated to it, how many how much of the cloud resources are being used etc etc like it would be nice to have some statistics on that so people know numerically what's happening and how many packages are checked today unfortunately I don't I don't have an answer but I can get back to you on it alright so I think at this point I also wanted to mention that I put in my chat that we have a Slack channel where you can continue the discussions with the speakers and also on other topics it's called talk underscore stats underscore data underscore analysis underscore two and it's a channel in the Slack use art ready to do on launch for example you can add your channel and continue your discussions there I think that I also wanted to say one thing too which is thank all the speakers I think that you made me laugh I was worried a little bit how this was going to go but you all made it very easy for me so thank you very much for excellent talks and I enjoyed it very much