 Right, and I'm going to share my screen, can someone confirm whether you can see the shared screen. Okay, thank you. So, today we have this workshop, our guide for targeted maximum likelihood estimation in medical research. Yes, and Korean, I am from University of British Columbia, from the School of Population and Public Health. And my research area is generally speaking machine learning causal inference. Since I have an interest in both machine learning and causal inference, this team really or the targeted maximum likelihood estimation is actually very relevant for my research. We also have Hannah. She is a graduate student in my lab. Hannah, do you want to introduce yourself. Sure, yeah, I've just completed my first year of my MSC here at UBC and population and public health. And so yeah I've been focusing on this TMLE recently in my research. And yeah, definitely interesting field. Right, so in terms of today's workshop. Let me just go through the goals first. In today's workshop, what we will do is we will try to get some introductory explanations or ideas about the targeted maximum likelihood estimation. But before we go there, we will need to give some explanation of some of the relevant methods because TMLE is built on some of the other methods that are already available. These methods are GCompetition and inverse probability of waiting. In this workshop what we will do, we will start with the GCompetition, then we will introduce inverse probability of waiting, then we will transition into the targeted maximum likelihood estimation and show the steps. Before we show all of the steps we will also introduce some of the software packages that are already out there that can help you implement this method. In terms of the analysis, we are simply going to use one particular epidemiologic data to explain all of these concepts. We are really going to focus on theory, we are just going to focus more on the implementation side. So the philosophy of this workshop is basically code for first philosophy where we will just show the codes and show the implementation details and do the analysis of a real data set using those codes all of these codes that I have used to generate the results are already available in this particular workshop material. For those who have joined later, I am pasting the link for the workshop materials again. Alright, so in terms of the prerequisites, generally, I am expecting that you have a basic understanding of the R language, have some general understanding of the multiple linear regression. I'm familiar with machine learning and some of the epidemiologic concepts that will be helpful but those are not required I'm going to explain them in brief. Even though this is a workshop on causal inference, I'm not really expecting anybody to have a deep understanding of causal inference or advanced statistical methods. Alright, so this is the first workshop that we are delivering on this particular topic. So, this is the first version of this document. Obviously, if you can spot any error or if you have any comments about this document, please feel free to reach out. You can go to my website just by clicking here and you can email me from there. And if you like this tutorial and if you want to site this tutorial somewhere. Here is how you can site. So before I move on to the chapter one, I just wanted to get a sense of where the participants are coming from. Would it be okay for the participants to just type in your from where you are coming from which institute or which city you are coming from in the chat box. University of Colorado. Alright. Oh, I see someone from Ottawa, Australia. Okay. New York. Netherlands. Okay. Alright. South Africa. Alright. Alright, so even if some someone is late that's probably not a big deal in the sense that we are recording this session. And we should be able to post this recording after the event so that you can review the materials if you want to or if someone is not joining live they can also view the materials. Just let me check one thing. So I see the recording button is on so should not be a big deal. If someone is joining late. For those who have maybe joined late and have not seen the materials yet. Here is the material. So, today, we are going to use some of these signs. So can you just mention that whether you can see the chapter one that I'm showing right now. Yes, okay. Alright, so let me just start with the chapter one and and then we will go on with the rest of the chapters. So my general plan is we have three hours, but we do not necessarily have to use the full full three hours. So what we will do is we have in total eight chapters. And within these three hours or less we will try to cover all of these chapters. And after I cover each of these chapters I will pause for questions from the audience and if there is any question I can try to answer, or if you do not want to wait and want to go ahead and ask the question you can always type the question in the chat box. At the end of the chapter I will pause and try to read the questions and try to answer those questions one by one. Alright, so let us begin so in this particular workshop what we are doing is we are going to use this right heart catheterization data set and this particular data set is openly available in the Vanderbilt biostatistics website. And in this particular data set what is happening is they have a procedure called right heart catheterization which is basically a monitoring device for measurement of cardiac functions. In 1996, Conor and colleagues has published a article in JAMA where they have examined the association of the right heart catheterization use and a number of health related outcomes. And one of those health related outcomes was length of stay which was measured in a continuous scale. In our workshop today what we are going to do we are going to simply focus on the relationship between RHC use and the length of stay as an outcome. So we are going to assess the relationship between these two with the understanding that we also have a number of adjustment variables. In this workshop what we will try to do is we will try to adjust the adjustment variables to get a better estimate of the RHC use on the length of stay in the hospital. So all of these chords that you are seeing all of these analysis that I have done all of these chords associated with those analysis are visible here if you want you can hide the chords if these chords get annoying to you. But generally speaking I will show the course with the understanding that you understand are and show all of these reproducible chords. The first step what we are doing is that after downloading the data I'm just going to prepare the analytic data so that I can use this data for my analysis. So in this particular step what I'm basically doing I'm recording and restructuring some of these data so that I can get a analytic version of the data and I will be using this analytic version of the data for the rest of this workshop. For this workshop. There are three notations that I'm going to repeatedly use the first notation is the exposure status, which is denoted by a. So in this particular workshop, or in this particular example, RHC use is our a variable or the exposure status. And then we have the outcome or why, which is the length of stay in hospital in our particular example. So we have a number of adjustment variables or covariates, which is going to be denoted by L, and we are going to show the covariates below. So in this particular data set, we have 49 covariates which is like disease category cancer cardiovascular event income body weight and stuff like that so in total we have 49 covariates. So that makes 51 variables because one variable is for the exposure status one variable for the outcome status and one and 49 variables for the covariates or this L variables. All right, so if you go to this corners paper by the way if you if you just click on this link and you will it will take you to the reference and you can download the corners paper for free. From the research gate and if you download that paper you will see there is a table one in that paper. So in here what I have done is I have basically created the table one from there using the create table one package using some of the important demographic and disease variables that we have in the data set. If you look at the data set you will see there are about more than 2000 subjects who are treated with RHC and there were more than 3500 observations or subjects who were not treated with the RHC variable. In a general sense if you say for example look at the age distribution in both of these categories. Generally speaking, the age distribution in terms of this proportion that you see in this parenthesis they generally look very similar. So that kind of gives you a sense that the data set that we are dealing with are mostly balanced. But we will say more about that a bit later. Alright so we also looked at the length of stay variable and we tried to compare the numbers with the papers we can see in terms of the length of variables when we take the mean for the people who took the RHC variable. It was 24.86 and if we also take the mean of the length of stay for those who did not take the RHC. The mean was 19.53. Interestingly enough, when we looked at the paper, the numbers were slightly different. I don't know whether the data was later somehow manipulated or there was another variable called ICU versus hospital, whether that variable was supposed to be there. We don't know but generally speaking I'm going to work with this particular data set where the mean is 19 versus 24 for the outcome under both of the exposure groups. I also have calculated the medians and the medians are also slightly different. In the papers they have reported the medians to be 13 and 17 but when we calculated the medians they were 12 and 16 so slightly different. I'm just saying it just so you know that we are dealing with slightly different data set so so that we can manage our expectation about how closely we can match the results. Alright, so in terms of finding out how useful RHC is, we can basically run a regression without even adjusting any other variable. We can just find the crude analysis results first and in that crude analysis results we can see that we are simply running a linear regression model where why is the outcome and A is the exposure variable. And when we run the regression we get a coefficient of 5.3 with a reported confidence interval. And now what we do is we basically use the same regression but now we are adding all of these 49 baseline confounders that we have and we again run the regression. Then this is the outcome regression that we are running where we have why is the outcome A is the exposure variable but we also have this 49 covariates. Then we can see the coefficient of A or the RHC use is turning into 2.9. So previously when we did the crude analysis the coefficient was 5.3. But after adjusting, we are getting the coefficient of 2.9. So obviously we are seeing some change when we're adjusting for the 49 variables we have. So what is the thing that everybody should do when they do regression analysis, they should check the regression diagnostics. So we check the regression diagnostics. So this is a plot where you can see this is a plot for residuals versus the fitted values. Obviously there is some pattern showing whereas in the diagnostic plots we generally expect that there will not be any visible pattern. So the data sets or the data points in this plot should be as random as possible which is obviously not the case for this particular regression. So we again check the QQ plot deviated from normality to some extent we again check the studentized version of the version of the residual versus the fitted values. Again, there is some pattern that we can see. There are some other plots so generally speaking, when we ran the multiple linear regression, the diagnostics did not look so good. So what we are trying to do is we are just trying to get a sense of what is the treatment effect estimate, right? Or whether there was any difference between the RHCUs versus non-RHCUs, whether there was any difference. So in Connor's paper they have done a propensity score matched analysis. Just a short note for this workshop is that we are not going to cover propensity score analysis in detail for this particular workshop because for TME we basically need G competition and inverse probability weighting so propensity score is not on the top of our list. But for those who are more interested to know about propensity score matching, there is another workshop that I have which is called understanding propensity score matching. You can check out the materials from the for that workshop in here. Also I have a YouTube video on the video recording for that workshop in another conference so you can feel free to after this workshop you can feel free to take a look at that. So in this workshop what I'm going to do I'm just going to briefly talk about what is what is it that this Connor's paper have done and we are trying to just going to replicate whether we can get the similar results or not. So to get the propensity score estimates what we generally do, we basically try to fit the exposure model first. And in here we have fitted the exposure model to get the propensity score and the fitted value from that exposure model is basically our propensity scores. So using the match it package we simply run the propensity score analysis and the diagnostics for propensity score analysis is very easy to check. You simply plot the propensity scores for the treated versus untreated. So in here treatment equal to one means RHC use treatment equal to zero means non RHC use or there is there was no RHC use. In the unadjusted sample when there was no propensity score matching you can see the distribution of the propensity score were different. But when we do the propensity score matching, this was a one to one matching, we can see the distributions are very similar so the diagnostic parts for the propensity scores are much easier to check compared to the diagnostic part for a regression. This is a very big table where they are showing a lot of numbers but you just take a look at whether how many of them were balanced and how many of them were not balanced and you can see none of the cobitters were unbalanced. So all of the cobitters that we had in this propensity score is going to be balanced and you can see this better in a plot this is called love plot when you run this love plot. You can basically see the difference between the unadjusted standard as mean difference versus the propensity score matched standard and mean differences. So these blue ones are coming from the propensity score matching, and you can see none of them are going above or below the point one cut point. So we are happy with the matching. So once we are happy with the matching we can simply check the cross tabulated version of what is the mean of the no RHS and what is the mean of the RHS and we can do a test to see whether they were different. And by this test, it shows that it is different. Interestingly enough, if you go back and look at the corners paper, the corners paper found a different P value. And this is not unexpected when you are using a one to one propensity score matching, where there is a lot of variability, and matching is basically done randomly to some extent. And there was some difference between the analysis that corners have done versus what analysis I have done because I have used a caliper option here. So that is why that is maybe one reason why we're seeing a slightly different result. Okay, so this is just by checking the mean of the outcome in the exposed versus unexposed group we can also try to estimate the treatment effect, just by using a regression we do not have to adjust for any other baseline confirmers or covariance anymore because we have already masked the data in the propensity score mass data, we can get a treatment effect estimate of three points something right. So this is not the core part of the workshop but I just wanted to show you that this is the coefficient that we are getting from the propensity score matching. Other papers that have used the same RHC data say for example this Kili and small paper that was published this year in the American State Station they have also estimated the treatment effect. And their point estimate was slightly different than ours their point estimate was two whereas our point estimate from the propensity score matching was three. So we are seeing a slightly different method that and that methods name is targeted maximum likelihood estimation using super learner. So we are just going to you learn in this workshop how to use this targeted maximum likelihood estimation method using the super learner. So at this stage you can see some of these relevant messages. Before I move on to the G competition or the next chapter. I just wanted to ask whether there is any question from any of you. You can either type in the chat or I don't know whether you can unmute yourself. Alright so for the propensity score matching there are many different ways you can do the matching in this particular workshop. I have showed a one to one matching, but obviously there is variable matching that is also available. You can also do optimal matching that will try to take more observations from both side from the treated and untreated to get a better estimate so obviously, as you can the one to one matching is also reducing your data set to a larger extent and many people might feel not comfortable with that so they usually prefer to use a higher ratio. What, what is the recommended ratio that is a hard question to ask because that kind of depends on data to data but generally speaking, as high as possible, given the context of your data is more encouraged. Alright, so just to give you how high level view of the difference in propensity score matching and tml is that in the propensity score matching we're basically focusing on the exposure modeling right so if you go back to the propensity score modeling approach you can see here. In the other propensity scores what what I'm doing is I am using this a as the exposure variable, and I will go into more detail when I talk about this inverse probability of waiting in the chapter four. But generally speaking in the propensity score matching, we are focusing more on the exposure modeling, but in the. What we do is we first fit a G competition model and then we use the propensity score to recalibrate our model, so we are going to talk more about that in the later chapters. Alright, so if there is no other question let me move on to the next chapter which is going to be our first step towards this targeted maximum likelihood estimate. And to explain some of these ideas, I will first reduce the data set to some extent so that I can fit the whole data set into one table and explain the ideas that I want to explain. So just a recap of the ideas. What we have is we have approximately 5700 subjects we have one outcome variable length of stay one exposure variable. What are the status or are it to use and we have 49 coveter some of them are demographic variable some of them are clinical variables. So this is the data set we are going to continue to use. So just so that I can explain my ideas in here, instead of using 5700 subjects, I'm just focusing on six subjects. So in these six subjects you can see four of them are female and two of them are male. And also we have a lot of coveter right so we do not really want to spend a lot of time at the beginning in the coveter so let us just focus on only one coveter six in this particular data set. So this is a very small data set consisting only with six subjects. So let me explain some introduce some of these new notations that we are going to use in this G competition chapter. So previously we have already talked about a being the exposure status y being the outcome status. Now we introduce two new notations. The first notation is why an in parenthesis a equal to one. It means potential outcome when the subject was exposed. So what does it mean in our particular example it means that length of stay was our outcome right so it means why in parenthesis a equal to one means that the length of stay when RHC was used by a patient. And similarly why a equal to zero means the potential outcome when not exposed that means that length of stay when RHC is not used. So to make the ideas a bit more concrete I'm just going to show the data set but just to recap we have 14 n cooperates and we have we had this format of the data we had L column we had a column with the RHC status and we had the y column with the length of stay where length of stay was a continuous variable. A was a binary variable and sex was also a binary variable in this particular data. So from this formulation of the data, what we are going to do we are simply going to convert this y into this y equal to one and y equal to zero formulation. The idea is that we are simply splitting this y column into two different columns outcome when exposure was applied outcome when exposure was not applied. So what does that look like in our data set so we restarted the data do not pay too much attention to the code but this paying attention to this table would just be sufficient. In this original data. We had the first subject was a male subject who was not exposed to the RHC treatment and the length of stay was nine days. Right. So if we go back to our new formulation data. We can see that this nine observation or nine days is moved towards why when a equal to zero. Right. And we do the same for anybody who did not take RHC. So we move this observation here and we move this observation here, the seven observation or seven days is moved to why in parenthesis equal to zero. Similarly, for those who whose exposure status was one, you can simply get the outcome of 45 days when that subject was exposed into a different column y equal to one. Right. So if you wanted to calculate the treatment effect estimate out of this particular table, what you would do is basically you would take an average of all of the observed values under y equal to one, which is 36, and you will again take a mean of all of the observed values under y equal to zero, which is 18 and if you subtract these two, you can get a treatment effect of 18. One important part of point of this table is that you cannot calculate that treatment effect estimate by subtracting these two columns observation because only one observation of outcome is observed in the data set. You cannot have the same subject who has the exposure and not as the exposure. Right. So that is why we often call this type of notations as counterfactual notations because we cannot observe both. So this is counter to factual. So that is why this is called a counterfactual notation. And in this, when we only use the observed observed data we cannot calculate the treatment effect for each of the individuals. But we can obviously calculate a treatment effect estimate when we take averages of the observed values under each of these columns. Right. So, think of this, if you if you have this problem of obviously you can take the mean of these observed values but there are some some of these cells of this column that are unobserved. So one thing that you can do is, you can treat this problem as a missing data problem. And you can try to impute the values so that you can get the treatment effect estimate for each of these individuals. Right. So that is what we are doing in this particular table, where I simply imputed this value 36 which is the mean of the observed values into all of these cells, where we did not observe any values. I also do the same for the unexposed. We can see the 18 was the mean outcome. And I simply impute this 18 values in all of these cases. And then what is wonderful about this is that I can estimate the treatment effect in the individual level so I can try to get a treatment effect estimate for each of this participants. Okay, so is that the best we can do we can just impute the mean is there anything else that we could do. Notice that we had said for example, male and female subjects, and it is certainly possible that the sex variable is a confirm that So in that case if you just impute this mean values that would not necessarily get rid of conforming in any sense. So in that case, what would be a better method is to get the mean values for male participants when they were treated and get the mean values for the female participants when they were treated and we simply impute the values for male and female separately. So two was the mean for male and 52.5 was the mean for female. So if we have a participant who is male we are simply going to include this to in here, but if we have a female participant. We simply impute the mean for the observed female participants outcomes mean. So that is why we are basically imputing here. This is obviously a one step better approach than just imputing an overall mean. And obviously we can do the same for when why in parenthesis equal to zero, we can do the same for them and we can still calculate the treatment effect estimate. We will now have two different treatment effect estimate but we can simply take an average of them to get the average treatment effect. All right, so now that I have explained that when you take into consideration of one confounder. How, how do you improve your imputation values. You can imagine that we have other covids in our analysis and some of them could be confounders say for example we could have age income race digits categories, each of these variables as confounders and if we wanted to do the imputation in this way. That would really take a long, long time and also in some cells we might not even have a person to represent the mean so it does not necessarily always solve the imputation problem, but we have a better tool that we could use. And that tool is regression because regression is basically a generalized method to take conditional mean on many covids right so instead of just using this one by one. Gender specific means we can basically build a regression function and based on that regression function we can try to impute whatever we think are should be the outcome under the exposed whatever should be the outcome under not exposed. In this particular setting I hope I was able to motivate you to understand why it is necessary for me to use the regression mean because we have a lot of covids here that we're doing. See we are we have a lot of covids. So basically what we are doing is we are basically building a regression function, while we have observed why basically the original why that was merged together. Exposure variable and then we have all of these baseline conformers and once we have that formula we just fit the linear regression and we find out all of these coefficients. And using this regression feet, we can try to obtain what would be the observed outcome if everyone was treated. So basically what I do in the original data set I replace the observed a status or the exposure status to RHC for every participants. And then I try to use that new data in the prediction model to get the predicted outcome for as if everybody was treated. So this is the as if type of concept that's why this is coming from the counterfactual notation. And when we you try to see the mean of the outcome who if everybody was treated the mean of the outcome was slightly over 20. We can again do the same but this time and also we can definitely put all of these numbers in a table, and we can see all of these people are under treatment. This is as if type of statement. So in here we have the original RHC status that is not what we are using we are using what if everybody was treated, and then these would be the outcome. Right. And similarly we can do the same for what if everybody was untreated what would be the outcome then, and then we can estimate the mean of the predicted values and we can get an estimate mean estimate of 20. And we can put all of these numbers in the table again when nobody was treated. And basically, now we have this problem that under this setup, we have all of these observations that were predicted from that regression, where everybody was treated. And in here we have everybody who was untreated same from the same regression we are just changing the YA column. If we are changing the column we are just getting these predictions. And if we subtract each of these values we are getting treatment effect for everybody equal to 2.9 basically everybody is using the same regression to get the estimate. The process that we went through is basically known as the G competition but I have shown the steps but I have not explicitly say what are the steps so let me show you what are the steps. Before that, let me just explain that in a general type of association measure analysis what we do is we try to get the persons who are exposed versus we try to get the persons who are unexposed and we try to get a difference between these two. But when we are using this G competition formula what we are trying to get is what if everybody was treated versus what if everybody was untreated and then we try to get get the contrast of that those two means. So basically for the G competition what we are doing is we are basically fitting a regression first this is the first step. The second step is we are basically replacing our observed a values with one all of the observed a column would be one and we are trying to get the prediction. Again, we are trying to replace our original a to equal to zero that means no RHC and we try to get the predicted outcome and once we get these two predicted outcomes we basically find the mean of those and take a difference and that is how we get the treatment effect estimate. Okay, let me, we were so far working with only six observations so let me just see how it works in all of these observations right. So in this setting you can see what I'm doing basically I'm. I'm fitting that regression where why is the outcome and a and L are the input variables. And once we feed the regression, we generally call it a Q regression and we will see this later as well. So what this regression fit we basically use a new data and in this new data what we do is we impute all a equal to one and we get a prediction, and then we do the same, but we then input a equal to zero and we get the predicted outcome, and then we get the outcome and take the difference and that will give us the treatment effect estimate. So in our particular case when we used all of this data set we got the mean estimate of 2.9. But if you look at the standard deviation, the standard deviation would be very close to zero because there was no variability remember in the data set when we observed all of these treatment effects everybody was 2.9 so there was no variability. So in the G competition method we can estimate the treatment effect estimate, which is going to be 2.9 in our example, but we do not necessarily get a good estimate for the variance. See, all of these numbers are very close together here to 2.9. If you want to really get a confidence interval for the treatment effect estimate, then we generally run a bootstrap to get the confidence interval. And when we run the bootstrap we basically see there is some variability in the estimate and in the bootstrap confidence interval you can then see when you are getting the bootstrap estimate using the normality assumption you can get a confidence interval of 1.3 to 4 and when you run a percentile based bootstrap estimate you can get 1.5 to 4.5. Obviously, as you can remember the data was not really that different from normality maybe so that's why the results are somewhat similar when you are using the normality and percentile based estimates. So we will probably take a break for about 10 minutes but before that if there is any question I can try to answer before the break about G competition. I think there's two questions in the chat. Okay, so first question is how robust are these methods against data. Not missing at random. As you know, in this particular setting we we are relying on the model specification and we are hoping that our model specification is correct. Say for example, what do I mean by model specification in this particular steps say for example when we are fitting the first step of the G competition. We are assuming a model right and when when we are assuming a model, we are assuming that we have specified our model correct you. But if we we fail to do that if we fail to assume that our model is correctly specified obviously the results are going to be sensitive to that miss specification and and that is basically one of the basis for our next chapter. So thank you for that question. The next question is what are the side effects if the linear model assumptions are not met when we are predicting exposure effect for everyone basically the same idea. So everybody is picking on this same concept that if what if this model is wrong, then how can we believe our predicted outcomes based on which we are basically getting the treatment effect estimate. Because if this model is wrong, we cannot really rely too much on the predicted values on we cannot believe too much about the treatment effect estimate that we get out of this G competition. And that is basically one of the motivation why we are moving on to the machine learning methods when we are fitting this G competition and I'm going to cover that in the next chapter. Okay, so is there a problem regarding overfitting it seems the predictions are the model are not trained hold it as yes. Yes, this this is a very relevant question and a very good segue for the next chapter in this next chapter we are going to talk about overfitting as well. So thank you for that question. Is there any other question. If not, let us meet in 10 minutes. So in in my Pacific time I it shows 1248 now. So, in 1258 will come back and start from the chapter three. Thank you. Hi everyone. Welcome back. Let me share my screen first. Alright, so basically, let me give an overview of what we have covered so far. At first we have talked about the RTC data set and we have showed some initial proficiency score estimates from that data analysis. And then we have introduced the G competition methods and we have explained the steps that are required for the G competition. And then we have explained that to get that treatment effect estimate G competition is fine but for getting the confidence interval you need some procedure like bootstrap to get the proper confidence interval. So now we are going to start with our next chapter which is the G competition using the machine learning methods. So as from some of the last question answers, you can imagine that G competition is a method that is highly dependent on the parametric assumptions that we are making or the model that we are fitting based on which we are doing this prediction so this model model is the core of everything. So, getting the model correct is very important for G competition. But one of the problem with parametric regression is that you, as an analyst will exactly have to know what is the confirm that what are the interactions what type of polynomials you have to use and stuff like that. Whereas we know that for some of these machine learning methods, they can utilize some of their power to automatically detect some of these non non linearity and non additivities. So for example, some of the tree based methods have these additional advantage. But one of the problem with machine learning methods is that due to the slow convergence of those methods and the non parametric nature of those methods. The coverage probabilities are often very poor and that's why the standard error that you get out of those methods even if you do bootstrap are not really that much reliable. In this particular chapter we are just going to focus on estimating the treatment effect and we do not care too much about the confidence interval because you will see later in the team really there is a different procedure for finding the confidence interval. So in this particular chapter we are just going to focus on treatment effect estimate before I move on. You just want to confirmation from my audience, whether you can still see the chapter three in the screen or the sharing is not working. Okay. All right, so now that we kind of know the G competition remember there was four steps in the G competition first step was getting the regression and then getting the predictions for the exposed and unexplored and then get the treatment effect so in the first step, we're talking about the getting the model. And how can we get the model better. So as I have just explained that I can try to replace the linear regression with a machine learning method with the hope that the model specification would be automatically detected. So this is a very popular machine learning algorithm is known as the HG boost method which is basic essentially a gradient boosting algorithm and this is a very specific algorithm. And this is a winner algorithm in many of the Kaggle competitions right so this must be good in terms of an algorithm. So we use this algorithm and one of the advantages of this algorithm is that this is a tree based method and this gradient boosting so it is certainly possible for this type of algorithm to automatically detect what type of interaction what kind of transformation what kind of in polynomials are helpful in identifying the correct specification of the model. So that's what we are going to do. The HG boost package they basically require you to convert the data set into a matrix format so we are extracting the model matrix and we are setting some particular values of the tuning parameters that they expect say for example one of the parameter is max depth, which is the interaction depth, what kind of interaction depths you want and here I'm kind of saying that I want up to 10 degree interaction or 10 degree polynomial or something like that so so that it can have a lot of power in trying to identify the patterns in the data set. Other parameters I'm not going to talk too much in this particular workshop. So what I basically have done I have used the HG boost package and the HG boost function to get the fit. And from that fit I basically get the predicted values and I can try to predict the values and get the density plots and you can see the algorithm is so good that there is almost no distinction between the observed values and the predicted values. But one of the reason for this is also that we are using the same data for the model building and also for the prediction error estimation so for example when we are using the root mean squared error we are getting a very low root mean squared error. So obviously when we are using the same data to build the model as well as to get the predictions, this is sometime known as overfitting problem where the results are too optimistic. So the predictions are too optimistic and often unrealistic. One of the known methods to deal with our combat with these two optimistic results is known as the cross validation. I hope that you kind of have a sense of what cross validation is if not let me give you a very brief understanding of what cross validation does. So in a cross validated data what it does is that it splits the data into a training part of the data and a test part of the data. So in terms of model building what it does is at first in the training data it tries to feed the model so for example the HG boost model would be fitted in this training part of the data. Only this part and then in the test data we will assess the loss function say for example RMSE or something like that in the test data to find a non optimistic version of the error estimate. And then in the I again shuffle the data in a way that is is different part of the data is now the test data and the remaining part of the data is the training data. I again build the model using the training data. I try to get the root mean square error estimates using the test data using the model that I just built using the training data. And so then I get a new root mean squared error estimate from the part of the data where the model was not built. So in similar ways say for example if you have three fold cross validation then you will have three different type of patterns of the data for the testing and training and you will get three different root mean squared error. And we can simply average out these three root mean squared error to get the estimate of the average RMSE and that will be a much better version of the model to get the predictions in future data or the data that the model has not seen. So if you want to use this cross validation in R there are many ways to do this but I'm simply using the carrot formulation or carrot package to do this cross validation where I basically specify how many cross validation I want. So for example in this in here what I'm basically doing I'm just specifying how many cross validation I want. So for the purposes of illustration I'm just going to use a three fold cross validation and I also specify what are the parameters that I want in the HG boost algorithm. One of the nice features of this carrot package is that it helps to fine tune some of the parameters that you are not sure about. So for example remember previously we were talking about the max interaction depth and we previously said the max interaction depth to 10 which was overfitting our data right. So I just wanted to see what other parameter values I can use that would not overfit my data as much. So I just use some of the values from two to 10 with two intervals and I just want to see which gives me better root mean squared error. So in the carrot package I simply use the train function to run my XGB boost method or the XGB tree that is mentioned in this carrot package and I fine tuned the parameter grid that I have just specified what type of parameters I want to use in the carrot and I also specify what type of cross validation I want in the XGB tier control so I specify all of these in the train function so in the train function I specify all of this cross validation and the fine tuning parameters. And it gives me the fit where it shows me what are the root mean square error associated with each of these max depth parameters that I have said. So in terms of root mean square the smaller the better right so when we were using 10 max depth, it was giving me a higher root mean squared error but when we use a root means a max depth of two that means only two interaction. Then the root mean square is the minimal so what this carrot package does it it automatically selects the final values used for this models are. See it automatically selects all of these parameters see the max depth is selected as true, because that was giving the best root mean squared error out of all of the other parameters that we have set. So basically what I'm saying is this carrot package is a nice way to use this cross validation as well as doing all of this fine tuning that you want to do in your model to obtain the best tuning for your model. So even if the edge boost is a very complicated model, but carrot package can accommodate all of these new models into its framework and can give you the best estimate of the tuning. So once you use the carrot package and know the best tuning, you can simply get a prediction out of that carrot package estimate from the exit boost. And you when when you plot it, you can see now there are some deviations from what the original why was and what the predicted why was. But then again, when you estimate the root mean squared error, it is not as small as we have seen before previously it was below one, which was not very much realistic, but now that I have built my model using cross validation where model was built using some cross validation model and performance measures were measured in some other part of the data. I get a more more honest version of the model and the honest version of the performance measures. All right. So that was my step one for the G competition building the model. The second one is I have just replaced the logistic regression, sorry the linear regression model with the exposed model which is a more flexible, not non parametric method for getting the predictions and once I get the model. I basically go to the second step of the G competition, which is basically replace all of the observed a values to a and get the prediction. And also, I go to the third step of the G competition, replace all of the A values to zero and I obtain the prediction. The last step is basically getting the difference between these two and getting the mean of that. And when we do that, we can see that the mean estimate is slightly high. So the treatment effect estimate. Now, when we use the exit boost method is for right so that was one method that we have used to estimate the G competition using machine learning. The machine learning method have or machine learning has many different methods right exposed is not the only method. There are other popular methods such as regularized models or lasso models, which are also very popular in dealing with linearity and variable selection and all that. So this is something we also wanted to test and fortunately for the GLM net or the package that fits the lasso method, it has a CV dot GLM that automatically does the cross validation for us and once we do the model using the cross validation with three cross validations or three fold cross validation, then we can get some estimate of the first step of the G competition right so instead of a boost we can simply use a lasso method to get our new model. And once we have our new model using the lasso method, we can simply replace the A values to one to get the second step G competition predictions and we can also replace the A values to zero to get our third step G competition to get the predictions and once we get the predictions, we can easily estimate the treatment effect estimate in the sorry it should be step four in the G competition method, but now you see the treatment effect estimate is 2.7. Remember what was the treatment effect estimate when we were fitting the XG boost, it was four and now it is 2.7 that means when you are changing your machine learning method. It is trying to explore different patterns of the data and the average treatment effect that you will get out of that particular machine learning method can be slightly different. So which result should we now believe four or 2.7 right so, and this is the same problem for any other machine learning method so if you are trying to use a bagging method if you are trying to use a random forest method or if you are trying to use any other cart method, or any other method, it might give you slightly different estimates of the treatment effect estimate, but this is the only thing that we are trying to get getting the treatment effect estimate using this G competition method right so how can we do better than just choosing a random method from our machine learning toolbox. One of the technique that we can use is that instead of using only one machine learning method, we can try to get an ensemble version of machine learning predictions. What that means is that instead of relying on only one machine learning method, I will probably rely on multiple machine learning methods that I feel are suitable for modeling my data. And then I will try to somehow combine all of these predictions into one prediction, so that I can get a better prediction for my G competition treatment effect estimate. All right. Before I move on to the next part, I just wanted to take a pause and take any question from the audience is there any any question from what I have already covered in this chapter three. You can type it in the chat box. There is one question in the chat about cross validation. Okay. So despite using CV are we still predicting on the same data used to build a model right yes that that is exactly right so there are other versions of the cross validation such as cross fitting that does a outer version of a fold where it makes the predictions and stuff like that using that we can certainly extend these to a newer part of the data. And that that would be then a more honest version of cross validation, but I will talk about that at the very last part of this particular chapter. But thank you for the question it is true that even though we are doing cross validation, eventually we are using the same data to build the model. Generally speaking, and using the same data to get the prediction, but that's still better than using just the one data set without cross validation. Right. Yes, you are right that we are predicting counterfactual exposure. So what comes for each of the exposures for each participant participants that is also correct. So that is one of the reasons why we need to use the whole data inside the cross validation in this particular example. Any other questions. Okay, if not, let me move on to the ensemble version of the machine learning methods or which is known as the super learner method in in the course of inference literature. So in this particular section what I'm going to cover is super learner which uses cross validation to find the weighted combination of the estimate from the different candidate learners to get a better prediction than any single particular prediction prediction or any part in any single learner. So for example, if you have just exposed you have one particular column. If you just use the jlm you have another particular column. And if you have less or method that then you also have another particular column for the predictions from all of these outcomes. But what this super learner is going to do is it will try to get some sort of weighted average to get a new column for prediction, which will take the best part of all of these predictions that we have so far. So let me explain that in a step by step fashion. So for running a super learner what we do is we simply follow these four steps. First we try to identify the candidate learners which of the candidate learners you think are most suitable in finding the relationship in your data set. And you pick all of those methods obviously the more methods you have the more computing time you will need. So you have to be a bit judicious about selecting which method is probably going to work better for you. In the second step you have to choose the cross validation or K for the getting the cross validated loss function. So in here you can choose K based on the amount of data you have the general rule is if you have a large amount of data then you need less number of K. If you have a smaller data then you need a large number of K to build this cross validation and get honest prediction from from it. The third step is selecting a loss function for the meter learner meter learner is something that I'm going to explain a bit later and then the in the fourth step will find the super learner prediction that is just going to be that one column that will combine the impact of all of these candidate learners prediction. And there are two versions of the super learner prediction that I will also go into more detail a bit later. So in terms of identifying candidate learner one of the aspect of super learner that is very important is that you do not rely on just one type of method. Pick all of the methods that are tree based. So for example, if you pick all of the methods that are tree based, there are other type of methods that could bring in other strengths. So for example, this parametric models could bring you some efficiency. Why is this tree based methods can bring you the impact of what would happen with different from prediction depths for the SVM method it could give you some of the ideas of what additional type of transformation you could do and stuff like that so generally speaking, as long as your computing time is allowed or competing research is available. Try to choose a variety of candidate learners do not just stick to just parametric or non parametric or something like that. And also just choosing a parameter model is not enough, you should also do fine tuning to find out the best tuning parameters for that model. So as you can see this is not just a matter of just choosing a more a bunch of model is also a matter of fine tuning and finding out the best combination or the parameter grid for that model. So for just for demonstration purpose in our example I'm just going to use the linear model as well as the lasso model using the glem net and the edge boost for our particular example. So in terms of in the second step choosing the cross validation K, just for reducing the competing time I'm using K equal K equal to three or they call it be equal to three cross validation. And then I tried to select a loss function for the meta learner which I'm going to talk about a bit later. So for that meta learner I'm just going to use a loss function such as method dot non linear least square. So what I'm going to do is I fit the super learn learning package where I specify the number of falls for the cross validation I also specify the super learning library so the three candidate learners that they have I specify them, and I also specify the loss function that I want to specify all of this method super learning method does the cross validation to find out the estimates from the meta learner for but before that it also reports all of the predictions from each of these candidate learners that you have chosen say for example. In our candidate learner list we only had three methods right glem glem net and exit boost and cross after cross validation and after running the super learning it will give you the predictions from all of these three candidate learners so you could run them separately on your own but super learner learner will automatically give you these prediction models and super learner will also give you the cross validated. This estimate based on which you can kind of get a sense of which model is performing better. Right, so for example the lower is better that means this glem net is performing better than any of the other methods that are used in this particular super learner. Once we have the predictions and once we have the cross validated risk, then we have two choices. The first choice is to get the super learning prediction from discrete super learner. What this discrete super learner means is that forget about the other methods that are not performing well in in terms of the cross validated risk just pick the method that is doing better in terms of the cross validated risk or the minimum in terms of the cross validated risk and just choose that column as your super discrete super learning prediction. So that is how this is working that you are just choosing the least cross validated error and then you are trying to get the best estimate from the prediction. The other method is known as the ensemble super learner what it does is it has all of these predictions so this this is why I had one this is why I had two and this is why I had three several people. The first choice to build a regression where original why or the observer is the outcome and these three columns are the input columns. Right, and then you can feed a nonlinear least square to get the coefficients and obviously you can scale the coefficients so that the sum of the coefficients is equal to one, and you can get the coefficients for all of that three different why hat, and you can see, when you scale it, it says that 93% of the contribution for building the tradition is coming from only 6% is coming from the exit boost and GLM is actually not contributing anything. So that means that what you will do is you will just multiply these coefficients with your predicted values to get a new column. So after multiplying you get this new column and once you get a row some you get a optimum combination so or the weighted average of or weighted some of all of these different predictions that you have got. And, and if you also obtain the dollar trade from the output of the super learner you will also get the exactly same estimates that you have obtained just by doing the hand calculations. So, basically, there are two steps for the order two ways for the super learner one is you choose the best possibility to risk and get the prediction from only that method. For example, in our previous example, what we had is we had this GLM net methods cross related risk was the lowest. So in the discrete super learning we would only choose this column. Alternatively, you can do a ensemble super learner where you take a weighted combination of all of these predictions to get one column of prediction, and that is called ensemble super learner. Anyway, so in super learner there are these two ways but we are just going to focus on this ensemble super learner where we are taking this weighted average for all of these predictions that we have out of all of the candidate learners. Once we have this prediction column, then it is very easy, we just go to the step two of the G competition to get the prediction where our a was one, and we go to the first step of the G competition where our a was zero, I still get the prediction. In the step four, I simply get a mean of this treatment effect estimate, which is basically the difference between the two predictions that I got in the step two and three. So we get a treatment effect estimate of 1.91, which is very different from what we got for a gibbous remember for the a gibbous the treatment effect estimate was for for the GLM net the exhibit that treatment effect estimate was something like something I guess, but for the super learning when we use the ensemble method we get a treatment effect estimate of 1.9. And there are some other details of how to choose K and what type of cross validation you have to choose and what do you do in the dependent samples that I have listed in this particular page, but generally speaking. So we can see the general pattern here. We first try to identify a model that is working better for us in this particular case, we have chosen to use a super learner that will combine the predictions from many other candidate learners that we think are useful in modeling our data. To get the model, we simply get the prediction where all exposure was one, and then we get the prediction when all exposure was zero. Then we, we take subtraction of these two type of predictions that we get from these two different predictions, and then we take the mean to get the treatment effect estimate. So that brings me to the end of this G competition using ML before moving on to the next chapter IPTW. Is there any particular question about this chapter. You can type it in the chat box. Alright, so obviously you can see when you are using the model that is using different type of predictions from different learners that treatment effect estimate that we got from the exhibit or the treatment effect estimate that we got from the MLM net is slightly different than the treatment effect estimate that we got from the super learner. So the question is, is there an intuition behind what is happening, or why we're getting why we're not really getting something in between, right. And that kind of depends on because this is this is really a multi step process. So in one step, if we are just talking about super learning predictions, if you talk about the super learning predictions and we get the ensemble learner, say for example you you are getting 14.5 for the first patient, when we feed the super learning model, right, but when when we had all of this. We, we essentially had something very similar right 14 point. What is it 14.5359 and 14.59 so that means that when we are talking about the predictions, our from the super learner, our prediction was actually somewhere in the in between right because it is taking some sort of weighted average. But then what when it is converting to the next step of trying to estimate the treatment effect estimate, then the relationship is not that linear anymore. Okay, so the next question is, could you please compare and contrast treatment effect obtained from G competition versus the propensity score matching. All right, so in the G competition what what is happening is that we are relying on the outcome model, and in that outcome model we are trying to build our outcome model to the best of our ability. But in the propensity score matching what is happening is that we are relying on the modeling in the exposure state, right. And this question is actually very relevant for the next chapter that I'm going to cover. So let me dive into the next chapter of IPTW and then I can try to explain this answer a bit more detail. Any other question from this particular chapter. Okay, if not, then let me move on to the IPTW chapter, which is very similar to the propensity. So it is also using the propensity score to get the IPTW so there is some similarity between the IPTW as well as the propensity score matching because we are first building the propensity score model and then we're dealing with the treatment effect estimate in different ways, propensity score matching is one way IPTW is another way. All right, so in this propensity score world, what we are basically doing is that instead of focusing on how to make our outcome model better, we focus on how to make our exposure model better. The focus is slightly different here so instead of just focusing on how to best model our outcome, we here first focus on how to best focus, how to best model our exposure model. And this exposure model is the propensity score model that we will see now. So once we have that exposure model then we try to focus on balance and then we think about outcome modeling, but the focus is on exposure model. So if we try to do the inverse probability of treatment waiting there are four steps that we need to follow. And in this IPTW the first step is always true to feed a propensity score model or the exposure model where we are modeling a not why. All right, so we are modeling the probability of getting the RHC instead of the why or the length of state. Once we get the propensity scores from this particular model, we then use a particular formula to convert the propensity scores to IPW or IPTW. We then use the IPTW and IPW as synonymously in this particular example. And once we have that, then we try to use this IPW as a weight in the sample. And then in the sample we try to see the balance. And once we are happy with the balance, then we do the outcome modeling. And then we show later what will happen if there is we are not satisfied with balance that that is something we will show later but let us just go through what I do in the first four steps. In the first step, I try to build the exposure model, see the outcome is a hair. And I'm still using the same confirmers or the covariance that I have in the data set. So the propensity score formula that I'm going to use is the outcome here. And in this propensity score model obviously we can try to make the propensity score model better by adding interaction polynomial and stuff like that or some kind of transformation if that is helpful. Once we have that we try to run a logistic regression model because we are dealing with the binary outcome here our treatment is binary right RHC versus not RHC. And then we can see the propensity score model. But interestingly, I do not necessarily care about any of this or strategy or the coefficients. So for propensity score model. These are ratios or the coefficients are not important. What is important is the prediction that I'm getting and also checking the balance. These coefficients are not that interesting. The only thing that may be interesting is sometime you can see the confidence intervals are very wide and then your propensity score estimate can get very unstable. So that is probably one thing for which you should check the propensity score model fit otherwise it's not the coefficients are not really that interesting. And generally speaking you will see later as well we generally denote this propensity scores as G previously remember when we were dealing with the G competition we were saying that we denote the G competition predictions as Q. But for the propensity scores, it is known as G. Can you excuse me for one minute I just need to drink a bit water. Alright, so I'm back with my water glass. Alright, so previously when we were talking about G competition and the predictions from G competition. We, we named it Q, but now that we are talking about propensity score, the predictions are usually known as G, G here. And in, when we get the propensity score model fitted, we simply can get the prediction out of the propensity score model, and we can try to see whether they belong between and bounded between zero to one because these are probabilities right probability of getting the G treatment, given the covenants. We can also try to cross classify the propensity scores in terms of the treatment values like for not no RHC I get from 0 to 0.95 for RHC I get approximately near 0.96 but a better procedure would be to check the density plots whether they are overlapping or not. And I think that is very important in terms of any propensity score method. And then, once we have the propensity score, we can simply use them to calculate the inverse probability weights using this simple formula. And once we convert the inverse probability weight, we can check the balance. There are a couple of ways to check the balance but the most popular one is to check the SMD or the standardized mean difference. And we try to see whether the standardized mean differences are less than equal to 0.1 or not. So, if the standardized mean difference is greater than 0.1, we generally say that we do not have a balance. The standardized mean difference or balance using the cobalt package and you can see there is a big table listed balance not balance and stuff like that. But what is important is that the end of this table it says that there was no COVID or COVID category that was imbalanced so we are kind of happy with our analysis. You can also see the same using a plot which is known as the Laf plot and in this Laf plot you can see. So, this green, sorry, the blue ones are the adjusted or the weighted estimates from the SMD and this red ones are unadjusted when you did not wait. And you can see the red ones were going beyond 0.1 SMD but the blue ones are remaining within the 0.1. So, we are happy with the waiting. Once we have done the waiting that means that all of the COVID are balanced. This is kind of like a similar scenario what happens in the randomized clinical trial. And since we are happy with the COVID balance in the weighted data, what we can do is the weighted data we can simply run a crude model. We do not care about adjustment anymore because we are happy with the balance and we can see the treatment effect estimate that we are getting is 3.0 something in our data set. Right. So that is basically a quick rundown of how we would run a inverse probability weighted analysis to get the treatment effect estimate. So similar to G competition, you can also imagine that when we are building this propensity score model here we are basically using a logistic regression. And we have 49 COVID rates, right? But instead of using logistic regression, it is certainly possible to use a machine learning method or better yet a super learning method to get the estimates, treatment effect estimates. So that is the main theme of the next chapter. But before I go to the next chapter, is there any question that I can answer in between? You can type it in the chat. Okay, so if you look at the density plot, you kind of have a try to get a sense of whether they are overlapping or not because numerically it is not very easy to check. In this density plot, obviously, as you can see, there is good overlap here. So there is densities here and here, but at the very end, there might not be that good overlap, right? But then again, generally speaking, if the numbers are not very close to zero or very close to one, we are not generally too, there is, it's not really a big problem. The one thing that you obviously should take a look is the summary of the inverse probability weighting. See, for example, the max summary is 6.3. That means there was one person who was given weight of 63 persons in that same data set, right? So, but 63 in terms of inverse probability weight, given that we have a data set of approximately close to 6,000, patient in the data set is probably not that problematic. So if our data set is 5,000, say for example, and maximum weight is 5,000, that means what? One person is given the weight of the entire other population that you have in the data set. So that would not be good. But in here, we kind of have like 11 or 12% of the weight, and that's probably not, or even less like one person, I guess, one person or two percent. So that's not really a big weight that I would worry about too much. Alright, so the second question was should we always use stabilized weights? Generally speaking, in terms of the theory, if you are using stabilized weights, one of the advantages of using stabilized weights is that you are getting a more efficient version of the standard error and consequently the confidence interval will be more reasonable. If you are using unstabilized weights, it is certainly possible that you can run into a problem of very large weight, and that will also impact your estimate, treatment effect estimate and the confidence interval estimate. Right, so yes, if possible, use the stabilized weights. So let me move on to the last chapter that I'm going to cover. After me, I will take a break and then Hannah will cover the chapter six and seven, the team and the software part. So let me just give a quick rundown of what is the IPW using ML. What basically we are doing, what's the difference between this chapter four and chapter five is basically we are replacing these steps from logistic regression to a machine learning method. It could be HG boost, it could be LASO method, but we are just going to stick with the super learning algorithm that is just going to take the weighted average of all of the candidate learners. So just the one thing that we are going to change in the IPW modeling is that we are basically going to use the same data. But we are just going to instead of just relaying on the logistic regression we are just going to work with the GLM net as well as the HG boost and get the super learner and using that super learner will get the prediction which are going to be the propensity scores and then we will get the treatment effect estimates. So let me run down the process one more time so the step one is basically expose the modeling step two is conversion to IPW step three is balance and step four is the outcome modeling. So in the first step instead of logistic regression model we are basically using a super learner in the second step we are basically converting to the propensity scores that we got from super learner to IPW. In the third step what we are going to do is we are just going to check the balance. Interestingly, if you are using a number of very powerful machine learning methods sometime what happens is that getting this balance can be very hard. And you can go to the output from the cobalt package and you can see. So these are not covets right these are the categories of the coveter so 59 of the categories of the coveter wire balance but there were at least nine categories of the coveter that were not balanced right and then that you can see from the plot as well. See this dotted line is basically our SMB of point one and you can see there were nine points that are blue that are outside of that bound so there were some residual conforming you may say, in this, in this version of the weighted analysis. So there are a couple of steps that you can take say for example if you are not happy with the inverse probability weighted. Estimates balance then you can go back to the propensity score modeling step and you can try to include more interaction and change the modeling formula and stuff like that. But if since you are using the super learning, it is already using a g boost and all of this different combinations so that's probably not something that is going to be helpful. The alternative that you can use is that when you are doing the outcome modeling you instead of doing the crude modeling, you can do the adjusted modeling when you are adjusting for the baseline confirmers again in the outcome modeling even though you are using the weighted version of the data. So when you do that you get a point estimate of 2.9 and you can do it by hand or you can use the package like there is a package called weighted package and you can get the same estimate from the weighted package, which is going to be 2.9. So I think there is a clarification from Hannah that you can see in the chat. Is there any particular question about this IPTW. Yes, that is exactly correct that when you are using the IPTW you can basically think of creating pseudo population similar to our counterfactual idea that in the pseudo population that is no confounding anymore. So because if there was no confounding anymore, then we could use the crude estimate in the outcome model. But since in our last example when we did the balance checking and we saw that there was some SMDs that were higher than 0.1, there are a couple of things we could do we could have adjusted for everything. So with this one we have adjusted for all of the Brazilian confounders or we could adjust for only those nine COVIDS that were imbalanced. And that is supported by this particular citation and we can check and that would also adjust for or address some of this residual confounding concern that you may have out of this analysis. Alright, so that's the end of the chapter five. Okay, is there a question does this method is also applied to other matching techniques like course in exact matching. I, I think that in the exact matching in the questioning sense that is a slightly different method that I, I do not necessarily want to cover in this particular workshop. But I think it's kind of like a different type of matching and algorithm there are there are many different type of matching algorithm and you should read the theory a bit closely to have a general understanding of what is possible in that type of matching algorithm so unfortunately I'm not going to cover this in this workshop so I'm not going to address this question. Any other question about this chapter. If not, let's take a 10 minute break will just come back at or nine minute break will just come back in nine minutes and then Hannah will start her discussion about Emily in the chapter six. So see you in nine minutes. Can you confirm that you can see my screen please. I can thank you. Alright, so now that we've covered our outcome models like those that we used in G computation, and our exposure based models like the propensity score methods that we talked about. We're going to talk about something called doubly robust estimators. So, doubly robust estimators. So this is information from both the exposure and the outcome models. And this allows them to provide us with a consistent estimator. If either the exposure or the outcome model is correctly specified and consistent estimator here means like as the sample size increases the distribution of the estimates gets concentrated near the true value of the parameter that we're estimating. So this doubly robust methods can also provide an efficient estimator, if both the exposure and the outcome model are correctly specified. So efficient estimator here means that our estimate actually approximates the true value of the parameter we're interested in, in terms of the loss function that we've chose. So essentially the point is with these doubly robust estimators that we have two chances to get our model specification correctly. So in the in the methods that we discussed previously, we always only had one chance to specify our model correctly. But here we kind of have a backup system where if either of our models that we've specified is correct and then we still get a consistent estimator. So Tmly targeted maximum likelihood estimation is one of these doubly robust methods. And essentially, it uses the propensity score or exposure model to improve on an initial estimate from that we got from our outcome model or essentially our G computation step. So why do we use Tmly? Not only is it doubly robust, but it also allows for the use of data adaptive algorithms like machine learning without sacrificing interpretability. So what I mean by that is that machine learning is only used in the intermediary steps to develop our estimator, rather than performing performing the estimation directly with machine learning. And that means that we can create 95% confidence intervals and such that are valid for statistical interpretation. And as we covered before, the use of machine learning can help us mitigate this model mis-specification. So Tmly has been shown to outperform some of these other methods that we've talked about before, particularly in sparse data settings. So Luke Fernandez and his colleagues, they made a really nice tutorial for Tmly and the steps of Tmly in R, but they did it for a binary outcome. So we have a continuous outcome with our RHC data set. So there's a couple of steps we need to add at the beginning and at the end to make sure that our method can handle our data. So the first step for us is the transformation of our continuous outcome variable. So this is the step that you can skip if you have a binary outcome. Step two will be making our predictions from our initial outcome modeling, so essentially G computation. Step three is then making predictions from our propensity score model to get our propensity scores. Step four is estimating something called a clever covariate H and step five is estimating a fluctuation parameter epsilon. And these steps four and five I'll go more into what these mean in the end of it. And then step six will be actually updating our initial outcome model prediction based on our targeting using this clever covariate and the fluctuation parameter. And then once we have our updated predictions, we can find our final treatment effect estimate. And then step eight is transforming back to our original outcome scale, which again can be skipped if we have a binary outcome. And then step nine is just your usual confidence interval estimation, which will go into how how that works in a bit too. So we're going to go through all of these steps one by one using the RTC data set that we've presented in the previous chapters. But just as a reminder, the exposure that we're considering is RTC right hard capitalization. And the outcome of interest is length of stay in the hospital. So step one, the transformation of our outcome why. So in our example the outcome is continuous. It's intiemily it's intiemily it's recommended that we rescale our outcome so that it lies within the range of zero to one. And the reason behind this is that we're going to make predictions for each of our observations based on the covariates. And we want these predictions to lie within the range of our sample data. So we don't want to make predictions that fall outside of that range, because we want this to stay something called a substitution estimator so we have want to keep everything within that range. And to do that we can use a logistic loss function when we're doing our estimations. So logistic loss function, essentially, make sure that we make predictions that are within the range of zero to one. But for that to be possible, the data we pass in also has to be within this range of zero to one. So that's the reasoning behind that. And so as you can see our untransformed data, the y values lie between two and 394. So one of our values fall within the range of zero to 50. So to transform this data, you do your standard scaling procedure so for each observation we subtract the minimum of our sample range and divide by the whole range. And then if we check our range we're between zero and one. So what we do then is our initial decomposition estimate. So to do this we first have to construct our outcome model and make our initial predictions. And to construct our outcome model we're going to use super learner, which we went over a couple chapters ago. So we use super learner again because it requires us not to make any assumptions before about the structure of our outcome model or the structure of our data, and helps us kind of avoid this model mis-specification. So to fit our super learner, we pass in our transform data for our y value. The first is within our x and we define our super learner library. Here we've used the same library as we've done in previous chapters. And then the important argument here is this method argument. And here we specified that we want to use a logistic loss function to stay within the zero to one range. So once we have our super learner we can make our predictions. And here's a summary of our predictions that we get initially, but keep in mind this is on the transform scale so this is not in our original scale of days. And then a note again I think we went over this before but this q not al is often what's used to represent these predictions, where a is your exposure and L is your covariates. And now comes the part that's kind of similar to G computation, we need to get predictions for all our observations under untreated and treated. So first we set all of our exposures in our data set to be one so treated. And then we make predictions on that. And then we do the same thing for untreated so we set all of our exposures to be zero and make predictions on that data set. And then we can already get an initial treatment effect estimate, just by taking the difference between these predictions for exposed and unexposed individuals. So our mean here is around 0.007. But again keep in mind this is still on the transform scale. So step three then is our propensity score model. So, at this point we have our initial estimate already and we want to perform our targeting steps so for this we need to calculate our propensity scores. And to do that we're again going to use a super learner. But this time we're passing in our exposure as the dependent variable and all of the covariates as the independent variables. So we're using the same super learner library. And again we keep this logistic loss function here as a method. So then the predictions that we get from the super learner, those are our propensity scores. And again these are represented as this G function. The G of ai equals one given li so the probability that the exposure of the ice observation is one given the covariates that this observation has. And we can also estimate the probability of the exposure being zero so being unexposed, given their covariates as just one minus this propensity score that we calculated. So here again we look at the ranges of these propensity scores for the unexposed group, and for the exposed group, and as you can see they overlap almost entirely in their, in their ranges. But here you get a better idea of how those who received RFC were actually more likely to receive RFC and those who did not receive RFC were likely to not receive RFC. So that's a fairly good amount of overlap. So step four is then estimating this thing called a clever covariate. So the clever covariate I'm not going to go into too much detail about the theory behind it. But essentially semi parametric theory has shown that the use of this covariate in the next steps. And leads us to be able to do this targeting step where we move from our initial estimate closer to the actual true value of the parameter that we're trying to estimate. So the clever covariate is defined using these propensity scores that we just calculated. And essentially for exposed individuals, you'll get one over the probability that they were exposed, given their covariates. And for unexposed individuals, you're going to get negative one over the probability that they were unexposed in their covariates. Something to note here about these clever covariates, you can use them either in this form, this entire form that this formula gives. So you have one clever covariate. Or you can use it in the split up form where you use each part of this equation as a clever covariate. So you end up with a two component clever covariate one component for exposed individuals and one for unexposed individuals. And the two component version, that's recommended usually, because it allows for a bit more fine tuning your, your targeting the exposed and unexposed individuals separately. Whereas the cumulative covariate that can be useful. If you want to use this covariate as a weight rather than a covariate in the coming steps. You can use the covariate version, but I'll mention it again in a bit. Yeah, so from now on we're going to show both of these two component clever covariate and one component clever covariate versions for all the coming steps. So here is just a quick comparison of these versions so up here we see the cumulative clever covariate, which is split up for unexposed individuals. And down here we have this two component clever covariate with the unexposed component and the exposed component so you can see that the ranges of these two vary a little bit. So the next step is estimating epsilon. So epsilon is something that we call a fluctuation parameter. And this fluctuation parameter essentially represents how big of an adjustment we want to make to the initial estimate. So depending on depending on how we've defined our clever covariate so if we've used the one component or two component version then we end up with a scalar or a vector with two components in this epsilon as well. We can estimate it through maximum likelihood estimation. Using a model with an offset that's based on our initial estimate that we got from our G computation step. And the clever covariates as independent variables or as mentioned before you could use them as a weights in this maximum likelihood estimation, but we're going to just show how to use them as covariates. So we have the two component epsilon and clever covariate, then our regression function to estimate epsilon looks like this. We have the clever covariate for exposed individuals and for unexposed individuals in here. And we're using a binomial, so the logistic regression. The coefficients of these clever covariates, those are will be our epsilon so this would be the epsilon for exposed individuals. And down here would be the epsilon for unexposed individuals for a one component epsilon it's similar, but we only have this one clever covariate in our logistic regression function. We have the two separate ones for exposed and unexposed individual but we just use the same one. And then we also only end up with one coefficient so one epsilon. And you can see that these are pretty different actually these clever covariates for the two component and the one component epsilon. But at the end we'll see that in the grand scheme of things in the final estimate. It actually in our data set at least doesn't make that much of a difference. And the way we do that is for all of our observations, the initial initial predictions we made for the exposed individuals we update those in the two component epsilon version at least we update those using the epsilon for exposed, and the clever covariate for the exposed in this function. And then we do the same for the unexposed individuals, but with the unexposed version for epsilon and the clever covariate. Essentially, it's our initial prediction, plus the epsilon times the clever covariate. And that gives us our updated predictions. But again, we're on a transform scale here. So if we only have one epsilon it's very similar. But all, all we do is essentially we use the same epsilon and the same clever covariate for both of these functions so we update with the same epsilon and clever covariate in our, in our exposed group and our unexposed group. So, now that we have that we can. So we have all of our updated predictions from our outcome models. So now we can just calculate the average treatment effect in the same way that we've always done, taking the difference between the predictions made for our exposed group and our unexposed group. So this is going to be the same, regardless of which format of epsilon we have. So for the two component epsilon, we get the estimate of around 0.007. And for the one component epsilon same formula. We get also around 0.007. So on the transform scale you can see that it really doesn't make that much of a difference in our data set, which of these epsilon versions we're using. And then our last step is rescaling back to our original outcome scale. So this is again the same, no matter which epsilon format you have. You just multiply by the, by our original sample range. And we get around 2.73 for the two epsilon version and around 2.8 for our single epsilon version. So this now is in, in, in units of days. So this is essentially the increase in days. That is expected when a patient receives RHC versus when they don't. And then lastly, we can also do confidence interval estimation. And since the machine learning algorithms were only used in the intermediary steps, rather than estimating our parameter of interest directly. These 95% confidence intervals can be calculated and give us some valid inference. So I'm not going to go into this formula and the details of it, but essentially based on semi parametric theory this closed form Darian's formula has already been derived for this procedure. And you don't need a time consuming bootstrap procedure or anything. So in your own time you can go through this code and see exactly what that formula is. And I'll just show you the results. So for our two component epsilon version. This is the confidence interval here and below for one epsilon, and they're fairly similar again for our data set. And that already wraps it up for TMLE. So I guess I'll take any questions if, if there is anything in the chat. If there's nothing I think I'll move on to the next chapter. Yeah, so this chapter, I just wanted to talk about some of the pre package software options and are so these are libraries that you can use to perform some of the things that we've talked about. The first package that I want to talk about is TMLE. And this package handles both binary and continuous outcomes. And it uses the super learner package to construct both of the models just like we did in the steps in the previous chapter. The default super learner libraries for the outcome and the propensity scores models are a little different so I've just listed those here. And of course again it's possible to specify a different set of learners if there's something else that you want to use. And these can be specified with the Q.SL.Library argument for the outcome model and the G.SL.Library argument for the propensity score model. So one important thing to note about this package is that the outcome Y is required to be within the range of zero to one for this method. So the data you pass in already has to be transformed. And then afterwards when you get the estimate from the function, you have to transform it back yourself to the original scale. So it does not do that for us. We don't have a demonstration of using this package. So, first step just as before, transforming our outcome data to fall within the range of zero to one. Same thing as we did in the previous chapter. And then you specify your super learner library. So for sake of comparison, I have specified the same library that we used in chapter six. So if you want to call this TMLE function, make sure to pass in your transform data, and your super learner libraries if you change something there. And then this fit looks a bit like this. And if you take a summary of the fit, then you can see the coefficients of all these candidate algorithms that you have specified when you called TMLE. So this one is for the estimation of our outcome model. These are the coefficients that are best. So that's the combination of these algorithms that it has chosen. And then for the treatment model, we can see the same thing these coefficients are used in the treatment model. And we can also get our ETE estimate directly from this, from this fit. Keep in mind that this is not, this is on the transform scale still. So we have to perform our transformation back to our original outcome scale still. And we end up getting 2.87, which is pretty close to what we got in our step by step version in the last chapter as well. So another nice thing is that you can get the confidence interval directly from our fit. But again, note that you have to transform back to the original scale. Otherwise, your confidence interval isn't going to make much sense. So this is our final estimate that we get from this package. It's confidence interval to it. And then just a couple of notes about this package. Again, doesn't scale the outcome for you. But if you forget to scale your outcome. And the algorithm or the algorithm is trying to deal with outcome types that it's not expecting or variable types it's not expecting it does give really helpful error messages. And yeah, basically all these steps are nicely packed up into this one fact function and it's very easy to use. And I don't want anyone to go through all these steps because it's difficult to understand otherwise what's going on behind the scenes. And I've also linked some of the resources that were helpful for this package. Another side note with tm le with this tm le package is that if you have previously calculated propensity scores for your data. You can actually have tm le rebuild a whole propensity score model using super learner in in this function. You can actually pass in your propensity score predictions directly. So say if you want to use a propensity scores that were predicted in a different way. And then super learner, for example, here we're showing it with the propensity scores that we calculated using the weighted package. You can just pass in those propensity scores with this argument g one w. And the tm le function will use that as its propensity scores. So if we use that and transform back to our original scale. We end up with the result of 3.1 so obviously a bit higher than what we've gotten with the other with the super learner for the propensity scores and such but yeah if you have other methods that you'd like to use for propensity scores that is possible. It reduces computation time if you already have them from somewhere else. So the second package that I wanted to talk about is a super learner package it's not a tm le package that doesn't implement all the tm le steps. It's just for creating super learners and SL three is a newer package and it's kind of designed to be more customizable than the super learner package that we've used in chapter six. And that tm le uses. And it implements discrete and ensemble super learning. So discrete was the one where it just chooses the best predict an algorithm from our specified list. And ensemble super learning was where it returns a linear combination or some kind of combination of the algorithms that we've we've specified. So with SL three the it's a little more complicated than using the super learner package. There's a couple more steps that we need to do, which is what makes it so customizable. So the first thing that we have to do is make an SL three tasks. And this SL three tasks essentially it keeps track of the rules of the variables in our problem. You pass in the data you pass in covariate names and the outcome. And yeah pass that into this make SL three task function and you end up with a task that looks like this. You have your list of covariates and you have your outcome specified. And then the next step is to actually create our super learner. So again, just as for the super learner library. We have to specify a selection of these machine learning algorithms that we want to include as candidates. But this time we also have to specify a metal learner, which is also a machine learning algorithm that super learner will use to combine or choose from the machine learning algorithms that we've provided as our candidate algorithms. So SL three has a really nice function where you can see all of the different machine learning algorithms that are available. Just using this SL three list learners function and you can pass in either continuous or binary depending on your type of outcome. And then you get this nice list of available algorithms for your specific outcome type. So whatever algorithms you choose these then need to be initialized first using this make learner function. And then they need to be collected in something called a stack. And the stack is made with, again with this make learner function but you pass in the stack argument upfront. So then to actually make our super learner. We also have to pass in our list of learners which is our stack, and then we also have to pass in our metal learner, which is initialized similarly to how our other algorithms were initialized. And I've shown two different options here the first one is the ensemble option. There are different metal learners that you can use. And then the second one is an example of a discrete super learner so we have the CV selector so to select one, the one that's the best. So, now that we have our super learner initialized we train it on our task or SL three tasks that we created in the very first step. And then we can make our predictions for outcomes. And just for sake of comparison again we've shown the mean difference between these predictions for exposed and unexposed to get our treatment effect estimate and that is around 5.33 for this package. It is a bit higher than what we would expect based on what we've gotten from other algorithms and other packages, but we did have to tweak a few things in terms of, in terms of parameters for our candidate algorithms and such to be able to run this within a reasonable frame, since the super learner things can be very computationally intensive. So that could be what is contributing to this being quite high. So, a couple of notes about the SL three package. It's pretty easy to implement and understand the structure enough for at least a basic implementation. It has a very large selection of candidate algorithms provided and a really nice way of looking at all those different candidate algorithms. It's a very different structure from the super learner library. So it has a lot more steps, but it is very customizable so I think it's more likely that this is usable for a variety of different scenarios. Oh yeah, I've also listed a couple of the helpful resources for this package if you want to look at that in your spare time. So the next thing is that we have this table representing all of the different estimates that we've made in this in this workshop with all the different methods that we've looked at. So we're starting with the adjusted regression going through propensity score matching g computation with and without machine learning. IPW with and without machine learning and TLE and SL three package. And then at the very bottom we also have the comparison with this keel and small paper. So you'll notice that a lot of these point estimates are quite close. A lot of them are around 2.93 or so. And, especially these TMLE, TLE point estimates are very close to our adjusted regression estimates. So probably could be that our adjusted regression actually gives it quite a good estimate. Another thing to note is that our TMLE confidence intervals are much narrower than that what we got for adjusted regression. So there is still that benefit to using TMLE. Another thing to note is that for these pure machine learning methods. We don't have any variations with these machine learning methods and the SL three package results. We don't have confidence intervals because there's no theoretical basis really for valid confidence intervals for these. And so that's another huge benefit of this TMLE procedure, because we can get valid confidence intervals and use the machine learning. Another benefit is that this keel and small paper, their point estimate is quite a bit lower than most of what we've gotten from our methods. But they also use an ensemble of different learners than we used in our methods in their TMLE. So that could definitely contribute to that difference. And then the last thing that I'll mention is some other packages that might be useful in your research so TMLE is a package that's TMLE for longitudinal outcomes. So that could be useful if you deal with those longitudinal scenarios. TMLE three, this one is still under development, but kind of goes hand in hand with the SL three package in that it's built to be more customizable and uses SL three in its super learner implementations. And then the IPW, this is a package for another doubly robust method, essentially a doubly robust version of inverse probability waiting. So that could be useful as well. And then there's lots of other packages that kind of relate to this topic as well. So if you follow this link or if you look on Cran or on GitHub, you can see lots of those as well. That's it. So if there's any questions on this software chapter. I'll take those. And I just wanted to have a general sense of when you use your, say, for example, super learner package versus when you use the SL three, what is the main difference that jumps out for you. The main difference for me is that you have all these different steps of creating the SL three tasks and initializing these learners and making the stack and then also I guess the meta learner is a big difference. I don't think you have such a thing in the super learner package. So the fact that you have a meta learner and can use different algorithms to optimize between the candidate algorithms that you specified that's a big difference. Yeah, I think those are the main differences just the number of steps with the task and initializing and such, and then also the meta learner. So it's kind of more complicated but at the expense of being more customizable. Exactly. Okay, so if there is any question, please let us know in the chat box. Otherwise, I will probably move on to the last chapter of this tutorial. Can anyone confirm that you can see the chapter eight. Okay. So, so far we have talked about finding the best model to predict your outcome or predict your exposure. So, for example, in Emily, we try to match up both of those ideas into one particular framework. But there is a difference between what we mean by model specification versus unmeasured conforming say for example, if you have a number of covariance. You do not necessarily know what are the best possible transformations or combinations of interaction terms or polynomial terms. Then this type of tree based methods or this machine learning methods are usually very helpful. But in general if you have some unmeasured conforming that was not measured at the design stage. Then no matter what type of machine learning method you are planning to use that's not really going to help you a lot in terms of that unmeasured conforming. If you do not really adjust for that you do not really have the way to reduce the bias due to that particular component. In this particular plot you can see there are many different ways a third variable could affect the relationship between the exposure and the outcome. And in this particular tutorial we have just talked about confounders but there could be the other roles of the covariance that we collect during the process of the data collection say for example. Because L is a confounder because it affects a as well as it affects why but C is something that is impact of a and impact of why and certainly we do not want to adjust for something like C here. This is a variable that is a risk factor for the outcome and usually that does not impact too much on the bias but it reduces the variability so usually if you have a risk factor for the outcome it is usually helpful to adjust in the model. If you have something like E which is the effect of outcome, you should not adjust for it if you have a mediator variable that should not be treated as a confounder. The mediation analysis has a separate framework of its own. If you have a separate instrumental variable that impacts why through your a then this variable if you adjust for it that will amplify your bias. And other than that if you have a noise variable that has neither nothing to do with the A variable or the outcome variable, then this will only increase the variance if you are just adjusting for it. One other possibility is that you have an unmeasured confounding that you have not measured, but there is a proxy variable that you can find in your data set or link link from another dataset. Then it is certainly possible to adjust for this P variable instead of this U variable but obviously that will be still subject to some measurement error bias. Right, so the main point that I want to convey is that so far we have just talked about this L variable here that we have adjusted and tried to deal with in our various type of fancy analysis but if you have other type of variables as a test, it is probably the best to talk with the subject to the experts to determine whether that variable is a confounder or a risk factor and then you can adjust for it. Otherwise, if that variable has a different role, probably you should think more judiciously about whether to adjust for that in your analysis. So the general difference between the TMLI framework and the other framework is that say for example when you are running a logistic regression or a linear regression or any other machine learning methods such as this RHC boost or lasso or whatever you are running, what it does is basically you have an outcome variable and everything else is basically considered as input variables. So there is no distinction between this age variable with this RHC variable, but in causal inference what we are primarily interested about is the relationship between RHC to Y whereas in the general parametric regression as well as in machine learning methods, this RHC is no different than any other covariance that you are entering in the regression or the machine learning methods. Whereas when you are dealing with TMLI type of method, what it does, it uses the outcome modeling, the Q predictions and it also deals with the exposure modeling using the G predictions and there is the extra targeting steps from which you can get this RHC slightly in a different scale or in a different form compared to all of the other covariance that you have. So that is why it is very useful for someone who is interested in causal inference and primarily the relationship between an exposure and an outcome variable to use this type of TMLI type of framework to get a better understanding of what is the relationship and how all the other variables are just background variables. We are just adjusting for them, but primarily we are interested in the relationship between this exposure variable and this outcome variable. And also, the reason why someone would consider SL or the super learning method is that in general as you have seen in our examples as well that when you use different prediction methods, different prediction methods tend to give you different predictions. And generally that impacts your treatment effect estimate. Instead of just relying on one method what super learning is basically doing is relying on many different method and that is why it is highly encouraged to choose as different learners as possible in your candidate list so that you can get a diverse list of predictions from which you can get a linear combination of the ensemble super learner prediction so that you can get a more stable version of the treatment effect estimate. Right and the last point I just want to make quickly is that just because the origin of the TMLI method was in the causal inference literature does not necessarily mean that just because you have used TMLI you get causal inference of the treatment effect estimate the to to get treatment effect estimate a causal interpretation there are a couple of assumptions that you need to satisfy. And when I talk about assumptions it does not mean that I just wish those assumptions were true. I talked to subject area experts to check or figure out whether these assumptions are plausible in the type of data that I'm dealing with. Say for example something like conditional exchangeability positive positivity and consistency. These are some of these assumptions that we can talk with the subject area expert to get a sense of whether these assumptions are being satisfied something like conditional exchangeability and consistency or something that you cannot necessarily test from your data positivity to some extent you can. But say for example something like consistency whether your treatment is well defined or not you have to talk with the subject area experts to get a good sense of whether there is a consensus consensus or agreement within the experts about that. In terms of this particular workshop, as you have seen the focus was on purely implementation and we have just showed a data analysis through different steps of TMLI as well as some of the related methods such as IPW as well as G competition methods. But there are a lot of theory behind it and we did not really cover those topics within this our workshop. And if you want to get a better understanding of these methods after this workshop there are some key articles I recommend and some of these articles we have used to build these tutorials that we have built. So I would highly recommend to read at least one of these two and one of these two to understand better about super learning and team. If you want to learn more there are some additional references that I have listed that will be very useful for you to understand why this type of TMLI methods are useful and how they are derived and all of the relevant information. If you are more interested about more workshops like this about TMLI there are a couple of workshops in the society of epidemiologic research and the University of Washington, you could try to join them. If you want to take a look at some of the free resources that you have on YouTube, there are actually a number of very good resources within YouTube that are all coming from the group of vendor land and some of the colleagues. And you can see some of these introductory materials, some of the more theory type of materials, some applied talks and there are also some of the blogs that are very helpful in explaining some of the ideas of TMLI and super learning. And at the end of this, you also have a list of references that we have given. You can also look into them, but generally speaking, these references are cited within the tutorial as well. So with that in mind I think this is the end of our workshop, but I'm happy to take any questions from the audience. You can type it in the chat box, if there is any question. Welcome. Thank you. Thank you for bearing with us for the last three hours. All right, so if we do not have any other questions, then this is the end of the discussion, but you have the materials those materials will be live. And if you have any questions after reading the materials are revising the materials, you can always reach us all of the contact details should be in the front page of that tutorial that we have just shared. Thank you very much and I will just stop the recording and stop the session now. Thank you.