 So thank you for coming to this demo of the AORSA package. I'll get us started. First, just get started by saying hello. My name is Byron. I'm an assistant professor of biostatistics and data science at Wake Forest University School of Medicine. And I have the slides available online for you today. I highly recommend getting the slides open on your own browser. Because later when we go into the live coding section you'll be able to click on separate code blocks. I'll show you where to click. But when you click it will automatically put that code onto your clipboard so that you can just transfer it right from the slide over to a local RStudio session. So for today, before we get to the cool part I'd really like to go through a little background. There's a lot of jargon with random forests and it wouldn't be right to just jump right into the software and start using the jargon without at least trying to give a little bit of support and context for these terms. That being said, if this is your first time learning about random forests, I would like to apologize because this will go too fast and they see my too much information at once. But the background section will still be better than no background at all. And so once we've covered a little bit of background I'll transition over to the demo of the AORSA package which focuses on oblique random survival forests and risk prediction. So we'll be covering how to fit models with that package, how to interpret those models, how to run a benchmark if you'd like to and how to extend those models for your own tailored purposes and ways that you'd like to grow in oblique random survival forest. So starting with the backgrounds we're just gonna touch on machine learning, very broad topic kind of comes in two flavors. There's supervised learning and there's unsupervised learning and we'll focus on supervised learning today. In particular, try and touch on what labeled data is, the prediction problem that we wanna focus on, how we engage with that prediction problem which is learners and then the specific type of prediction that oblique random survival forest do which is risk prediction which is basically saying we can make predictions for sensor outcomes. So we'll try and give the background on all that jargon. All right, so starting out with supervised learning we have labeled data. This is a little diagram of the data set and I just want you to kind of imagine each row of this data set it's giving you information from a particular person and you can also imagine that each column of the data set is a different variable, right? So what makes a data set labeled is that some of the variables are designated as predictors. That's what this X variable is here. And at least one variable is designated as an outcome and that's what this Y variable is over here. Usually predictors are easy to see, easy to measure. They don't require too much money to retain and the outcome might require time or might require more money to measure. And so that's kind of why we wanna predict the outcome. And so the general problem framework is you're given the X information and you wanna predict the Y information and so the general way to approach this problem is you use a learner. Learner is a jargon term in machine learning and as far as I know it's just synonymous with prediction model. So if you hear somebody say learner what they mean is prediction model. And if I'm wrong about that, please correct me in chat but I think over the years that's kind of how I've come to understand it. So what a learner essentially tries to do is estimate this function F that can go from the information in the X variables and map that over to the Y variable. So F is this function that literally maps from X to Y and we try and estimate that with a learner. And of course the learner also assumed that this function exists but that's just a technical background. So now we can talk about two specific types of learners relevant to the demo. The first is the decision tree and then the second will be the random forest which obviously is an ensemble of trees or a big set of trees. So first let's just discuss what a decision tree is. Decision trees are a type of learner that works through this mechanism called recursive partitioning. And we're gonna engage with that through some pictures instead of trying to give two technical explanation. So I'm gonna go through a couple of pictures where we're gonna use a decision tree to do classification of different penguin species. And we're gonna be using this nice dataset. It's publicly available. It's very clean and fun to work with that has chin strap penguins, chin tube penguins and the Adelaide penguins. And so we're gonna try and predict species using a decision tree. So this is what the penguin dataset looks like if you visualize it as a function using flipper length on the X axis and bill length on the Y axis. You can see that in this space there are sort of three different groups and those groups are kind of the species of the penguins. You've got chin strap, chin tube and Adelaide here. And if I were to fit a decision tree into these data it would start by partitioning the data. So what I mean by partition is that it creates two subsets. There's two subsets at a time. And whenever it does this it creates two non-overlapping subsets that if you put them together you get back the original space. So here's its first partitioning step. It's created one set on the right and one set on the left. And you can see that it's done this by splitting the data right at a value of flipper length of 207. And so over here you've got basically all gen tube penguins. And then over here you've got all the Adelaide penguins and the chin strap penguins. And so the reason that subset works like this is that the decision tree tries to maximize the difference between the two subsets it's creating in terms of the expected outcome. So the expected species on the right-hand side is clearly gen tube penguins. And on the left-hand side it's clearly not gen tube penguins. And so if I were to allow the decision tree to grow another partition this is what it would come up with. Now it's partitioning based off bill length but only on this left-hand side figure. And the bill length cut point is 43. And so now in this subset up here on the top left you can see it's purely almost nothing but chin strap penguins. And then down here it's almost nothing but Adelaide penguins. So these three regions are called leaves and the decision tree, which isn't immediately clear why it's called a leaf but when you draw this the same graph now can be visualized as a binary tree. And you can see that these regions at the bottom are essentially just the three regions that are shown here. Now you can kind of see why they're called leaves. Although this is a tree that's been flipped upside down. This is the formulation of a tree that is the reason why people call it a tree. So if you wanna compute predictions with a decision tree the idea is you start at the top of the tree with whatever data you're trying to predict and then you follow the instructions until you get to an ending point, a leaf in the tree. So let's just say that we started out up here and our flipper length is less than two out of seven so we come out here and our bill length is, I don't know, 50. So we end up in this leaf. So now our predicted value is gonna be given to us by the leaf in the tree. So the predicted species is chinstrap but the actual predicted probabilities are listed out here. So we have a 6% predicted probability of an anomaly, 92% of chinstrap and 2% of Jintu. And in case you're wondering why that is this is the leaf that we ended up in. And you can see in this leaf 92% of the training data that we're in this leaf in the tree is flipped over chinstrap pigments, 6%, these three dots here were Adelaide pigments and 2% this one dot here was Jintu. So the predicted value of the leaf is basically just a summary of the training data that were in that leaf where the tree was grown. So this is the basic mechanics of a decision tree. And now we're ready to talk about the random forest. So the random forest is a big set of decision trees. It works like a committee. When the random forest wants to make a prediction it gets a prediction from all of its decision trees and then it does something to kind of make a majority vote or average them together or take some summary of the predictions from the decision trees. So what makes random forest a little interesting is that it grows the trees with a little bit of randomness. And it's a little counterintuitive because when you grow trees with randomness you actually make the individual trees less good at prediction, but you make each tree more independent from the other trees. So to get into a little bit of detail but what actually is randomized when you grow the decision trees in the forest is you get tree specific bootstrap replicates of the training data, right? So you start with your training data and you just take a bootstrap subset of it which gives you about 63.2% of the original data in your bootstrap subset. And then you have a little bit of data that's not in the bootstrap subset which is called out of bag data. And this concept will come back up when we talk about the stuff in the demo. So I'm just putting the seed in there. And then you also have random subsets of predictors being used every time decision trees in the forest grow to new branches, right? So randomness will make the trees more independent but it also makes the individual trees weaker. And so it's surprising to see that the random forest actually does give more accurate predictions than a single tree in most cases. And I have a little example here that can help you convince yourself that this is true. We'll just cross over to the interest of time but I also can show you with some visualizations how random forest does a really nice job of computing predictions. So here you can see some decision boundaries from a standard non-random individual tree and it does a nice job of predicting our pigment data. Now I'll show you what a decision boundary looks like from a tree that has been grown with this randomization and it actually does a little bit worse job of creating a decision boundary here because you can see that it's not quite correctly classifying these purple points whereas the original decision tree did. And the reason this randomized tree isn't correctly classifying those points it's probably because those points weren't included in its bootstrap replicate of the training data. So it didn't even see those points when it was being grown. Now I'll show you the decision grid from the random forest. So this is an ensemble decision grid. It's just taking the predictions from all of the trees that all those randomized trees that it's grown averaging them together. And what's surprising is that it does such a good job. The decision boundary looks very in tune with the data it doesn't seem to be overfitting, we see here. I can see somebody has joined in this asking about link to the content. So I'm gonna, thank you for joining by the way. So now let's just touch on what no blink tree is. We've kind of covered the basics of decision trees and decision trees can be either access based or a week. An access based tree is gonna be drawing these lines in our predictor space that are perpendicular to the axis that the predictor that we're splitting on is. And so you can see an access based tree will kind of make splits like this. We have these lines that are drawn perpendicular to the axis of the predictor. And an oblique tree is different from an access based tree because an oblique tree uses a weighted combination of predictor variables instead of using a single predictor variable when you partition data. So instead of using X1 being less than some cut point as a splitting mechanic, we actually use something like a constant times X1 plus a constant times X2 is less than a cut point. And this leads us to get decision boundaries that look like this. We have oblique decision boundaries because these boundaries that you're seeing are neither perpendicular nor parallel to the axis. All things considered, I'm not the biggest fan of the term oblique. I don't know what the right name would be though. So it's just what we call it. So when you hear the word oblique, you can just think like, oh, these are more flexible decision boundaries because we're using linear combinations of variables to make splits instead of just a single period. And so when you look at the decision boundary from an oblique tree, it's, you can see how the boundaries are no longer perpendicular to the axis that we're splitting on. And you may also kind of think to yourself like, this is kind of maybe like what a decision tree would look like if Pablo Picasso had designed it. And I was curious about this. I wanted to go over to the AI image generator and see if it would give me back, basically an oblique decision tree decision boundary if I typed it in like this, like a scatter plot in the style of Pablo Picasso. And in my opinion, it's pretty similar, pretty similar. So, but what's really neat is that when you ensemble a whole bunch of predictions from oblique trees, you get back a decision surface that looks very reasonable, very smooth. In fact, this is the big thing that distinguishes axis-based trees from oblique trees. Oh, I see a question in the chat about linear subspace trees. I'm not the biggest expert on those. I'm not gonna dive too far into it because I'm not an expert and also because I don't wanna eat up my whole hour with like some other topics. But I'm gonna pivot back over to the slides. So the decision boundary that you're seeing here is a lot smoother than the axis-based one. And that has a lot to do with the fact that the decision boundaries from the individual trees are not perpendicular to the axis. So another thing to consider is sensory. This is why we have to talk about risk prediction instead of just straight probability prediction. So sensory means that we have incomplete data about the outcome. And this happens when you may start a study and you may have an outcome like the development of let's say hypertension. And you just wanna follow people and see when they develop hypertension. So it's like a time-to-event framework. So over the course of the study, which let's just say the study lasts 10 years, you're gonna have some people who have the event and they might have the event before 10 years lapses. So this is, here's somebody who has the event at two years. But then you have other people who decide that they don't wanna be a part of the study anymore. So they drop out or you just lose contact with them. And you don't know, they don't have the event up until the point where you lose contact. And so you really don't know if they have the event during the 10 years or not. And so this is what we have to deal with when it comes to computing predictions. Oh, I see that the speaker sound is echoing a lot. I'm not quite sure why that is. Is anybody else getting that audio feedback? So echo in the room? Well, just, okay. Okay, I guess I'll continue. It seems like it's good enough. So anyway, so if we were to be asked the question can we predict risk in the next two years for hypertension? We could theoretically use this person's data because we know that they don't have hypertension in the first two years. But if we're asked to predict risk in the next five years, we have to be very careful about how we use data from somebody who was censored at year two because we wouldn't wanna assume that they didn't have the event for the three years that we weren't in contact with them. So this is where risk prediction comes into play and not to go too far into the details but we have to be very intentional about how we use sensor information so that we don't bias our predicted probabilities of the event happening. So with risk prediction, we just have to be aware of the fact that we wanna make a prediction about the event occurring in specific time span and we have to sort of specify what that time span is. All right, so this is where random survival forest come into the mix. Random survival forests are designed to deal with censored outcomes and they can give you risk prediction. So the way that these work is that they're basically just the same as any other, their trees are the same as any other decision tree but in the leaves of the tree, we fit Kaplan-Meier curves, basically predicted survival curves to the training data that are in that leaf. And this is how we can get predicted risk. Somebody will give us a time, right? So if we get a time of 200, we go into this node and we see a time of 200 about all of the observations in this node haven't had the event yet. And so that's how we can get these predicted probabilities at specific times. And then we aggregate them in the normal way and we get a ensemble prediction from the forest. All right, so now we can put these concepts together and talk about the model that a versus fits which is the oblique random survival forest. The background of this is that I wanted to fit an oblique random survival forest back in 2018. I couldn't find code to do it. So I wrote something a little rough and I thought it worked well enough. So I wrote a paper and I wanted to share the idea and also share the code. And the basic idea was while we're growing the tree, we'll just kind of fit a Cox model to the data in each decision node and then use the predicted values from that Cox model as the linear combination of predictors. And that works okay. It ended up being pretty good at prediction and a general benchmarking test. So around 2020, the code that I wrote was picked up and it was used in a separate study to develop a hard failure risk prediction model and the oblique random survival forest did well in that study, which was very exciting. But it also helped me realize as I was watching the person that was collaborating with use my code and they had trouble. And of course they didn't complain about it or anything like that. But I was able to see that my code was becoming a bottleneck in their analysis because it ran slow and it was not as accessible as I would want it to be. So I decided after that that there was this sort of rule in the R universe that I wasn't quite aware of. There are a lot of really good R packages out there. But if you kind of look at the R packages that are used most frequently, it's the ones that are fast. The fast R packages really have a lot of traction and I don't think that's a problem. I think having your work run fast and allowing you to make adjustments, updates and tune things is good. So I realized I needed to rewrite my code and come up with ways to make it efficient and a lot more accessible for users. So I went through this process, it's been a while but I rewrote my code and I made it into a new R package called the Orsif. It went through a review process by R OpenSci which I found extremely helpful and it was published in the Journal of Open Source Software. That's what JOS stands for right here. And I ran a much larger benchmark on it to kind of understand how good it was at making predictions. The benchmark is currently just sitting on archive in a preprint. I invite you to check it out. It's lots of details and a lot more details that we'll get into here but it's hopefully pretty comprehensive. And as above this, I renamed the R package Orsif because a long time ago when my dad was developing his own software, he called it Aorsa. So I thought it'd be kind of funny if we just basically shared the same acronym. So this gets us to the point now where we can start doing the black demo. I hope the background was helpful. It definitely was a fast race through all the topics but now we can at least have a little bit of shared vocabulary. And so here we can start. The first thing you might wanna do is install Aorsa. And to do this, you can run install.packages because Aorsa is on cram. And now I'm just gonna point out you can run these, you can open these slides up on your home computer. The link to the slides is in the chat. And then when you have it open, you can click on this little button on the right-hand side of any of these code blocks and it'll automatically copy the code. And so then you can come over to an R Studio session and just kind of paste the code in. And I hope that'll be a nice convenient feature for folks to use. This is a really nice feature brought to you by, sure we get an extra package by the way. So you're gonna wanna have Aorsa downloaded and you're probably also gonna wanna have the tidyverse. We'll use the tidyverse kind of throughout to just do things. And a next step in the demo will be to briefly look at the data that we're going to be involved. So there's a data set in the Aorsa package called PBC Aorsa. It's a very slight modification of the PBC data from the survival package. And you can see when we look at it, it's got 276 observations, 20 columns. One column is an IP column. That's not gonna be used for prediction. And we have a time column and status column. These are the time until the event and then whether the event happened or whether the person was censored. And then we have all these other columns that are potentially used as predictors. So now we're ready to fit an oblique random survival force. And to do this, we use this function called Aversive. And it's only gonna require two things. It requires a data set and it requires a formula. And you'll see on the next slide, there's actually a number of different ways to specify the formula because I wanna let people use the syntax that they like. So we'll come over to this next slide. If you are a long-term R user, you've probably used CoxPH at least once. And you'll know when you use CoxPH, you specify the outcome with this serve object. So it's capital S serve. And you put the time variable in and then the status variable. And you can do that if you're using Aversive. There's no need to use this like time plus status syntax unless you like it. You can also specify your formula using the standard serve object. And then you can use most of our formula shortcuts on the right-hand side. And what I'm showing you here is you can use this dot shortcut. That means give me all the predictors or give me all the variables in my data set that aren't on the left-hand side of my formula. And then you can also use this minus sign to say except for this predictor. So our formula here means give us all the predictors except for stuff on the left-hand side except for this predictor called ID because we don't wanna use the ID as a predictor. And then of course we just specify the data after that. So a couple of different ways you can specify models with the Orsif function. And now I'm just gonna flip over to our studio for a second to kind of walk through how this code will run. I'm setting my RAMC to be 329. Basically wanna get the same fit as me. And I'm using the Orsif fit right here. And so I can just print it out. You can see the printout comes in similar syntax to the Random Forest SRC package back over to the slides. So also if you like using tidy models I'm right there with you. I think tidy models is a very nice set of our packages. And so as of version 0.2.0 in the Sensor package you can now select a Orsif as an engine. So if you wanted to set up a modeling pipeline with tidy models and you wanted to use an Orsif this code is kind of how you could do that. You specify a random forest and then you set the engine as the Orsif and then you're gonna have to set mode to be sensor regression because that's all that Orsif currently does. I do think it'd be great to make it work with classification and standard regression as well. I just need to set aside six or seven months to sort of do that. But once you have that random forest specification set you can then go ahead and use the parsnip fit function to fit Orsif model. And you'll see it's just, okay now you get the same printout but you get the same Orsif model but now it's wrapped up in the parsnip modeling attributes. So it's a new addition to the tidy models universe. And I'm excited to see how the Sensor package evolves because I think that they're doing work on that now integrate that more into like the main part of the tidy model is the universe. All right, so now we can fit an overly grand survival forest. And we'd like to kind of interpret it and get a sense of, you know what are the variables that are important here and how do those variables relate to the predicted risk from the Orsif model? And to kind of give a little bit more background I just want to touch on this term right here expected risk. So what I mean by expected risk is actually partial dependence which is a little bit of a jargon term but it's a standard thing that people use to look at the behavior of a prediction model especially for the prediction model it's considered black box like a random forest. So what it means is we want to understand how a particular predicted variable relates to the predicted risk from a model. So we have a model that uses this predictor and we want to see how changing that predictor will change the model's expected prediction. So we kind of set up this procedure where we get our training data and we set every value of our predictor to be a specific value like we set the value of Billy Rubin to be 0.8 and then we compute our predictions for all of the observations in our training data with Billy set at 0.8 and we take the average of those predictions and we can take the average by taking the mean or the median or the specific percentile and then we present that sum and we do this for several different values of Billy Rubin to kind of understand what is the expected predicted risk as Billy Rubin changes and everything else stays the same. So that's actually what worse if summarized unity is going to do for us. We supply an orcif model. That's the first thing we need to give it and then we tell that how many variables we want to summarize and then it's going to spit back out a little summary table with a little section for each variable and I'll show you how this looks in our studio session. So here I'm going to run orcif summarize unity on our fit but in this notice, I'm saying in variables equals 10 here because I want to actually show you how that looks. So when I say in variables equals 10 I get back this output where it's kind of like a table with different sections. Each predictive variable has its own section and the sections are ordered from the most important variable to least important variable and within each section you can see for the variable taking a specific value like 0.8 what's the expected risk? Sorry, the expected predicted risk and what you can do is just look at this little table and get a sense of how the predicted risk changes with respect to that predictor. So when Billy is 0.8 the mean predicted risk is 0.23 and when Billy is 1.4 the mean predicted risk is 0.25 a little bit of an increase something to notice but then when Billy is 3.5 the expected predicted risk is 0.36 that's a big jump, right? So you can understand from that like the expected predicted risk is really increasing a lot when Billy Rubin has gone from 1.4 to 3.5 and so you can scroll down through here and see the other variables that are ranked as important according to this model and get a sense of how the predicted risk changes with respect to those periods. Okay, back over to the slides. I'll also point out Worst of summarized uni currently has a bug on some operating systems. So if you do get an error when you run this it's most likely because of that bug. And long story short there's a null value in my code where there should be a false value. And I fixed that in the development version but I didn't want to push that to cram just before this demo because I was worried that I would inadvertently like break the whole package. I know that kind of sounds paranoid but I thought it'd be safer to just let the bug exist. It's only on some operating systems and it'll be fixed soon enough. So if you get an error with summarized uni you don't need to worry. Summarized uni is actually based on these functions which do not have bugs in them as far as I know. These are the partial dependence functions of course. So Worst of has these functions to give you a lot more control over how you compute the expected predicted risk or partial dependence. There's three different functions that you can access. One that uses in bag data from the decision trees and another that just uses the out of bag data from the decision trees and another that uses data that you would think of as being new to the decision trees like testing data or your external held out validation data. All right, so these functions work with two inputs. The first input is going to be an Orsif model. All right, so I'm just going to supply fit Orsif to this one. Then the second input is a specification of your predictors which I abbreviate as pred spec. So what you want to do here is supply a named list and the named list will have variables as the names and then the values will be whatever values you want to compute an expected risk at. So when I say Billy equals one through five that means that I want to compute expected risk out of a value of Billy equal to one, two, three, four and five. And so now I get back a very similar table but I have a lot more control over this table. And so now I'm going to talk about this blaring thing that I haven't mentioned yet which is predarisen. You can see that predarisen is sitting in this output data set and each value of predarisen is 1,788. So remember when I mentioned censoring and how we had to directly specify what time we were going to predict the interval to occur over. So we haven't been doing that so far and Orsif has been noticing that we haven't been specifying any time interval to predict the interval to predict the events risk over. And so it's kind of just picked one for us. And the way that this works is if we don't set a prediction horizon the Orsif functions will set it for us and they'll just use the median follow-up time in our training data. And I'll just let you in on a secret. Oh, I can recommend the paper for discussing. Oh, sorry. I got distracted by a question in the chat. I could definitely recommend a good paper for discussing this partial dependence. It's, I think the idea was originally brought up in the greedy boosting what it was called, a greedy partitioning on-room. It was like the original paper for boosting. And I'll look for a link for it and try and set it out soon. But switching back over to this predarizing thing. So yeah, so anyway, Orsif will pick predarizing for us. And we can sort of, we should actually specify predarizing ourselves though because that'll make our output a lot more interpretable and kind of allow us to put better context around our predictions. So we're going to use predarizing here to make expected risk more interpretable by saying that we want to predict risk at one year after baseline, two years after baseline, three years, four years, and five years after baseline. And we're also going to expand a little bit on how we're using this partial dependence computer by saying that we want to predict risk for men and men and separately. So you can see I've just modified predspec to take sex as the variable and then computed values of M for male and F for female. And then I'm also going to specify predarize. So it's the time scale is in days here. So I'm taking 365 days times the value of one, two, three, four, and five. And then when I get this result back, I'm going to use the tidyverse to put the data into a nice format where it's easy to print out and see all the information. And so when you run this, you should get values similar to mine or identical if you use the same random seed as me. And you're going to see that the output will show you the prediction horizon, 365, 730, et cetera. Then it will show you a column for men and a column for women. And in these columns, you'll be seeing the predicted risk at the corresponding prediction horizon. And I also just added this column here, right? You can see I just created a column here called ratio, which is the predicted risk for men divided by the predicted risk for women. And the reason I added this was because you can see over time that the two groups sort of have a time varying risk or rather the ratio has a time varying value. The two groups start out at very similar risk, but over time men get a slightly higher risk in the women. And you can see this also. So you can also kind of bring this out by specifying a finer grid of prediction horizon values. So like instead of just doing values of one, two, three, four, and five years, you can set prediction horizon to go from one year up to five years and just take steps of 25 days. So then you get back a data set that you can plot and I'm using Gigi plot here, just kind of using the default themes and stuff like that and creating a line plot. So this line plot is showing us the expected predicted risk for men and women. Men are this line on top, women are on the bottom. And it's time since baseline on the X axis and expected risk on the Y axis. And you can see visually the curves are separating. I tend to think this is a nice thing with oblique range survival force because if you were to fit a proportional hazards model, you'll know that proportional hazards models assume that these effects are not time varying by default. You can set them up so that they do have time varying effects, but then you'd have to know which effects are time varying and you may not know that. Whereas with random survival forests, this is done for you and you don't have to know ahead of time which effects are gonna be time varying and which ones are not. So this is kind of a neat thing with the random survival forest. So now we can also look at this partial dependence output with a fixed prediction horizon. And this allows us to kind of look at the expected risk profile over multiple variants. So here we're just gonna fix the prediction horizon at the medium. And we're gonna investigate how these three variables may or may not interact with each other. So we've got Billy Rubin and we've got Edema. This is a categorical variable with three levels and we have treatment and this is a categorical variable with two levels. And you can see here we're kind of just using a shortcut by saying I wanna look at Edema of all of its categories. I'm doing the same thing with treatment. And then I'm just setting Billy to go from one to five but I wanna have 20 different Billy values because I wanna look at this in a figure. And so now I'm gonna run this using this prediction specification using the same function as before or as if PB out of bag. And now we're gonna get a figure and I'll probably switch over to our studio in a second because I think it's kind of an interesting figure to look at and we can just make sure that the code runs. But when you create this figure, you can see that we have about three different groups here in terms of predictive risk. We've got this group right here, this group in the middle and this group on the bottom. And these groups are really determined by Edema status. So Edema has a pretty big impact on predictive risk. If Edema is zero, you're in this group down here. If it's 0.5 or the middle and if it's one you're up here. But as Billy Rubin increases, you'll notice that this middle group actually starts out closer to the low risk group. And then as Billy increases by the time Billy is up to five, you actually see that this middle group is a lot closer to the high risk group. So long story short, Billy Rubin is modifying the relationship between the Edema value of 0.5 and predicted risk. And so you could think of this as a two-way variable interaction and the oblique random survival forest has automatically picked up on this and is showing that to us in this output from the partial dependence function. And so again, this is nice because if you're fitting traditional models, you don't necessarily know where the two-way interactions exist, but you can fit a random survival forest, it'll find them for you. And then you can in turn go look for them in the interpretation of that random survival forest. So now we're gonna pivot and talk a little bit about another way to interpret the random survival forest, which is variable importance. There are three ways to compute variable importance with the random survival forest in a Orsif. I'm gonna try to cover their bones in the details. I just wanna be aware of the time and not spend too much time on the background. So the first way is to compute a know-by importance. And this comes out of a great paper called, called An oblique random forest by Minzi. It was published in 2013 or so. And there's references for it in the documentation for a Orsif. The basic idea is when you compute linear combinations of predictors, you're gonna find a p-value for each predictor and you'll keep track of the p-values that are corresponding to all the predictors, right? So every predictor has a whole bunch of p-values and you can simply compute the proportion of times that the p-value for a predictor was very low and that proportion is gonna be the importance of the predictor. So the idea is if the predictor always has a very low p-value, it's probably important. And if it's always got a not so low p-value, then probably isn't that important. And so obviously this is like gonna have some limitations but one thing that's great about it is it's very fast. Another way to compute variable importance is more traditional method of permutation. This is where you've got a forest that you've already fit in and you know it's prediction error. So now you're gonna permute a predictor and then you're gonna reassess what the prediction area is after the predictor has been permuted. So what that looks like is we have a fitted forest here and I'm gonna permute the value of flipper length. So here we go, it's been permuted. And now we can see from the picture that we actually have a lot of misclassified points. All these Gen2 penguins that were originally classified as Gen2s are now being classified as chin straps and that's incorrect. So we could see that permuting this variable has really messed up our random forest prediction accuracy and we can quantify how important that variable is by how much worse the prediction accuracy got. So, and this is kind of, I'm a little hand waving here. This isn't exactly how it works with the picture but it's the concept. So another way to compute variable importance with the orcif is to use something called negation importance. And this is similar to permutation but it has a little bit of a different philosophy. So actually for each predictor, instead of permuting the values, we're gonna go into the forest itself and multiply that predictor's coefficients by negative one. So if it's in a combination of predictors that it's coefficient is like 1.6 or something, then that's gonna be converted over to negative 1.6 and what this effectively does is it reverses the slope of the decision boundaries in the trees. So again, a little hand waving here but what this conceptually means is that we have a fitted forest here and instead of permuting the values, we're actually gonna flip the decision boundary. And when we do this, we can see that we again have a lot of misclassified points. And because we flipped all the decision boundaries just based off of flipper length, we can conclude that flipper length is an important variable. So all three of these methods are accessible in a orcif. There's sort of a family of functions called orcif underscore vi for variable importance. And you run that function on a fitted orcif model like I'm doing here. And you can also just specify when you fit the orcif model what type of importance you want to compute. All right, so there's two ways of getting about it. Okay, so we're now at the point where we can ask a question. We see that we can fit these orcif models and we can interpret them but it's kind of important that they give us predicted risk that is somewhat accurate because there's not much point to interpreting predicted risk as the predicted risk isn't a good predictor of what's actually going to happen. So I've run a larger benchmark in my preprint on archive over 35 different prediction tasks. But in the demo here, we're gonna look at a smaller benchmark over 11 different prediction tasks. And you will be able to run this on your own systems if you want. But just a fair word of warning, you'll have to install MLR3 and we associated our packages for survival with MLR3. And some of these are not on Crayon. So it'll just take a little bit of extra late work to get them up and running on your system. Luckily, you can also just follow along with the slides and take my word for it that I'm not making up the results, which I'm not. So what you'll see here is all the packages that I need to load from this benchmark. And what you'll see up here is kind of what the benchmark is going to do. We're gonna use MLR3 and we're gonna be comparing the prediction accuracy of models with the Orsif to the prediction accuracy of some random survival forest that use access-based trees. In particular, I were gonna be using the random forest SRC package, which does survival regression classification forests. And then we're also gonna be using access-based random survival forests from Ranger. So you may have used these packages before and you may not have, but the general idea is that these are very widely used are packages for random forests and they're quite good, very good random forest packages. So we're gonna run an experiment where you basically fit a model with each of these three different learners and then we validate that model and some held out testing data. And now I'm just gonna flip through some slides to show you where I'm getting that testing data or that data from, right? So first we'll use our PDC Orsif data and you can see that I'm making this thing called the task. That's because the general syntax of MLR3 is one where you create tasks for prediction. So that's why I'm using this task name. I've got another task with data from a veteran's administration one case or trial and this data actually comes directly from Ranger forest SRC so it's pretty easy to make a task with it. Some other data that I'm pulling out of the OpenMLR library. OpenML is a very handy R package. You can just grab data sets that are publicly available and download them from the OpenML website. Another data set from OpenML, lung cancer. Another data set, this one's coming out of, I'm not quite sure, out of the survival package and this is from a cancer trial. This cancer trial actually had two survival outcomes and so we're making a separate prediction task for each of those two outcomes. And now we're gonna put all the tasks together in a list and you can see at the bottom of the list there's a couple more pre-made tasks and these are just tasks that are already available with MLR3. So we've got 11 different prediction tasks in total and we're gonna set up a little benchmark where we go through each task separately and with each task we're gonna be running five-fold cross-validation. So we start with the full data, split the data up into a training set, the testing set. On the training set, we'll fit three models, one with Orsif, one with RANDFOREST SRC, one with Ranger. And then each of these models will make predictions for the testing set and we'll evaluate how accurate those predictions are. And so here we're saying these are the learners that we wanna use. And then we're gonna say, here's how we wanna evaluate the accuracy of those learners. We'll give a graph score, which is also a briar score. I hope you've heard of at least one of those. I don't have enough time to really go into what they mean, but I'll just say lower values are better for the graph score. We're competing a C index, which is gonna tell us how well a model discriminates between cases and not cases. We're gonna compute a calibration score. And so this is a slope that we want to be equal to exactly one. The closer the one, it ends the better. And then we're also gonna keep track of how long it takes to train our models because as I mentioned before, people prefer not to use slow software. So faster software is nice, assuming it works. And then we actually create our benchmark. That's what this code is gonna do. And then we'll run the benchmark and we'll pull out the scores of these three different models. So when we do this, we can summarize some results with this bit of code here. You can see this is just kind of a little bit of tidy burst mixed in with, well, this is basically just tidy burst stuff. And here's the summary. All right, so this is where we're actually able to compare the prediction performance of our different models here. So the graph score column, we can see the expected graph score over all of the prediction tasks. And the average for A or C is 0.143. And the average over here is 0.146 for random forest SRC and 0.156 for range. So remember, the graph score lower is better. So A or C kind of gets a little bit better score here. The C index, now it's higher is better than the C index. So A or C gets a score of 0.734. Random forest SRC is just a little bit below that and then ranger is just a little bit below that. Next up is the calibration. The closer to one, the better. So a perfect calibration gets a value of one for the calibration slope. And the or Cif is coming in at 0.994. Random forest SRC comes in at 0.985, a little bit further away from one and then ranger comes at 1.07. And then very last thing, I realize I'm just reading a table to you, but all of this is valuable information. So I'm just gonna kind of cover it. Next last thing is the time to train. So here you can actually see or Cif is efficient, which is great because oblique trees are actually very hard to fit efficiently. And that's been the main challenge of developing A or Cif. It's like, how do I fit these trees in a way that won't take days to finish? And if you're interested in exactly how A or Cif actually does do this quickly, the archive preprint has quite a lot of details on that. I'm not gonna go into it here because it's just demonstrating A or Cif and how to use it. But I find it interesting. I will say the caveat here is that MLR3 has forced these learners to run on a single processor and Ranger and RFSRC are both designed to be run in parallel. A or Cif is not quite, oh, good question about the scale. That's in seconds, I believe. So on average, that's the number of seconds it took to fit a model. Thank you for that question, it's a very good question. And so the comparison here is a little bit biased because A or Cif never even, I didn't write A or Cif to use parallel processing. I probably will at some point, but Ranger and RFSRC expect to do parallel processing. And when they do, they run very efficiently, more efficiently than A or Cif in many ways. So it's a little bit biased. I wouldn't take this as a final say of efficiency of these R packages, but it does at least show that A or Cif is not slow. So now we'll just briefly touch on how you can take A or Cif and tailor it a little bit so that if you don't like the prebuilt way that A or Cif finds many of your combinations of predictor variables, you have a lot of control. You can change that and modify it to do exactly what you want it to do. So when you fit A or Cif models, you can supply this control argument. And the control argument should be the output of one of these different control functions. And there's four functions available right now. The default one is to do the fast version of A or Cif. So this just runs the fastest. And then you can also use, instead of fitting this like partial Cox regression model, you can fit a penalized Cox regression model which runs substantially slower, but it does perform better in some cases. And the one that I find the most interesting is the control custom function. And I'll show you how to use this in a second, this allows you to create your own function for finding linear combination of predictors and then you just supply that function to Orsif and it takes it and uses it for you. Right, so here's how that works. Let's say that I wanted to make an oblique random survival force and instead of finding this linear combination of predictors with Cox regression, I just wanna find some random coefficients and combine my variables that way. So I'm gonna make a function called F underscore random for making random linear combinations. And all this function does is it returns a matrix of values from a random uniform distribution. And then that matrix will be used as my linear, the coefficients for my linear combination. So then I take this function and I pass it into the Orsif control custom function, which then I pass into the control argument of Orsif. And now I get back my customized oblique random survival force. And there's a lot of different things you can try with this. Maybe you don't wanna do random coefficients. Maybe you actually wanna apply principal component analysis and use that as a way to find a linear combination of predictors. And so I'm not the biggest expert on principal component analysis, but I can at least write a function that uses the PR comp function and then pull out one of the principal components from that. And here I'm actually pulling out the second one because when I experimented with this, the first one was more or less dominated by one variable. So it was like, that's basically just fitting an axis-based force, but the second dimension was more, you know, a good mix of variables. So that's why I'm using the second dimension. But then I just pass this in and now I'm using principal component analysis to fit an oblique random survival force. And so, you know, you can write these functions yourself and you have a good bit of control over how the random survival force works. And of course, the thing that I'm not really mentioning is it's very easy to cause your r-session to crash if you send r-functions into C++. So, you know, when you do this, there are some nice error, there are some functions within A-worship that'll test your function out. And if your function looks like it's gonna cause your r-session to crash, it won't be sent into C++. Instead, you should get back your message that kind of says like, here's what it appears your function is doing that it's gonna cause the r-session to crash. And so here's how you can hopefully modify it and make it work better. And I saw a little question in chat, so I'm just gonna read that real quick. I think there would be, so the question is, would there be a certain function to use that would make a standard random survival force? And there are a couple of ways that you might be able to get at that. So you could kind of just say, return a matrix where all the coefficients are zero, except for one, and that would give you a standard random survival force, because you're just using a linear combination when you're removing some variables with that linear combination by multiplying them by zero. Or I think you could also fit the random, the oblique random survival force with the value of one for M-try, so that you're actually creating linear combinations with variables, but there's only one variables. Like that's a trivial linear combination, but you would end up with something that was a lot like an axis-based force that way. And so, great question. So now, if you ever wanna tune your random survival force, this is kind of the way that I think about doing it, just by comparing the accuracy of the force based off of it's out of bag predictions, right? So here I've just got three different approaches to fit this random survival force. One is the default approach. One is using random coefficients. Another approach is to use principal component analysis. And I'm pulling out the prediction, the out of bag predictions from these three separate fits, right? So I'm creating a list of risk predictions, and then I pass that to the score function, which this is a function that exists in the risk regression package. And it does a very nice job of evaluating the accuracy of risk predictions. And so now you can just kind of get a look at how the three different approaches did. The default approach has an AUC of 90.8. The other approaches do surprisingly well. I'm very surprised that you can just take some random coefficients and you get back a pretty good AUC. That's interesting to me. It's also interesting that this principal component analysis does almost as well as the approach of using Cox regression. In fact, I don't have all my slides, but there are a couple of data sets where the principal component analysis is better. So I'm very interested in what people may think of when it comes to new ways to find linear combinations of predictors because there is a good amount of variability in the performance of the oblique credit survival forest that's explained by how you're actually computing the linear combinations. So thank you for sticking with us till the end. This is just a list of what we covered. I would like to acknowledge the funders of this research and it's been really nice to have some time to support a time to work on this. And of course, the collaborators who have helped along the way are all pictured here. Really great group of people. And so we're right off the top of the hour. If anybody would like to email me with questions, I'm just gonna type my address into chat. Please feel free to send me an email to have any questions and maybe we have time for some questions here. But otherwise, thank you for coming.