 Hi, my name is Beth Edkinson and I'm a biostatistician at Mayo Clinic where I've had the good fortune of working with Terry Thurneau for a number of years. I've learned a lot about survival analysis from him and today I want to focus on additions to the survival package that allow users to analyze multi-state models using extensions of familiar functions. First a reminder of what survival analysis is. Survival or time-to-event analysis focuses on a specific type of longitudinal measurement where subjects are followed for a period of time until they have an event of interest where their follow-up is discontinued and they are considered censored because the event has not yet been observed. The diagram on the left shows the follow-up of patients from baseline until event where the dark circles indicate the event and the lines without the circles are censored. Cell methodology is needed to analyze this data because of the censoring. Another way to think about this data is to draw a diagram from two boxes. Subjects are in the first box until they experience the event when they move to the second box. The boxes typically are called states and the arrows are called transitions. Classical survival analysis then is really just the simplest type of multi-state model with two states. The data format for classical survival analysis often consists of one rope or subject with a time variable and a status variable indicating a ventor sensor plus covariates. More complex models such as with time-dependent covariates or multiple events of the same type use the counting process style of endpoint that includes a start time, stop time and status for that interval. The status variable can have the value 0, 1, false, true or 1, 2 where the second level indicates that an event was observed. One of the main functions used in the survival package is surfeit which is used to create Kaplan-Meier curves. These curves are a measure of absolute risk. In this example, if the event is death then the default plot shows the probability of being alive at a given time stratified by group. The other key function is Cox pH which is used to fit Cox models. Cox models are a measure of relative hazard so in this example the risk of experiencing death when in group 2 is about half the risk as it is for somebody in group 1. There are also a number of standard helper functions such as plot, print and summary associated with these core functions. The other key functions that we often ask are what's the probability of being alive at five years? What is the median length of time until death? How long on average do people stay alive during the first five years? What risk factors increase your risk of death? Now transition to multi-state models. The goal Terry had when modifying the survival package was to make probability in state curves as easy to create as Kaplan-Meier curves, make fitting multi-state models as easy as fitting a Cox model and provide access to other estimates of absolute risk such as the restricted mean time in the state. So what exactly is a multi-state model? It's simply a framework for modeling a process where subjects transition from one state to the next. As stated before, classical analysis shown in the upper left is a process where subjects start in one state and can make one transition to one other state and there are only those two states. The upper right panel is one way of thinking about multiple types, multiple events of the same type. A subject has zero events, then possibly they transition to having one event and only then are they at risk for a second event and so on. The bottom row includes events of more than one type. Bottom left is an example of a competing risk analysis where all subjects start on the left state and each subject can make a single transition to another state, but there are multiple other states. Sometimes all the events are of interest and sometimes the focus is primarily on just one type of event such as here where the main event is cardiac death. If a subject dies of some other cause, in this example, they are no longer at risk for a cardiac death which is different than censoring. Finally, there's the full multi-state model where subjects can start in any of the states and can move between states depending on the scenario of interest. These models can get very complicated and often we've ended up examining different scenarios for the same study. I'll demonstrate multi-state analysis using the myeloid dataset which is a trial with two treatments and the states of entry, complete response, relapse and death. The initial dataset includes one row per subject with total follow-up time and an indicator for death censoring plus times until complete response and relapse if they occurred. With multi-state models, I strongly recommend first drawing a state diagram. This diagram shows all the possible states and the various transitions that are allowed. And as I stated before, it's often the case that you wanted to create different models and the diagrams can help you sort out what it is that you're really thinking about and examining. The survival package includes the function state thing that helps draw the boxes and arrows. Essentially, you create a matrix where the rows are the from state and the columns are the to state and one indicates that you can transition from one state to the next. Data setup is the most important step. And there are certain requirements for multi-state data. Specifically, there must be a subject ID and the endpoint instead of being a numeric zero one value must be a factor. The first level of the factor indicates no transition. In certain scenarios, you may have subjects who begin studying different states and that's perfectly fine. Here the team merge function is used to create the desired dataset. Team merge can be called sequentially or grouped together as is shown here. The first new event called dead specifies the full length of the follow-up and the ending status of one or zero indicating death or census. The next event looks at CR time and splits the subject's follow-up if there is a time listed. The new variable CR is created indicating that there's an event at the end of this new interval. Finally, RL time, relapse time, is used to further split the follow-up and create the relapse variable. The team merge function can also be used to create time-dependent covariates using a call to TDC instead of to event. The team merge function attaches an attribute to the data frame so that the summary of the dataset shows useful information about how the time intervals were created and more information about interpreting this summary is available in one of the survival package vignettes. One of the newer functions is SERV check, which checks your data to make sure that it's ready to be used in analysis. The transitions matrix summarizes the number of subjects moving from one state to the next and it can be helpful to check your diagram and make sure that things match up. This also provides a quick check to make sure that people aren't making illegal moves such as from death to another state. I've also used this to really better understand how many subjects are transitioning between the various states and if there's really small sample sizes for a given transition. For instance, there are 20 subjects who move from the initial state to relapse and in order to have a relapse, they would have needed to have been in a remission state at some point. So perhaps these are subjects that should be examined more closely. Another check is to make sure that all the subjects are following general rules such as are listed here. The SERV check object includes the IDs for those subjects that have issues. The analysis functions mainly cox ph and SERV fit also will use SERV check function behind the scenes and will stop if there are errors. So it helps prevent frustration by running this first. Finally, it is sometimes helpful to know just how many subjects experience a given state multiple times. The summary below lets us know that 151 subjects had a complete response, relapse and a death, but no subjects had repeat visits to any given state. The questions that can be answered with these multi-state models are simply extensions of what we already asked with classical survival analysis such as what is the probability of being in a given state at five years? What is the median length of time in one state until moving to the next? How long on average do people stay in a certain state during the first five years of the study? And what risk factors increase your risk of moving from one state to the next? In order to create probability in state curves, you just need to call the familiar SERV fit function. Again, remember that the major difference is that there must be an ID statement and the event must be a factor. The default plot shows all of the states except the initial one, so you can specify that you want to see that one too. One of the major differences about these curves is that they can go up and down as subjects move in and out of various states. They indicate the probability of being in each state by time. And here the solid line is treatment A and the plot shows that those subjects are more likely to be in the death state and are less likely to be in the complete recovery state than those in subjects in treatment B. Here's code used to create a ggplot version of the curves. Note the initial call to SERV fit zero, which adds in the starting time to the data for all of the groups. Then I used the broom function tidy to create a data set that can be plotted. And for this example, I plotted each state in a different facet. Note that here the initial state was also plotted, showing how quickly subjects move out of that state. The print of the SERV fit object using the arming option shows the restricted mean time in state. A value of three, so here I've got three times 365 because time was measured in days. The three indicates that the calculation should be made using the first three years of follow-up. So subjects in treatment A spend on average 0.5 years out of three in the initial state and subjects in the treatment B groups spend only 0.41 years. Finally, there are multi-state models used to analyze relative hazard or the transitions between states. The basic call is as simple as a regular Cox model. In fact, you could fit separate Cox models for each transition and get the same results. Here I've used matrix equal true option to the coefficient call just because it had to print out the coefficients and it's used here to save space. The print of the C fit object looks like a regular Cox model for each of the six transitions and is shown on the next slide. In this matrix of coefficients, each column corresponds to a transition shown at the bottom. State one is the initial entry state, state two is complete recovery, state three is relapse and state four is death. One colon two then is the transition from the initial state to the complete recovery state and one colon four is the transition from entry to death. So here is the portion of the the full print out of the Cox PH object showing the very similar results as you would with the classical survival analysis. The tidy function can be used to create a data frame of the coefficients and transitions for plotting or for other summaries. So that's always a useful tool. Another possibility is to constrain certain transitions to have the same coefficient. Since you may believe for instance that any transition to death is really having the the risk factors will have the same effect and so you can constrain them to have a common coefficient. So here I've used one colon two plus two one colon one colon four plus two colon four and three column four with both covariates and I've said that they should have a common coefficient. So whereas before the treatment coefficients were negative 0.097 negative 0.653 and negative 0.3 now they are all negative 0.294. Because this was fit using Cox PH most of the extra functions can also be used such as Cox.cph which tests the proportional hazards assumption. And in this particular example the the proportional hazards assumption does not appear to be violated. There's no evidence to reject that. One of the the advantages of fitting these models simultaneously is that you can then create predicted time and state curves for different scenarios. The surfeit object from the Cox fit can be treated as though it's a matrix of with rows corresponding to the new data that I've created here just two rows with for treatment A and treatment B and just looking at females and the columns are for each state. So shown here is code for plotting death using the original data and using predictions from the model. And the observed and predicted lines match fairly well given that the predicted was fit just for the females. There are two other main packages in R for multi-state analysis including M state and MSM. And for the most part the functionality of the M state package is now available in the survival package. And there's a vignette in Terry's Github site that shows the correspond an example of that. The MSM package provides functions for fitting continuous time Markov and hidden Markov multi-state models where the transition times aren't exactly known and these scenarios are beyond the scope of the survival package. So I've been working with Terry Thurneau and another colleague Cindy Croson on a new survival book and the focus is on multi-state models. But the book has also led to a number of other new functions being beyond those that I described in my presentation that are worth checking out. And overall I found that the most difficult part is creating the right data set when working on multi-state projects. In general the analysis tools are easy to use though interpretation of the results still is taking some time and thought. It's an analysis that I expect to be seeing a lot more as researchers start asking more complex questions. Thank you for your time.