 So, we continue with more talks by our doctoral students, the ESRs, the next one is Bowen Fan from ETH Zurich. I'm his advisor and Bowen will talk about his work on predicting recovery from multiple organ dysfunction in pediatric sepsis patients. Bowen, the floor is yours. Thanks, thanks a lot for the introduction and as cousin said, my name is Bowen Fan, I'm a ESR2 and from the Department of Biostation Science and Engineering from ETH Zurich. I started in the first March of 2020 and this work is one of our recent work that has been published in ISMB 2020 this year. Okay, so first of all, I would like to say thank you for all these collaborators as this project is a joint project between ETH Zurich and also some clinical institute that's in charge of, oh no, sorry. It's a university hospital offer and and a robot at Zurich Children's Hospital of Chicago and also University Children's Hospital of Zurich and it's not possible to get this work done without their full support. So I would like to say thank you here. So there are two main focuses of our project. The first one is pediatric sepsis and I would like to give you a proper definition of what a sepsis is. And sepsis is a life-threatening organ dysfunction caused by dysregulated host response to infection. And this sepsis can progress rapidly that leads to severe organ damage and death. And also on the right hand side, you can see a fact sheet of sepsis, which is from the world sepsis day event. And from fact sheet, there's 47 to 50 million cases per year and at least 11 million deaths per year. And in one in five deaths worldwide is associated with sepsis. And also sepsis is number one cause of death in the hospital and no one cause of hospital real-dimension and also number one cause of the health care. And up to 50% of the sepsis survivors suffer from long-term physical or psychological effects. And also what's even worse is that sepsis disproportionately affects children as 40% of the sepsis cases are children under five years old. So that's why pediatric sepsis definitely works our additional attention. And the second focus I would like to say is the multiple organ dysfunction syndrome, short as MOTS. And MOTS is defined as two or more concurrent organisms than dysfunctions. And MOTS is highly likely to be developed from sepsis in pediatric patients. And comparing to regular sepsis cases, MOTS has even increasing mobility and mortality. And here in this study we use the organ dysfunction definition from the International Pediatric Sepsis Consensus Conference. And this definition includes six different major organisms, respiratory, cardiovascular, central nervous, renal, haptic and haptological systems. And for those patients who accept MOTS at the sepsis onset, it's much less likely for them to recover to a rather mild state like zero or single organ dysfunction. So even those for treatment and even for those who survive the MOTS, they may still suffer from lifelong consequences to patients to themselves and also their families. So here in this study for those patients who were already in MOTS where the cryo sepsis is important to investigate their chance to recover to a better state like zero or single organ dysfunction. And here this is also the main motivation behind our study that we do an early prediction of the recovery from MOTS in this pediatric patients with sepsis. And hopefully with this work we could provide the clinicians the opportunity to take extra care and also enable tummy interventions. And so here we imply so much learning models for this prediction. And although there are already many studies that have focused on the early recognition of sepsis, but this prediction of the MOTS recovery in sepsis patient has never been attempted with machine learning. And also the current standard of care in clinic practice is still rule based. So with this effort, with this work, we hopefully develop something framework which could allow for personalized prognostics for those patients with better patients with sepsis. So this work is mainly developed on the Swiss pediatric study, short as SPSS. And this SPSS is a retrospective, national and multi-center course study, including children up to 17 years old with blood culture-proven bacterial infection and involves 10 major pediatric hospitals in Switzerland. And this SPSS database includes electronic health records at a daily level, which keep track of the most of normal measurements within every 24 hours. And this record presents up to six days of blood culture sampling, or we get the day of sepsis, the day of the sepsis onset. And a recent systematic review shows that for the early prediction of sepsis, more than 85% of the study only validated their models on one single database using a cross-validation scheme. And even though their models performed quite well in their own settings, the performance was never really verified in the other database. And also for this clinical research, external validation is extremely important for assessing the model's generalizability. So that's why we managed to assess to another pediatric patient, sepsis patient database from NN Robo-Attributed Children's Hospital of Chicago, short as LCHC here. That's for the purpose of external validation. So let's have a proper definition of this model's recovery. So we formulate this as a binary classification task that on the first day, we predict the patient using the data from the first day of sepsis onset, with a time horizon of seven days. Here I show a few examples of the model's recovery. And day zero here indicates the day of sepsis onset, and we have the data until day six. So that's one week, and we do the prediction on day six. And here the blue arrow indicates the mod state, and red arrow indicates mod three state. So the first one is a recovery example that the patient was in the mod, was in mods. And after day three, he or she got better and eventually on day six, he or she was mod three. And the second one, as a non-recovery example, that's the patient started with mods and got better after day two, but eventually turned into mods again. And the third one was excluded because the mod, even the patient started with mods three. So that's not very, very considerate in the study. And also one thing to note is that if the patient deceased before day six, we still, before day six, we still consider the patient is a non-recovery. So after removing those invalid samples for the SPSS data set, we have a non-recovery samples of 138 and recovery samples of 118. This translates to a positive class prevalence of 46.1%. And for the other external validation set, we have a non-recovery sample of 210 and recovery sample of 181. This has a very similar positive class prevalence. And also the core size is more or less similar in the same order of magnitude. And with the help of the clinicians, we also collected some relevant variables for this prediction task that includes vital science laboratory test results, clinical scores, chronic disorder information, and also demographics. There's 44 of them in total. And the data are only collected on the first day of SPSS set without the risk of future information leakage. And here's the results here. For the internal validation that we implemented and compared several machine learning models, including logistic regression, vector machine, random forest, multilayer perceptron, and lab rating boosting machine. And also we implemented a clinical baseline model that using the decision tree, which is the second pediatric logistic organ dysfunction score, which this score has been verified to be a very good indicator of mortality and also organ dysfunction in large pediatric patient database. So we also adopted a nested cross-validation scheme that was split the data into training validation and test that. And we repeat this for 10 independent rounds. And with the model development and hyperparameter tuning was done on the training and validation set. And we only report the performance on test only. And on the right hand side, you can see the internal validation results. Here we use two metrics, area on the rock curve and area on the precision curve. So all five models, the five main machine learning models performs comparably well and all of them up from the clinical baseline model by a large margin. And what the winner of this five models, actually the random forest model with a rock of 79.1% a rock of 73.6%. That's why we chose the random forest model as our main model and for the later external validation. And also the random forest model is highly nonlinear model and it's tend to be overfit and not doesn't really generalize very well. So it's particularly important to do external validation for this random forest model here. Then we do this external validation actually for the complainers of the study we did this validation in both centers and also testing on the other. So here you see two plots. And the red curves with the uncertain events are the internal validation results for the 10, 10 for the cross validation. And the blue curves are the external validation results. That means the model was trained on one center and test on the other. So basically we can see that the generalization of this random forest model is quite good. It's successful in both directions with this area on the rock curve over 75% for all the cases and also the area on the precision curve is over 70% with the positive class prevalence of around 46%. And also if we look at the high record region, where is the of more community interest that most of the event can be captured. For example, if you look at recall at 80%, that means four out of five recovery events can be corrected predicted. And we still have a precision score around 60%. That means for every five predictions to automate. Three of them are correct. So we see this random forest model here have both high prediction performance and also high generalize ability. In terms of this, in terms of this most recovery prediction task. Thanks. And later we also analysis whether the random forest model can predict the most recovery earlier than one week, like one, one to five days in advance. So here we show the results of the relative AEPRC area on the precision record curve. For different time points, the relative AEPRC is the absolute AEPRC normalized by the positive class prevalence on each date. Using this metric, we could have a rather fair comparison of different time points. So we developed a random forest model on the SPS data set and evaluated them on both. So you see the record for the SPS data set, the performance keep going down as we increasing the time horizon. And for these LCHC data sets external validation, we see the performance goes up a bit in the first two days and then drops from the day three onwards. And on day six, both external validation and internal validation are achieved very similar performance. And also this is something where it will be expected as for a shorter horizon, we should have a better performance. And also for the temporal limitation of data sets, we could only do the prediction for the first week. But for like longer horizon, like two or three weeks, we could probably just expect a bit lower performance here. And but MOTS here, MOTS here in the pediatric substance patients, their MOTS is highly dynamic. This is that usually with the progression or the recovery scene within very few of the first few days of the admission. So and also the persistence of MOTS for the whole week is already a better sign that it's highly associated with high modality and mobility. And also we want to have a model that has both high prediction performance and also high generalizability. So that's why we chose day six as our main focus. And also another thing we desire is the high interpretability of the model. So that's why we chose to use this shape value to explain the risk factors discovered by this random force model here. I already lost count how many times you have seen this. So hopefully you'll not get too tired of that. So anyway, so on the left-hand side, you see a B1 plot showing the top 10 features here. And on the right-hand side, you see the full names of these two top 10 features and the full names of them. And how do you interpret this B1 plot that basically each dot represents each patient. And also the red dot indicates a higher value of the feature and blue dot indicated lower value of the feature. And the closer of the dot to the right-hand side of the X axis, that means the feature is driving the model to give you a positive prediction. And the closer to the left-hand side, that means the feature is driving the model to give you an accurate prediction. So for example, if we look at the top one, the lowest oxygen saturation level. That means that here, there's a lot of red dots here. That means that the patients with high oxygen saturation level, the model tends to assign them high chance for the recovery. Well, for the second one, the lactate. And we see that the red dots are located on the left-hand side on the X axis. And usually that means heightened lactate values are already indicating this organ damage going on. So that's why the model thinks these patients are not going to recover from us. And for this plot, what we found interesting is that we found two organisms. The cardiovascular and respiratory systems are critical for this small recovery prediction. Because we find most of these top 10 features are more or less associated with these two organisms. So in clinical practice, we could probably pay more attention to the model's patients with these two types of organ dysfunction, rather than treating them equally compared to patients with other types of organ dysfunctions. So finally, I would like to summarize my presentation here in this study. We developed a machine learning framework for predicting most recovery in pediatric substance patients for one week in advance. And we conduct a comprehensive experiment to show that the proposed model can not only just predict most recovery with high accuracy, but also can be transferable across different clinical sites to those unseen patient data, even in the intercontinental setup. And also the prediction of the model is interpretable from a clinical point of view. So this means the model we developed here could provide more insights to the clinicians, or maybe they are more understandable for them. And also we believe that our model has certain clinical utilities that it could probably assist clinicians for better patient assessment and triage on day zero at the day of sepsis onset. And now we are doing something else using the same data set that we include in the genomics and proteomics profiles of this patient that we try to discover novel clinical phenotypes of pediatric sepsis. And by doing a characterization of the patient's subgroups, hopefully with this effort we could develop something for better personalized treatment. And that's all for my presentation. If you want to have further information of this paper, you can scan the QR code here. Thanks a lot. I'm ready to take questions. Thank you, Bowen. Other questions for Bowen. Tobias. Yeah, thanks, Bowen, for the great talk. One question regarding the clinical interpretability that you mentioned in the end. So how would this look like? Is that these graphs that you show, or is there more behind it? So how would the clinician interpret the results? I think this will probably help to support their decisions. Like patient with, for example, low oxygen saturation or high lactate value. They should pay more attention to that. But it's probably if you have a certain patient, right? It's most probably the case that there will be certain points which will be on the blue, others on the red, right? So otherwise it would be a clear decision. So if I get it, if I see a patient and it's always in the red, then it's clear and I can see also as a clinician how the system comes to it. Yeah, we also actually provide some samples for like how model, how the model like make the prediction for each individual samples. I'm not here here. It's in the paper already. So it's like a survival score for each individual patient. There will be much easier interpretable, I think, not in the cover level, but individual level. Did you have many more features than you said these are the top 10 features? Yeah, we have 44 features in total. Okay. And did you also like quantify like how much each feature kind of contributes to the solution? Did you say like with these 10 features I capture pretty much I don't know 99% or? Oh, yeah, we actually, I think we didn't really quantify like how much exactly this top 10 contributed a model prediction. I think that's a good point. We could probably include this in future work. But you said it's the first two, right? That have the only talk about this first two here to just to showcase like how this model. I take this features into account when they make making prediction. Overall, if you look at the resulting performance of the system, right? I don't already show that basically doesn't matter which machine learning method you use, but it's always better than the clinical status, right? But do you think there's a way to push the performance much further because there was not, at least not from the first view, there was not so significant differences between the individual methods, right? Yeah, that's right. And performance is, well, it's below 0.8, right? So it's not super reliable. Do you think there's a way to push it further? I think, yeah, of course, for machine learning models, the primary way of what we're always ready to have more data. Since for this package, package, except his patients, we only have like cover size of less than 300, which is a pretty little for this data driven approaches. Yeah, probably if possible, it could increase more data, but it could be very difficult. Or we do a more exhaustive research, but could be also discolors on overfitting issues. Okay, thanks a lot. Leslie. Thank you for your talk. I have a question on slide 11. Could you explain again the, so you trained on the SPSS and could you explain again why the Chicago data for the relative EU PRC first increases and then decreases? Yeah. Or what's the interpretation of that? It's a little bit difficult to interpret this. Or it's kind of a mystery of why that phenomenon happens. Yeah, we didn't really try to interpret this at the beginning. So, because it's external validation and we could already access to the data. We just let past the model and let it run. So, is anyone could give a proper guess of why this happening. It's an interesting artifact then. Okay, thank you. Thanks. I have one more question. So, we always took for granted in this project that the SPSS had been confirmed as a bacterial infection. Is this actually known at the point of time in which we would make our prediction. There's an inclusion criterion for this data set. Where it's for me not totally clear that this is like given or determined is the right word that this is determined at the time at which we make this prediction. So, is this the bacteria infection we know at which time point this is typically confirmed is maybe three days into the stay or something. Yeah, that's the things that for this data sets. It's actually that we use the blood culture proven proven bacteria infection that we do the blood culturally sampling and we once we found the bacteria we make this as a day. Exactly. And that's the inclusion criterion of this. And that accepts the study. But I think an open point is when exactly is this determined is this maybe determined after we make our prediction because then there would be another like latent set of patients on which we could also make our predictions. But which do not fulfill this inclusion criteria. I think that would be an extremely interesting external validation question to test the model on patients who are at risk for developing sepsis, for example, and see whether your model will predict them to recover, basically, because they're recovery likelihood will be higher, I guess, because it's not uncertain yet, whether they have sepsis or not. So other data sets like that available. This is the question. Maybe I can answer. So there's now a national data stream for pediatric research in Switzerland, in fact, and headed by one of our collaborators on the paper so they are going to collect data that's like that like an intensive care data sets for for children, pediatric patients. And I think you could do this. I mean, what, what this data set here represents is a very clean data set in the sense that that the sepsis has really been confirmed to be connected to a bacterial infection here on these data sets. It's not not necessarily the case for all things for all patients that are labeled as septic in the in a database. And then then exactly it's a question how how much does this in generalize if you train on a high quality expenses to obtain labels. I was just wondering if this Swiss pediatric sepsis data set is public. So far it's not for some private issues. And also the externalization is also not accessible to public. Not to us. Maybe just an additional comment what you said is extremely important because we think about the translational perspective of your modeling efforts. These models tend to be applied as they as we go to more poorly defined samples. And therefore I think it's crucial important to establish this right away, because to determine what the operational window of these models is. So that came to my mind is have you looked into decision curve analysis to establish the value of your model. So basically doing that benefit analysis. How much do a clinician gain or patient gain from a certain prediction. So by taking in the costs and the opportunities basically of a prediction. For this one is actually a retrospective study. So it's a bit difficult to evaluate to really compare against like a clinical decision decisions. So hopefully we can do some prospective study like really put this into clinical practice. And we see really compare to like the true decision make by the clinicians as this clinical baseline still sounds in a proxy to us. Hey, thanks a lot for any was a great presentation. I was just wondering because you were talking about how you had to basically hand in the model and get the predictions back and you didn't have access to the data and doing this external validation. Could it be. Could this be a nice use case for something like federated learning or storm learning that we had presented in the network like early on. Like maybe using something like that you can collect data from multiple sites and. Yeah, I think it is a very good point. In this case, it was extremely difficult for us to like to communicate anything for the model development. Also evaluation, we could like do this anonymously using the federal learning. I think they'll be very useful and also for the other collaborations in medical field. Thanks. Thank you a lot. If I may add, this project was interesting in the sense that for us, this was the really the first example of what you could call federated learning. So we sent the models to us and they ran the model there so so in all our other projects, we negotiate data access at some point we get several data sets and then we harmonize them and then we run it this this was really decentralized so we only sent the code we never got the data in this project. Good. So it's time for more questions. That's one minute. Who was first. I'll start in the back. Do you know if there is any stratification in yours in your sample, especially regarding the kind of bacterial strain that causes this. I didn't really get it. Do you know if there is maybe an effect like like a strain specific effect. In terms of like the features that you can get from your patient that could have an impact on the performance of your model in the end. Help you to interpret. So whether we could stratify for the infection that is present. I also thought about this actually about due to the very limited course size. We didn't do this eventually for the subgroup analysis or any stratification patient based on whatever the satisfaction or the pathogens. So, no, no, we didn't use those as feature because this may not be available for this external relations that I'm coordinating the adult sepsis study in Switzerland and we looked into this point very much. And those are too small to, to stratify for the type of bacterial in fact for the pathogen that causes sepsis so in a, in a like realistic time horizon so like a bigger, like collection of data says it's needed in just a Swiss one. Juliana. So, I mean, as bone said, the size of the data set did not allow to stratify but what we did do is that look for enrichments of certain, like statistical enrichment of certain strains in, for example, the like false positives or false negatives just to see whether for some strains, the model was like struggling more than for others, but we didn't find anything like exceptional. Thank you. And final question by Giovanni. Thank you for the talk. I was more curious it also in addition to my question apologies if I missed it but why is the rate of sepsis so much higher in children, like is it because of the immune system that's not developed. More immune system and also the mobile more general. So my question is, since somebody has mentioned the model is not like 100% reliable. Where is the situation in which you will, you would get some predictions wrong. And in the clinical setting, not all errors are made equal. In this case, would it be worse to get like a false positive or a false negative like what would be the consequences of both. I think, yeah, in this case, I think that's a very good question. In this case, we're doing the recovery prediction, not like really mortality prediction. So I would say it's better to give a false negative. So we consider still like paying attention is negative examples that I would still pay attention to this patients because the model thinks that patient will not recover. But hopefully eventually recover it, they'll be good. But if not, we can still have more additional attention on it. Thank you. Thanks a lot. Good. Thank you again. And we move on to the next speaker.