 What we should do now is do two things simultaneously because you and Marijn have a project together which you started. So Marijn can you join us here because you have prepared a little bit of a presentation for that as well. And it answers, that's the second thing that we do. One of the questions that came up in the chat, namely do you have anything for master students to do because I know this is a project that involves master students. Very smooth transition, yes. We do have a shared research project in collaboration with ETC Hospital that and its research is based on the not one but two master thesis from the DSS students. Hannah and Gerrit, if you're watching, thank you very much for collaboration. So let's get into it. Basically the goal of this project was to predict patient deterioration using data science methods. And deterioration here is for those patients that are already in the hospital and that will at some point deteriorate. And that means an ICU admission, so emergency care. And of course this needs to be unplanned because sometimes people are admitted directly to the ICU for those we exclude. And we define this as 24 hours forward. Now current clinical practice is based on a score that's called the MUSE, the modified early warning score. And this is a score that is taken ideally every 24 hours or maybe even every shift of the nurse shifts in the hospital. And it includes quantitative vital signs and clinical observations. And one of the important things that we found in the data set from the ETC is that there is a column called the gut feeling of the nurse, which is actually quite telling onto which patients will deteriorate or not. And there's a second contender, which is a DI model, the deterioration index, that is generated by the electronic health record company software. The records, yeah, basically assigned these values in the background. And that is a proprietary model by the company ETC. So here is an outline of the MUSE. And you can see that there is a couple of columns here that indicate several domains where people can go off course. In green is the to be expected values. Yellow is slightly worse and red is the no go zone, basically. So we can see that there is a breathing frequency, oxygen supplementation, heart frequency, systolic blood pressure, consciousness, etc. So these scores are converted to ordinal ranks, so zero for no problems in the domain and three for increasing severity, basically, following the ABCDE criteria of emergency care, leading to a maximum of 27 points. And that is really, really bad, of course. A traditional cutoff in the hospital is a MUSE higher than or equal to seven, which would trigger a intervention by what is called a rapid response team. So some people come over and see what's what with this particular patient. Now the data set that we worked with was provided by the data warehouse. We want to thank Inge and Hype there to work with us and have carefully put a light on all the blacks, the blind spots that we had in the data and that they could figure out what was going on. And there's about 500 days here of data. The data collection period also included COVID, so there's some words about that maybe in the end. And as you can see that there is the majority of the patients fortunately do not deteriorate, but there is a slight very small minority that does deteriorate. And the balancing act that we have to do here is we have to devise a model that raises the alarm only for those patients that are actually going to deteriorate and ideally doesn't miss any of them without leading to alarm bells going off all of the time. We call this alarm or they call this alarm fatigue, something we learn about that would intervene with clinical practice just on you know what is possible to do basically in practice. So you can see here there were some exclusion criteria including having COVID, so this data this model is fit on only non-COVID deteriorations and a couple of others. So there were two approaches, a machine learning based approach and a deep learning based approach and Gerrit was responsible for the machine learning approach which focused on explainability and use random forest as a main method. We can see here a list of features that were selected by some of some different methods to select features and you can see actually that the muse so the traditional values are actually all in there all of I guess all of them. So they do actually are they are of course very relevant to this prediction and this came out as well. And so the results of this random forest analysis basically gave in the in the best case a model with an F1 score of about 20 and 27 with a recall of 24% so that is not not too good actually and a precision of about 30%. And interesting here is that the gut feeling indication alone so just the nurses evaluation already would outperform the traditional muse or DI score. I think this is where I hand over to you. Yes. Yeah so the second student Hannah focused on deep learning neural network methods and aimed to compare how they would perform using both static and dynamic data in other words would collecting dynamic data lead to improvements in performance. So one question one research question was does it make sense to take into account let's say the history of the patient so as much data as possible from let's say something approximating a time series point of view. And the second question an interesting sub question was also how does the class imbalance affect the performance and should we do something about it in the training stage. So let me first focus on the data itself. So the static model is fairly similar to what I just described what it was working on. We collect data on a particular day and we try to estimate the risk of deterioration of that patient in the next 24 hours. The dynamic model was trained on a different input data set namely collection of the same measurements for the same patient but three days in a row. So here I should already make a note that the results that I will present are not directly comparable to the results of that Marin just presented because of course the the test set was different and the training set was different because here we only worked with with those patients that were in the hospital at least three days. So as you can see here we also have a very small event rate but the event rate for the dynamic data set is already now one percent rather than the 0.03 percent that was presented by Marin. So patients who stay in hospital for three days apparently have a higher chance of deteriorating which probably makes sense as well. So Hannah tested various levels of let's say upsampling the training set so that the training set then included different event rates and we will see in a few slides time how they all performed. First of all the more general results we can see that the static MLP method already produced decent results in terms of recall higher than muse in terms of precision a little less than muse but in terms of the F1 score slightly higher and muse. We can see here also an interesting result for us is that Hannah's experiments also validated that muse outperforms DI exactly the same as it as was the case in Get It's Teases. Like I said there is no direct comparison possible between Hannah's results and Garrett's results but we've just included them here as a reminder the random forest actually performed very similarly to MLP. What we can see and what's very interesting to us is that RNN so the dynamic model performed very very much better than MLP or random forest or muse or DI but like I said this should be taken well what should be taken into account is that the test set of course was different but the difference is still quite stark. So the main findings to summarize both types of deep learning models outperformed muse and DI with MLP very similar to Garrett's results with random forest and using a main conclusion would be that using dynamic data if available of course is better than using only static data. So if you have a patient history concretely it's better to use it than to just focus on the last measurements that you have. And then I guess concluding so we can say that in this case when the data is available the sequential models do outperform the static models and hospitals can use this model to fine-tune their event rate their alarm rate so trying to maximize the recall for a given acceptable number of events basically. But there's some open issues so we don't know too much about the calibration of relative risk and we've just classified here directly. As mentioned already earlier a validation on different cohort is very much required either a different hospital or a different sampling period maybe of patients. A side finding was that this the model trained on non-covid patients performed really worse on COVID patients because they would have different vital signs related to deterioration and non-deterioration. So much lower saturation for example would be considered normal for a COVID patient and really not normal for a non-COVID patients. But we think and hope that this model might be more used in a way to again triage patients so to indicate to the nurse what the subjective ordering of relative risk is between patients basically saying which patients should you see next with the available time and resources that you have. And in that sense basically try to optimize the care as well. Well that was quite interesting. There are a number of open issues that you mentioned. Are you continuing this? Do you need more master students for this? Yes so all the master students watching please join our project. There is an interesting upcoming data set that actually hopefully will use continuous monitoring of patient data so every 15 minutes with some smart patches we would really hope to be able to use that very rich data set to do more sophisticated time series models. Yeah you mentioned also there's a difference between COVID and non-COVID so you might be people who are happy with the COVID crisis because that gives you a lot of data but probably it's not just COVID and non-COVID probably other categories as well. Have any insight in that? Some other students are not outside of these two theses have done some work with us on seeing or checking whether there are differences from one department to another department at the hospital but we've seen indeed that there are definitely different trends in different departments but we haven't been able to let's say quantify anything yet. It's ongoing work. There's actually here someone says could bachelor students also work on this? Presumably yes. Presumably yes yeah so that might be projects there as well and it might be nice preparation for a master project a bit later.