 Hi everyone, it's an honor to be here today. First, I would like to thank the committee for selecting our paper as the best with bioinformatic graduate paper this year. And as mentioned before, this is actually a joint work of many people and I'm only one of the first, well, five first authors and will present on the behalf of the five of us. So our paper is called Early Prediction of Circuitry failure in the intensive care unit using machine learning and it is also published on Nature Medicine. It is a joint effort between ETH and the University Hospital in Bern, known as Inges Spidow in German. So all the German speakers in the audience, please forgive my pronunciation. And there are two medical doctors from the University Hospital in Bern and many people from the biomedical informatics group and the machine learning and computational biology lab at ETH working on that. So I would also like to take this chance to thank everyone involved for their hard work and contribution, which make the paper possible. Without further ado, let me share more details about the project with you. So here's a picture taken from an intensive care unit as you really know and ICU. And patients who ends up in the ICU are those who are critically ill and need to be closely monitored by lots of medical devices and also closely observed by the clinicians. And each medical device here measure like a physiological signal such as heart rate, blood pressure, body temperature. And the clinicians also take notes on what they observe and also like send the patient's blood samples, urine sample and so on for lab analysis if necessary. And these patients also are given drugs to improve their physiological state to some extent. So over time, we have a large collection of time series data of physiological variables, lab tests, observation notes and pharmaceutical variables, even for a single patient. And if we leave this large amount of data upon human analysis, it will certainly overwhelm the clinicians, especially they need to take care of like several ICU patients at the same time. On the other hand, such wealth of data has facilitated data-driven machine learning research in tackling ICU related clinical problems. However, a lot of the recent work focused on predicting mortality or lines of stay in the ICU, which is of little use in helping clinicians making further treatment decision. But organ system failure will provide the clinicians with more helpful information in their decision making and organ system failure are not a reality among ICU patients. So there are organ system failure of different kinds. We can have like circuitry failure, which is associated with the heart. Respiratory failure, which is associated with the lung, neurophilia, renalphilia and so on. In our project, we focus on circuitry failure and we have developed an early warning system for that. So yes, not only we want to predict it, we want to predict it early. Another aspect that motivate us to build such a system is to alleviate alarm fatigue, resulted from abnormality detection by the medical devices as well. And we call our system circuit short for circuitry early warning system. So here's the diagram of the circuit's framework. It has multiple components, but I will explain them one by one. The data we use is provided by the university hospital in Bern. It has electronic health records from 2005 to 2016. There are about 50,000 patients, more than 7,000 variables and about three billion measurements. So it's a really big data and the highest measurement frequency is every two minutes. However, not all 7,000 variables are relevant to circuitry failure and also most of the important ones are actually available after 2008 due to a medical system update in the hospital. So we apply a few filtering criteria. In the end, we just keep electronic health records from patients and variable that satisfy the listed criteria in the gray box here. So after patient and variable filtering, we implemented variable specific processing, one of which is removing artifacts of different kind that we find in the data, including strange signals caused by the detachment of medical devices from the patient by accident and data and information entered wrong by the clinicians and duplication records and many others. Another specific processing we do is merge drugs given in different forms such as injection, infusion, tablet into a unified one. The last step of pre-processing is variable merging where we merge variables with similar medical concepts into a meta-variable, which is more general across different hospital ICUs and we also reduce the number of variables down to 209 meta-variables. So data pre-processing is actually an important step towards better machine learning. However, this is often underestimated but we, however, like spend a great amount of time and effort which is about 5% here on developing a data pre-processing framework that can take raw ICU data and clean it up and then transform it into a machine learning usable format. And we also publish the curated ICU data to PhysioNet for academic research use and we call it HERIT or HERIT. So it has higher time resolution than the existing public ICU data sets such as MIMIC and EICU. So after pre-processing we can extract features and labels from the data. Since we are interested in predicting circuitry failure we need to know what counts as circuitry failure and based on the medical knowledge of the clinicians in the project they provide us with a definition of circuitry failure events and with that we annotate each time point of the patient as whether the patient is experiencing circuitry failure or not. However, we don't want to predict what's happening now we want to predict the circuitry failure in advance. So the final label we actually use is whether the patient will have circuitry failure in the next X hours. And based on the experience of the clinicians we choose X to be A which is also how long a shift lasts in the ICU. So now that we have the labels we can extract the features while we also extract the features. And first we project all time series onto a five minute time grid and we impute the values at the time point where the actual measurement is missing using a data imputation which I will not go into detail here because of the time constraint. And after that we extract four type of features from the dynamic variables namely shape loads which is a time series motif and instability history and multi resolution summaries and measurement intensity. And besides those features we also have static information about the patient such as age, height, gender and so on. So in the end most of the patient can be represented as a time series of feature vectors of size about 5,000. And for machine learning we actually look at four different models. One is like GBN which is a gradient boosting tree. Another one is LSTN which is a type of recurrent neural networks that is suitable for modeling time series. And we also look at plane decision tree and logistic regression. We found that like GBN performed the best among these four models and that's also what we use in the end in circus. So before I move on to the result slides I want to explain two important metrics that we look at in the project. Alarm precision and event recall. So the clinicians can select a special and our system will raise alarms whenever the score has passed the special. And alarm is considered true when there's an circuitry failure event within the next eight hours and falls otherwise. And circuitry failure event is considered captured by the alarm if there's more than one alarm within the eight hour windows before it and it is considered missed if there's no alarm at all in the eight hour window before it. And we also apply silencing strategy to reduce the number of false alarm as well. So another experimental setup that we consider important but often ignored in a lot of the study is temporal splitting where the test data and validation data are collected in later years than the training data. The logic behind it is we want to analyze the model changed from older data on the data that is collected in the future which is a more realistic setting than just random splitting. So we apply this logic to the five temporal splits we use for hyper parameter selection and the overall split for final assessment. So now we move on to the results but first I want to say we also develop two type of circus model one we call circus and the other one we call circus slide. And circus slide is a light version that only use the 20 most important variables that we found through feature analysis and also we can visualize the feature importance as shown in the right plot here for interpretability and it's very helpful for the condition to understand what's going on in the prediction as well. And circus also don't use all 209 meta variable it uses the 112 most important variable derived from the top 500 features and we compare both circus and circus slide with a baseline that try to mimic how clinicians would predict circuitry data in advance in clinics. So the baseline is actually just like a plain decision tree based on variables included in the circuitry failure definition that would be the mean arterial pressure lactate and the vessel pressures. And from the precision we call curves plot we can see here both circus and circus slide has outperformed the baseline by quite a lot. And at the recall rate of 90% we can have a long precision of 30%. And what we also see here is that circus slide almost have as high performance as circus by using only a very small subset of variables. And the advantage of that is circus slide is more transferable across different hospitals and it requires lower computational cost as well because we use fewer variables. So now we want to evaluate how many alarms per hour circus slide can generate compared to a baseline method that mimic how alarms are usually generated in the real ICO setting. And this baseline method just detect any abnormal value in some variables and raise an alarm whenever it detects abnormality. For fair comparison we apply the same silencing strategies and also we let both method to use the same number of variables so any abnormal method also use 20 most important variables as well. And the baseline method have a recall rate of 97, well 95.7% and the standard circus slide that we look at only have a recall of like 90%. So we try to adjust the threshold of circus slide a little bit so that the recall rate of circus slide will also match with the any abnormal value baseline method. So in the middle you can see the adjusted circus slide version. However, if you compare both circus slide with different threshold with the baseline method we see that the number of alarm being generated every hour has significantly reduced. So it means that circus system can actually help alleviate the alarm fatigue to some extent. So next we want to see how early the alarms are prior to the circuitry failure because if all alarm are just generated a few minutes before circuitry failure it's still not very useful. But what we see here is that for standard circus slide was 90% recall, we still can achieve like 81.8% of recall even two hours prior to circuitry failure. And with the distribution of first alarm time prior to the event we also see that on average circus can raise the first alarm about 2.5 hours prior to actual circuitry failure as well. So our aim is that the system can help conditions to be aware of the circuitry failure much earlier in advance however this still need to be tested in real clinical settings. So lastly we want to analyze how transferable our model is to different ICU data sets because we don't just want to build a system that is specific to one ICU, we want it to be applicable to a wide range of ICUs not only just in Switzerland but also in other countries as well. So we took a public ICU data set called MIMIC and it is collected from the US. One of the big differences between Herit and MIMIC is that MIMIC has a much lower time resolution. And we compare the performance of circus slide on both MIMIC data set and Herit data set. So these two plots may be a little bit harder to understand in a very short time but the main message here is that the performance of circus slide fine tuned on MIMIC is better than the circus slide that's entirely rechained on MIMIC because the fine tuned version has incorporated knowledge from both ICU data. And what we also discovered is that the performance drop is mainly going from Herit to MIMIC data is mainly due to the course of time resolution in MIMIC data. So to summarize we developed a state of the art machine learning system for predicting circuitry failure and we also published this Herit data for academic research use. And the system performance is well enough to be of clinical utilities because it can raise an alarm as early as like 2.5 hour prior to circuitry failure on average and it's transferable to different hospitals, ICUs even like worldwide. And the missing pieces is that we still need to validate this model in real clinical setting. And that's challenging. And what we're also currently doing is building similar models for other organ system failures such as Reno and respiratory failure. So some final words. So this is a very complex project in the sense that it is interdisciplinary and it uses like data of a very large scale and very complex and it has involved many people. But it's still an exciting journey for me to work with so many excellent people. And also I got the opportunity to work closely with medical doctors and get their perspective on the analysis of machine learning performance. And such opportunity is not very common in like a lot of the ML in healthcare research. And I'm also happy to see that the system we developed has the potential to be realized and hopefully make a positive impact. And with that, thank you for your attention and I'm also open to any questions.