 The first speaker of today is Florian Lipsmeyer. He studied bioinformatics and bio-mathematics in Bielefeld University and joined Roche afterwards as a senior scientist, becoming a principal data scientist in 2016. And now since August 2018 he is the digital biomarker data analysis lead of PRED Informatics of Roche in Basel, Switzerland. So he's located in the same city as our department. Really happy to have you here, Florian. Thank you very much for coming. And Florian is going to talk about the various roles of machine learning in the development and use of novel digital health technologies. So with that, Florian, the floor is yours. The roles of machine learning around novel digital health technologies or digital biomarkers. I want to quickly introduce what I mean with that and also why we think digital health technologies, digital biomarkers will play or are already starting to play an important role around disease assessment, especially in a clinical study, clinical trial context, but also beyond. For that, I would like all of you to just think about your daily life and try to start. Remember how you have felt last week, a few days before, a few days left. And now imagine you are someone who has a disease in general. And what you can already get from that is that there are differences and that is what all our measurements are about. So basically you are living as I'm living 365 days a year. And depending on your disease, you might have good days and bad days and moderate days with your disease. Meaning your disease is fluctuating every day in one way or the other. Plus, depending on what disease you are having, your disease might progress over time. And in our daily life, or especially in a clinical study context, what is happening is, of course, you're going to a clinician to a physician to get assessed to see how is your disease developing? Are you getting more healthy? Are you getting more ill? And at this visit in a clinic or at a physician, you can do different types of assessments to get more information about the disease. That could be just simple blood measurements or MRIs or other things. But often, especially in neurological diseases, what you will do is certain types of assessments where different types of disease symptoms will be assessed by you performing cognition tests or different types of motor tests. And the physician is observing you in order to see how good are you doing these things? And then these type of motor tests are then rated on a severity scale, often between zero and four, for example. Or vice versa, you are asked, what type of problems do you have in daily life? How difficult is it for you to do your daily housework or just drinking or going around or meeting with friends and also again on a scale of zero to four, for example. And then these type of problems are just summed up to a total score. And that is then a proxy for your total disease severity. And as you can see from this picture, what you get there are actually only spot checks because the disease is variable and it's a chance event when you visit the clinician, whether you have good moderate or the symptom day, obviously. Plus, if you are asked, like you maybe just also so for yourself, it's really difficult to remember beyond a certain time horizon how good you were actually feeling. So there's a record period, which again means everything you can tell at your physician is just another spot check of how you really feel over a longer course of time. What that means in a clinical study context is that it adds inherent variability to all these measures that you get there. And in a clinical study context, what you don't want to have is variability in interpersonal variability in how you assess disease. Because that just means you need more patience or longer time horizons to actually see in a study whether a drug is working or not because you might have an arm where patients are not getting a drug and another arm where patients are getting the drug. And then over time, you try to observe whether one arm is doing differently than the other. And if you have this inherent variability and that is too big, you need to decrease it by increasing sample size or time. And that brings us to digitalized technologies or digital biomarkers of their mission in that how can they augment what is currently done in the clinical study context. So maybe first of all, what do we consider as digital biomarkers? And this is just an example in that sense, but we consider digital biomarkers, basically readouts from digital devices often even consumer-grade devices like this depicted here. A smartphone, for example, or a smartwatch but can also be more dedicated devices like sleep measuring sleep or something else. What all of these have in common is that they have digital sensors that produce continuous data streams. So you might know or might not know that but your smartphone is continuously producing gyroscope and acceleration data that helps, for example, to see whether you have turned your phone and go into portrait or landscape mode for your photos or for playing some games. So that is the type of data you see here in the small visualization on the lower left where I'm just walking down a corridor and this type of data is recorded. And what you can then get from this type of data and you're getting that in your usual smartphone, for example, are your daily steps in life or your distance walk. But you can get also a lot of more data from that. That is what will be the focus about in this whole talk and why ML and AI are actually a method space which is I would say necessary to really unlock this type of data. So just to recapitulate why we currently have a challenge and why digital health technologies could for a solution there. First of all, frequency. As I said, we have this fluctuation of symptoms and you just cannot go every day to a clinic or to your physician to get assessed. That's just impossible. And there's the position problem. You might have these low-resolution scales. So that if you rate something on a scale of 0 to 4 then that is inherently low resolution. Then there's the question of accuracy. So you have some subjectivity in there. You as a patient might have a very, very different view on your disease compared to another patient or also a physician might have slight biases in how to rate certain things. And also he has good onset days. And there's this reliability component where different people as I said also consistently rate disease symptoms in a different way. And so you might have these type of biases in there. Plus if you go to a physician or into a hospital you might notice that yourself you are inherently also often influenced by that. So you are not your daily self in your home environment. This is also about what we call ecological validity. So really how does what you measure there in the clinic really reflect your performance in daily life which is in the end what we care all about. With different types of digital solutions you can actually try to overcome that by increasing the frequency by measuring basically daily or continuously. You can get a greater position because you have these continuous data streams which you can turn into continuous outputs. Also with these continuous data streams you have the possibility to get to better or greater accuracy and by employing these consistently and using the same algorithms obviously you should get better reliability. Plus by measuring at home you also get better ecological validity. So that brings us to what do we do about these type of things and how is this field actually developing? And I think it's really best shown by how we have developed in six years or seven years. We started with a first proof of concept smartphone application where we tested in a phase one B study some simple measurements on a smartphone which we deployed in a clinical study to the Parkinson's disease patients and we asked them to do certain tests. That worked very well and from there on we started to develop more of these type of solutions in different diseases and since then we have now collected data over the time for more than 5000 subjects at a lot of different sites in a lot of different countries and all of the data as of I don't know half a year ago for example is already adding up to more than 500 terabytes of collected data and that does not even include a special type of study. We are also running which is a bring your own device study which is called floodlight open where we have a few other thousands patients all around or participants all around the world who just have downloaded that app and are collecting data for themselves or in other types of study setups that have nothing to do with what we are doing. So you see this is really a way to collect additional data and it is a lot of data and you need to make sense of this type of data so maybe just to illustrate that in different types of settings what we have for example currently deployed is in different types of solutions in with smartphones, smart watches but also with beacons and sleep mats but so that's baliastography in Parkinson's disease, Hunting's disease or in autism spectrum disorder where we ask patients to do certain tests but also very passively monitor certain things in their daily life like I mean they carry around the smartwatch or the smartphone in daily life and we collect sensor data and then look into their gate patterns, answering patterns, their mobility or their sleep for example. And with this type of data we really hope to go into the details of what can be measured about the disease and I will give some of these examples in the next slide so just as illustration before we really go into the machine learning part. So one example is for the hand turning test so what we ask patients to do there is every second day please turn your hand as fast as possible a few times and then you get something like this where the smartphone is turning then heavily and you see depending on how disease you might are from Parkinson's disease for example you have something which is called reticnesia which is making you slower, doing less amplitude movement or also leading to hesitation in these movements and all of that you see here in subject B where you have lower amplitude, less fast movement and you also see third signs of hesitation. So this is something very obvious on the next example you see something which is then bringing you why is it important to measure often at home. What you see here is on the lower axis you see the assessment of something similar in the clinic and what you see there is there are a lot of patients that are rated by the physician at that spot check in the clinic as having no problems with hand movement. On the left on the y-axis you see what we measure as maximal hand turning speed in daily life and what you see then also is that there's generally a nice kind of correlation but you also see outliers here that are clearly behaving differently compared to what the physician has seen. So this subject is more in this type of disease group than in the high disease group and if you look into that with such a video then you see that subject B actually is showing maybe not too much less speed, yes a little bit of less speed but also less amplitude but not showing the hesitation that you see in subject C for example. Meaning this patient is really behaving differently in daily life and if you measure that often and confirm this type of finding then you can be pretty sure that you're seeing a different type of disease picture than what the physician has seen. This is what I meant with inherent variability and ecologic validity at the home site. We have another test, here's another example where we just asked the patient to hold the smartphone still in the hand and this is a hunting disease patients which have involuntary movements which is called Korea and then what you can see there is something like this happening. A healthy control has relatively low movement but hunting disease patients can have quite a lot of movement and also starting with very subtle movement. That's where also this higher sensitivity is coming in which you as a physician might have problems actually to see. And what we could show there then for example is that again there's a lot of variability but generally this test is really well suited to assess the same type of symptom as assessed by a physician and we could show that just with one single feature already we see quite a nice correlation and quite nice similar results in different studies where we have deployed this type of device. Still this is all a single feature so for the context of this presentation still maybe a little bit less relevant than Pauling. Just at the last point to convince you why this can be important. This is a last example from again a single feature from a U-turn test where we ask the patient at home to do a few U-turns and we measure the speed in this example and this is from Michael Schrozer's patient in a study and Michael Schrozer is a disease where you can have relapse events meaning that you might have you can call it certain spikes of disease where you suddenly get worse and then you can also get better afterwards to a certain extent and what you see here is basically this daily measurement and you see that for Harvard of the observation period he's doing kind of fine for his kind of performance and then he also sees that suddenly something is changing and in this study we luckily asked these patients also, okay, please report when you feel that you have a relapse event and what you see here is really when they start to report that we also see a clear, distinct pattern in this type of test and you can of course imagine that with more data you can home into that much more precise and do that automatically and maybe do this even earlier than it has already happened. So it's really performing worse. That's at least and that is then important to keep in mind for the rest of the talk. What do we want to do with these type of measurements? We want to develop new outcomes, new scores that can be used in different ways to assess drug efficacy for example, on disease progression or maybe also in other contexts distinguish different types of disease populations for stratification purposes and there are basically two different ways to do that. One is the completely data-driven way which will resonate very well with you as an audience I guess and that is a way where you use the data and try to be maximally sensitive in different ways types of ways to really be able to measure disease and disease progression very, very early on. There's also another way which we will not focus on this presentation a lot about and that's the patient-driven way. So rather than selecting features from the data or doing a lot of crazy AI stuff you go to the patient, ask the patient okay, what is really relevant for you in terms of daily life? What is really your problem for you? And then you develop measures from the sensor data to measure that and then you also try to maximize that and get a more patient-centric view and score on that. These are not the same thing because not everything you are able to measure might be directly relevant to a patient in daily life. You might not care about this problem or might not realize it but you might be able to measure with that something very sensitive just to keep the distance. All right, so let's turn into how can digital biomarkers be influenced by machine learning and AI components. For me, this is really about optimizing on the information extraction but also keeping explainability of what we get there. So these are two inherent components that need to go together for digital health technology of digital biomarkers because what we are developing there is really for clinical studies for patients, for regulators, for physicians and all of them really want to understand what you are measuring there. So this is an important component to keep in mind and that also kind of influences how you need to design ML AI components in different stages offer development of digital biomarkers. So what do I mean with optimized on information extraction by keeping explainability? I tried to visualize that in a simple plot so you have to imagine as I already mentioned we are collecting a lot of sensor data. Can be acceleration data, touch screen data, microphone, audio data can be anything. GPS, Bluetooth. But what we want to have from this data and that is something we also all our study participants who collect this data for us that we get the maximum disease information from this raw sensor data. So we need some type of function which gives us that type of maximum disease information and what that actually usually means in one way or the other is that you need to care about the quality and the context of this data. You need to have signal processing that gives you different types of readouts from this data that are meaningful. You need to aggregate them in some type of way to get this type of maximum disease information. And vice versa, often it might also be really good to be able to go from this maximum disease information to the raw sensor data and understand where is that actually coming from. These are continuous data streams and hence it is not always straightforward to understand, okay, what is it that makes the signal which we compute here so unique? Which type of features are the ones playing a role? Where are these features predominantly coming from in the center of streams? And that is something which is also very important and we shouldn't forget about. And it's a question of course always, how can this be done? And even worse, what is maximum disease information? How can we actually approximate that we have reached this goal of getting to maximum disease information? All of that I will try to highlight in the next half hour to really convince you that using MLI components in this digital biomarker context is something necessary to reach this, but also that AI is actually something that could from my perspective help us to approximate what maximum disease information actually is. So sometimes digital biomarkers can be a puzzle and that's also why I depicted it on the different ways how we currently use MLII as kind of different puzzle components in this quest for the best digital readout in a given disease. And what I will show in the next slides are examples from all of these areas where we use it. This is about data quality, data contextualization. What you all care mostly, most probably about is about feature-free deep learning to classify some things. But it's also about using features directly for developing or for predicting clinical scores or doing classifications. There's also about using AI components to develop something which I term meta-row data. You will see what I mean with that later. And it is also about going back from the classification or the prediction result to the raw data. So let us first start with what I mean with data quality checks. As you always know, with data, it's garbage in, garbage out. So you should care about what you are collecting there and what it actually means. And what that also means is if you do something like we are doing and that is deploying measurement solutions to patients in their home setting, you collect a lot of data, but you collect it in a remote unsupervised setting. So you actually don't really know if someone is using what you give to them in the best and correct possible way. And even if they are doing it, you might still get data that is sometimes just not usable for you. And what that means is to start with everything you need to care about data quality. And that can be raw data quality. Not every device is the same. Not every device, there might be faulty devices. So you need to care about sampling frequency and deviations from that generally the noise and the bias of sensors. You need to care about external impact. So the way you behave might impact on what you can measure or not. Some of you might notice that if you can't, if you look into your step count, it's not always counting steps because in some instances, you might do something different, but the algorithm still count what you're doing there as steps might go on a trampoline. Then in doing some sports or doing something else, that's all still something where the algorithm might not realize you're not working, but still it might count steps. And so this can be also about center placement. And for example, very simply, also just the correct execution of a daily test that you might ask the patient or the participants to do. So here's a very, very simple example, where we ask patients to do a symbol digit matching on a smartphone screen. And with that, we want to access their information processing speed. So patients see symbols on a screen and they have a mapping of symbols to digits and then they need to press the right type of button on the screen, right type of number as hard as possible and that do a lot of times until the test runs out. And what can happen in these type of instances is of course that participant is just not interacting with the screen or he's not really doing it in the way he's supposed to do it. He's just playing to quit to go through this game for the game day. So you need quality like for that to, for example, see if there's no screen interaction during the time or whether he's just taking through meaning you have really low response times and really low accuracy response to the fact that they're just not doing the test to instruction. That's a very simple example, but that's something yeah, we are doing. Let's turn to a much more involved example and that is something which some of you might actually know or realize who are doing, who are using modern smartphones equipped with a PBG sensor. So that are the screen lights you have at the back of your smartphone in case you are using that. And so these PBG signals are basically trying to measure heart rate and other things from how they are changes in the reflection from your veins at the hand basically. And what we are interested in different types of disease context is to measure what is called heart rate variability because heart rate variability can tell you something about stress and the anxiety state of the person at a given point in time. But and that is something you might realize if you use these devices. This signal is heavily influenced by what you are doing meaning if you're doing anything where you use your hands in different types of movement these measurements might either not be possible or gets really, really wrong. And then what you need there is a quality assessment of how does your signal need to look like so that you can trust in the end the HRV readout. And what we try to develop here or what we develop here is what we call a kind of quality metric using machine learning type of approach to assess from the raw data whether a later readout of HRV is trustworthy or not. And we did it in a way that we collected gold standard data, ECG data. So that's this red type of signal what you see here. You see nice what is called RRI signals. So these are the intervals between successes heartbeat. That's the one thing. And from that you can then very reliably calculate HRV metrics so hard different types of metrics that approximate heart rate variability. And on the other hand, in parallel we collected this type of PPG data and also turn that into different heart rate variability readout. And then we really compared or calculated a difference between what we get at heart rate variability from the PPG and just ECG and we assume that the ECG signal is always correct. And then we expected a lot of different types of features from this PPG signal and use that in a linear model with Lasso to develop a scoring metric. So just to show you what I mean as an example, we for example let the patients or the participants doing pace breathing and you see that there are then variations in how the signal is changing there for ECG and PPG. You see for this example the signal is very, very similar. That's nice and it's also expandable because when you do pace breathing you usually don't use your hands. But if you do typing on a smartwatch or on a keyboard, smartphone or just walk around that already changes the picture of that. So what we then looked into is really what is the type of model that reduces the estimation error between what is measured by the PPG and by the ECG type of signals and what type of window length do you need to get a minimal difference between these two types of signals. And that is in the end the machine learning model that we are then deploying on data that we are continuously collecting from patients and then get a continuous read out of what is high and low quality data and where can we then actually use our dedicated HRV algorithms to collect and what to calculate this read out and data usage and statistical analysis on what is changing for these patients over time. When do they have anxiety? How often is that happening and how do potentially drugs influence these type of states? So that was an example around quality and I hope you understand where quality is important. Another example of how to use AI is as I said, what I take term because I didn't find the better term for that, meta-raw data. So in different instances you collect raw data, but this raw data might not be directly suitable to what you want to measure. And I make there a distinction between you do a little bit of filtering to denoise the data or you turn your raw data into something else, which is still a type of raw data, but nearer to what you want to actually measure and then calculate features from that. And examples for that is for example, sense of fusion where you get the IMU data exploration virus called magnetometer and you fuse that with an algorithm and get from that, for example, velocity or displacement instead or another very simple example is you speak and then you turn what you say and what an audio signal actually into text and then you deploy algorithms on this text data rather than on the audio files, very different types of assessments or another example, which I guess we'll show you in a moment that you collect video data and turn that into movement coordinates and calculate from there certain types of assessments and that is actually something which is also tried with smart phones or also other smart devices and which we had also let me also investigate in the past. So there is an open source, open source library available where researchers have developed a deep learning application that is trying to infer where you are looking on a smartphone when you have the camera running. So they are taking pictures or they're having video stream data from your left and right eye, from your whole face, but also a face grid and put these into a deep learning application and that then collects a lot of crowdsource data with different types of tests where you have sped dots on different parts of the screen and then follow that and develop the solution that is supposed to predict where you're looking on the screen. And for us this is potentially interesting application of course, because depending on what disease you are looking into that might be actually important because there are certain views like autism spectrum disorder who have inherently a different way to assess, for example, faces. So it might look more so we as maybe people who don't so I mean I'm not in the autism spectrum disorder usually have a tendency to look more to the eyes where certain subpopulations in ASD might have a tendency to look more to them out for example. And that is something you could accept and measure and we wanted to see if that's possible with such a solution. So what we looked and really into is how that works on a smartphone and let participants do different types of tests following dots and looking into to eyes and mouth and different types of patterns and then we looked into how would this be direct output of this deep learning solution and other different ways to optimize on that. For example, by having a calibration task at the at the beginning and then doing some linear transformations or using support vector machine approaches to optimize on the classification error and we also looked into different instances of lightning and so on to see how robust that is when you see some of the results here the rest you can see in the publication and you see that obviously there are ways to improve on these type of results and this is from my point of view really something where more research can and should be invested to get better eye tracking for such a smart solution. Next thing is what I call data contextualization. So that brings us back to so what does context really mean context is for me really the circumstances of or the setting where we collected segments of raw sensor data. So and why is that necessary because we need to understand where we collected the data otherwise we cannot know what is the right algorithm to actually apply or is the data trustworthy. So what we need is we need to annotate somehow this data with contextual information. But how is it possible if the data is collected in a remote setting all the time. That means on the one hand we need to get that information to directly compute compute features with this information or to identify interesting data segments to then choose the right algorithm on that and examples of that is what I call passive monitoring so where you just continuously bear your smart device and we collect frauds and the data and then you might want to know what type of activities have you already done. Have you been walking sitting. Where have you done that at home or outside. Did you have something like this shaking type of events of Korea or something in Parkinson's disease which is called drama which is a kind of continuous regular hand shaking in a different type of way which really impacts your daily life. But you need to identify these type of events in your in your center of data otherwise you cannot use the type of right type of algorithms to actually then calculate anything on that other examples for example audio where you might want to identify different speakers or you want to classify different types of ambient noise to see whether the audio data you are getting there is usable or what part is actually usable or also where are you actually is there something about also your social daily life that you can get out of that are you are you in a context of a lot of people a lot of things happening or you're always in quite places. Let me give you some examples. Why AI is really something you need to use there. One first example is you really from what we have in autism spectrum disorder as a conversation test where we ask the caregiver and the patient to have a daily conversation and where we want to see how is this conversation go going really. Is there something we can learn from that for example there's the hypothesis that that there are difference in how are terms taking that that and there might be interruptions or that that that you are different types of how monologues are done or something similar and what you need to do for that is in the first phase you need to dial rise the speaker so who is talking when and you need to do that from the raw standard data that different ways to do that one possible way is to turn the raw standard data into something which is called the male spectrum and then you can deploy different types of deep learning solutions for example to then try to separate speaker and and that's something we did in this context and then looked into what does that tell us about the disease and what you can see from there is in the first place that we are getting for a good correlation between what is the actual true classification there and and what we predict in terms of patient response duration in terms of proportion of how much the participants are speaking so that's already great and what you can also learn from that is that this response duration also the proportion of speech in such a conversation is really different could be participants from the autism spectrum disorder compared to healthy individuals and so you see that these mirrors in such a conversation test might make a lot of sense they differentiate and and with this type of solution you can get really meaningful information which is then in the future might use for tracking how this is changing over time given certain types of therapies either drugs or maybe also behavior therapy another one example is and that's something which we are using across the board and many types of diseases where we collect continuous data and the data that is called human activity recognition and and for me this is one of these examples of what I call transferring learning from different diseases you will later also see something which is really transfer learning in in in the deep learning context at the very end so please be patient with me there but what I mean with that is this is just from our first study 45 patients being in their roughly half a year and what we collect there is really 33 k hours of sensor data which is roughly 1.2 terabyte and what we have there is a problem is this is non-labeled data from the daily life of the participants and to really get something meaningful out of that we need solutions for that and what what we deployed there is deep learning solution which we trained on different openly available data sets and and where we try to predict different types of activities like sitting standing walking jogging and so on and you see a depiction here in case you're interested on on what type of deep learning solution was was used in that instance and you might then ask rightly okay how do you actually validate something like that in a busy context because this data is not I mean the original training was done with with data that was out there which was not usually from patients from these people but from healthy healthy human beings but what we could actually use in this type of context is that we have walking or balance tests and other tests for a given disease so we actually had kind of snippet of ground truth data that we could use to see how good this prediction model is actually working and it's working really really well also in this case of disease context and then we really deployed over this long stretches of time and from that we calculated for example the percentage of gate equity or how often these people are standing up sitting down so sit to stand or stand to fit transitions or we identified gate segments and then try to identify when people are turning and how fast are people turning in daily life and what you can see here is that all of these methods really differentiate them in the end in daily life between age matched healthy controls and these Parkinson's disease participants so this is really going into the daily life and marrying something meaningful and that's where you need really as an integral part and lii components to get to that type of data so why do I call this transferring the online because actually what we did in a second part is we did this for a smart phone and the next set we developed something similar for a smart watch or a variable and we use that in schizophrenia study context and what we now focus about on is actually not for example the walking activities like we did here but we turn our view actually to the non-working activities when people are gesturing in different types of instances doing their housework and so on and and developed something which is called baby gesture count and what you can see then there is in a schizo in a schizophrenia context with participants with so-called negative symptoms meaning these patients are maybe more apathic or generally yeah also less communicative you see that and that is measured by a clinical score which is called the BNNS if you compare that to really matters from daily life then you see that patients that are have increased BNNS scores also show less gesture in daily life so this is really showing that what is there somehow assessed in a clinical with the context is also reflected by patients behavior in daily life and yeah so here we transferred to a different types of device by the solution and are getting different types of meaningful information and let's now go back to Parkinson's disease and show what that means there we are also interested in these type of head movement I already mentioned that there's this concept of reticinesia that patients slow down in their hand movement and and this is of course also something that might impact their daily life and going back to this type of passively monitored hand gestures we can actually show that we can measure that this high test retest reliability we see that there is a certain correlation with established clinical score in very early patients and then in the study context where we actually have a drug involved what we could show there is that the group of patients that is taking that specific drug for Parkinson's disease is actually slowing less of a decline of these type of hand gesture energy in this case compared to placebo patients which are not getting the drug so here you really see that by using an lii we get into the context of what is actually happening there and then we measure something that might be meaningful for the patient and then we can use that in a clinical study context to actually see what is happening in their life and how might the drug impact changes there that brings us to a very traditional field that's just taking features a lot of features and predicting clinic scores or classifiers and the different reasons why to do that first of all you can use it to understand really what type of different information do you have in your future space not everything you're measuring might be the same but a lot of might be in a disease context point of the similar thing so you need to understand the diversity of what you're measuring then the disease context and then of course you want to see which type of feature space is really predictive for traditional clinical assessment and which part might actually give you novel information and you can use that for example to develop test education scenes that satisfy different types of patients groups or and also to increase your disease understanding there of what type of problems do you measure there that are distinctive for different types of patient groups here is one example of how we do that so what we collect is for example continue our two-minute walk test data where we ask the patients to walk two minutes with their smartphone smartwatch in my disclosures and then what we want to assess there is how how disease are these patients so this is about their classification time of where we want to really see is there a difference in the gate pattern between healthy volunteers between mighty infected and as patients and more inferior than as patients and you can see here depiction of how we did that so there's a data exception step where we get a lot of features from these type of law standard data and then we develop different types of models and different types of cross validation screens very very important with here that you have to keep in mind these patients are doing that every day or every few days but you really need to take care that whatever cross validation you do that you do a subject by split and do subject by predictions and don't have this cross bleeding of information having some some of the subject in the training and another set of the same subject in the prediction because that obviously then can lead to a lot of overfitting which was a problem in the past in some publications having been out there and what can them see from there is that with different amount of features and different combinations of the smartphone and the smart watch and features from that you can really increase precision and accuracy in this type of context to really differentiate at the controls from might be impacted by this patient or more moderately impaired patients and I will come to back to this example at the end there we will use a deep learning solution to actually improve on these results. Another example is a broad shape test where we ask the participant to draw certain shades on the screen you can see that here and then the extract different types of features and the traditional clinic with tests for draw shape for hand impairment upper limb impairment and multiple scrollers is called the nine old pack test that's something they do in the clinic where you have to where you have a board with packs sticking in there and then you take a pack put it out put it into a bowl and then do that for all the packs and then afterwards you do the reverse and the time is measured how long it takes you for your left and right hand and then you take a summary score of how long this takes and that's one important upper limb breed automatic scrollers and what we did here is we use them our test to predict this nine or pack test time because as I said I mean we are measuring that basically daily so we can we wanted to see how a daily score of a of a surrogate for nine or pack test could then be more robust and then change over time and you can see that generally we are able to predict on a cross-sectional level quite well these nine or pack test times and can then use this type of clarifier for these type of activities so this is all for a single type of assessment but of course what might be actually really interesting if you start combining this type of information excuse me so what happens if you do different types test or in other instances you might have different types of standard data and then combine them and do classification or estimation of disease severity and a continuous level and we also developed some machine learning frameworks for that this is just one example meaning what we learned there is really what type of different information do we have in for example a balance that compared to the study test where you tap on the screen a gate test where you walk up and down a corridor a tremor test where you just hold the smartphone in your hand or you do a sustained formation test where you do continuous and then how can you combine this type of information really with different types of machine learning models into something that either distinguishes healthy controls for from in this case partner with these participants or how to predict their disease severity and one example of what you see here is where you use the type of which regression type of approach to see how does employing different variations there lead to better or worse accuracy and also sensitivity and specificity and what type of tests are there really important in that and also how many days of data do we actually need to use an aggregate there to really go to get a good accuracy and in this example you see for example that's collecting 10 days of data and then aggregate them in a meaningful way and put that all in the classifier really leads to quite stable results which is of course important because that might mean that your readout is not a daily readout but might be a 10 days aggregate readout which you can then follow over time which is still of course much better than having a clinical visit every few months or weeks and you can see here then a prediction of a clinical assessment of that disease which is called MDS cupidias and that's what we measure there is then matching quite okay given that the MDS cupidias is also not just a motor score but actually also measuring cognitive problems and other impairments in daily life and life anxiety and other problems which of course are difficult to measure with such motor tests as I have discussed here so this brings us to something many of you might be more interested in and that's what I call what term feature free deep learning to classification and that's one of these areas where I see tremendous potential of employing MLAI in different three different types of contexts in the digital biomarker space so I'm always mentioning this continuous data that we are collecting but what you have to keep in mind is that our clinical studies that we collect this data are usually only this not that many participants so we have to imagine studies between 50 to if we are lucky six, 700 participants and that means on the overall that is not a lot of data to develop deep learning solutions obviously what we get from these patients is a lot of repeated longitudinal data that's nice but it is still the same patient and what that means of course in terms of deep learning is that you are in this inherent problem of low sample size or low subject size sample is not the correct word because we have these repeated measurements but you have these low subject size to a big need to really have enough information to build these deep learning solutions in a really good unbiased way and that is what transfer learning is something that actually can help a lot meaning what we did investigate here is I mentioned the human activity recognition model that's the very beginning so a model which was initially designed to estimate different types of activities and we developed this the human activity recognition using publicly available data from healthy participants and as you have seen it was quite well can be used then also directly to generate quite a point but you can think that of course further and coming back to the other example that I showed before this two minute walk test you can stop thinking about okay are there ways to combine that and this is just one example obviously there are other ways and other other tests where you might actually do something so both type of assessments are inherently around movement but they have different things in mind and so what you could think first of all is rather than developing a feature-based human walk test classification scheme you can use you could develop a feature-free deep learning type of solution for that but given that in this instance you only had like 80 participants 80-90 participants half of them might disclose as patients that's not a lot for this type of center data to really develop robust deep learning solutions so what we have started to investigate there is then really a transfer learning approach where we used our previously trained human activity recognition model and then plugged that into another classification application around deep learning where we then wanted to use that type of pre-trained information to help us to actually classify better as it controls from mildly and moderately impaired as the multiple sclerosis participants in this two-minute walk test context so you have these pre-trained layers which are tuned to extract a type of feature in the convolutional layers that are important to distinguish different type of activities and you can of course hypothesize that these type of convolutional layer-based teachers can also be used to help in differentiating different types of walking patterns for multiple sclerosis patients and that's what we tried out here and what you can see then on the lower right is a comparison of four different types of solutions we did there so first of all as a basically ground-truse type of solution at this feature-based cluster support vector machine solution where you see the accuracy and the copper and then also the MF1 score which is not, I mean, which is okay but it is of course way from perfect then in the next steps you see what you get if you just do directly deep learning on the two-minute walk test data and you see you cannot really improve on what you got before might actually be worse might actually come to overfitting depending on what you do there and then you see two examples where we use two different types of publicly available data to train a human activity recognition model plug that into that other deep learning solution retrain it with part of the two-minute walk test data and then do predictions on the rest and what you see there here really is that you get a huge increase in all scores to see how good that solution is actually working so you really see that in this type of instance transfer that as a key to unlock new information from using deep learning and this is also, so this is first of all of course a nice result but for me this is strategically also something that is bringing us nearer to what I turned at the very beginning of what is actually the maximal information we can get out of such a two-minute walk test given the center data we are collecting there because that is the inherent question with a given feature space are we there because the features they are explainable we have designed them we have used our knowledge on the disease to develop algorithms to extract information out of the raw center data that we think is important in the disease context and you see from there if we think the features are important we combine these features and we get okay type of results but if we then use this type of transfer deep learning solution we get much better results really good results and what we can learn from there there's a gap in our feature space so we are not currently there at the maximum information content that we could get out of this raw data which is important because we want to get the best information positive so this is something which I propose really as something to develop in the future to really approximate maximal information content wherever possible and that brings me then to the last part of my talk and that's classification to raw center data with this deep learning solution there is now this development of new ways to unlock that black box and one way to do that might be layer-wise relevant propagation where you take your classification results go through the different layers back and then pinpoint on where is actually the important signal in your raw data and why is that important? I mentioned in the very beginning and also over the course of the talk explainability is really key for us so we really need to understand we have the good result coming from because that's what patients care about regulators care about doctors and other research are really caring about meaning what we want to have I mean we are happy with such deep learning solutions in the first place but they need to be either expandable or even better we can use them to pinpoint to the signals we have missed so far and then really develop dedicated traditional signal processing based feature algorithms to also extract that type of meaningful information and have an explainable available as readouts from this raw data and that's really one of the core principles I think we need to get to having this type of pipeline to extract maximum information content and optimizing this pipeline by using novel and AI technology tools which help us to go in the one direction but also go back to in the other direction and thereby transfer learning in different types of ways over the course of this endeavor between the machine and us so both of us need to learn and we need to use different types of solutions there to also transfer learnings from one disease to the other or from different related information spaces to our specific disease problem because human activity recognition for example was initially not developed for us obviously but is something which is used in different types of instances for sport watches or for just looking to different types of human behavior in a social setting and now we are using this disease context setting which is a very different approach but it's still something of course of high relevance and this is really where different types of research and community communities can join forces to decrease the necessary sample size to collect in these type of settings by pooling information using these trained layers convolutional layers in leveraging deep learning and then going back to pinpoint where the signal is coming from. So this closes basically the picture and shows again for me if you put all these puzzle pieces together you might be able with MLAI to actually differentiate healthy controls from what is called pro-dominal disease stages so disease stages where patients are not already classified as having the disease but they might show first indications of that and maybe with the central data combined with MLAI you can actually start with already differentiated controls from patients that soon develop a specific disease at the one point or in another setting where you develop them from that really sensitive measures that change over time and you can use them to really assess disease status and differentiate participants getting different types of therapies or drugs and that brings up to one last example from a Parkinson's disease study which is called Pasadena and from the year one readout and you see here what does it mean to deploy such a solution so there's also 360 patients and we collected there more than 30 terabytes of raw sensor data they did roughly a million tests there in different types of ways so not only hand turning but also tapping on the screen the tremor test, the speech test and other tests and then we turned all the data down using some methodologies I have shown here and beyond and all of that to actually make one decision and that is to see whether there are signs of drug efficacy and also in this context we developed from a feature space a kind of score using an early method and to build a proxy of certain motor impairments in Parkinson's disease that were hypothesized to change fast over time in that specific early disease context and what you can see here is on the left hand side you can see clinical study results from and the S.U.P.S. Part III reticonegia so that's really the clinician assessing the reticonegia problems of the patient and on the right hand side you see results from our digital score and you can see that in both instances the clinician seeing certain things in the epicynical visit and we measuring this high frequency in a high frequency type of setting where we aggregate data over two weeks intervals and then calculate the score all of that you see a difference between participants getting the drug and not and you see that with our approach we reduce actually these confidence bands and thereby get of course much more precise estimates of what is actually going on there and by measuring more we can also deploy different better statistical models in this case a linear mixed model running slope model approach compares to an NMRM on the left which cannot fully leverage this type of continuous data and in the way we can leverage it with more data being available and we can confirm thereby in an independent at home measured way what is seen in the clinic which is of course important in a clinical study context so as a recap coming back to my vision I mentioned we want to make get maximum disease information out of that and I hopefully could convince you that using MLI MOSFET is a way to augment traditional signal processing in each of these steps to actually get to this maximum disease information output and there was this open question of how do we actually get back from maximum disease information to send the data and as I already mentioned there if you use deep learning then you should unlock this black box and really understand where is it coming from and there is new development there which are important and which you should leverage and you could approximate maximum disease information by really assume that a proper deep learning model might achieve near the optimal performance and then you can compare it to what you have developed as traditional features a traditional classifier and see what is the gap and try to understand that as a summary I hope I could convince you that digital health technologies really offer great opportunities by creating a lot of continuous data that there is a fact that traditional signal processing methods have their limits because they are invented by us and we don't know everything and ML AI methods really are needed to augment the development of digital health technologies to maximize the data value and there really is many different ways on how to leverage ML AI in the course of this development but that might be of interest for you this is a very young and active field of research and it comes with its own set of challenges so first of all one of the usual problem is you have this low patient number compared to the complexity of the data one solution could be intelligent ways of transfer learning then which is also quite unique I would say is you really have this high frequency repeating measures and there are really limited solutions out there that can deal with these type of things in an optimal way because you usually assume you have one measure of the subject or of a probe or whatever and then you collect this data in an unsupervised fashion so you are lacking often the ground truth which is an inherent problem in itself and lastly as I mentioned I really firmly believe that our results from the Drubaimarkas need to be expandable in the disease context to subject matter expert regulatory agencies and very importantly to the patient and that is something we really need to take care of and shouldn't keep out of our mind when developing complex AI solutions that brings me to the end of the talk and means we have 15 minutes of questions left which is great thank you very much for an exciting talk and really glued to my screen here because it's very exciting to see that Rush has these topics on their radar and is working to transform AI and healthcare maybe I want to start with a question that came in over Slido it is on the role of sequencing data so we just read this off for you and then we'll handle some other questions I wanted to tell you I see the question oh you see the question okay perfect, perfect so the first question is how could sequencing data augment what we are doing in the Drubaimarka biomarker context to even make better models by having certain predispositions at the very beginning in the model and I think that's a great question because it really brings into these different aspects of how you can look into disease and I wouldn't even stop at sequencing data there are also other I mean there's blood-based measurements in general there's MRIs and other things you can do at a clinical visit to try to stratify your patient population at the very beginning and I think it really makes sense in the broader context to combine all this information into something that is really then tailored to really understand each patient in the best way possible so I mean that's a yes basically it makes a lot of sense but it's really early days I would say to do that and it's really disease dependent on where this can lead to additional gains in measuring them then the second question was could digital biomarkers be a stepping stone to prevent of medicine by detecting decreasing health long before diagnosis and what are our plans and that's a very good question so in the first place I'm working in a part of Rush which is all about drug development and clinical studies so what I've shown you here at the conference really about measuring in the first place developing methods that cannot mend traditional methods to have better ways to see how disease progressing and how drugs can impact that but of course there are many diseases where it makes a lot of sense to start really early on with these type of assessments if you just take Parkinson's disease then Parkinson's disease is I mean when you get your disease diagnosis you have already developed some kind of symptoms and that means a lot of things have already happened in the brain which are usually not reversible meaning that there's still a late point in time to start with drug treatments and it makes a lot of sense to try to move this type of yeah treatments early on the traditional point in time where you can identify the disease in the future so that's of course something which then needs to be shown but in that sense yes it makes a lot of sense to look into that it is of course difficult and in general that's something which is obviously I mean which is of interest for society and something which is not just done by one research institute or one pharma company but that's something where there's really an active arm of development in the research community looking into these type of problems in the cognitive and in the motor domain and there are a lot of different big and small players are trying their best to piece out more information so if you google for that there are I think a lot of publications in the last years exactly around that okay thank you thank you very much I think we also have a question in the zoom if I see that correctly so Giovanni can you unmute yourself and and ask the question hi thank you so first of all I found the topic really fascinating and incredibly challenging I have a general question probably I can imagine that there's an incredible amount of challenges for these technology to be fully implemented and one that I can imagine is convincing people to use it like adoption I can see a lot of people that would benefit from it for example not being very adept with smartphones or maybe being concerned with privacy could you say a couple words if you have any strategy or anything in mind to try to improve the adoption of these tools yeah no that's I mean there are so many different things you need to consider there about adoption I think first and foremost it's really about two things one is the data privacy component and the other one is really this usability type of component because both of them might hinder you to convince participants to really do these things in the way you need it for a long amount of time and in a chemical study context you are in a lucky situation in that sense that via the research side there's direct contact to participants so so it is easier to encourage participants to really do their best to collect the data that's positive and with the drug studies there's of course more incentives to get good data also for the patients than first to because I mean they are volunteering for getting drugs so they want to get the best information all of that and but the other component is if you want to use these type of solutions potentially also in the real world you as someone who is not in a drug study you might want to use that when you are forgetting disease for having a good conversation with your doctor to show them the data and show the physician what he's not seeing at a clinical visit and in these instances of course technology is, I mean has been developed is developing and all of that is really on ensuring data privacy and following GDPR really making sure that this data is secure and that I think what is really at the forefront of what needs to be kept in mind to really convince everyone of us that we collect this data because it's data which is telling someone something about disease so it's really private information and then you need to have the type right type of connections to whoever you want to speak with so optimally there are then developments where you have possibilities to send this data to your doctor in a meaningful way and not or also see the data aggregate for yourself in an understandable way so these are really difficult things for in a disease context because I mean you not everyone also someone who is studying at a university and then this maybe also very worse in biology and so on but you really have to to develop something that is to a certain degree food proof and also understandable for a wide variety of education but also different other backgrounds and then it needs to be deployed on solutions that are widely available in a consistent way so not everyone is using the newest high technology high-end smartphone but there are other less high-end solutions and then all of them need to work and then you also need to see whether the context of people in Europe might be different from people in the US and from people in Asia or in Africa on how to deploy these solutions so all of that is is a challenge and that's debarment I'm going to look into that thank you I think it's really interesting thank you we have we have time for for other questions as well I'm seeing that Deanne also has raised their hand in the chat so please and meet yourself yes thank you for the nice talk well actually my question was almost the same as the Giovanni I was wondering if you have any feedbacks on how open patients are to this type of new technology are they reluctant because they have to share the data as they are willing to participate but you kind of already answered it's a really good question in general and I mean just to share a few insights I already mentioned that I mean in the clinical study context we are very very positively seeing how this is adopted of course I mean one challenge there is that depending on the disease and the age group not everyone might have a smart solution for that so what we do there is we deploy usually also smart watchers and smartphones so we give it to the participants in this case that also helps us to ensure a certain consistency of the data but there's of course also a lot of bring your own device type of solutions out there if you go into the smart phone world rather than using dedicated sensors like sleep mats or whatnot and there are really a lot of studies over the last few years which which have deployed these type of things one really important study is the empower type of solution in the partner disease context developed by SageBioMetwork they could really see that a lot of people in the U.S. in this case healthy controls but also partner disease people were willing to test that out and do that and collect a lot of data tons of data also I mean what we developed I mentioned that this is floodlight open which is also a downloadable solution and then might scroll with conduct where we also see also without a lot of advertisement still quite a pickup to use that but as I also mentioned there is I mean it's really important to care about data privacy about anonymization of the data you have to think about ways how to make this the data then also available to the patients but also to research just around the world in a way that that that is fulfilling the different needs and that we are a lot of development is still happening and data anonymization is really playing a lot of big role I mean I mentioned for example that we are also collecting audio data speech data that is something which under GDPR is really something which which is really private information and so you need to really take special care to work with this type of data in an ethical way following I mean being really compliant and meaning if you go into these more open research type of scenarios what you need there and that's where MLAI can really play also an important role you need to develop for example in the future anonymization methodologies that on the one hand make sure that all information about the given participants is filtered out but on the other hand you keep what is important in the disease context still in the data and I firmly believe that where MLAI also will need to play a heavy role Thank you very much Okay, thank you very much We have another question I think in the zoom chat by Emesha I hope I'm pronouncing that correctly Please unmute yourself Thank you Okay, so I have a more let's say technical question in a sense So during this data collection process you encounter like there are sensory issues or you know these air patients don't use their devices so there is a lot of missing data and usually in papers there isn't much explanation on how this is dealt with and I would be curious if there are certain approaches you usually take or yeah Yeah, no Again, a really relevant question Of course, I mean there's I mean different ways of data loss that's called that way one might be the technical problem another might be patient's participants not being adherent enough and I mean first and foremost you need to find out what is the problem So is it the data problem? I mean, is it the family problem or is it coming from the patient? And so that's inherently important because that also means that you have to do different things to see what this missing data can do to your final readout in terms of missing at random causes of missing data So is it because the patient is it because the device was malfunctioning or is it because maybe the patient was so bad that he couldn't do the test? So it's really important to find that out and so that's one part of the answer then methodology wise there are also several solutions but steps in that first of all what I mentioned there is aggregation So the daily information in itself is fluctuating because you're good and bad days what we are interested in in your general disease state and a very important question for giving diseases how much data do you need to collect to estimate a good general disease stage? And that can mean just an example like we did here in partner disease where we aggregate data over the two weeks interval maybe don't need necessary data from every day but we need enough data otherwise we cannot otherwise this type of aggregate data is not stable enough and then we would say okay for this two week period there was not enough adherence for the given patient therefore the data is not there but that means of course you can increase or you can thereby already start a lot a lot of the missing data problem because you still collect enough data still you might lose some data and then of course it's the question of what type of statistical methodology do you use? Do you use implementation type of method? Do you use methods that inherently can deal with missing data in different types of settings with these linear mixed models or also mixed model repeated measures? They inherently can deal with a certain amount of missing data without doing any implementations because you have this inherent assumption of different data points being connected meaning that and a meaningful way connected and meaning you can use these assumptions to ensure that even if data is missing information is transmitted over the course of time basically but active field of research very important actually and yeah it is I think on the agenda of different research institutes but also for example there's something which is called Dyn which is a society around these type of digital health technology tools we are developing knowledge statistical methodologies also on their agenda of future research projects Thank you Thank you very much if no one else has a question I'm also happy to release every one of you into the break I think this went this went splendid Thank you so much for for this exciting talk it was really really a pleasure to listen and an excellent discussion we will reconvene here at 10.30 with a talk by Sepp Hochreiter on modern Hopfield networks so enjoy the break and thank you again Florian for this exciting talk I hope we can give you around a virtual applause maybe this works I'll also try clapping into the microphone let's see whether this does something Thank you so much and see you soon in the after the break Thank you