 Okay, so good afternoon everyone, as Carsten just said, I'm Masha and I'm an MFBM ESR under the supervision of Antonio Artees at the Carlos Tercero University of Madrid. In this talk, I would like to present the work I've done during the past three years. First, I will talk about the concept of digital phenotyping, its promises when applied to the mental health field and the challenges that working with such data sources present. And then in the second part of the talk, I will show you some applications that we worked on where we try to predict mental health outcomes based on the digital phenotypes. So let's start by defining the context. So the use of digital technologies to better understand individuals and groups or even society itself remains quite a key topic across different fields. And as a result, several terms have emerged in these fields that sometimes generate a hype or fear and overpromise things. For example, the terms of digital trace, digital footprint, which are common in humanities, social and computer sciences, arose alongside the digitized or quantified self as well. And more recently, medicine has introduced the term of digital phenotyping, which basically appears to promise the ability to track biological, physical and behavioral traits using smartphones and fitness trackers. And well, digital phenotyping promises significant benefits when applied to the medical field. For psychiatry, for example, which has up to now mainly relied on episodic reports collected on site with the clinician during appointments. Digital phenotyping offers this powerful approach to systematically detect behavioral states of patients, maybe even subtype current heterogeneous diagnostic categories and measure outcomes. For neurology where measuring cognitive tools is quite cognitive function is quite expensive. It offers an inexpensive ecological assessment tool in real world setting. So as it delivers this, let's say rich data source. It opens up a new field and new possibilities to reconfigure the delivery of healthcare both for the patient from the patient's perspective and the perspective of doctors. So the workflow of digital phenotyping consists of several steps which can be broken down to two main things. The first part is acquiring the data and then analyzing this data. And as these data sources are very complex, multi-dimensional and temporal, the, let's say, most more suitable or, yeah, the most suitable technologies for them will be machine learning applications. Yes, these technologies, even though sound very promising, face a series of challenges which can come from either the data collection or the data analysis perspective. From when we talk about data collection in digital phenotyping, there are two categories. There is active data collection and passive data collection. Active data refers to anything that requires the user's input. For example, filling in questionnaires, leaving diary entries or rating their sleep. And passive data is generally anything that is collected by the sensors of the phone or any device without requiring an input from the user. Now, the problem with active data collection is that users tend to, like, consider it a burden over time. And it's very hard to keep them engaged in the studies or in a clinical setting. Then, of course, there is the issue of privacy and data security. Some of this information that is collected can be directly linked to the person, such as their phone number in some cases or even their location information. So these are things that have to be handled using corresponding techniques. And then when we go to the data analysis perspective, we can see that since we are using different sensors of these devices, the data is very heterogeneous. We have real value data. We have categorical information. And also due to sensor failure or other collection errors, sometimes the values that we are recording are outliers, like you can see, for example, on the bottom left plot for the step count. Sometimes we end up with values in, like, two millions, which, well, would be quite impossible for someone to perform two million steps in a day, right? So these are challenges in the data that we have to deal with. And also, there are a lot of missing observations, which can be due to sensor failure or behavioral aspects of the patient. Like if someone, I don't know, turns off their phone for the night or for the weekend, then we don't collect data for a very long time. But we have no way of knowing why this is happening. Okay, so for these in mind, I would like to go to the second part of my talk where I will show some applications. So during my PhD, I basically worked on three main topics in mental health. One was mood prediction. One was anxiety prediction and the last project was functionality prediction using digital biomarkers. And the data that we used for this work comes from two ongoing clinical studies in Madrid. And for the Mobile Essence Data Collection, the EB2 tool is used while the clinicians use a tool called MeMind to record information about the patients and also the outcomes of their, like, for example, questionnaires or evaluations. So in the first project, we worked on a generic machine learning based approach for emotional state prediction using only passively collected data from mobile phones and variables. And as a target outcome, self reported emotions by patients. This topic is quite important in mental health because changes in emotional state or long periods of very negative emotional state can be indicative of worsening. Like disease evolution or like, yeah, for onset of depressive phases, for example. And if these are caught in time, then we can intervene and help patients before things get very hard. So for this project, we used the listed data sources, so step count, distance indicators, whether the patient practice sport during the day or not, how much they sleep, how much time they spend at home or using their phone. And then the self reported emotions, and we had a cohort of about 940 patients that had at least 30 days worth of data recorded at that point. And also, well, we wanted them to have at least recorded once an emotion. To represent the emotions on a lower dimensional level, we used these two dimensional mapping that is commonly used in psychiatry, where we map the emotions based on their balance of whether they are negative or positive and based on their intensity, which is called arousal here. And as you can see on the plot on the left, in the data set, most of the reported emotions belong to the negative category, which is quite expected with mental health patients. So in the overall data set, we had over 170,000 entries or samples. And as you can see on this graph, the data was quite sparse. There were sometimes long chunks of missing. Overall, it was about 80% of the observations was partial and it was around 5% only. That was complete, meaning that for a day, we had information for a specific patient for every variable that we are tracking. And moreover, the emotions, because it was self-reported, so the patients were not forced or coerced in any way to introduce emotions. Well, some patients tend to do it more often, right, that like to keep track of their mental health, others maybe less, and we ended up with only about 10% of the data being labeled. So even though we started with a quite large data set in the end, we lost quite some of it. So to deal with the missing data, which was the first challenge we had to tackle here, we decided to use generative models, namely hidden Markov models and mixture models, which learned the underlying patterns in the data and are able to like learn even the distributions even in the presence of missing data by marginalizing out with regards to the other variables. And we trained these models in a semi-supervised setting because we wanted to enforce the states of the model to recognize the different emotional states that we were considering. So this means that some of the states of the HMM would always emit negative emotions and some would always emit positive emotions, for example. And this way we managed to link the emotion better with the behavioral data. And after we trained these models, we used them to impute the missing values before performing classification. And how we did this was that we found the most probable state sequence for our observation sequence for a patient based on their behavioral data, leaving out the emotion information and then generated samples from the most probable states to fill in the missingness. Moreover, we also experimented with including the latent representation provided by these models as additional features for our classifiers. And we tried different temporal and non-temporal models. So we compared classical machine learning applications such as logistic regression support vector machines with RNN based temporal models. And we also defined a hierarchical regression model to try to better capture the individual differences between patients. And yeah, it's a short summary of the results. So we achieved, in some cases, larger than 70% area under the rock curve in the three class and five class problems. And as you can see, including the posterior significantly increased the performance. And also, as we found that accounting for these individual differences using the hierarchical model, we could gain a bit on performance, but the results were comparable to these. And then in the second project, which was a collaboration with psychiatrists from the Mount Sinai in New York, we aim to evaluate the usage patterns of social media and communication data from patients during the COVID-19 lockdown period in Madrid. So it was a short, relatively short period. And we wanted to relate this to their anxieties in terms that were measured using the seven item general anxiety disorder scale. And we had a small cohort of only 95 patients here from the mobile essence data site. We focused, as I said, on social media and communication app use. And alongside these we wanted to include other covariates that were relevant for anxiety in this situation. Such as information about whether the patients were exposed to COVID if they were living with someone else who was an essential worker. And if, yeah, like the perception of threats regarding their jobs, like losing their jobs and so on. So when looking at the data, we could see already that there were different patterns emerging in the temporal data and there was also quite a non-uniform distribution in the other features that we considered. And given the small data set we wanted to keep it simple and try to find a way to like combine this multi-model data into relatively easy and interpretable model. Yeah, so as I said here the target outcomes were the GED7 scores that the clinicians recorded via phone calls after the lockdown. And this is basically a seven item questionnaire where patients score how they feel in certain situations. And we defined the cutoff at 10 indicating like based on literature because then it's kind of a diagnostic value for screening. So, yeah, like agreeing with psychiatrists they seem to be a good cutoff. And we applied a simple pipeline where we reuse this idea of having a hidden mark of model modeling the temporal data. But here because we wanted to reduce the temporal data into a feature vector we actually aggregated the state posterior probabilities that we found for every sequence and concatenated that with the covariates that we were considering to then perform the classification with a simple linear regression. We tried more complex non-linear models but they improved like there was no improvement so this also gave some explainability. Another interesting thing that we found was that the three state HMM captured like quite interesting patterns in the data. So, yeah, I don't know how well you can see what's up there but we found that for example state three decoded the parts in the data where there was a low communication app use. And I think average, yes, like average social usage while the other states were capturing the more extreme values and when decoding the sequences we saw that state three actually was usually linked to those periods where there were no observations for a more consecutive time. And then I still have one minute so I will quickly wrap this up. So the last project I've worked on was trying to predict declining functionality. And here we also wanted to opt for multimodal approach where we combine the temporal observations of mobility descriptor variables and some social demographic information. Here, talking to clinicians and following their advice we included covariates such as patient age, gender, their employment status and cohabiting status. And yes, so again, we have the same overview of a data with a lot of missingness. Here the temporal sequences were represented as half hour windows for every feature so basically a day is captured in 48 slots of these windows. And well, our outcome of interest was the mobility domain of the World Health Organization disability assessment schedule, which asks patients if they had difficulty performing a given set of tasks in the past 30 days and then they get an overall score, the higher the score, the higher level of disability they have. We again categorized this and here the cutoff is at 25% where we consider like the healthy cohort people with less than 25% disability and everyone else was put together in the impaired cohort. Most patients only had one entry, only about 400 had two, but out of these 400 patients only 50 had actually a change in the score during the period of the observations. So it was a bit tricky. And here we applied deep learning pipeline where once again we used the valid working agent approach to impute first the missing values. And then once we had the imputed temporal sequences, we work with a pipeline constructed of basically some LSTM layers and the tensions. We included the attentions because we wanted to gain some insight into the important patterns for patients and we wanted to be able to compare what are the temporal patterns that the model would pay attention to more. For example, for a healthy or an impaired patient, and then well after we extract these like encodings or embeddings of the temporal data, we concatenated it with the socio demographic information to hear the predictions. And our preliminary results were somewhat low, but promising. We tested the model on a different cohort of patients and it did not manage to generalize very well. But as you can see, for example, on those plots, the important patterns in the data like variety between patients and this is something that we still want to look into. Yeah. Okay. Yeah, I had to like sentences about the ethical problems with this kind of research so we have to keep in mind that we are talking about the mental health of patients and this is kind of a field that can easily be abused. And there are certain aspects that have to be considered for safety and yeah. But as I'm running out of time, thank you very much. These are my collaborators and yeah. Thank you. Like to start. Maybe I have. What is a race tent. No, okay. I'm starting to hallucinate. The lab is working a lot on another type of time series ICU data sets from intensive care, which seems very similar at least on an abstract level to what you're modeling. So have you thought about applying these techniques to intensive care data sets like mimic or EICU? Well, actually, in a small experiment during my second month at Siemens, I tried using the HMMs for electrical record data. So it was lab measurements as well and yes, they did not seem to work that well, especially in the case when the time between measurements is very large. And like, of course, the different, for example, lab values there were collected at different time intervals. Here the time intervals are more more aligned because we have this summary of daily or half hourly behavior. So that makes it a bit easier. So these, these times years are quite nicely behaving. They are regular and not so. Yes, like frequency wise. Yes, even though there is a lot of missing, but like, yeah, in theory, we have always these 48 times lots of. Yeah. Good. Is there another question, comment. If not, then we thank MSHA again for her talk.