 but building successful collaborations with health data. Okay, thank you. So my name is Tempes Van Skijk. I am a machine learning engineer at Microsoft, working with healthcare data, just the usual disclaimer. I'm gonna be talking about my personal views, not necessarily the views of my employer. So I'm gonna talk about building successful collaborations with health data, because I have witnessed an increasing growing interest in working with health data and building machine learning projects with healthcare data. So I'm hoping to give some kind of advice and insights that you can use in your projects if anyone is interested in health data here. So we'll take a look at first understanding what makes healthcare data quite so tricky and unique and interesting. Then we'll look at this important point of data people like us relying on medical experts. And lastly, we'll look at some of the cultural differences between sort of data tech people and healthcare people and how we can better understand those differences and play nicely together in our collaborations. So first of all, let's take a look at what makes health data quite so tricky. So the most obvious thing that comes to mind is the sensitivity of healthcare data. And when we're working with this data, security is really important as we saw in the talk prior to this one. It's really important to work securely with that data, maintain the privacy of people in your data set. And often healthcare data can be tied to consent. So I've experienced working with the genomic data where someone had consented to have their data used for cardiovascular research and we couldn't use that data for cancer research because their consent had specifically outlined only one type of research. So that's another interesting aspect. Obviously the consequences of the decisions we make with healthcare data are extremely serious. So if we use healthcare data to train a machine learning model to determine how long somebody should wait until they get their treatment or how much we should spend on somebody's treatment or which treatment they should get, those are obviously very serious decisions. Data is not the new oil. There was this saying a couple of years ago that data is the new oil and I think it's very regrettable. Health data in particular is very, very precious and we're extremely privileged if we are, if we get the opportunity to work with healthcare data. And I've noticed this very acutely when working with clinical trial data where you feel very grateful for each participant in your trial and you really treat that data as a precious gift, I suppose. And in fact, for the average clinical trial, phase one registered on clinical trials.gov that includes around 75 patients where each one costs $15,000 to recruit and to look after on the trial. Continued, more things that make data tricky. It exists in a non-digital form very often. And so we can't do exciting machine learning unless it has been digitized. And that goes on to the quality of medical data. When it is digitized, it's often missing or poorly captured. But if we think about why this is the case, we may be a bit more sympathetic. So as data scientists, it's frustrating to have missing values, but imagine a nurse and a busy ER, their priority is not entering data correctly, it's to save lives. Now, labeling health data for the point's purpose of training is particularly difficult. When we have images of, it's easy to label them as cat or dog, but when we have a panel of healthcare data, it's very difficult to define what makes someone healthy and someone not healthy. So that labeling is often one of the most difficult parts of a health data project. And one of the only health outcomes that everyone can agree on is whether someone is alive or not. DEXA requires a huge amount of expertise. Health data requires a huge amount of expertise to understand and we can't just kind of walk into it and get going. So here I'm showing that there are a huge amount of published machine learning algorithms in existence. And there's a smaller set of those that are actually useful algorithms that solve a useful problem. And then an even smaller set of those are algorithms that make it into medical clinical practice because there's a lot of barriers along the way, a lot of regulation, for example. And then even smaller set is algorithms that are available in clinical practice and are actually used. So the last time that can often be usability and user experience. I've had an experience where I was in a surgery wearing scrubs and I handed a device, a medical device to a surgeon and they kind of tried it out, didn't like it, gave it back to me because it wasted a couple of seconds. And they just don't have seconds to waste. And when you're designing apps for clinicians, if they're used to clicking once and you make them click twice, they may not actually use it because their time is very, very precious. So we've had a look at what makes medical data and medical algorithms so tricky. So now I wanna talk about how important it is for data people like us to rely on medical experts. So I would say that we should be aware of health data publications that have no health experts on them. And I've seen this even in journals like Nature where computer scientists will get hold of some real medical data, do an analysis and publish their conclusions without ever consulting with relevant health care experts. And some of the health care experts that I've spoken to have been really, really shocked and disappointed because the results are not right. The computer scientists just did not have the context and it's embarrassing. So let's all avoid doing that. However, that doesn't mean that you should just get a doctor on your publication for the sake of it. So this is a kind of a common question that gets posed to me by people who are starting their health care data projects. So they said, how do you recommend collaborating with doctors when the time comes to write the actual paper? I made a network that detects tumors but I'm not sure how to split the paper with the doctor. Okay, so this is the wrong approach. You don't just include the expert at the end. It's really important to have that domain expertise at the beginning. And another thing I would say is to make sure you're solving a real problem. And if you had collaborated with a medical expert from the beginning, you could have checked. Do they need a network that detects tumors? Is that a real problem or one that you kind of just assumed or made up? So it's important for us data people to collaborate in a multi-disciplinary team. It's not just us data people. There's also, so there's data scientists, there's software engineers that actually puts the models into production, maybe user experience designers and product developers. And these are the kind of tech people on the left. Then we have clinical researchers. They might be in a university or maybe in an R&D department of a pharmaceutical company. People who play a data governance role, maybe regulators like the FDA or the institutional review board of the university. And then on the other side of the wall, the healthcare professionals and patients who are all stakeholders. And I've drawn them on the other side of the wall because often as tech people don't have direct contact with patients or even doctors. We often go through some sort of clinical researcher. I'm gonna just highlight these stakeholders. So I'm gonna lump them as sort of the tech people and the medical experts. So when entering a collaboration, it's important that data scientists and software engineers are humble and respectful of domain expertise. And this has been quite interesting during this coronavirus pandemic to see that suddenly everyone is an epidemiologist. So software engineers and data scientists get hold of some code, run some models and then have strong opinions about epidemiology. They sometimes go against the experts that have been in studying this field for decades. And it's important for tech people not to judge a fish by its ability to climb a tree. So when we enter into a collaboration, if we judge our collaborators, the medical experts on their use of technology and fail to see the expertise that they bring to the collaboration, that is a big failure. So I've noticed also during coronaviruses have been a very world-leading epidemiologist who has had his code criticized very publicly by software engineers who are saying that his whole epidemiological model is wrong because they don't like the look of his code. So he has a screenshot of a tweet saying that he should have used expressive class names. They don't like the way his arguments are passed. He has amateur div traits. The code has a lack of modularity and so on. But that doesn't mean that he's not an expert in his field and someone that we can learn from. So that's not the right attitude to go into a collaboration with. So I'm gonna mention one project I've worked on recently with health data and this is called Project Physio. Physio is an ongoing study of children with cystic fibrosis. Cystic fibrosis is a very serious genetic disease, mainly affecting the lungs and children that are affected. The main way that the disease is treated is to do these coughing and exhalation exercises to clear their lungs of mucus and to actually cough up the mucus every day. And it's a very demanding kind of physiotherapy to do. So we have a collaboration between Microsoft and the UCL Institute of Child Health and Great Ormond Street Hospital where a team at Microsoft developed custom sensors, custom pressure sensors to go into this device that the children do their exhalation physiotherapy into and custom video games as well. And the pressure sensor actually controls the characters in the video games. And our hope is that gamifying this physiotherapy will make it less of a burden on the children because some of the children actually dislike their physiotherapy more than they dislike having this very serious disease. And so my team did the kind of data analysis of this study and set up the pipeline for processing this huge amount of data, huge amount of time series data that's coming out. And what I've shown here is sort of clusters of breaths with the child as they're breathing and then pauses to cough and then does their breathing and so on. And I think this was a particularly successful collaboration because the clinicians we worked with were highly engaged in the data science. So as well as myself as a data scientist, we also had software engineers that helped build the pipelines and put this into production and experts being experts in child health in cystic fibrosis and in physiotherapy. And they actually attended all of our engineering daily stand-ups and all of our weekly meetings too, which is incredible. I've never had such a huge amount of engagement from clinicians and that was great because we were able to involve them in every decision about every data point that gets excluded or every feature that is used or not used in analysis. So now I'm gonna talk about some of the cultural differences I've observed between tech people and medical experts and advice about how we can all work well together. So one major cultural difference is how we view technology and risk and progress. I think as technologists we think of new technology as something that will improve our lives, something exciting, something to work towards as a sign of progress. But I did a project with a medical regulator that whose concern is patient safety. And they had a very different world view which was fascinating. They, when they approve a new medical device to be used, they just see new accidents that will happen, new ways that the device could malfunction or be used in unexpected ways. And they just see a kind of a flood of new accidents and new harms happening. So they want to keep patients safe. And so that's a very different world view which I thought gives quite an interesting insight into the way that this community thinks. Move fast and break things is a phrase that was popular in tech and startups. However, you can see it's so different to the medical community where, which moves very slowly, very cautiously and does not want to break anything, especially healthcare. So how can we be good collaborators? And so I think medical researchers should be engaged and curious and open to learning, trying to do something in a different way, trying out new technology. Ideally able to commit time. And it's very precious, but the more time they can put in the better. I'd like to see more medical organizations investing in in-house data skills. So I don't mean doctors learning Python. I mean, medical organizations or indeed departments hiring data scientists and data analysts so that they can do things in a different way and take ownership of the analysis rather than collecting data and giving it to another organization to analyze. And then finally, I can say this having spent time in academia not letting perfect get in the way of progress. So often researchers are kind of overwhelmed by the complexity of the disease area that they work in. And I feel like engineers are quite good at coming in breaking down a problem and starting to build something simple first, just to get going. For the tech people, I would say that we are responsible for using data in an ethical way. So it's extremely valuable to have lawyers, sociologists, philosophers who are experts in ethics involved in this conversation. But it's also crucial that data scientists have a seat at the table because we understand the data and the algorithms and the limitations of them and what they can do better than anyone else. So it's really important that we're actively involved in this discussion and that we flag anything in our work which does not seem ethical. That's really is our responsibility. We should be humble, respectful and relying on domain expertise. Engineers in particular should not be judging a fish by its ability to climb a tree. We should be inviting researchers to meetings and planning documents and write into our code repo so that they see what we're doing and are involved in our decisions. They should be patient in a tech phobic environment because we understand why the environment is conservative and cautious. Data scientists should be prepared for messy real world data and they should be prepared to balance statistical or machine learning metrics with what the clinician wants and what makes sense to the clinician. And data scientists should be obsessed with building something useful. So hopefully that overview was useful and hopefully what you take away from this talk is that medical health data is tricky for a number of reasons that data people must include medical experts in their projects from the beginning and that you can expect some cultural differences between collaborators on a project and if you recognize them, hopefully we can all play our role in collaborating nicely together. Thank you. You can find me on Twitter and you can find out about my team at this year on Lyft and about our open jobs too. Thanks. Thank you for that lovely question.