 Brilliant, so my name is Wilhelm, I work at Babylon Health and it's a real privilege to be here, so thank you very much for having me. So to provide an affordable and accessible healthcare to everyone on Earth, that is the mission of the company I work for. And it's been a real privilege of the last two years to spend the majority of my waking life trying to solve this problem. And when I joined two years ago, we were a really small company, there were like 40 of us, and we really kind of honed in on the accessibility part of this statement. Because at the time, if you were sick, you would somehow have to physically transport your sick body to some health epicenter where there's a high concentration of healthcare professionals that can help you with your illness. And in an age where everything from information to cat videos was highly, highly accessible, it just seemed well within our reach that we could make healthcare a significant degree more accessible by leveraging the internet, smartphones and modern technology. So we went for it and in July 2015, Babylon launched their answer to this accessibility question with their telemedicine application that essentially allowed people to have a GP consultation with the doctor over the phone and the whole kind of healthcare service that goes around that. So after the team kind of recovered from literally doing like blistering hot fixes while our CEO is stepping up onto the stage at the Royal Albert Hall, everyone relaxed, pat themselves on the back for making healthcare a significant degree more accessible. But the affordability part is the tricky part because two-thirds of the costs of healthcare go to the salaries of doctors, nurses, healthcare professionals, etc. So we need to figure out how do we solve this problem. If we want to make healthcare more affordable, we have to somehow reduce the cost of healthcare professionals. So one way we can do this is by simply just increasing the number of doctors. And that seems like a kind of reasonable approach. But some people estimate that we are 5 million doctors short from providing affordable and appropriate healthcare to everyone on the earth. And not just that, doctors are expensive to train. They take a long time to train. It's a very difficult training process. So this approach just isn't going to work for us in the short term. So affordability is really tricky. We had kind of made good progress on accessibility and we just kind of put our heads down, fixed bugs, added features and just kind of went on our merry way. But then one day Ali, our CEO, watched the movie. And ever since that fateful day of the movie, this question came up again and again and again in meeting after meeting after meeting as this kind of idea was starting to take shape in his mind. And they say it takes 20 times of repetition before something starts to sink in. And after his kind of relentless repetition, this idea started to kind of take shape in the imaginations of the engineers as well. And the idea is basically quite simple, right? So we know we have lots and lots of doctors short in the world. And doctors are just inherently difficult to scale. But we know something that's easy to scale. We know that machines and software and computers are really trivial to scale. So if we could just figure out a way to kind of put some of the doctor's brain inside of a computer, just one computer, then we can just scale it really easily and then iterate on that. So this was kind of like, you know, it was a really kind of daunting idea and how are we going to do this, medicine is so complex. So before we did anything, we built a demo. We wanted to see kind of how people would take to this idea, you know, how people would feel like sharing, you know, their health care concerns with an app. So we built a demo. So I was in the IRS team at the time when we built this demo. And while I was building this demo, I kept saying to Ali, like, Ali, I don't know how we're going to build this. I don't know, like, the stuff that we're kind of suggesting we can do in this demo, like, I've got no idea how we are going to do this. But the reception of this demo was so amazing. People loved it. People were so encouraged. There are people thinking in this direction and the implications it has for health care around the world. That was just that, that gave us kind of enough motivation and inspiration to not knowing how we're going to do it, just kind of step in and see what we could do. So I'm going to take you through a bunch of the things we did and the challenges we had. At the end, I'm going to talk just a bit about the role that Python played in this whole journey. And at the end of each section, I'm going to just share with you some of the lessons we learned because some things we did, we were like, oh, wow, that works surprisingly well. And lots of things we did, we were like, wow, that was a really bad idea. So let's start with medical triage. For those who don't know what medical triage is, medical triage is essentially you arrive somewhere with a set of symptoms and someone figures out what you should do. So should you go to the hospital, go to the pharmacy, stay at home, you know, monitor for a couple of hours, like not kind of, it's like the step below diagnosis. So just kind of suggesting what you should do. So we knew we had to do this. We knew we had to be able to take medical queries and suggest to people what they should do next. And building this medical triage system was a real roller coaster ride because we started off just being like, oh, you know, it's just a basic decision tree, you know, we just kind of build it up this tree, we get some doctors to help us, and then we just kind of walk down the tree until we get, there's no more questions to ask and we can, you know, give them the outcome at the end of that branch. But then it's kind of like, you know, we dipped the roller coaster and it was like, oh yeah, but this question is way more important than this question. And when these two questions are answered in combination, they kind of nullify each other, and our minds are just blown with the kind of complexity of this kind of medical sphere. And then we kind of like come up again because we had some really good data scientists who helped us kind of like, you know, blow by some of these problems. And we built it. The first, because essentially without kind of going into the details of how we wait questions and that kind of stuff, it is just a decision tree that you kind of walk down with some kind of intermediary logic in between. And we built the initial version in PHP with MySQL and it just wasn't working for us. Like it was clunky, it was difficult to change. Some of the changes were quite risky. So we scrapped the whole project and we rebuilt it with Python and a graph database. So this is a picture of the CMS that we used for this graph database. And it's a small kind of corner of this very, very large decision tree. So why did we choose a graph database for triage? I would love to say that there was no hype involved in our decision, but the truth is that at the time graph databases were all the rage. Like even now there's a lot of hype about it and it's solving interesting problems and we were keen to get our hands on this technology. But there are some good reasons that a graph database is applicable for this use case. So in this decision tree there's a massive sense of hierarchy. There's branches, subbranches, subtrees, et cetera. And this kind of data model is normally a good probe that maybe a graph database will be applicable. The next thing is that a graph database has all the semantics we need to reason about a decision tree just out the box. Like that CMS that I showed you, we didn't kind of build that GUI ourselves. Those are just graph database semantics where you have nodes, relationships, relationships have directions, all these things that we have in our data model. So reasoning about, we didn't have to kind of build this mapping from a relational database to our decision tree. And because the semantics kind of mostly match one-to-one, stuff like some of the tooling that comes out the box was really, really helpful without a lot of work on our side. The CMS being an example of that. So what are some lessons learned? The first thing I would say, and this is kind of just true for all software technology, is new technology is trying to not get carried away by the hype. Because when these new technologies come out, there's lots of attention, which means there's lots of hype on Twitter, et cetera. And if you make an uninformed decision on behalf of your company, it can end really badly. The second thing that I want to say is that, and this is specific to graph databases, don't get carried away by the cool animations. Because I've never seen a technology that can sell its technology so well with such really cool animations that looks like it's solving problems that you can't even imagine solving in your kind of traditional way of working. So when should you use graph databases? Now, the single most important question that you need to answer when considering whether or not to use a graph database is the following. Are relationships the most important part of my data model? Because if the answer to this question is maybe, or sometimes, or I suppose so, you're probably going to have a bad time. Because if relationships are the most important thing of your data model, they are just so much that you can do with a graph database. Now, the reason this question is important is because in traditional databases, like relational databases, relationships are represented or handled with joins. And joins can be expensive. And I use can there tentatively because there's many kinds of joins, there's hash joins, and merge joins, and index joins, and other joins that I'm sure I don't know about. And databases have done a lot of work to optimize these processes. But to get the most out of it, you often find yourself writing very specific queries that makes your data model kind of quite brittle. And the integration with your database quite brittle. Where with graph databases, because relationships are first class citizens, as long as that remains true, you're going to have a wonderful time in graph databases. The second lesson we learned is that data modeling is extremely important. And this is very true in no-SQL databases, but especially true in graph databases. And we unfortunately learned this the hard way. So we did our medical triage with a graph database. But because that worked so well, we also decided to use a graph database for our chatbot data model. So what you can see here is a typical thing that you would find in our database. So the red node is a user. The green nodes are conversations. And the blue nodes are elements inside that conversation. An element being basically everything that's in the conversation that is mostly messages between users, as well as conversation status updates and stuff like that. Ignore the purple nodes. We'll just call those hacked nodes for now. But what's wrong with this? So the first thing that's wrong with this, and you can't see it from the image, but the node most closest to the conversation node is the most recent element in the conversation. And at the end of the chain is the one furthest and the furthest in the past. The reason why this is bad is that every time we add new elements to the conversation, there's multiple kind of operations happening. We have to delete the one relationship, change the status of the other relationship, then recreate another relationship. And it's just a really heavy and error-prone operation that happens. And not only that, it exposed some bugs in Neo4j, which at the time they fixed this, but at the time when you deleted relationships, it didn't clear up the disk space, and because we were deleting relationships like crazy, eventually our disk space got used up. So what else is wrong with this? Also, you can't really see it here, so the red nodes are users, and initially we wanted to represent the Babylon user, like the chatbot as a user, because it felt like correct. So everyone who comes into our chatbot, initially they'll be chatting to Babylon, like the chatbot user, and sometimes like a doctor will come into the conversation. But what this meant is that because most users were talking to Babylon, it meant that the Babylon user node had a relationship to every single conversation node. And what this means is that every conversation node in our entire graph database was only one node away from every other conversation, and in graph databases this is called the super node, and the super node is like really bad. It's really bad for performance, and there's a bunch of ways you can do to get around this, but none of it feels good, and you really just don't want to go down that route. And all these kind of things we learned, like in production and stuff like that, which you really don't want to do, and we are way, way more hesitant now about adopting new technologies and making sure we do kind of proper investigation before we dive into something. So another reason why data modeling is important is because in graph databases, migrations are not trivial by a long shot. So what you see here is this should be one linear chain of elements in a conversation, you know? Just you said this, Babylon said this, you said this, it should be one chain. But what you can see is that it's just craziness happening, like nodes responding with themselves and stuff like that. And this is a result of a Cypher query. Someone committed a change to one of the queries and it just messed up the way that we were adding new elements to the conversation, and it was causing problems for users in production, so we had to fix it. Now, writing a migration script that fixes this is not easy because it's indeterminate and it's just really tricky. So you really want to pay a huge amount of attention to your data models. You don't ever find yourself in this kind of situation. So the third thing we learned is basically that the grass isn't always green on the other side. So the medical triage is perfect for graph databases, but chatbot not so much. So we're basically migrating the chatbot data model over to Postgres. And as we're kind of coming back to Postgres, it kind of feels anonymous to the protocol sign returning because we feel like we've kind of been over at Neo4jland promiscuous and Postgres who's just always been solid for us, hasn't done anything wrong and we're returning now and he's just greeting us with open arms. So stuff that we usually felt restrictive now feel so amazing, stuff like constraints on your columns, they're just like blowing our mind, stuff like familiar querying strategies and sharding that is easy to do in relational databases. Database migrations are like amazing. All the tooling that comes around it, like SQL Alchemy and just the general tooling around Postgres and relational databases in Python, it's just so great to come back to all that stuff. So yes, the grass isn't always green on the other side. Okay, so we have medical triage. Brilliant. We launched it, it did really, really well. We pitched it against real-life A&E nurses and doctors and it just did really, really well. We were really proud of it and we kind of felt this massive sense of achievement. But as Ali reminded us, we were still a long, long way away from where we wanted to be. So we knew we had to build a chatbot and we have since found that building a chatbot is really difficult and the reason it's really difficult is because natural language is really difficult. Natural language is like one of those problems that you approach the problem and you just imagine this is going to be hard. You have this idea in your head this is going to be really difficult. But as soon as your skates touched the ice, you realize you've totally underestimated how difficult this is going to be. And the reason it's difficult is because the scope of it is just so huge and you enter into it, you start solving one problem and you realize, oh well this means this and this means this and next minute you have you're trying to fit the entire kind of semantics and linguistic rules of and all the things that happens behind the scenes into your head and trying to solve that. And the biggest problem we had to overcome initially was figuring out how we could reduce the scope of natural language understanding to something that we could take a bite at and ship the production and serve our users. So what did we have to do? So the first realization we had to make is that we are not interested in helping you decide whether or not you should grow a man bun. We aren't going to help you with that problem. We want to hone in on focusing on helping you with medical queries. So that's over there. Over here we have a set of services. So we have this very well performing medical triage. We have a bunch of content about different diseases and this massive library of content. We also have doctors who are ready to if our machine can't handle your query to handle your query. So we have this very kind of contained scope over here and we have these services over here. So when we put it like that, all of a sudden it becomes a machine learning problem because we have this infinite space of inputs and then we have this kind of set of defined outcomes. And this is kind of called intent recognition. We wanted to take the inputs and map it to some output that we could do. If the mapping was not good, some generic fallback that would kind of redirect you to ask us medical queries. So if it's a machine learning problem, don't we need data? And medical data isn't easy to find. But we were really lucky because we already had this system running in production called Ask in our telemedicine application where you could just send a single message to a doctor and he would reply in a couple of hours with some kind of medical advice on what you should do. And the doctors and the nurses who were doing this had the good sense to tag each question and outcome. So we had thousands and thousands of these queries. So by the time that we realized we wanted to do this, we had this really, really rich data source ready for us to train on. So rich data source, some machine learning scientists and engineers, that's all you need, right? Not quite because the inputs that were coming in were still free text. And free text is just inherently unstructured. So we had to figure out a way to take free text and then represent it in a way that our models could understand. And our machine learning guys did a bunch of work building multiple different classifiers to transform the free text into slightly different combinations, some were just word-in-the-bag, word-to-veg things, other things were like medical entity extracted concepts passed into models. And at the same time, they did some classifiers around our content and our outputs to make those easier to map. So essentially what they are trying to do is take a sentence and then put it somewhere on a map. And then the hope is that sentences that are very similar will kind of cluster together in this map. So it's very similar to a point on a map on some kind of 2D map. But the difference is that these maps were 380 dimensions. So this is like, you know, a simplified kind of set with some test data. And what you can see is that in the top left, those red ones are all about your neck or something like that. On the right, we have some ones about hands and foot and stuff like that. And in the middle, those queries are like ones about urination and pooing. I don't know what the politically correct way to say that is. But it's unclear to our models, like is this about digestion or is this about your abdomen or reproduction or anything. So there's some parts that are quite hairy, but you can see that some of the things cluster really well together. And the guys kind of iterated on their models and it got to a place where it was doing really, really well. It was handling our queries with 95% accuracy and 99.8% safety. I'll explain a bit more about what those percentages mean in a bit. And they used a bunch of technology, a bunch of different technologies that are available to you when you code in Python. They use TensorFlow as well. They use a combination of RNNs and other deep nets. And I got to a place where they were really happy with their entity recognition classifiers. So we were doing really well. And we were taking some of these medical queries and we were just like knocking them out the park. We were handling them so, so well. But then we would get a really simple query and we would just totally bomb out. We would look so, so stupid. And then we would get another query that we just had an obligation not to miss. But we didn't really have good data for these queries. So the engineers told the machine learning guys to pipe down. They've had their time. We're just going to reject this problem. So we didn't quite do that. We used this tool called DriveScript, which essentially is this linguistic DSL that helps you kind of construct linguistic rules to catch huge variations of sentences and map them to a specific outcome. So this is an example. This is a query about... looks like some kind of abuse or child abuse or something like that. And it handles a huge variation of sentences in this one rule and then maps to a outcome that we want to provide to the users. So we wrote like hundreds and hundreds of these rules to kind of handle those like non-medical but kind of important queries. We're busy actually moving away from that now because the machine learning guys have had the chance to build things that are more generic, that don't need as much data and can handle these simple queries better than these rules can. We learned that testing chatpots are really difficult. Since doing this I've become really interested in how search engines test their search engine because you just have an infinite amount of inputs. Unlike a website where you have a bunch of buttons and it's only a certain amount of combinations of how you can tap the buttons and stuff and you can write comprehensive tests that make you really sure that your system is working. But if you have an infinite input space you can make a change and not sure what your change has done to some corner of this massive, massive input space. So we couldn't come up with a really good way to test to give us huge assurance. So we just went for many, many layers of testing. So the first thing we did was machine the model validations which essentially is just putting some of your data aside as a test set, training your model and your training set and then passing the test set back to your model to see how well it's doing. That's where the 95% accuracy comes from. We then wrote just thousands of RiveScript examples to test that linguistic DSL thing. So we just had many, many, many of these things. We also then created this kind of client mock system that pretends it's a client and with kind of BDD approach just testing loads of different interactions throughout the chatbots. Before making major releases we did clinical validation tests where we would get a big number of doctors into the office and they would use the chatbots, go through examples and generate this report of how clinically safe it is. And we would then do post-market surveillance. So when the product is out there we have some doctors who their main job is to take a sample each day and run through that example if they were handling that query and then decide at the end whether or not this was safe or not and that's where we get that 99.8% safety from. The second lesson we learned is that the chatbot is only as good as your content. So we initially thought natural language is going to be really difficult and it is and we spent huge effort into kind of solving that problem but then when our first kind of, you know, prototypes started coming out we realized that this doesn't feel good even though our natural language understanding is really good it didn't feel good because our content wasn't like optimized. We didn't spend that much effort in our content and we've learned since that you need to kind of improve both at the same time because good content without good understanding is really bad but a good understanding without good content is really bad as well. So we now kind of as we're improving the ML stuff we're also improving our content so that the whole chatbot becomes better altogether. And then I asked machine learning guys like, you know, I'm doing this talk what can I tell these guys that you guys have learned that doesn't give away the current cool things you're working on that is still kind of under IP controller and they said to me that basically standard classification doesn't scale in the way that we've done it here and the sentence they told me to use is that information retrieval is better than classification and the reason classification doesn't scale is because each time we want to make a small change we have to retrain all our models and sometimes the new feature we want to add doesn't quite fit the way that we constructed those models and it's just a very kind of slow process we want to be able to iterate our chatbot really fast. So we have natural language and stuff and we have the medical triage, we're doing really good but we had no way yet to kind of communicate to mobile clients we had this really complex stuff happening in the back end but we didn't have this uniform way to communicate to our mobile clients so we had the dispatcher is where, this is like a simplified there's more services around this but the dispatcher is where the understanding happens we have leaflets and medical entity extraction services that it talks to and we have the RiveScript DSL service but we needed to build an API that kind of simplified all this complexity happening in the back end to a way that was consistent for our mobile clients to understand and again this is just massively simplified we have like 35 services running in production but it's this kind of piece that was particularly difficult to kind of take all this complexity and map it into a simple way for our clients and one of the lessons we learned is that just in this application was just an evil word because we started off like oh we're going to build this chatbot it's just going to be easy it's just a request response we can just use HTTP, it's perfect but it wasn't quite like that because we had to use web sockets and it wasn't just web sockets, we had to also kind of bring APNAS and GCM in there and initially we were like oh it's just one person talking to another person because there's other people and there's doctors and there's authorization of elements and our heads just exploded with how much more complex this was than we thought it was going to be a really useful tool we used that helped us build and the clients build at the same time was Connection which basically is this wrapper around Flask that helps you generate swagger documentation that your code is dependent on which means that your swagger documentation is always up to date so how it works is you basically build a swagger YAML file if you don't know what swagger is, swagger is like a standard for how an API document or API design so a standard way of representing what your API does and you can do it in JSON or YAML so you build your swagger file Connection then will map the endpoints on that swagger file to the entry point in your code and then it will do lots of cool stuff for you so this is an example of our conversation model so you can see there there's some strings and bulls and arrays referencing to other objects and then Connection will do that type validation for you that you don't have to do in your code and it also means because Connection is doing that you're really tied to making sure that this documentation is correct because it's going to impact the way that your code runs which meant that our documentation was always very very good which is very useful to our mobile clients and what you get then is documentation and not just documentation, you get an API client that your mobile clients can use to test out your API so cool so the last thing I want to talk about is just the role that Python played in all of this and I can really confidently say that we wouldn't have done as much as we did in the time we did it without something like Python and some of the lessons we learned were not surprising but were really welcomed and resoundingly true and some of the lessons were quite difficult and not ideal so one of the lessons we learned is that Maths and Python are like a match made in heaven there are just so many amazing amazing tools that take really complex things and make them really easy not easy but make them really accessible and then there's a huge community around this as well and there's loads of tools and stuff on the internet and tutorials and stuff to kind of get you on board so if you have like Maths heavy workloads Python is like an excellent excellent choice for you and not only that Python and the web is also this really good combination because again you have all these amazing tools like everything from you know everything out the box frameworks like Django to more simple macro frameworks like Flask and kind of everything in between you have like native support to all modern web technologies like Apache Spark and all the stuff like that you know database drivers all that support is just really really solid which makes building this stuff you know really confident about building some of these complex systems some of the bad things we learned though is that if you have async heavy workloads in Python you're probably going to get caught out at some stage even if you know what you're doing there's probably going to be something that's going to catch you out because of all the kind of specific knowledge you need to know about how async works in Python so this is a screenshot from Wireshark and why were we doing packets nothing you might ask no one wants to do this but we had this issue where like one in a thousand we were getting this SSL error that our load balancer was just throwing an error and the SSL errors were different each time and for the life of us we couldn't figure out what was causing this problem so we then went into the packets to figure out what was happening and basically what's happening here is the server sends like a hello message to the other server there's this kind of certificate key exchange and cypher and spec and handshakes and stuff and then right in the middle there the server just says hello again and they're receiving the service like whoa I wasn't expecting that like fail and we couldn't figure out why this like random kind of initial hello and the SSL handshake was just kind of getting in there and we still don't really know exactly why but we did find out that there's the compatibility issue with multiprocessing and a ventlet and a ventlet was the underlying async framework that our web framework was using and it was just causing some craziness another problem we ran into was when using AIO HTTP was that once a week this kind of internal server had running just kind of ran out of memory and crashed and there were just some specific implementation details that we didn't kind of completely read through our understand where you had to kind of release the object the response object if there was an error otherwise the memory just kind of hangs around and it's just this stuff that you really don't want to be dealing with but when you're doing lots of async stuff in Python you're probably going to find yourself dealing with these kind of like weird things every now and again so the last thing I want to say and I don't mean to be controversial at like a EuroPython thing is don't look to Python for your like main sense of engineering achievement so my relationship with Python is very much like an unstable romantic relationship where I move from being infatuated to being totally in love on like a daily basis because I'll be using really cool stuff like generators and decorators and context managers and coroutines and all this stuff but then I'll get something like this so this is just a function that has a return and a yield statement in it and if I was reading this for the first time what I would imagine is that the yield statement will just get ignored and it'll just return three and everyone's happy and it all makes sense but it doesn't do that it becomes generator function and returns generator so okay whatever I can kind of live with that but if you call next on this generator I guess if I read this for the first time I would imagine that maybe the return is skipped and it yields two but no in fact it raises stop iteration with the value of the return and if you kind of go and read there's a pep for this and it's got to do with coroutines and subgenerators and stuff like that but to a new user this just says return doesn't mean one thing in Python and this is kind of the it's stuff like this that I have to keep reminding myself that gives Python some of its characteristics some of its personality this is a quote from the Zen of Python and essentially it says practicality beats purity and this is really important to us because we are way, way more concerned about making healthcare affordable and accessible than we are about doing it in the most pure way and Python has that very same approach where it's way more interested in solving a problem and getting things done than it is about doing it in the most pure, special way which is very different to when you go to scholar guys or elixir guys and stuff like that and with that approach we've been able to build something that's really, really amazing and it's affecting people's lives on a daily basis and we're way more interested in that than we are about doing it in the most pure way and Python allows us to kind of do that which is something that is really amazing to us so that's it for me what's next for us we're moving into diagnostic space we're moving into a way more comprehensive health prediction and monitoring space we're making our chatbot international which I can tell you is not trivial so like natural language for English is hard natural language for Arabic is like and Chinese is like so that's me are there any questions? we have time for just one short question please really interesting talk did you for your chatbot take into account social situations or medical history of the patients asking a certain question because it can have a lot of influence on the questions the doctor will ask after he knows about the history of the patient or about the social context sorry I didn't quite catch the question for example when a patient asks I have had it what should I do it can really matter if the patient is a young boy who got it so at the moment let me think maybe two months ago no but recently yes so recently we take demographics and age and gender all into account into the kind of outcome and path that you're going to take that you're going to go through and what we're working on now is not just to take that but also taking everything we know about you how many steps you're taking a day the medication you take all into the kind of advice that we're giving to you and we have now a pattern that we can start adding these more kind of complex reasoning and deductions to so we're starting kind of down that path thank you very much William give him a big hand