 Looks like we're live. Yeah, so welcome, everybody, to the last talk of this block. It's my pleasure to welcome Dr. Anton Becker. He's going to give a talk on building a radiology workflow manager with R. Thanks. Thanks for the introduction, hi, everyone, and welcome. I've already seen some great talks this morning, so happy to continue, and I hope I'll deliver. First of all, a big thank you to the organizers and to the sponsors, a fantastic event so far. I also need to acknowledge my colleagues and friends from the Sloan Kettering Radiology team and Strategy Innovation teams, especially those eight wonderful people who have really massively helped getting this project off the ground. I'd also like to thank our radiology faculty and our service chief and our department chair. So a couple of words about myself. I'm a radiologist at the Memorial Sloan Kettering since last year, so I'm an MD. I'm coming from the MD side. And this slide should just give you an overview. I did learn R in the context of my research and scientific work, so I'm not a software engineer and I'm not a computer science person. But yet this program actually has more to do with engineering and a lot less with science. So since I guess there are going to be only very few radiologists on here, I want to take a few minutes to explain the radiological workflow. This picture should just, you don't need to study everything that's here in detail. I just want you to take a couple of takeaway points from this. The first thing is it's not as simple as you might think to just go from a medical image to the result. There's a lot of intermediary steps involved. That's the first thing. Second point is if you look at the far right circle here, you can see that this is a, so I'm reading the chat right now, so I can maybe interact with you guys. So we have pathologists on the call. Very good. So as you see on the right side, this is kind of the legacy workflow. This is the legacy workflow. This was when we had actual films that were taken and that were then developed. So in most Western radiology departments, this workflow doesn't really exist anymore. But there used to be a physical room where people had to go and take the films off a physical folder and then go with them to the light box and read them. So as you can see kind of here, the digital workflow that we have is kind of modeled after this old workflow. It doesn't, on a technical level, there is no reason it needs to be like that. But I think this was just done so that all the radiologists who were used to the old workflow, they wouldn't have to completely change their workflow. They could just be told, look like whatever you did before physically, that's now done virtually on a computer. So first takeaway point, the radiology workflow, as we know it today, is kind of a legacy system modeled after an old film workflow. And the advantage that this gives us for this project is that usually those little parts that you see there, they were introduced one after another. So oftentimes there are different softwares involved and they all need to talk to each other. So there is a wealth of APIs that we can leverage. Some use HTTP, other use healthcare specific standards, like the HL7 standard or the DICOM standard. So since this is an old workflow, this was kind of all designed for the old radiology system. We, as radiology has evolved as a field, we have a lot of new problems. And in our opinion, those new problems, they need new solutions. So one of the new problems is that the size of institutions has increased constantly. About 15 or 20 years ago, Memorial Sun Kettering had maybe 10, 15 radiologists. Now it's 160 radiologists. So like a gourmet restaurant, if you wanna deliver high quality work, you can't just use the same tools and scale things up indefinitely. You'll never find three Michelin-star restaurants with like 800 tables. Because at that scale, you won't be able to deliver that kind of quality anymore. So in our workflow, this would mean either you'd have huge cluttered worklists, where you have to look for exams, or you'd have a lot of small different predefined worklists with like certain criteria, but people would go and need to click all of them. And so it all just introduces friction at the workflow and doesn't make it run smoothly. Our solution to this is that we use R to send the examinations directly to a personalized queue of every radiologist. Second issue is subspecialization. So whereas back in the day, you have one radiologist, you had a brain scan, she would read it, you'd have a scan of the lungs, he or she would read it, you'd break your leg, the same radiologist would just read it. Today, we have a very subspecialized radiology as other fields actually have moved into more subspecialization. For example, at Memorial, we have a team of radiologists that deal exclusively with cancers of the kidney, bladder, and prostate, or we have another team that deals exclusively with any diseases of the chest, lung cancer, et cetera. So that's one, and the last point is whereas back in the day, it maybe was acceptable to wait a few days for the result of your scan, nowadays people usually want their results quickly. Like, I mean, it's become very apparent in the COVID crisis, you do a COVID test and you want the results in a few hours at least, you don't want to wait two hours for your test result. Similar in cancer care where you have a cancer and you're worried that it might get worse or it might recur and you don't, after you have the scan, you don't want to wait a week for the result, you want to have it quickly. So we viewed this as an optimization problem basically, we optimized between the available radiologists on one side and the unread examinations on the other side. And we basically optimized over this with the custom Monte Carlo algorithm which we implemented in R. So if you think about this conceptually, basically the surface you see down here, this is the difference between the two distributions. So red means fewer exams are read by the appropriate subspecialized radiologists and blue means a lot of the subspecialized exams are gonna be read by the specialized radiologists. So basically the way you do it is basically you throw a whole bunch of marbles in here and then you look which marble lands on the lowest point and that's your ideal point. So a short schematic flowchart about how the program's actually set up. In this box you see IGNITE which is the project name for our workflow manager. We run it actually on an RStudio Connect server. I'll talk a little bit more about that in a minute. We do mirror most of our clinical systems in a SQL server just so we don't have to interact with all of those. It's not strictly necessary but we do it this way. Q-Gender is our system for making the clinical roster or for scheduling which radiologists work on which days. We also mirror this in a SQL server. So basically the basic steps that IGNITE does is it takes all of the unread examinations from the SQL server. It first of all looks which ones of those are obviously still open. Then it'll assign a so-called DMT, we call this disease management team. So this is basically which subspecialization should this exam sort of go to. And we then assign it to different algorithms. So I just saw a question in the, so first of all Q-Gender, I think it's a great product, I love Q-Gender. But some people on the chat don't seem to like it. So it's not a gradient descent algorithm actually. If you think about it in terms of, in machine learning you have a huge search space. So you need to use very sophisticated algorithms to find that minimum point. We just use bootstrapping, we just do random permutations and that's enough. If you think about it it's like in machine learning if you train a neural network or something you're trying to find the minimum point in a landscape the size of the state of New York. So you're gonna need a map for that. Whereas the size of the search space in our case is maybe the size of this table that you're sitting at right now. So you won't need a sophisticated algorithm for that. You can just do it randomly. Yeah. So our system has a couple of neat things implemented. So this is actually an HTTP API, it's very simple and this is how we assign the exams to the different radiologists since their personalized queue. We do have a couple of neat systems. One of them is Ignite will look up the next patient appointment. So if a patient has an appointment within the next 24 hours with a clinician say with their oncologist, then the radiologist who was assigned this exam they will get an email notifying them that. Of course it's especially relevant if the exam is like if the appointment is within the next say hour or the next two hours, then the radiologist can prioritize that read and the results available before the patient has the appointment with their clinician. That's basically for that. A couple of details about the R implementation. We've decided to write this as a package because then it's portable. It ensures a certain level of code consistency because then we document every function. And we also have some sort of integrated quality control. First of all, when we built the package, second of all, we also used Travis and test that to kind of ensure that everything works like it's supposed to work. Our code we group it into four different functions which you'll see in a minute when I show you an example. We use core functions which have the optimization algorithms implemented. And then we use a bunch of functions that just interface with those SQL and R interfaces. We have certain auxiliary functions which we use mostly for logging or for the use of the PINs package. And we have a couple of internal functions that are just smaller functions that kind of make the code more readable or more elegant. The second important part is internal lookup tables. So you obviously need some sort of a priori information in your program like for example, how many exams is the program actually allowed to assign to the radiologists per ship. It also needs to know which radiologists has which expertise. So all of this is done with a bunch of, basically a bunch of CSV files. Some of them are also pulled directly from internal databases. And then we have the so-called modules and that's just a fancy name for RMD documents which are just run at given intervals on the RStudio Connect server. So a couple of words about RStudio Connect versus a standalone server or some other self-made solution. I'm a big fan of RStudio Connect. We use it at Memorial for various analytics and dashboards. So it was just, it was available before we started this project. The nice thing is it has a built-in PIN board for use with the PINs package. It integrates very seamlessly with RStudio. So whenever you update your package, we just have a separate update script that then just from RStudio updates all our modules automatically. So it's very, it's very painless the whole process. And it has this nice high-level functionality to kind of generate reports and emails and send them out to different people. The downside would of course be it's a proprietary product. There is a cost involved which you might be able to achieve at a lower cost if you have like a Linux server with the only running open-source software. However, that might on the other hand be out sort of that might be, might be, sorry, I lost the thread. Yeah, it might be more time intensive to set it all up and to maintain it. So my suggestion would just be to use whatever you have available and then run with that. So here we have an example of our markdown document. This is like a minimal example. The true module is a little bit more elaborate, but basically first you first you load the library Ignite and then I load the Margariter for the pipes and the pins package to kind of, to kind of for some logging. The first function is a whenever, whenever I don't use the package name it'll be an internal Ignite function. So in the first function we set the HTTP token for the whole session. Next function we register our pinboard on our Studio Connect. In the next, and all of those kind of did, as you can see it's a very pure our way. The next line fetches all the completed examinations and then assigns the DMT to them. In the next line I get all the radiologists who are say on a CT shift on that morning. Those are two just our native data frames. And then this is one of the core functions that divide function. This then divides everything up between the radiologists. And here I can actually define the number of bootstrap or number of Monte Carlo iterations. I use 50 in this case. And then it'll generate a list of three data frames actually CTs and all the CTs that were actually successfully assigned are gonna be in a separate data frame. And this function then actually communicates with the dictation system and does the actual assignment. Every exam that's been assigned actually gets a variable that gets assigned status is okay. So I filter those out and then I pin them to my pinboard. And after that I do different stuff with them. For example, in our case after that another RMD actually takes up that pin and then looks at which patients has and have an examination coming up and then does the email notification. So some results, the code base. The initial prototype we built over a weekend or over a week. It had 18 files about 150 lines of code. So this is just to show if you wanna try something like this at your department or if you wanna try whether it's feasible, whether it's possible. You can do a very fast prototype with R and you can very quickly see whether it's possible or not. So we've now actively developed this program. We're more than three months in. Consists of 66 custom functions. And still the code base is still quite slim. It's still less than 2,500 lines of code. Looking at the complexity of the different code. Some of you might be familiar with the cyclomatic complexity. It's a measure of like how complex a function is. And you can see all the divide functions. Those are all those core functions with the implementation in them. Those are among the most complex functions. So what is the actual effect on our clinical workflow? You can see how over 2018, 2019, those are just the months of June and July, 2018, 2019. You can see that the number of exams has steadily increased every year. This followed a trend that was already present in the prior years. And what has especially increased is the number of exams that were read slower. So more than a day ago. And you can just see that while the number of exams has still increased, we've managed to read the exams a lot more efficiently in 2020. Despite every reader not reading more exams. So we were able to keep the number of exams read per reader. Those were actually stable. So we didn't force people to read more. It was just distributed more efficiently. What this has also enabled us to do is actually beforehand, when we didn't have the system, the first thing that would be kind of culled when there was sort of an overhead of examinations available, the first thing that would be cut was admin and research time. So for this year, actually able to give people a lot more of the admin and research time, which of course results in a lot of happy colleagues. So on that positive note, I also need to mention some of the challenges we have. Of course, if you implement a new workflow, at first there's gonna be some resistance to any change that you introduce. You do need a minimum number of radiologists per day, of course, that's a planning issue, otherwise the best algorithm is not gonna help you. The premise of our implementation of this was a balanced clinical productivity, which also meant that some readers who are a bit slower are disproportionately affected. So they're gonna be assigned more examinations than faster readers. And from the technical side of documentation, testing, testing, testing, and of course that goes into ensuring continued support since this is an in-house application. So conclusion and closing remarks, this is the first radiology workflow manager written in R. We think it's the start of a new personalized radiology workflow paradigm. And to our knowledge is the first pure R application that's actually used in clinical care, but maybe I'll find out something differently in the process of this event. And R helps us to deliver the best possible care to our cancer patients. And with that, I thank you all for your attention. And I... So I'll read you the questions, so you don't have to. So the question, this is a fantastic manager. So why did you choose R for this over perhaps more traditional languages like Java, Python, et cetera? So this goes back to the sort of what kind of problem are you dealing with? It's an optimization problem and you're dealing with data and with data frames and nested data frames. And it's just, as I've shown, like it's less than 2,500 lines of code. So it's fairly simple code in R. And in our opinion, R is just like the best tool for the job here. Surely the same thing could have been implemented in Java, Python, or C. But I think the whole, given that R is just so powerful, this was just the ideal kind of the ideal tool for the job. So the other, I can actually read that. So the package is not open source so far. Any thoughts about making it open source? Yeah, I've thought about it. Right now I have a lot of memorial stone catering specific things still in there. But I'm definitely, if there is any interest, of course it's a lot of work making it open source, making it applicable to everyone. If there is any interest, I'd surely consider making it open source. I'm a big fan of open source software. And challenges getting management supported. Well, actually, so this is actually, I was kind of, I was working with, this project started with a lot of tailwind because R is already pretty strong in our department. So that was fortunately not an issue we had to deal with. So there was one other question in the chat that said how many lines of test that are in this code? I don't know, actually, to be honest, not as many as it should, not as many as it should. So thank you, wonderful talk. So we have a break right now about eight or nine minutes and then we'll see you in the next session for our last block of talks for the day.