 From around the globe, it's theCUBE with digital coverage of DockerCon Live 2020. Brought to you by Docker and its ecosystem partners. Welcome to the special CUBE coverage of DockerCon 2020. It's a virtual digital event co-produced by Docker and theCUBE. Thanks for joining us. We have a great segment here. Precision Cancer Medicine really is evolving where the personalization and the data are really going to be important to personalize those treatments based upon unique characteristics of the tumors. This is something that's been a real talking point and focus area in the industry and technology is here to help. We've got two great guests who are using technology, Docker, Docker containers, a variety of other things to help the process go further along. And we've got here Sabrina Yan, who's a bioinformatics research assistant and Camille Took, who's a student and intern. You guys done some compelling work. Thanks for joining this DockerCon virtual live. Thanks for coming on. Thanks for having us. So first, tell us about yourselves and what are you guys doing at the Children's Cancer Institute? That's where you're located. What's going on there? Tell us what you guys are doing there. Sure. So, at the Children's Cancer Institute, as it sounds, we do a lot of research when it comes specifically to children's cancers. So, children are unique in the sense that a lot of the typical treatment we use for adults may or may not work or will have adverse side effects. So, what we do is we do all kinds of research both wet lab and our lab, which we call a dry lab, where we do research in silico using computers as the developed pipelines in order to improve outcomes for children. And what are some of the things that you guys have to deal with? I was going to text side, but also there's the workflow of the patients, survival rates, capacity, those constraints that you guys are dealing with and what are some of the things going on there that you have to deal with and you're trying to improve the outcome? What specific outcomes are you trying to work through? Well, at the moment, off of the past decade and all the work we've done in the past decade, we've made a substantial impact on the survivability of several high-risk cancers in our pediatrics. And we've got a certain program, which Sabrina will talk about in more depth, called the Zero Childhood Cancer Program, and essentially that aims to reduce childhood cancer in children to zero. So that, in other words, will be improving the survivability 100%. And hopefully no lives will be lost by cancer. And what are you guys doing specifically? What's your job? What's your focus? Yeah, so part of our lab group, both computational biology, we run a processing pipeline, a whole genome and RNA sequence processing pipeline that given the sequencing information from a kid, so we sequence their healthy cells and we sequence their tumor cells. We analyze them together and what we do is we find the mutations that are causing the cancer. That helps us determine what treatments or what clinical trials might be most effective for the kid. So specifically our lab works on that pipeline where we run a whole bunch of bioinformatics tools for that area, bioinformatics, which is basically just biology informatics. And we use the data-generated sequencing in order to extract those mutations that will be the cancer-driving mutations that hopefully we can target in order to treat the kid. You know, you hear about, you know, TAC and you hear Facebook, personalization, recommendation engines, what to click on. You guys are really doing really more personalization around treatment, recommendations, these kinds of things come into it. Can you share a little bit about what goes on there and tell us what's happening? Well, as you mentioned when you first brought us into this, we're looking at the profile of the tumor itself, and that allows us to specialize the medication and the treatment for that patient. And essentially that lets us improve the efficiency and the effectiveness of the treatment, which in turn obviously has an impact on this probability of cancer. What are some of the technical things? How did you guys get involved with Docker? Where does Docker fit into all of this? Yeah, I'm sure Camille will have plenty to bring up on this as well. But yes, it's been quite a project to convert the pipeline that we have with, we have built on a specific platform and is working great. But as with most tools and a lot of things that you develop when you're engineers, it's pretty easy for them to become platform specific. And then they're kind of stuck there and you have to re-engineer the whole thing for it to work on another platform. And that's just such a pain to do. So the project that Camille and myself were working on was actually taking each of the individual tools we use in the pipeline and dockerizing them individually containing them with the dependencies they need so that we could hook them up any way we want so we can configure the pipeline, not just customize, based off of the data, like we're on the same pipeline and every kid, even being able to change the pipeline to discover different things for different kids to be able to do that easily, to be able to run it on different platforms. The fact that we have the choice not only means that we could save money, but if there's a cloud instance that will run an app faster, if there's a platform that, you know, wanting to collaborate with us and they say, oh, we have this awesome data. We'd love for you to analyze. It's over here. And we're like, oh, well, our pipeline's over here. Yeah, use my tool. It's really great. Not really. And so having portability is a big thing as well. And so I'm sure Camille can go on about some of the pain points for having to dockerize all of the different apps. But, you know, even though there are some challenges associated with doing it, I think the payoff is massive. Camille, dig into this because this is one of the things where you got a problem statement, you got a real-world example, cancer patients, life or death, you got some serious things going on here. You're a techie, you get in here. What's going on? You're like, okay, this is going to be easy. I just wrangle the data. I throw some compute at it. It's over, right? No? How did you, just take us through the life there you've been living. Right, so as Sabrina mentioned before, first and foremost, we're on the scale of several hundred terabytes worth of data for every single patient. So obviously we can start to understand just how beneficial it is to move the pipeline to the data rather than the other way around. So much time would be saved, the money costs as well. In terms of actually dockerizing the programs that analyze the data, it was quite difficult. And I think Sabrina would agree with me on this point. The primary issue was that almost all of the apps we encountered within the pipeline were very, very heavily dependent on very specific versions of so many dependencies. Like that, they were just built upon so many other different apps and they were very heavily fine-tuned. So dockerizing it was quite difficult because we had to preserve every single version of every single dependency in one instance just to ensure that that was working. And these apps get updated quite semi-regularly. So we had to ensure that our dockers would survive those updates. So what did it really take to dockerize your pipeline? I mean, it was a whole project where myself, Camille, we had a whole bunch of extra bioinformatics interns join us over the summer, which was fantastic as well. And we basically had a whole team of us who were like, okay, here's another bioinformatics tool in the pipeline. You get to dockerize, you get to dockerize purple, you get to dockerize sage, each tool individually. And then you spend days or weeks on it depending on the app. Some are easier than others, but particularly when it comes to things or bioinformatics tools, some of them are very memory hungry, some of them are very finicky, some of them are a lot more stable than others. And so you could spend one day dockerizing a tool and it's done in a handful of hours or sometimes it could take up to a week and you're just getting this one tool done. And the idea behind the whole team working on it was, eventually you slog through this process and then you have a docker file setup where anyone can run it on any system and we know we have an identical setup, which was not true before, because I remember when I started and I was trying to get the pipeline running on my own machine, a lot of things just didn't work. It's like, oh, you don't have the very specific version of R that this developer has or, oh, that's not working because you don't have this specific file that actually has bug fixes in it just for us. But like, oh well. So you had a lot of limitations before the dockerizing, docker containerizing it. Life was tough. What was it like before and after? Well, I'll probably speak more to before. It was basically, yeah, days or weeks just trying to set up and install everything needed to run the whole pipeline. Yeah, it took a long time and even then a lot of things didn't work. Like, oh, you've got to set up this specific version of Python. Oh, but you need these other three for different programs or you need this version of R but then this new upgrade of the tool doesn't work with that version of R, all kinds of issues that you run into when these tools depend on entirely different things and to install like four different versions of Python or three different versions of R or different versions of Java on the one machine in order just to run it is a bit of a pain. What a hassle. It's a hassle, basically. It's a nightmare. And now after, you're golden. Probably Camille can speak to that. Yeah, so what's it like after? It's ridiculously efficient. Like it's incredible. Well, like I mentioned before, as soon as we set in stone those at the versions of the dependencies, Docker keeps them naturally and we can specify the versions within the Docker that have container. So we can absolutely guarantee that that application will run successfully and effectively every single time. Share with me how complicated these pipelines are. It sounds like that's a key piece here for you guys and you had all the hassles that you get Dockerized up and things work smoothly, got that. But tell us about the pipelines. What's so complicated about them? Honestly, the biggest complication is all of the connection. It's not as simple as run A, run B, run C, and then you're done. That would be nice, but that's not how these things work. If you have a network of programs with the output to this, you can the input for another, and you have to run this program before this one, before this one, but some of the outputs become inputs for multiple programs. And by the time you hook the whole thing up, it looks like a gigantic web of applications where all the connections are. It's a massive, well, it almost looks like a massive mess when you look at it. But having each of the individual tools contained and working means that we can hook them all up. And even though it looks complicated, it would be far more complicated if we had that entire pipeline in a single program. Like having to code that whole thing in a single group would be an absolute nightmare. Whereas being able to have each of the tools as individual doggers means we just have to link the inputs and outputs, which is a task. But once you've done that, it means that you know each of the individual tools will run. And if an individual tool fails for whatever reason, memory, or other issues you run into, you can rerun that one individual tool, re-hooks the outputs into whatever the next program is and keep going without having one massive, you know, a program or file where it fails midway through and there's nothing you can do. Yeah, and you get to unpack everything. So it's basically you get the goodness, you do the work up front and you get a lot of goodness come out of it. So this comes to the future of health. What are the key takeaways that you guys have from this process? And how does it apply to things that might be helpful to you down around the corner or today, like deep learning? As you get more tools out there with machine learning and deep learning, we hope there's going to be some cool things coming out of this. What do you guys see here? Any insights? Well, we have a section of the computational biology team that is looking into doing more predictive tasks, working out basically the risks of people developing cancer, the risks of kids developing cancer. And that's something you can do when you have all of this data, but that requires a lot of analysis as well. And so one of the benefits of, you know, being able to have these very movable pipelines and tools makes it easier to run them on the cloud, makes it easier to share your processing with other research institutes or hospitals. Just making collaboration easier means that data sharing becomes a possibility. Whereas before, if you have three different organizations with their data in three different places, how do you share that when moving the data isn't really a feasible task? How can you analyze it in a way that's practical? And so one of the benefits of Docker is all of these events, all coming out, you know, if there's some amazing predictor that comes out that uses some kind of regression or deep learning or whatever. If we wanted to add that, being able to dockerize a complex tool into a single dockerize app makes it less complicated to add that into the pipeline in the future if that's something we'd like to do. Camille, any thoughts on your end on this? Actually, I was, Sabrina read my mind for the last point. I was just thinking about scalability definitely is very, it's a huge point because the pipeline grows as the technology does. Any kind of new technology that we'd like to integrate into the pipeline, as of now, it'd be significantly easier with the use of Docker. You can just dockerize that technology and then implant it straight into the pipeline, minimal stress. So productivity, agility, does it come home for you guys? Does that resonate? Yeah, definitely. And you got the collaboration. So there's business benefits, the outcomes are there. Any proof points you could share on some results that you guys are seeing from fruit from the tree, if you will, from all those goodness? Well, one of the things we've been working on is actually a collaboration with Osbio Commons and Kavatica. They've built a platform specifically for developing pipelines, which we've wanted to test out. And they have support for Docker containers built into their platform, which makes it very easy to push all our containers up to their platform, hook them up, and be able to collaborate with them not only to try a new platform with our Dockerize app, but also to help them develop their platform, be able to share and access data that's been uploaded there as well by other people. We wouldn't have been able to do that if we hadn't Dockerized our app. It just wouldn't have actually, it wouldn't have been possible. And now that we have, we've been able to collaborate with them in terms of improving their platform, but also to be able to share and run our pipelines on other data, which is pretty good. Awesome. Well, it's great to have you on theCUBE here on DockerCon 2020 from down under. Great internet connection. You guys got great internet down there. People in the US like that all the time, but we're remote, we're sheltering in place here. Stay safe on you guys. Final question, could you each share in your own words from a developer, from a tech standpoint as you're in this core role, super important role, and the outcomes are significant and have real impact. What has the technology, what has Dockerization done for you guys and for your work environment and for the business share in your own words, what it means? A lot of other developers are watching. What's your opinion? Yeah, I mean, the really practical point is we've massively increased capacity of the pipeline. One thing that's been quite fantastic this year is we've got a lot of increased support for the zero childhood cancer program, which means going into the future, we'll actually be able to open up a program to every child in Australia that has cancer. We'll be able to add them to the program, whereas currently we're only able to enroll kids who are at a low survivability rate. So about 30%, the lowest 30% of survivability we're able to enroll on the program currently. But having a pipeline where we can just double the memory like that, double the amount of data. And the fact that we can change the instances freely to just double the capacity, triple the capacity means that now that we have the support to be able to enroll potentially every kid in Australia, once we've upgraded the whole pipeline, it means we'll actually be able to cope with the amount of children being enrolled. Whereas on the existing pipeline, we're currently at capacity. So doing the upgrade in a really practical way means that we're actually going to be able to triple the number of kids in Australia we can add on to the program, which wouldn't have been possible otherwise. Unleashing the limitations and making it totally scalable. Camille, your thoughts as developers are watching, you're in there, you're getting your hands dirty, you built it, it's showing some traction. What's your take? What's your view? Well, I mean, first and foremost, like Sabrina said, it just, it feels fantastic knowing that what we're doing is has a substantial and quantifiable impact on the subset of the population. And we're literally saving the lives, saving the lives with the work that we're doing. In terms of developing with that technology, it's such a breeze, especially compared to, I've had minimal contact with what it was like without Docker. And from the horror stories I've heard, it's a godsend. It's really improved the quality of developing. Well, you guys have a great mission and congratulations on the success. Real impact right there. You guys are doing great work and it must feel great. I'm happy for you and great to connect with you guys and continue using technology to get the outcomes, not just using technology. So fantastic story. Thank you for sharing. I appreciate it. Thank you for having us. Thank you. Hi, I'm John Furrier. We're here for DockerCon 2020. DockerCon Virtual, DockerCon Digital. It's a digital event this year. Obviously we're all sharing places. We're in the Palo Alto studios for DockerCon 2020. I'm John Furrier. Stay with us for more coverage digitally. Go to DockerCon.com for more. Check out all these different sessions. And of course stay with us for this feed. Thank you very much.