 Hey, thank you very much for being here. My name is Shailen, and I'm an AI specialist at Intel. And internally, my title is Technical Consulting Engineer, where I am the link between you guys, the end users, and the core developers who develop software that you guys will use. So today, I'm going to show you how we accelerate deep learning algorithms and end software, and I will use a case study, a real world case study, to show you what we've done. So the interesting case study that I'm very passionate about that I'm involved a lot this year is about scanning brain cancer in humans. So that's the title of today, brain tumor segmentation using deep learning, OK? So I'm based in Germany, and I have an education from Germany. So a brief agenda. I'm going to describe you the problem that we have and how we're trying to solve this problem with AI. And then I will also tell you what software tools and packages that we use in order to solve this problem. And then we'll have a look at some performance numbers, what you can expect from such a real world case study. Let's start with some statistics as a motivation of why I, my team, and Intel is so passionate and involved in this field. As per the GlobalCon, it's a World Cancer Statistics Research Group. We found out that there are approximately 18 million people having cancer. So new cancers were recorded. And if you look at that, close to half of that number involved people dying. So if your family is involved in there and dying from cancer, it's not nice. So what can we do? How can AI help in there? And if you look, those numbers are just in 2018. This year, 2019, we can expect similar numbers. And it's really important to diagnose cancer as early as possible and find solutions so we can avoid deaths, right? So an introduction to this brain tumor topic. So there is a technical or medical term we use, gliomas. These are the most common occurring types of brain tumors. And they are very dangerous. If you have them, it can grow really bad and you can die from that. And 90% of those gliomas belong to a class of highly cancerous tumors. And to date, multi-sequence MRIs or magnetic resonance imaging is the de facto way to screen and diagnose for such gliomas. And when you do an MRI, imagine your body, your head goes into this MRI machine and it takes a volumetric 3D scan of your brain. And for the doctors, the challenge, to find the cancer, they have to slice or segment that 3D volume and analyze slice by slice. This is a very time-consuming. It's an expensive process, but that's a very crucial thing. So segmenting the brain, this 3D volume is very important. And how do you actually fix or cure an affected area? You can do radiotherapy. And radiotherapy is actually using a laser to grill the bad cells. And then another way is actually to actually do surgery. So you cut the skull and go and remove the bad cells. OK, I know some reactions there, so sorry about that. But yeah, but in order to do these two things, you actually have to analyze slice by slice of this 3D volume. And that's why segmentation is important. Now, the problem is the following. And that's the medical challenge. The medical challenge is one fold. We have a lack of specialized doctors to do that. And I have a link down there where people posted an article about the lack of physicians to do such kind of studies and analysis. And the second thing, this whole process of segmenting is time-consuming and very expensive. But we believe that computers can help. So if we can automate this process, we have gained time to the patients, to the doctors, making this whole diagnosis process faster. And we also improve on the segmentation quality. The second thing where computers can help is in the field that we are collecting so much data. And I got data from 2013. At that time, approximately 153 exabytes of data were collected just in the health care sector. And that number was predicted to grow over 2,000 by the year 2020. We are now in 2019. I need to check the numbers for that. But now you may ask, what is an exabyte, right? To give you a perspective, that's over or close to 250 million DVDs worth of information, one exabyte. And now in this year, 2019, over 2,000 exabytes, that's a lot of data. So having high compute power to analyze all of this data is great. We're in a great time. So this is where AI hopefully can help. So that's the cred of this talk today. And let's have a look at the data set that we used in order for us to train our deep neural network that we used to do that. So the data set actually comes from the brain tumor segmentation, or BRATIS, a challenge of 2018. So it's an open source data set provided by the University of Pennsylvania. And the goal for our deep learning algorithm is to look at the 3D volumes and figure out whether a 3D pixel or voxel contains cancer or not. So there are so heavy tissue or the three types of cancers, so cancer or no cancer. Now a voxel, just to visualize that for you, it's like that kind of 3D pixel. And let's maximize one of the examples of that brain over there. So this is it, so cancer or no cancer. And we can color label them to different channels on the type of cancer that we're looking at. And I have one slide summary of the algorithm we implemented. And this is it. On the complete right, we have the input image from the machine. So that's one segment of this MRS scan. And then in order to train our deep neural network, we needed labeled data. So a combination of this MRI input and the middle one, and that's what the radiologist has drawn by looking at the MRI input. So he has marked the cancer areas, and we got tons of these input images plus the labeled one from the doctor. A combination of these two, we call them a set of labeled data. We use these two to train our neural network so that it can do what we have here, the predictions or inferred images. And this is our goal. We want our deep neural network to start analyzing new patients coming in and telling us whether that person has a high degree of cancer or not, where are the cancer areas, and so on and so forth. Let's have a look at the algorithm used. The model in this research is a UNET model. And UNET is very, very popular in the medical sector, especially for medical imaging. And the UNET neural network looks like a U. That's why it's called UNET. And it has lots of convolutions involved, and a bunch of researchers from the University of Heiburg in Germany came out with this research, and it's really great. It's nice. It works quite well. And it works like an autoencoder. So one side is encoding, and the other side is decoding. And at each stage, going through that neural network, we're extracting features. And that's how we can detect, at the end of the day, cancer or no cancer. So basically, this neural network answers the question to which class does a volumetric pixel or voxel belong, cancer or no cancer? Now, you may think all this deep learning, AI, it's complicated. Well, not really. If you look at the bird's picture of this whole algorithm, how it looks like is like that. Very simple. We have an input data set. Think of this as black boxes. Input data set coming in, so label data. It goes into the neural network. Then the next step is train that neural network with the input data set. And once we have that, we have a train model. Job done. Easy peasy. So now we have this train model. The new patient comes in. And then we do inferencing. And then we get the result. So all of this looks great. But what did we use in order to make this happen? A bunch of software tools. Some of them, the Intel distribution for Python, for best-in-class Python performance, of course. And then for the neural network framework, we use TensorFlow. And since TensorFlow may be painful sometimes, we leverage Keras as a very nice layer on top, so making deep learning even easier. And then Horovod. Horovod is a technology by Uber. It's very interesting. The car company came up with this piece of technology. And what does Horovod do is actually distributing works. It splits a job, a work, into multiple small subsets, small works, and then distributing that work on multiple nodes or machines so that they work together. And if you look at the logo, it's like one person. Well, Horovod is actually a Russian word for a Russian dance. And it's like one person holding the hand of the other person in a circle. That's what the logo looks like, like the dots are the people, the heads of the people holding hands. And this is the key message distributed computing, one node talking to the other node, and so on. So with Horovod, we split our training process on multiple machines so that we could do the training of our deep neural network faster. And then the second stage of the training is inferencing. How do we do inferencing fast? He's using a tool called OpenVINO. And OpenVINO is a very nice tool that makes inferencing easy and fast thanks to its optimizations in place in there. Now, let's have a look at some numbers I got by going through this training process. Now, you can imagine. We have this MRI input coming in from that MRI device. It contains high quality images. It's a huge data set, large images with lots of detail. To train that neural network, it's very taxing. It's very intensive for a computer to do all this processing. So obviously, training takes a long time. And some performance results I got complete right when I used TensorFlow stock, that is from Google. And on one machine, it took me 76 hours to do the whole training, going through 30 epochs or 30 times through that neural network. I was not very happy. With 76 hours, I thought I could do better. And of course, the next step is to actually use a better TensorFlow. And that's the one which Intel optimized. And this is the second point in the Intel Optimized TensorFlow. With just changing that TensorFlow package, I dropped to almost 50% better time, 2x performance boost just by using software to 43 hours. But still, I was not very happy. Then I started looking at distributed training. How can I use multiple machines to work together and do the training faster? So then I looked at, let's look at the last two parts. Four nodes, means four machines with Horovod, so eight workers. Eight workers on four machines. I dropped from 43, 44 ballpark hours to 7.5 hours. That was great. And increasing the number of workers to 16, even better, five hours. And with five hours, I was more or less happy, based on that huge data set that I had. So from 76 to 5, the key message there is use really optimized software and distribute your work. And if you have a cluster, make use of a cluster. Why should you stick to one machine when you can make use of multiple machines? So use better software, of course, looking at just one node from 76 to 43. For me, that was mind-blowing. I really appreciated that. So without much work from me, it's leveraging just better software. Now, plugging all of this in a big picture, this is how it looks like on top. What I wanted to do, solve a medical problem. This was my software stack involved. And now you may ask, OK, what is this inter-optimized TensorFlow? It is the same TensorFlow code that Google releases. And what we do, we take this code and we plug in our performance library in there. So this performance library, it loves math. So my unit network does a lot of math intensive operations. And that library, the Intel MathGunnel library for deep neural networks, it loves math. Whenever it says math, say yes. So then it boosts all those math-heavy computations so that I could do my training faster. As you can see, the numbers that I collected. And of course, I leveraged best-in-class Intel Xeon processors in order to do that. And you saw I had four nodes, so four machines, four Xeon processors, so I could do training faster. And actually, when I said one node, one processor, no, it was actually two circuits, so two physical Xeon processors on one motherboard. So I had eight physical processors working together to give me close to five hours, dropping from 76. So now you have seen that only one slice was being inferred. So imagine now for that 3D volume that came in, every slice is being analyzed. If you had to have a doctor to analyze one by one, it's really expensive, time-controlling. The doctor may say, oh, that's too much. And that's just for one patient. And AI algorithm doing that for you is obviously much better, easier for everybody, for the patient, for the doctor. And if you're curious how all of this plugs into a 3D, this is how I got there. I use this software called Mango to draw this 3D volume. And that's the MRI brain originally. And you can see all the slices stuck together. And you can see the volume of cancer there. And for the doctor who has to do radiotherapy or even surgery to cut the skull and go there, he needs to know exactly where are the bad cells. Otherwise he may destroy good cells and the person goes into coma or something like that. So breakthrough stuff. Now, if you are curious also, this whole source code and the data set is all open source, even the AI software tools that I showed you from that slide earlier, that software stack, they are all open source. This is Intel's commitment to AI. We're going open source, free software, free tools. And if you want to have a look at my code as well, especially if you're in the medical field, I published all my work on my GitHub. And that's the link there. And you have instructions on how to get all the data set and how to get started and play around. You can also reach out to me. So from my GitHub, you can get to me. And that's it. Thank you very much for your attention. I'm open to questions. Could you use the microphone? Because it's being recorded. It's easier than. I'm sorry. What is the difference between Horovon and Spark? Because you talked about the workers and nodes and all that kind of stuff. OK. Well, so on the Spark architecture, leveraging Hadoop and so on. So Horovon is a Pew MPI-based package or program. So what it's doing is splitting my work into MPI processes and sending it to native nodes. So there's no Spark involved in there. If you would be using Spark, we have another solution. It's called BigDL, which is also a Spark application. And with BigDL, you could do the same thing, splitting work in multiple chunks over several Hadoop nodes. And that's the main difference. So on a native cluster, you obviously cannot use Spark because you don't have the software stack there. And in this case, you would use Horovon. Cool. One more? Have you tried other UNED architectures or just one for this experiment? Very good question. So this example here is the 2D UNED model. We have also tried the 3D UNED. And if you go to my GitHub, which I will just move here. So you guys who are really curious, I totally recommend you please do that. Go there. And you will also have my Horovon code in place. Where's my cursor? So this is the 2D version, leveraging the 2D UNED. And I also have 3D UNED. And training with Horovon, all the code is there. It's really nice. So I've tried 2D and 3D UNED. Thanks. Any other questions? Curious about anything, any Intel software tools, technologies that would like to ask me any questions? I can answer them, hopefully. Who funds the cancer research? Is this just to demonstrate the capabilities? OK, the question is who funds the cancer research? This stuff is done by us only. So it's coming from our own motivation to try to solve something. We have partners helping us, or even taking this code and using that. But there's no external funding, if that's what you were referring to. So if there's no more questions, let's thank the speaker one more time.