 What I'd like to talk about now is how we are using some of the applications in which we are using deep learning in health care. I'll talk about some one use case which we worked on is one of the state of the art methods today in using deep learning on MRI data and also talk about what other people are using in deep learning and how I think this is going to go ahead, going ahead. So, let's talk about medicine. So, when you go to a, you know, suppose you have a problem and you're going to a doctor to, you know, get some treatment for you. The first thing you'd go is you'd go visit a general physician today and you'd be asked a few questions. You know, what's your patient history? What's your history like? Have you had this before? What symptoms do you have? Do you have cough? You know, can you cough for me? A lot of questions which are probably written down by the doctor. Then the doctor might say, okay, looks like there's something wrong. I need to, you know, place a stethoscope. I need to listen to your lungs. I need to listen to something else. I need your ECG. If you have something wrong with your head, you might need an EEG. So, a lot of signal data, a lot of symptoms which are being recorded. And then after that, you might say, okay, your lungs, they don't sound right. Can you go get me an x-ray? So then what he does is you go, you go to the x-ray. You get an x-ray done. There's a radiologist who will be sitting there. He'll go through the x-ray. He'll look at the x-ray, see what's wrong. You know, if you have a cough, he might say, okay, this looks like there's something over here. Some pleural effusion. There's maybe some symptom he signs. It's bothering him. He writes out in the report. Gives it back to the doctor. You go back to the doctor. The doctor says, okay, this looks like TB. I don't know. Maybe something, it looks like something. And he says, okay, let's get a pathology. Let's get a biopsy test done. So you go to the pathologist. You get a blood sample done. You get a tissue. You get a biopsy done. You get, you know, some sort of pathology test done. And essentially, you know, it goes to the lab. It either goes through a blood counting machine or it might be if it's a tissue, then it's essentially a microscopic image that he's looking into. He might add a few chemicals. He'll see how it lights up and so on. And then finally, if your doctor is actually that smart enough and what you're having is actually cancer, then he might say, okay, let's take a genomic test. Maybe if, you know, what you have is some hereditary disease or it's some, you know, it's a disease which you have a targeted treatment for. If it's some kind of cancer, you might have a particular targeted treatment which might work on it. So then finally, he'll say, you know, based on all of this, he'll say, okay, let's either give you medicines and we'll cure you up or, you know, I need more data. Let's put you in monitoring. You need to be admitted in the ICU. It's pretty serious. And or maybe he might give you a surgery. Why I'm saying all this is that if you think about what all these individual doctors are doing from a very high level, what you see is that, you know, you have one your clinical records coming in, which is essentially a lot of, you know, text symptoms which describe the disease he has, the presentation he's come up with and so on. You have a lot of BP ECG or any other readings data which are again signals, 1D signals, 2D signals. You have a lot of radiology imaging which is coming up. Radiology images which are x-rays, MRI, CTs are all 2D images, 3D images, 4D images, different kinds of images. Again doctors, what are they doing? They're reading these images. They're giving the reports. You have pathology data. Pathology data is again a microscopic image which is, you know, a pretty large image. Look at it. You look at what's wrong with it and you send back the inference. Genomic data is again text data. Maybe you'll have one of the four letters of the nucleated combinations in several large combination. You're putting them together and essentially the doctor there is looking at all the genomic data to tell you what is wrong with your genome. So all of this, at some level, all of these are, you know, individually, you know, a data problem. Although combining all of them is a pretty complicated problem. What you can clearly understand is that if you have a lot of data, a lot of these insights can be made and a lot of understanding can be made. So there has been a machine learning has been applied in medicine for a pretty long time now. It's not a new concept. There is always a problem of not having enough physicians there. You know, a lot of the diagnosis you make are not very subjective. You know, you don't really know what treatment is going to work on you if you have something which is, you know, fatal or which is very life-threatening. And then the kind of the increase in data and the increase in literature coming up today, it's pretty tough for doctors to stay up to date. If you go talk to a doctor today and ask them if they know genomics or how much they understood genomics or even if they know what an SNP is, most of them will probably not. Because, you know, the pace at which all of this medical research is happening is simply, you know, we are still struggling to get enough doctors for having the best specialist care available. So deep learning, so in the context of deep learning is essentially a new paradigm in machine learning. And so just like how we've been using machine learning in most of these problems today, almost everyone out there who's been working in healthcare is now applying deep learning on it. And the idea is that you get better results. So what I'd talk about here is that each of these are individual problems which have their own merits and which have their own challenges in them. I've mostly worked on radiology, and so in the coming slides I'll talk about some of the challenges you'd see in radiology data and some, how, what are the usual techniques you can work out to actually come across to actually, you know, come across and have good results with them. So when you talk about radiology, first things, when you talk about imaging in general, what you'd see is your AlexNet, your ImageNet, all are trained in very pretty data set, 224 x 224, so nice. But when you talk about medical imaging, your digital pathology data can be sizes of 5GB, you're talking about 10K x 10K, 20,000 x 20,000 size images here. And your high-resolution CTs can have 100, 200 slices or much, much more than that even. Your mammogram is not a typical 8-bit image, it's a 12-bit image. You have MRI sequences, so MRI sequences are different ways of obtaining MRI images. So you can change parameters and get, you know, visualize the tissues in different variations, different ways. So each MRI image is a 3D image. So if I have multiple MRI sequences for diagnosis of any problem, you're essentially talking about a 4D image over here. And all this complexity causes problems, you know, in that you have a large number of pixels, voxels that you have to deal with, you need to extract an equivalent number of features, and you know, if you're dealing with so many dimensions, you have problems of course of dimensionality and so on and huge computational load. This is the first kind of challenge that you'll, you know, come across as soon as you start working with medical imaging. Then the second problem which you see is that, you know, there is very limited data, which is a fraction of the size of the image. So here, a few speakers back, I heard them saying only 8000, only 400. The data sets that I've typically worked at usually are in double digits, and that itself is a large number. If you have 100 medical imaging, researchers are very happy. But ImageNet, you are talking about what? Million images. And we are nowhere close to that, when you are talking about dealing with medical images. And you have huge variability. If you look at the amount of variations you have among images, you'll have people wearing fancy necklaces like this near X-ray, which you don't, this could very easily be, you know, misclassified for something else. You have a lot of variations, and you have very, very limited data to play with. This is data hungry. So it's pretty difficult to actually make deep learning work on these images. And then again, you have anatomical, demographical variations and so on. Then what you'll see is typical is that the abnormality that you're looking for, that you need machine learning for rather, is going to be very, very subtle. So it's going to be something which is not very evident because if it is evident, there is no need of machine learning in the first place. So what you're looking for is something very subtle. So this is an image of multiple sclerosis. So I'll actually come to more detail on multiple sclerosis. Multiple sclerosis is a disease, it's a neurodegenerative disease, and it appears as pretty subtle lesions on your brain. And this is actually the box you see here is multiple sclerosis. I'm not really sure how many of you are able to make out, but this is what you're looking for when you see a patient with that. This is a very tough problem. As you can clearly see, I don't know how a radiologist is able to say that what he's seeing there is breast cancer. And the biggest problem is it's good if you can see all the different variations and all the different abnormalities. But at times you're in a situation where this is all the data you have and this is what is normal. Anything which doesn't look like this is probably abnormal. Or you have 10 cases and you say yes, anything which looks similar to this is also abnormal. So that is the kind of data you're usually working with. And then again, finally, since you're in the medical, annotations are not easily available. If you want meta data for what you're working with, it is pretty difficult because you need to get the time of a doctor. And getting the time of a doctor is a very difficult affair if anyone has tried over here. Getting the time of a doctor is pretty difficult in the first place, so if you want annotations, it's pretty difficult. Second, even if you do get doctors, if you manage to get two doctors to it, you will see the doctors will not agree. So this is similar to the problem which Aniraj was talking about in pathology is that doctors will not agree. So if I give the same scan to two different doctors or if I give the same scan to the same doctor at multiple times, he is not going to mark the same area. And there is no way of knowing the exact truth because I can't go and biopsy every cell in this guy's head. I mean that is practically possible. So there is no way, there is no ground truth. So you can just take two doctors, you can take one doctor, you can take two doctors and you can take their markings for granted and find out a way in how you deal with the variability. So these are broadly the challenges you would see if you start working in area. These problems are common not just to radiology, a lot of these problems are common in pathology as well. And typically any problem which you work on in medical in the healthcare sector, some of these you would see these common traits among all these problems. And so I will talk about one problem which is multiple sclerosis and what I do is I actually our team actually has the state of the art in multiple sclerosis segmentation today. And what I do is over the next few slides I will walk you through how we dealt with each of these problems and how it is that we managed to get a good performance on this. So what is multiple sclerosis? Multiple sclerosis is a chronic disease. It causes demyelination of your neurons. So essentially it breaks down communication in your nervous system. So if you have multiple sclerosis you might have poor motor skills, you might not be able to communicate properly. Very terrible disease, it does not have a cure. So what is done in multiple sclerosis patients is that they take an MRI every six months or one year to monitor the progress of multiple sclerosis in your brain. So how they do that is that you know this these are, this is an MRI image. So each one of these are actually four sequences of MRI images. So MRI images you can take in different sequences to visualize different tissue properties. So this is what is called the T1 image, the T2 image, this is what you call a PD image, the flare image. And these all images have multiple sclerosis. I am not sure how many of you can see it. So these images come in the size of you know, broadly the size we had was 181 cross 200. So it come around 200 cross 200 cross 200. So each image has around what 8 million voxels in them, voxel is the 3D version of a pixel. So you have and in this case we are dealing with so four sequences of 3D images. So you can call it 4D data. And the multiple sclerosis lesions here are these boxes over here. So these are very very tiny small shadows which you can see here. And maybe this is the best one to visualize them. Small brightnesses over here, these are what is multiple sclerosis. And you can correlate them with the other sequences to some level. You can see on T1 it's darker and in T2 it's brighter. But that doesn't occur in all lesions. This lesion here is bright in both. So some correlation which you can draw. But in general it's a very difficult problem in that sense that if I give it to the same doctor, I am not going to get the same markings because and this becomes a problem. Why does this become a problem is because when I am monitoring this patient over time, I am asking him to come to the clinic every 6 months or 1 year and I going to get his MRI done. And when this happens how you monitor how bad his disease is, is through a biomarker is called lesion load. And lesion load is essentially the volume of multiple sclerosis lesions which you can visualize on the MRI. And so what happens here is because there is intra-rater variability and if the patient comes 6 months later and there is a very small difference because MS is already so small you are not really able to attribute whether it is the error of the human itself or whether the disease has progressed or whether it has subsided. And this becomes a problem not just in your clinical scenario. It becomes a huge problem when you are doing drug research for multiple sclerosis. So there is a challenge which is in the same context as the challenge which was put out by John Hopkins University and that is part of the IEEE International Symposium Biomedical Imaging. It is held at New York around last year. And so essentially a given training rate of 5 patients. So 5 patients is all you have. And you have test data of 15 patients and you have 3 to 5 time points for each patient. So you are looking at 5 patients. So 5 into 5 you have 25 data points. And test data you have again 15 into 6 that many points which you have to evaluate. And these can you have annotations for 2 radiologists and the radiologists don't don't agree. So they have pretty pretty varied annotations. And dice score. Dice score is as a metric you use to compare segmentations. It is essentially intersection upon sum of oxels. The dice score agreement is around 0.55. One is when it is perfect. 0.55 is you know it is almost like they agree half of what the other guy says. So how do you deal with this problem? So first thing is you know it is a 3 dimensional problem. The input images I have are no longer 2 dimensions. They are in 3 dimensions. So very logically let's apply a 3 dimensional convolutional network. So when we did this last year I think now 3 dimensional networks are much more easily implementable if you have to archer TensorFlow. We actually had to write our own modify the existing frameworks to ensure it does all of this efficiently. And so first thing is you will apply a 3D convolutional network if I am sure most of you by now would be familiar with what a convolutional network is. Just to summarize you will have several some layers of convolution and you will have some layers of subsampling. So what your subsampling layers do is essentially it will take something like 4 pixels and it will make them 1 pixel. So it will essentially take the maximum and take the average. So why this is important is I will come to it. So you should note that you know subsampling essentially it brings down the size of your network considerably towards the end and that's really required because otherwise you will have very huge computational overheads. And even when you do convolution what a common practice that is done is something called strided convolution. So usually when your convolution is done you keep moving over every single possible you move the filter across all possible pixels. You don't skip out any values but when you are doing a strided convolution what you are doing is you are skipping out a few values in between. So you move your filter in the first position and then you will apply the filter in the third position, fourth position if you have a stride of 2. And this causes some loss of information. But it's been proven that doesn't matter and finally you will get a good output. But you do have all these subsampling is essentially losing out some information in between. But you know at the cost of computational cost you are doing it. So if you are training, if I want to train the most naive way to approach this problem is that I will take a 3D patch. If I want to train for if I want to get the segmentation or get the class at any point I will take that point I will take a patch around it. So in this case I will take a 3D patch around it and I will feed it to a voxel YCNN. So voxel YCNN takes a 3D patch and tells me if it's an MS or not. So this is a very the most naive approach you can think of and performance wise this implementation is good because if you think about it you can define, you can probably look at a voxel of the center patch and be able to tell if it's MS or not. The problem, there are a couple of problems here. One problem which you'll see is that my image is earlier typically your segmentation is doable because you're dealing with an image size of 224x224 which is okay. It's around 40,000 which is okay. So if I want to run for every voxel I want to run this image, for every pixel I want to take a patch and feed it to the image it can be done in a considerable amount of time. But if I want to do it on something like MRI here the image size I have 6 million voxels in one image. So I have to run this voxel CNN 6 million times on this image to get image output for one image. And if I have test data of 100 images that's going to take a very very long time. So this, if I want to implement it in a very naive way it'll probably take me a couple of hours to predict for one image. Forget training. To predict for one image it'll take me a couple of hours. So what do you do then? So the first thing which we had to tackle and this is not actually very commonly implemented solution still in communities still catching up is that you, is that how do you convert you know a patch level CNN to a region level CNN. So if I have instead of predicting for every patch by patch by patch, can I feed an entire image and get the segmentation for the entire image. Instead of classification I want this segmentation of the entire image which I'm feeding. So that's essentially what this is. So can I feed a larger image to the same voxel level CNN. So I have voxel level CNN but to the same CNN I want to feed a much larger image and I want to get the segmentation output. So for this the first thing you'd have to do is you'd have to use a one cross one convolution. So I essentially convert all my MLP layers which say there are 500 neurons and so on. I convert all of them to one cross one convolution or one cross one convolution. So it's essentially the same because it's doing the same operation but what the advantage I get by doing this is that it is no longer dependent on the input image size. Convolutions I can apply on image size of any input image size but when it comes to the MLP the MLP I'm hard coding same I am expecting an input of 512 here. But if I am to write that MLP as a one cross one convolution then my entire network can take in any image size and it will give out a corresponding output size. So what your typical networks are trained so that it takes an input of 512 and gives you an output of one value but if my network is fully convolutional I can feed a much larger image and instead of getting one value I'll get a larger array, a larger matrix of values. And so this is what you typically call as a convolution, a fully convolutional approach. But the problem here is that because you have strided convolution like I explained and pooling operations. If your input image started with h cross w by the time you have your output it probably be h by 32 or w by 32. So this is assuming all your convolution layers are padded and so that they don't have any other, they don't change the size of your output in any other way. So if you have just pooling operations and convolution operations don't change the size, you will still have a very very sub sampled output. So you'll have h by 32 w by 32. And there is no way I can up sample this to max the segmentation especially for something as minute as MS you will clearly use all of this data. So there is this very simple implementation trick. So it's an implementation trick because the implemented because by doing this what I essentially am training is still a voxel level CNN in the sense that it's still training for each patch by patch but I'm able to take a much larger image and get a corresponding segmentation output. Now how I'll do this is very simple is that if you look here closely is when you have your typical pooling what you're doing is you'll do all those pink boxes you see over here. All these pink boxes you see you are pooling them to one value. But when you want to get a segmentation output for the entire image what you will see is you need to do you need to get an output which is you are essentially losing that data which is in the blue box on top. So what I do is I will add sparsity to my convolutional kernels. So what doing sparsity so essentially instead of first of all convert my typical pooling to overlapping pooling so it's no longer has tried. So once I do it using overlap pooling I'll get an output which looks like y2 on top and then I convolve using a sparse kernel. Why I'm using a sparse kernel is because the actual output I want upon convolution is an output where I don't want those middle I only want to consider those red dots over there. And if I want to consider just the red dots over there I can do that using a convolution filter which has zeroes in the centers. So in the rows and columns if I just put zeroes I'll get the same convolutional output. So it's very simple and as and when I do more and more pooling I'd have to insert more rows in columns of zeroes. And by doing this I'll be able I'm able to you're able to develop implement a CNN which is able to take much larger region as an image region as output input and be able to get the same segmentation for that region as output. But implementation wise this is the exact same as implementing a patchwise CNN. And so you add sparsity and this results in a speedup of around 1000 times as compared to speedup. So instead of having to run it 6 million times I only have to run it for 1 by 1000 around 600 times is all I need to run it for. It's much much faster. So how do you train next is the first thing is how do you what network did we take? So we trained a network where we had a sequence of 3D patches coming in patch size of 19 cross 19 cross 19 cross 4. You average pull it to make it smaller after you do a convolution of 60 filters of 4 cross 4 cross 4 and average pulling of 2 cross 2 cross 2 and then again 80 filters average pooling MLP of 500 MLP has implemented as 1 cross 1 convolution and then you have a softmax output to give you the probability values and the problem here is if I try training this data try training this network feeding all the voxels the data balance is so much so MS is around 0.05% of the entire data. So your network will simply not learn so what you need to do is what we did is now we are no longer feeding patch by patch we are feeding regions of the image so we take up regions of the image which only have MS in them