 Hi, everyone. My name is Govan. I'm going to be in 12th grade next year, and I'm from Houston, Texas, and currently I'm working in a learning algorithms lab at the University of Houston. So this is my case study on using 3D convolutional neural networks with visual insights for classification of lung nodules and early detection of lung cancer. So the question is, how can deep learning methods be used to solve high-impact medical problems such as lung cancer detection? And more specifically, how can we use 3D convolutional neural networks in this specific application for detection of lung cancer? And another problem everyone probably knows since you're all ML practitioners is that no matter how good your deep learning model is, if it's not interpretable to people in the domain, it's really hard for them to adopt it. So ML models recently, they've been known to have very good accuracies, especially deep learning models specifically. But if the domain expert can't trust the model, then it doesn't mean as much because it really won't gain adoption. So the idea is, how can we use techniques such as gradient weighted class activation mapping or grad cam to visualize the model's decision making and increase radiologists' trust and improve adoption in the field? So just some background on the problem. Lung cancer is a leading cause of cancer death among both men and women in the United States. And this is around even more than 100,000 deaths a year in the U.S. alone. So it's a pretty big scale problem. And one of the reasons, sorry, the five-year survival rate is also only 17%. So when we look at why the survival rate is so low, what we find is that early detection can really improve the chances of survival. However, many times by the time the lung nodule or the cancer is detected, it's already too late in the process for intervention and good treatment to happen. So the idea is that early detection of malignant lung nodules can significantly improve the chances of survival and prognosis. And the problem with this is that detection of lung nodules is quite time-consuming and difficult due to the volume of data involved. So there can be end as well as inter radiologist variants. So what this means is that since the CT scan data is so huge, it has millions of voxels, the lung nodule is pretty tiny compared to the size of that CT scan data. So the problem is that one radiologist might see it and classify this as a nodule while another radiologist might not even see the same thing. So there's some subjectivity in inter radiologist variants. And then the nature of the problem is that it's kind of hard to find the nodule in the first place. It's almost like trying to find a needle in a haystack. So to approach this, I'll do some more background first. So computed tomography or CT is widely used for lung cancer screening. And the goal is to see as early as possible, can we detect that this disease is here. And accurate detection is quite important to the diagnosis of lung cancer. But as I said before, a CT scan can have millions of voxels. And out of these, a lung nodule is quite small and hard to detect. So this is quite the significant challenge for radiologists today. And so what happened is that automated methods recently have shown better accuracies or even comparable accuracies to a manual interpretation by radiologists. And in addition to this, they can reduce subjectivity and inter radiologist variants. So as I talked about before, one radiologist might classify this as a nodule, another might classify it as something else. And then this variance can lead to many problems. So the idea is that using automated methods can reduce subjectivity significantly and improve diagnosis. So more recently, deep neural nets have shown superior performance in classification problems. And deep neural nets are things such as 2D convolutional neural networks. So what these do is that they look at a slice of the CT scan. Normally CT scan data is 3D, so it'll have many different slices. So what a 2D CNN or convolutional neural network can do is it can look at a singular slice of this data, and then it'll classify this as having a nodule or not. And further upon this approach is using a combination of multiple 2D CNNs from multiple angles of that CT scan data. And then that can allow for higher accuracies. So obviously, like 2D CNNs are one approach, but 3D CNNs that use the full nature of the 3D data instead of the 2D data should have higher accuracies in specifically long nodule detection task. And you can think about this from the human perspective to get a better idea. So when a radiologist looks at these 2D scans, they'll look at a bunch of 2D scans in order. And this gives them a more 3D view of the CT scan. And what this allows them to do is give a better diagnosis of whether or not there's a long nodule there or not. So the same thing applies to convolutional neural nets in that if they can see the full 3D nature of the data, the idea is that they can get a higher accuracy in detecting these long nodules. So I'll go into a little bit of background on the different ML methods and then deep learning as well. So this is an example of what a singular neuron looks like an artificial neural network. And the full neural network will have many of these neurons. And so the idea behind neural net is that it's model to be similar to a human brain. So the neuron will take many different inputs, apply some weight to that input, combine all of those inputs together and then put them through some activation function. And then this will propagate among further neurons and then it'll be a full network of singular neurons like this. So this kind of artificial neural network approach has shown a lot of promise in many datasets. However, when we get to image datasets, it's a completely different issue. So traditional neural nets will look like this with input layers and then hidden layers in between. And finally an output layer that makes a prediction as to what we're looking for. In this case, is it a long nodule or not, or any generic test like that. So the problem here is that they're not really suited for image processing for a multitude of reasons which I'll go through. Number one is that images have a lot of pixels and as such, there'll be a lot of weights in the neural net which means that number one, it's just computationally very expensive because there's so many weights. So it's just not viable. In addition to this, since there's so many weights, overfitting can occur. Being ML practitioners, you probably know about how big of a challenge overfitting can be. So it's obviously not ideal to have an architecture that's prone to overfitting. And one of the most important things is that the spatial information image isn't accounted for by an artificial neural net. So what happens here is we think about a new architecture called the convolutional neural network. So basically what a CNN is, it consists of the input and output like any other deep learning method. And it'll have layers called convolutional layers and max pooling layers. So I'll go through this further in the next few slides. But the general idea is that the input will go through a series of convolutional and max pooling layers. And then this will map to some fully connected layers at the end, which map to an output. So you might not know, understand what that means, but I'll go through a little bit deeper. So a convolutional layer will take in some input. It'll consist of kernels or filters. Either word is fine. And basically what happens here is that this kernel will convolve over the input. So it'll start all the way at the left. And it'll convolve, it'll go to the next layer, next layer. And then basically what happens is that the dot product is taken between the kernel and the input. And this outputs to some feature map like this. So the idea is is that these convolutional layers are able to detect certain features in the image. So if you have an image of the face, the idea is is that the earlier convolutional layers can detect the edges, the little edges on the face. And then later on, the later convolutional layers can detect certain parts of the face, maybe the eyes or the nose. And finally, the last few layers will detect the entire face. So this is the main layer that makes up the convolutional neural network. But another critical layer is also the max pooling layers. So max pooling layers basically take the max value in a certain area, and they'll pull it into a singular value. So to give you an idea, like these four from the feature map here are max pooled into the singular pixel, singular feature right here. So the idea behind the max pooling layer is that you reduce the dimensions of the data. And this allows for quicker computation. And we talked about the overfitting problem earlier where in a traditional neural net overfitting is quite the problem because there's so many input features. So what you do is that when you max pool like this, it allows for the general network to generalize better, and it won't overfit the data as much. And this is really good because it allows for higher accuracies. And finally, using the max value allows for better feature detection from max pooling. So the idea is is that a full network will consist of many convolutional layers interspaced with max pooling layers, which, as I said, reduced dimensions allow for better computational speed and reduce overfitting. So each CNN layer has features of increasing complexity. And the first layer, as I said, learn edges, corners, things like that. And then as you go further, the intermediate layers will learn more complex parts of the object. In the face example, maybe like eyes, nose, things like that. And finally, the last layers will detect full objects such as faces. And so the idea is that these CNNs will have convolutional layers, max pooling layers. And finally, it'll output to some decision made layer. And that's just a general idea of how 2D CNNs are. 3D CNNs are essentially the same thing except your input, your input will be 3D data like this. Instead of being a singular layer, it'll be a volume. And then your kernels, this is the key part. Your kernels or your filters will also be 3D. So what this means is that instead of detecting 2D features such as edges and corners, it'll detect the same features but in a 3D fashion, which is really critical, especially in this long nodule case, because all the data is of a 3D nature. Yeah. So videos like time series type data, so that's a little bit of a different approach. But in this case, yeah, you could do 3D data. And time series data also works as well. It's just a little architecture modification. So the idea is that this 3D data is inputted. It goes through some 3D kernels and then it's classified as either healthy or diseased. In this case, long nodule exists in that CT image or long nodule does not exist. So in addition to this, like I said, CNNs like other deep neural networks have been black boxes, giving users no intuition as to how they're predicting, yeah. Okay, yeah, sure. So yeah, feel free to ask questions at any time. I'd like this to be interactive so that I can answer your questions. So this kernel right here. So normally, let me just go back a little bit. Yeah, so normally we'd have a 2D input, right? Just an x by y array. And the kernel is also 2D. So it'll just be two dimensional. And the dot product is, this convolves over the input and then the dot product is taken and this outputs some feature map like this. So the idea with a 3D is that everything will be 3D instead of being 2D. So you can just think of the input as having just many stacked arrays. So it'll have many arrays behind it as well, which 3D nature. And this kernel or filter right here will also be 3D. So it'll just have the weight or the kernel or the input. They're all the same thing. Yeah, you might hear me use these words interchangeably. Kernel, filter and weight all mean the same thing. So the kernel that convolves over the input data will also be 3D. So what this means is that it does a better job at detecting these 3D features that may not otherwise be seen in a 2D input with a 2D kernel. So the kernels are like essentially what's updated over multiple iterations. So in a traditional neural net you'd have certain weights that you'd update. So I'll show you. Yeah, so you'd have these weights that you'll update after every iteration. But in this case, we replace these weights with the kernels. So the kernels would change over multiple iterations. They're essentially what's being optimized to reduce the loss and optimize the network. So exactly. Yeah, so it's essentially just a weight matrix. But instead of traditional weights that are just numbers, it's an array that is convolved. And then the kernels themselves say the same kernel just across the whole image. But you're updating. Yeah, so the kernel will also be 3D. So it'll have depth as well. So the idea is that by adding this depth is pretty much just the same way you're trying to mimic what a human is seeing. So when a radiologist looks at it, he looks at multiple layers together to get a 3D view of what exactly is going on. So the same way by making the kernel 3D and making the input 3D, we're trying to maximize exploit the 3D nature of the CT scan. So you're not losing any spatial information. And that's why you should get a higher accuracy by using this 3D approach. Yeah, they'll move in all three directions. So the idea is that it's still spatial information. But if you look at a long nodule, like in reality, it's going to be some spherical type of thing. So when you look at it in a 2D fashion, it'll be like a circle, right? So a circle is probably harder to detect than like a sphere in a 3D data. So when you look for this like edges that are like almost like spherical, it's easier to detect versus like a circle in a 2D nature data. So the idea is that since the data itself is in 3D, when you map over to 2D, you're naturally going to lose some information, some spatial information, some information that can be critical for detecting the nodule. So when you make it 3D data, you're trying to minimize whatever information you're losing so that you can really get a better idea of what you're looking for and look for it with a higher accuracy. Yeah, so it's still the same edges, but the edges will be 3D, right? Like when I look at you and see your face, I see I have depth perception, right? So I can differentiate someone's head versus like a circle. So like that's the depth perception that I get because I have that 3D view of someone. So that I can when I look at someone's face, I see the edges, but I see them in a 3D nature. I see the depth perception, I see that your nose has depth, your eyes have depth. The same way the idea is that this model can exploit that 3D nature of the long nodule data and acquire higher accuracies. Yeah, so different people have thought of like different approaches. I've tried both of them. What I've found is that most of the time max pooling seems to do a better job of detecting features. At the same time what I've heard is that average pooling can generalize better. So it's generally like a trade-off. You can try both of them. These are things that are usually optimized over iterations. Usually you'll validate on a validation data set and see what seems to be performing better. Am I overfitting too much? Do I need to change average pooling, max pooling? These are all things that are part of the network architecture that the idea is that over multiple iterations when you validate you want to see how changing these features can really improve my accuracy or how is it changing what I'm seeing. So what I found for this project is that max pooling seems to do better at detecting some of the features. So that's what I went with and most of the projects I've seen max pooling seems to perform better. So yeah, that's the idea. Yeah, so training time. Yeah, so if you have 3D it will increase. So I know GPU training usually doesn't have too much of a problem training fast. In this specific case what I did is that number one I downscaled the data to improve training speed and also I tried using lightweight models. So if you use a model with 100-200 layers you might have problems with training time especially if you're training on a CPU with limited resources which in this case I was doing. So what I did is that I used a pretty lightweight model which still had good accuracy in this specific problem as well as downsizing the data and then also if you have problems like in terms of training time one idea is to just decrease data size. Yeah, so I'll stand I used as my I'll go into this further. I think it's better if I move on because this is all stuff I'm going to cover on later in the slides. Yeah, so I think this is where I was. So yeah, so one of the big problems about deep neural nets and deep learning approaches is that while they have high accuracies it's really hard for domain experts to accept what they're predicting because they give no intuition and this is one of the general themes that I've noticed like across all of ML and even in the past few days I've been at the conference what I've noticed is that a lot of people have been approaching this problem in different ways. So as I said superior results and test conditions but real life adoption is not as easy because of lack of transparency in the models. So the idea here is that I thought maybe we can approach this problem using gradient weighted class activation mapping or grad cam. So how many in the audience know it or have used grad cam or know about grad cam? Alright, so a few of you. So I'll go further into what exactly grad cam is but the idea is that we can use this algorithm to provide visual explanations by highlighting discriminative regions in the model. So this is a very powerful approach for figuring how we can make the model interpretable for people in the domain. So the idea here is that the study aims to build a 3D CNN with state of the art accuracy but also provide visual insights into how exactly the model is making its decision and this will allow for better trust and adoption in the field. And in addition to this like I was saying before you want to validate your model and update it across the debugging and optimization process. So the idea is that if we can look at these visual insights we can see where the model is failing, why the model is failing and we can address this by changing the architecture of the model. And to my knowledge this is the first study that demonstrates grad cam techniques for visual explanations on LUN module classification. So this is really the value added here with this study. So the objective here is to research and develop 3D CNNs to detect LUN modules in CT scan data with better accuracy and higher trust in existing models and this is to ultimately aid in early detection of lung cancer to improve chances of survival and prognosis. Yeah so the research questions are just derived from that same objective. Can we prove that a 3D CNN can do better than 2D CNN to detecting LUN modules and also is it possible to derive visual explanations for the internal workings using grad cam methods? So we hypothesize that 3D CNNs which exploit the full 3D nature of the data will have better accuracy and that grad cam algorithm can provide visual explanations for decisions in LUN module problem by highlighting discriminative regions. Exactly with the same model type and also can we improve upon that using some optimization also cover that as well. So I'll go a little bit into the data that I used. So we used the LUNA 16 data set which has almost 900 thoracic CT scans and this data set is pretty clean as already from the start. So the scans with slices greater than 3 millimeters were removed and one of the reasons for this is that the LUN module is pretty small from the start. So if you have slices greater than 3 millimeters you might miss the nodule altogether. So that's some of the pre-processing that was already done. And then images were annotated by four experienced radiologists. So the idea here is that you want to have a golden standard. So we talked about how this problem suffers from inter radiologist variants. So the idea to approach this is that by having a panel of radiologists discuss each image and then label it you have a golden standard for what is actually being seen in that image. And then each radiologist marked lesion as a nodule or a non-nodule and the reference standard is that at least 3 out of 4 radiologists must identify it as a nodule greater than or equal to 3 millimeters for it to be labeled as a nodule in the data. So this is how the data looks. Basically this was full CT scans that were then chopped up into where exactly they identified as nodule or non-nodules. So the idea here is that this is a nodule right here, this is a nodule right here and this is a nodule over here. So obviously I'm not a radiologist. I don't know if any of you are also radiologists but it's kind of hard to tell on some of these images what exactly there is and that's that's where the problem lies right. So in this case like this looks like some amorphous blob that's in the picture but we can't really tell if that's a nodule or not. In this case it's labeled as a non-nodule but as someone who's not a radiologist it's really hard to tell. So that's the idea. There's the problem of false positives as well as the problem of not even detecting the nodule. So twin problems that we're aiming to address using the deep learning approach. So this is the study design. So the idea is that first we gather and preprocess data then we split the data into training and testing data sets as well as validation. We design and implement a model. We train that model, validate that model on the validation data set and we iterate through this process right here. So based on how our model is performing we can change the architecture. We can put in maths instead of average pooling. We can add more layers, change the filter sizes, how many filters there are. So a lot of this is the job of the data scientists basically. Basically iterate through this and figure out how exactly we can make the model as good as possible and then we evaluate on our test data set. So the idea here is that the test data set can only be used once because you don't want to train for your test data set or optimize your model for your test data set. The idea is that we hold out a test data set to be used at the very end only once. And finally we visualize the model using Gradcam. So this is for higher interpretability and the idea is to gain adoption in the field. So first splitting and preprocessing the data. So out of this 900 CT scans we cut down to a thousand nodule and a thousand non-nodule volumes and this is divided into three tests, three sets for training, validation and testing. So the data was completely randomized and 1400 images were used for training and 600 for testing and 10% of the training data or 140 volumes were used for validation. So I'll go a little bit into the architecture and implementation. So I used that AltsNet architecture as the baseline and I also optimized the 3D CNN over many iterations. So the AltsNet architecture is probably the architecture that made CNNs popular in the first place. It did very good on some baseline measures and then it's what caused like an explosion in this whole field of image classification using CNNs. So the architecture is pretty pretty small and the idea here is that you can reduce your computational load. So it has about five convolutional layers followed by three fully connected layers. And my 3D CNN is also pretty similar. It has an extra layer and also what I did over many iterations is that I increased the number of filters and the filter size is early on and funneled it down more dramatically than AltsNet does towards the end of the network. And I believe that in this specific problem that may have helped find some features that AltsNet might have missed. So this is just some model summaries that were output. It's probably hard to read, but yeah, just continue. So now I'll go into the training process. So the training process is basically some images X are input into the model and then this model makes some predictions and then these output or the predictions is compared with the actual value of the label which is Y and some loss function is computed. So this loss function says how far away are we from where we want to be and how can we minimize this loss so that this model is updated and is better at predicting what we're trying to do. So the optimization of this loss function is basically where we update the model with new parameters and in this case as we talked about before the new parameters would be the values inside of the kernels as we were speaking about earlier as opposed to traditional weights in the artificial neural network. So during the training process a soft map activation function was used before loss is calculated and cross entropy was used as the loss function to be optimized and all of the models used atom optimizer with default parameters and a learning rate of .0001. So all of these things are different things that can be changed in and out like for you can use different loss functions, you can use different optimizers. So the idea is that we can change these things and try and get a better model. So once we're done the idea is that we want to evaluate the model with some key metrics. So some of the ones that are used widely are precision recall and accuracy. Accuracy is just pretty straightforward, it's just how many you got right out of the total. Recall is how many you are able out of all the positive values, how many were you able to detect and finally precision is out of the ones you've labeled as positive, how many are actually positive. So these are all useful for different reasons. If you want to detect as many nodules as you can, you want to have a high recall, but you also don't want to sacrifice precision because at that point you'll have a lot of false positives. So generally in the medical domain, people try and focus more on recall than precision because the impact is higher, right? If you can detect someone with a long nodule early on, that's very critical because it can save a life as opposed to if you get a false positive, it's still you'll probably shock that patient. It won't be good for them, but at the same time it's not on the same impact is just missing the nodule altogether. So the final ICNN was visualized using GradCAM to understand how the model is making its decisions and GradCAM is basically an algorithm that uses the penultimate or the convolutional layer right before the fully connected layers and it uses the activation from this convaler and it can utilize spatial information in this convaler that's completely lost in the later dense layers. So this is a critical step to drain radiologist trust because it can highlight discriminative regions and make the model really interpretable to domain experts. So I'll go into a little bit of the software and hardware I used. I used Keras intensive flow libraries in Python. So these are probably pretty popular. Most of you probably worked with them before and I used a library called the Keras visualization toolkit for my GradCAM algorithm and I used sklearn.metrics to get the metrics for all of the decisions and the models. And one thing is that I know a lot of people have to work with limited resources. So the idea is that that's why I used an AliceNet model as opposed to something more complicated like DenseNet or ResNet. But yeah, so my hardware is just a CPU on a Mac optimization function. I used Atom optimizer. Yeah. And that's just based on like a literature review. I thought most people had good results with Atom and it had best results compared to other optimizers. So that's what I ended up going with. So now I'll go over the results, model performance, key metrics, and then also the visual insights that were generated. So this is the AUC. So all the CNNs had pretty good AUCs close to one, showing that they have a good separability and they're able to perform the knowledge of the detection task well. And the AliceNet 3D CNN with an AUC of 0.95 performed better than the AliceNet 2D CNN with 0.94. And I can highlight more differences in the net slides, but this is just AUC. And my 3D CNN performed the best at the AUC of 0.97, which shows that these optimizations that are done over iterations and validated are effective in increasing the model classification ability. So this is a more detailed results slide along with the key metrics. So one thing that I really like to highlight is that this 0.94 recall value, it might only be 3% greater than the AliceNet 2D CNN. But the thing here is that even a 0.01 increase in recall is like saving another patient's life out of 100. So this recall value is really critical, especially if you're maintaining about the same precision. Because if you're able to increase this recall without sacrificing too much in precision, you're basically finding patients that you wouldn't have otherwise found. And by early detecting this long module, you can approach this cancer differently and you can really make a difference in their life. So the AliceNet of the 3D CNN, the accuracy of the AliceNet 3D CNN is better than 2D CNN and the optimized 3D CNN performed the best. And it also has better recall and precision values compared to the 2D CNN. So this is probably the most critical part of it, the visual insights and the model of decision making. So the images on the left are the input images that are fed into the network. And these images on the right are the grad camp generated maps. So the idea is that if we can map this out of a full CT scan, we can show exactly what point on that huge CT scan is giving an insight into the radiology, giving an insight to the radiologist that this is what's tipping off the network and saying this is why I'm predicting that there's a nodule here as opposed to not being a nodule. So some examples, this is pretty clearly the nodule in this case. And then now we can also detect this here as well. This one here. So these are pretty clear images, right? You can see where the nodule is. But then the real value is when it's not really clear to tell and it's a small point on a huge CT scan. Then this is really when it comes into play, because if we're able to really give visual insights into how the model is making its decisions, it just, it's a completely different ball game because now this radiologist can really trust this network and then it allows for higher adoption in the field. So some conclusions, 3D CNNs outperform 2D CNNs in this task. So it shows the benefits of using this 3D data with 3D kernels and how they can really improve our approach in non-nodule CT data. And state-of-the-art accuracy numbers were achieved with the optimized 3D CNN. So this indicates the effectiveness of the approach of building a model over multiple iterations and how it can yield high performance in terms of the non-nodule detection test. And finally, the study for the first time to my knowledge has demonstrated the effectiveness of applying gradient-weighted class activation mapping and shows how it can really provide good visual explanations as to why the model is predicting what it's predicting. So this is quite critical because, as I said before, and many times, general theme is that if you can really make the model interpretable, it can be really useful to clinicians and radiologists in terms of trusting the model and adopting the model. And as all practitioners, one thing we really want to do is not just have a model that can provide high accuracies, but we really want to do something in the actual field. So that's why this grad cam analysis is so critical. So future work would be to review the class activation maps in detail to understand where the model is failing, where the model is performing well, and we can further optimize the network to improve in cases where it's not doing well. And then also, how many of you know about capsule networks? Yeah, so capsule networks are like a new architecture that's been developed. So the idea here is that traditional CNNs use max pooling to solve the problem of pose. So the problem here is that a lot of important data is lost in the process. So capsule networks use hierarchical pose relationships to store this relationship. And what this means is that they can train on much less data and achieve similar results. So capsule networks have shown state-of-the-art performance on data such as the MNIST dataset. But I've yet to see anyone apply to the long-nodule task. So this will make it a logical extension of the work that I'm doing. And here are some select references on different things that I reviewed before I approached this problem. So any questions? Yeah. Yeah, so the idea is that I manually, so they give you where the lesions are. And then these lesions are identified as either nodule or non-nodule. So I just manually crop 32 by 32 around that to make it like a more approachable problem. Yeah, so that's the idea for adoption. So you want at least to know where the lesions are. And then you can chop it up there and then you can identify whether it's nodule or not. And in terms of actual CTs data, you can map these drag cam explanations back onto the full image and then provide that to the radiologist to see like whether or not each block has a nodule or not. Yeah, it was a 3D crop. 32 by 32 by 32. Yeah, thanks for this great session. So one question I do have it is related to, you know, I think you have used those binary classification in there to say it is like it is a positive or is it a negative? Does it make sense to include, you know, one more class where, you know, you are not sure about the results like let's say the value I gave like 0.6 and I'm sure it is in that direction. So that that can be something, you know, which can be fed to or can be given to expert like these are the results I'm not sure. Yeah, so that also makes sense. It's more so much like a regression type problem, I guess is what you're saying. But that's also a valid way to approach the problem. In this case, we use binary labels, but that also makes sense because what we do is we get some output and then we apply the soft max to it to figure out what the whether it's a nodule or not. But without applying that soft max, we can get an idea of how sure the model is about its decision. And then we can up with that as like a number from zero to one instead of a binary classification label. OK, thank you. And one more question, which is like in the initial you said like, see, these are these were the images which you can provide. But I think there was some gentleman who asked like, I can feed up a video as well, right? Which was just done. In that case, you said, you know, the architecture would be different. But if I would assume in the end, you will need these images, right? So you'll just take videos and then you'll just capture those images. And then yeah, so that's a general idea. So I haven't done too much work on time series data, but I know that CNNs can definitely be applied. And in terms of data that comes in series is like that, I know there's networks called recurrent neural networks or RNNs that are also supposed to be really good. I haven't really delved much into that area. But I know that definitely CNNs have a role to play in time series data like video as well. So the healthcare data is whatever the data we are getting from open source or you got it from any hospitals. Yeah, so this data, the Luna data set is open source. So it was basically like part of a grand challenge that they release some open source data. But in terms of medical data, I've worked with that as well. Generally, you have to go through some certification before you can access this. So I know for I know what I've heard from a lot of people is that electronic health records are very good data sets to use. But at the same time, there's a lot of approval you have to go through. So the more closer you get to the hospital government type data set, the more certification you'll have to go through. But there's also a lot of good open source data sets like this, which you can just find online. So that 3D CNNs, mainly we can use for healthcare or we can use for like a sport a sport recognition. Sports means that balls are above it. Yeah, so 3D CNNs, one of them is. Human action recognition. So what are the fields we can use for 3D CNNs? Yeah, so 3D CNN is one of the reasons they're so powerful along with other deep learning approaches is that they can use for anything. So anything that has an image, anything that has some kind of volume, you can apply 3D CNNs to it. So you can use it to detect literally almost anything. It's pretty much the same as our eyes. That's why it's considered to be so powerful. OK. Yeah. So what is converted into a vector of formant? Can I say I'm stacking like six vectors over each other? Yeah, that's I guess you can think about it like that. The idea here is that, yeah, it would produce a 3D volume. And obviously you can't really see a 3D volume on a slide. So what I did is I just took one slice of it and displayed it next to a slice of the 3D image, 3D volume. So a lot of the data we're talking about in this project is 3D data with volume with a z-axis as well. But to demonstrate it on a slide, what I did is basically I just cut out one piece of it and then so that we can show it. So I'm trying to understand how do you have a vector of formant? Do you have like six different vectors stacked at one layer? Yeah, so it's just like, so normally you have 32 by 32 images. But in this case, we just have 32 of those. So we just stack all these on top of one another and that's your z-axis. So you'll just have an array of 32 by 32 images. So I don't know if that answers your question. 3D model that you work with. Yeah, so I don't remember the exact numbers. It didn't cause too much of an improvement, which is good in this case because it's not too big of a data set and the models are too complex as well. But I know that I experimented with this kind of stuff on bigger data sets like 80 by 80 images, 320 by 320 images and the 3D versions of those. I know it can cause quite a jump in computational time. And one of the ways to address this is to downscale the data. And even if you don't want to downscale the data, what you can do is you can downscale it. Then you can go through multiple architectures, see what's working out, just for quicker iterations. You can get a quicker idea of what's working and then you can use it on the full data set. That way you're not wasting the time going through multiple architectures on the full data size and you can optimize what time you have. And then another approach is just to get better hardware. I know GPU training is quite powerful, TPU training. Yes, we did try a few models on deep learning with respect to 3D, the input with 3D. That was basically for something called less kidney cancer detection. But we found that 3D CNN works two time-consuming models. Have you tried on GPU training? Yes, it was a GPU. We basically work on GPUs. So CPUs are no question. Yeah. You can't wait for the entire day to get one epoch output. So it's like GPU training but we found it two times. So one approach might be just to use smaller data, get quicker iterations and then you can scale up to the full data size and then apply. Yeah, so your accuracy might decrease but what I'm saying is that if you downscale like this, then you can see what's getting better accuracies in relation to other architectures. And once you've realized this architectures what's working for me, dense nets working better than ResNet, then you can full scale back up to the full data size and then try it on that. Since you're working with hospitals, right? Are you trying to evaluate your model against your real-time data before going further in your analysis? Yeah, so future work would be to basically try and talk to a hospital see if they can implement this solution. I haven't really gone that far yet but that would be definitely something I'd want to do. Don't you think chopping 32 by 32 box into an entire city image you are losing the global information where the nodule is because radiologists they look where those nodules are. So when you make small boxes the global information is lost. So generally I feel like nodules are pretty self-contained so the spatial information you want to look at is like within the nodule itself right in this case because there's a lot of things that are just in CT scan data that just not a nodule and there's so what you want to do is you want to look at this nodule and figure out whether it's a nodule or not and that's how you approach the false positive problem and the problem is that like you want to get this full spatial information but generally nodules are pretty self-contained because if you can get the whole nodule in one chop of the data then you can pretty much figure out whether it's a nodule or not without much spatial information from other parts of the CT scan. So in this case I don't think it's too much of a problem I guess it's something to be explored whether using the full CT scan can allow for better accuracies but in this case I don't think it caused too much of an issue. All right thank you. Thank you.