 Good evening everyone, welcome to this Journal Club meeting. Today's topic will be taken by Dr. Amai Kulkarni. He is a staff radiologist at Joravinsky Hospital and Cancer Centre at McMaster University in Canada. So as you know, in a field like radiology, there are constant updates and technological advancements and newer reporting systems and data analysis and that kind of thing. And with our busy schedules as doctors, it is very difficult to keep up with this kind of technological advancement. But because for this, we have the Journal Club meeting, we have it every month. Today's topic will be combining the strengths of radiologists and AI for breast cancer screening. The publication is on the RSNE website. I've posted the link in the chat box for if anyone wants to go through it once during the lecture. So if I can hand over to you. Yeah, sure. Thank you so much for the introduction Abhishek. First of all, thank you for Indian radiologists to give me this opportunity to talk about this important topic here. So as Abhishek mentioned right now, Journal Clubs is a really good opportunity for brainstorming some of the ideas going over some of the published literature and then trying to make an informed decision about what you feel about that paper. As we all know, artificial intelligence is the big buzzword, no matter where you go in whatever field of medicine or even out of medicine. It has just become a really, really important aspect of our everyday lives, including use of chat GPT. So I'm going to jump into my presentation. So just Abhishek, would you just let me know if you can see the screen share. Perfect. So as we all know, the talk is essentially to talk about this journal, but I thought that I will take this opportunity to ask also briefly discuss about what is artificial intelligence how we are what are the basic things that we need to know when we're reading any AI related paper or any or getting involved in any area that talks with our colleagues. So we will basically go over that then I'm going to use something called as a claim checklist and I think Abhishek has provided a link for that claim checklist in the chat. The claim checklist is nothing but a checklist which which you can use to just go through when you're reading any AI paper to make an informed decision about what is the quality of that AI research. So you can go through that the journal article itself is from the Lancet. It's an open source article so you can download both the articles. Let's start with some questions so if you don't mind. Can you please pull up the polling questions if you can but if you don't have polling I don't think we have too many participants live on this portal right now. So you can even use chat for answering it's up to you. So we can go over the first question so you can, you can go through the polling questions there so do you read any AI related AI related papers that you can. Give me a second it's just causing something on my screen. Okay so question number one is do you read any artificial intelligence papers as of now, so you can just say yes or no, or not frequently, I've given you three choices. So I can see that some people are answering the first question the second question is you can already see, can you evaluate your people with the tool as you already know we are going to discuss with that. And then the third question will be, what is more important when it comes to evaluating any of these AI papers do you want to know more about data sources, or description the model is more important, or detailed training process how they train the model is more important. So maybe I'll give you a few more seconds and then we'll move on. Alright, so it doesn't matter so we can we can go ahead, but we are going to come back to these questions at the end again, just to briefly discuss of what we learned. So thanks, thanks for getting the poll out. Okay, so we can hide the poll. Alright, so let's go into the topic so disclosures. I personally do not have any relevant financial disclosures. This general presentation, but I have all have listed the disclosures from the authors who presented or published this work. So what is the problem why do we need all these AI related papers or AI softwares, especially when it comes to the world of breast cancer screening. As we all know that there are, like in any screening program, there are true positives, which are going to be the exams that are going to call back and they're going to turn out to be pressed cancers. Then they're going they're going to be false positives. These are the ones which are when you call something back on a screening mammogram, and it does not turn out to be anything. Then true negatives are that you read a mammogram as a virus one or two, and after 24 months or 36 months they came back for the repeat screens and there was nothing that developed on the mammogram so which means that you did not miss the cancer. Then what are false negatives so false negatives are the mammograms which are read as pirates one and two when you're doing a screening study, but they essentially have a small canner cancer which could be missed. And with any screening program, this is always a risk that it might happen that you have some of these false negatives, you might have a bunch of false positives, and false positives are nothing but a majority of the call back rates which is usually between 5 to 12% depending on whether it was a baseline screen what was the category of density of breasts. These are essentially are somewhere around four per thousand for most of the screening programs. So we want to see if we can develop something which can detect more cancers amongst these false negative group. So for example, if you read a mammogram, gave it as buyers one or two, you said that the study is negative, but there was actually a small cancer that was missed. So we want tools to capture those things and this especially becomes more important in category C or D density breast where detecting breast cancers can be a bit of a challenge. Then we also want to develop tools which will help us minimize the positive so which means that we call something back, but it does not end up being anything so that this is a quite anxiety provoking for some of the women who undergo breast cancer screening that the moment they get a call from the screening center. Oh, by the way, our radiologist wants you to come back to our center, they have this flight of thoughts like is it a cancer, how big is it will I survive for five years 10 years so it can be anxiety provoking for quite a few patients so we want to develop something which can also minimize some of these false positives. These are the current options that we have. So as we all know some of the mammography softwares that come that come with something called as a cat, which is nothing but computer aided detection. So, all the previous generation cats they would detect an abnormality on any of the screens, it will put squares and down kind of markers for either calcifications or a mass or asymmetry. There's another version that we have, which most of the industry leaders have been calling as AI cat there are multiple different names for that. It does not only detect the abnormality but it also tries to give you a score by some classification to really like an x or z score which will say it is 58% chance that it is valid or something like that. Some of the newer AI cat techniques can also pick up a slice of tomosynthesis stack that this is the slice where the abnormalities present. So we are quite far in using these cats cat tools as of now. The other options that we have is something called as peer review. So you read a bunch of screens and you put some of those screens as for double read for one of your colleagues. Now this type of this type of double reading or peer review is widely used in many, many parts of Europe. So even at our place we do tend to do a little bit of peer review, but it's not completely randomized we usually like to limit it to things like for example a single view asymmetry which we don't feel confident to call back, we put those cases for double reading and then we make a consensus or conference opinion. We can either design conferences around to do those kinds of things or design a workflow where a random portion of all the screens go for double read. Or the third option is you could potentially explore the role of artificial intelligence as your peer reviewer. So we are going to discuss about that in this paper. But before we do that, what is artificial intelligence many people think that just a big blocks, big black box something goes in something comes out, and then we have no idea what it is so for that we need to go back into history So the person who changed it for us is this individual. He designed a AI model involving convolution neural networks with multiple layers of the networks built within it. And it was called as Alex net and it won a large scale visual recognition challenge in 2012 and that is when everything started to revolutionize in the world of radiology for us. Does it always work. I mean, we never know. I mean, if we all remember the launch of this iPhone in 2012. So Craig was trying to, you know, show how the face detection works on the iPhone and it failed on live stage. So, we are still trying to, you know, get all the pieces together and get everything going in real life. It has come a long way, but there are still some challenges. For example, if an algorithm is asked to identify how many of these photos are dogs, it is going to have a hard time because there are some dogs, and there are some chocolate chip muffins in all these pictures, and some of them actually look like a dog. So the algorithms need to perform so well that it does a job of identifying things into a given in a given stack of images, as good as or slightly better than radiologists. If we want to really use AI as a leveraging tool to minimize some of the burden of our work, or to help us aid in detecting some things which we might otherwise miss. So the biggest challenge, as I said in screening mammogram are category C and category D density press. So this is a screening mammogram with MLO views and CC views, and there was a locally advanced cancer on this mammogram which was initially missed at the time of screening exam. The patient came back with a palpable concern and there was a massive lesion in the right breast. Then we employed some AI tools. For example, this paper here, I've quoted that at the bottom of the screen. They tried to detect some of those abnormalities on these screening mammograms and then all those red dots that you are seeing on these mammograms are nothing but called as heat maps. So the software developers as well as people who know how to write codes and extract data out of these algorithms, they can generate these things called as heat maps. So they are just trying to understand how this algorithm is working to detect abnormalities. So in this image, the patient actually had a small cancer and the AI algorithm actually picked it up. Whereas in the other case, the suture markers which were placed as well as a small metallic marker from previous biopsy, the algorithm was trying to detect all of those things as abnormal. So you need to really understand how it works to get a complete sense of how you can, how you can embrace this technology. So how can we come up with some solutions? So there are different methods for representative classification tasks. So what is representative classification tasks? You are going to throw a bunch of images at this algorithm and the algorithm is going to give you a binary outcome, whether the, for example, the screening mammogram is abnormal or normal, or for example, if you add a stack of CT images and say whether there is appendicitis or no appendicitis, just to give an example. So these kinds of classification tasks can be done by some of these models. So I know that this diagram is busy, but I have quoted the reference there down below if you want to read more about how these classification tasks are designed. Most of the radiologists including myself, we are involved in the space of AI research when it comes to this portion of artificial intelligence. This is nothing but we are trying to use some predefined engineering features and then trying to see how we can enhance those features or use those features to detect certain abnormalities or quantify certain abnormalities in images. For example, you will have a retrospective data of say 25,000 receptive lung carcinomas, you will go in and do segmentation of each and every one of those CT images, and then try to extract texture features out of it, either using some types of some types of software programs using different computer language options available out there such as Python. And then you see what texture features are having a recurring theme that these features are causing abnormalities. And then you throw a bunch of other images at the algorithm by designing an algorithm and trying to see if you can reproduce your findings. So this step is somewhere most of the radiologists get involved in doing research. And if you look at this portion here, which is called as B. So this is nothing but deep learning. This is the one where you can use what we discussed about Alex net or are those deep convolutional neural networks. So this can be supervised or unsupervised or semi supervised supervised will be that a radiologist or an expert will keep be keeping an eye on how the algorithm is detecting findings. So it will be refined over multiple multiple cycles by using different types of data sets, and then you get an outcome as yes or no. And there are all these multiple layers of complexity which are in between, which are doing all the work for you so for example, you give it a chest CT and then it will try to find out where is the cancer. So this is just a basic of how some of these algorithms are designed and how they work. So as we discussed, there are these two kind of types of representative classification tasks. This is a really good paper if you want to know more about deep learning and I feel that this is really important before we dive into our today's journal club article because we need to know what these algorithms are actually doing when we when we are asking you to detect abnormal screen or abnormal screen. So there are different types of image classification tasks. Let's start with, for example, this is an actual CT of the abdomen through the liver. This is portal venous phase you can see the blood in the, in the portal venous circulation and oral contrast was administered. You can definitely see that the patient has liver lesions, and then there's also normal liver normal stomach and the normal screen portion of the kidney bone soft issues, etc, etc. So one task that you can do is of something called as object detection so we talked about that when we discussed about cat right. So this is nothing but trying to detect abnormalities and trying to say okay this is metastasis this is aota this is stomach and this is spleen by putting a different color codes and trying to give you a better kind of an idea. So the CAD models that we used before the AI CAD for mammography screening they would give a circles for asymmetry and mass and squares for calcification so that was a task of object detection. Then the next task is going to be semantic segmentation so what is that so it tries to segment the area of suspected liver metastasis on its own. And then it color codes that for in this example as yellow and rest of the stuff is blue so it is just trying to pick up the area with metastasis so it is a specific task we're asking it to identify any specific type of abnormalities on the image. Then the next one is instance segmentation so not only it detects that these are metastasis but it also segments individual metastasis separately so for example 1234 it has segmented those separately. So each instance of metastasis is being segmented on this size. So this is all good right I mean we are really happy that we have algorithms but then how good are we doing. So this was a really famous trial. I have given a link down below, but this was a really famous trial where they tried to see if they can train pigeons to detect abnormalities of mammogram, as well as pathology slides, and they saw that the accuracy for calcified as well as non calcified findings it increased from 50%. To 85% after training the pigeons. So do we really need AI. Now I'm going to leave at this point and then jump on to our discussion today. So, if anybody wants to access this publication you can just pull up your phone and scan that QR code and you will get the complete paper. So the title of the paper is combining the strengths of radiologists and AI for breast cancer screening so see how we are trying to leverage AI to improve our outcomes in this paper. And the most important thing to know here is that this is a retrospective analysis. So before we move on. If anybody has any questions at any point just note them down or type them in the chat or we can discuss at the end of the session. I think there are also some participants on YouTube live they can type their questions on YouTube live as well. So now, let's go to the publication, and then try to see what was the purpose of the city. So they, the main purpose is to integrate AI into screening pathways so right now as you know that most of the places just incorporate a radiologist who looks at a bunch of screening studies and then decides whether they are normal or abnormal abnormal experience get called back. So some places as we discussed earlier today they employ some form of double reading either as randomly assigning some cases to other reader or using conference or consensus rounds to discuss whether they should call back certain findings or not. The goal of this study was to create a two part system and they're going to go that and go into that later. And then they wanted to assess what is the effect on sensitivity specificity when a stand alone AI system was used, or it was used in the setting of a decision referral type of approach so we'll discuss about that later. The, for the purpose of this study, one of the authors is a developer of this particular software which has been used all the codes and the algorithm of the software is is shared on the, on the main publication website in multiple different appendices we can read about that if you have more questions. Now what is the problem and what is the need for designing this kind of study. There are a lot of studies which talk about how AI has been comparable to radiologists in detection and classification of abnormalities. Then, usually what happens that the data is insufficient when it comes to the accuracy of these examinations of these algorithms in the setting of screening population, because remember a large portion of screening population is going to have normal exams, and there are 24 out of 1000 or somewhere on the abnormal mammograms with a cancer. So the large portion is going to be a normal population whereas most of the research studies look at and reach data where there are more cancers. So this is an abnormality or this is the problem which needs to be addressed in the screening setting. Then, most programs still use mammography as an agent so it is not technically alleviating any burden of radiologists to do the task of double reading or anything else. The results of increased AI in screening has always shown that there is some reduction in sensitivity of the screening program itself. People have used AI in triaging followed by incorporating recommendation for supplemental screening. However, the problem in this type of study is that they did not consider the inter reader missed cancers remember I talked to you about false negatives so they did not consider that just give me a second I'm actually doing this presentation from our hospital and we have motion detector sensors so the light just went out I just started just give me a second. So these are the problems and these are the problems that we are trying to address in this paper. So should be, will it be probably, you know, nice to imagine that something like this is going to happen. So this was a nice cartoon which I bought online so there's a computer which is doing the task of, you know, your double reading or detecting a diagnosis. And then the radiologist is doing something else as it is shown here just trying to relax which is never the case but. But then what happens is that the computer is saying that I'm being sued for missed diagnosis. So what do I do now so there are quite a few ethical concerns and issues that still need to be addressed before we completely embrace AI as a technology for helping our day to day practice. So the solution proposed by this group is that why, why don't we try to see if we can combine radiologist and add together and see if it can be one of the alternatives to consensus conference or double reading so at least the task of the second double reader can be minimized so that they can provide care to some other patients or do some other work at the same time. So how did they achieve that so they the current pathway which they have is you do a mammogram, then reader one reads the mammogram reader to read the mammogram and then there's a consensus conference decide whether it is a virus one or two or by zero. The second option is where they have a standalone pathway where reader one will read all the mammogram AI will read all the mammogram on its own and then there will be a consensus conference. And the third and the most complex pathway that you can see here is what they call it as decision referral pathway in which the mammograms will be read predominantly by an artificial intelligence algorithm. And what happens is that all the mammograms that the AI software feels confident about their triage as normal. All the mammograms where there is the it lacks, it lacks confidence, or then it is confident for a cancer, all of them go to a pathway of decision referral. And this is when the radiology steps in, and then tries to figure out how to deal with those cases so this is in a setting where it is used as a decision referral pathway. This is what paper also explores about. Now where did they get data from. So all of the mammograms, which are part of this study are from German national screening program between January 1, 2007 to December 31, 2020. So there were eight screening sites, and they obtained. necessary regulations clearance as for you regulations, and then they divided the data into an internal test set and external test set now this is really important because you want your algorithm to assess multiple different types of images at different locations or different settings, as that is going to be helpful for them to understand if the algorithm is working fine in the real world. So then they created once data set as a test set, and the second one as a screening set. So I'm just going to turn off my video here because the light is just turning off all the time, but I'll turn it off back later on. So now you have two data sets which are going to be used for assessing this algorithm, and they gave a split of how many cancers and how many normal mammograms they have. So this is the data pre processing step, which is again really important before you test your data on any AI study. So I don't, I will not go much into the details of all these things here but essentially the bottom line is that if you look at the data here this is the internal testing set, internal testing set, all the six centers. So all of this data was mixed, and then it was subdivided into three sets called as training set, validation set, internal test set, and then you have the last one where there were two centers which were completely separate, those are called as external test set. Now this brings me to a really important, really important concept of something called as what level of the study is data disjoint. What does it mean? So any patient belonging to this group was never a part of this group, and vice versa. Now why is that important? So when you train your AI algorithm using this training set and validation set and then test it on the internal test set, and then you send all of these images to the external set to this algorithm again, it is going to have a bit of a concern for bias, because if some of these patients are also part of this set, then it already knows those patients. So it is going to detect those cases perfectly well. So this is like you're trying to prevent AI from seeing anything from this data set, and this is called as the data is disjoint at the patient level. So each and every patient belonging to two separate groups was not mixed at all. So this is an important concept and just shows you robustness of that study. So results. So this is the important thing to remember. So what did they find? So there are a bunch of AROC curves here, but essentially what you see is this small, really, really small square here that is the most important one. So the red dot is for a radiologist, the purple dot is for an AI stand alone system in this case, and then the blue curve is nothing but your cancer detection, the mean or the average cancer detection. So the radiologist is actually work is actually doing a really good job of detecting cancers when it is used as a standalone technique when the AI is not being used for any work done by the radiologist it is doing something on its own. So the radiologist did a really good job in this case. Now this is the second one where again the red dot is the radiologist. And then again the second one is the AI standalone so if you look at this one this was tested on the external test set. So even then radiologist was doing a good job than a standalone AI. So it just shows you that radiologists are still going to be really beneficial when it comes to delivering cancer care. AI is going to be more useful in situations where you need that extra help or you need something else that can help you go through your day go through your volumes are going to add more value to the work that you're already doing. Now, this is an important thing to understand so they published their results, also in different types of densities. So for example, category a category B category C category D so as you expect the category D density breasts, they're quite a few amongst the entire population, most of them are category B and category C. So even in category C and D density breast, the algorithm is actually doing a good job. And then this is just a split of all the different by rats by rats categories three, four, five and five, like malignant with invasive component, because this is how we get our data for all the cancers at the end. So this just shows you what were the different types of abnormalities that were detected by the algorithm as opposed to an individual radiologist. And just to give you an, just a color schematic diagram so radiologists are going to be in this like pink kind of shade green is decision referral model and purple is standalone so you will see that the, the referral based decision referral model has actually done well when compared to the individual readers like radiologist alone or alone. So for instance, for example, for example, if you look at this inside you disease, where the standalone model has done a little bit of a good job than individual radiologist. Then if you look at this intramamory lymph nodes. This is where the decision referral pathway has done a good job then both of those. And obviously, in the setting of globally symmetry radiologist have superseded all the other AI models so charts and diagrams are usually the most important to go through any of you in any paper essentially. So this brings me to the end of our first part of the presentation we went over the purpose of the method in which how they did the entire study, the results what they found after doing the research. Now, the next most important thing that I want to discuss in the second half of this presentation is claim check, checklist, but before we go there doesn't even have any any questions that we want to go in this first part. No, it doesn't look like we have any questions so far. Okay, sounds good. Okay, so I'll just keep on going to the next session then. So claim checklist so what is claim claim stands for checklist for artificial intelligence in medical imaging this was released by the radiology artificial intelligence journal. So you can just look at this QR code and then you can bring up the claim checklist, and then we will cover that checklist together. So if you have minutes to bring up the checklist on your phones or devices that you're using, then I'm going to bring up a word document of the checklist that I want to show you all. All right, so do you see the word document there. Okay. All right, so this is the claim checklist so essentially what you're supposed to do. So let me give you a bit of a historical background about this checklist and how we, how I became aware of this one. So we started something called as an AI general in our program for all our residents and fellows. And then it was essentially led by one of my colleagues, who is an expert in the field of research as well as AI. And then we decided that we want to create some kind of a structure when we are doing the general clubs. And then the most important thing that anybody should remember is that there is going to be some industry support when it comes to publication as well as providing grants for these AI projects. And you have to be really careful about the quality of the AI research being done before you really understand the results and before you try to decide okay this is what I want to do at my practice setting. So that's why this checklist was developed like you have to go through each one of those questions separately. There is an article available online as well for the claims checklist which gives you idea of what each one of those question numbers means like for example, what should you look for and study objectives what should you look for a scientific clinical background. So it's a really self explanatory process and it actually gets pretty, pretty easy once you do a few of those. The fourth most thing that you have to know is that once you start from patent abstract the important question is that did they identify any AI methodology so for those who have brought up the paper on their on their laptops on their devices, you can see that in the in the abstract, they have the authors have made no reference to what was the AI methodology was it deep neural networks convolutional neural networks, did they have some kind of a software. So that is a big issue, because not all the articles are going to be full text or free full text. So you would not even realize if you want to buy or pay money to get access to this article or not. So you will have that in the abstract. Then, the next step is structured summary of study design methods and results it was provided by the authors here, including how the population was divided into internal test set and external test set, how it was tested on the model and what were the results that they got out of that. So that is a really good indication that they provided in the title and abstract. Then we go to the introduction part, you need to know for any paper, not just AI but for any paper you need to know what was the scientific and clinical background including the intended use of this of this software as well as clinical goals. So in this case, they made it pretty clear that they want to use AI for the task of a double reading or consensus reading in the setting of either using as a standalone reader or using it as a decision referral approach. So it was very, very clear. So that is really nicely explained, and they laid out all the objectives and hypothesis so you are pretty confident about that. Study design, so this was a retrospective design, you can even see that in the title of the paper. So it is important to share that information with the readers because otherwise you cannot make an informed decision of applicability of that of the research. Study goals such as model creation, exploratory study, physical study, non-inferior trial. So this one was exploratory study because they want to see if they can apply AI as a double reader. So this is an exploratory study. Data sources, they have mentioned it extensively well in the study that what data source was used, where they got all their mammograms from, what was the nature of those studies, why were they done. So it was quite useful. So eligibility criteria, everything was fine in the study except for one issue, which was they did not describe how were the patients selected. It says that there were some 1.1 million mammograms which are in flow in the study between the time period from from 2007 to 2020, but they did not say whether they were all consecutive mammograms, or were they randomly selected. That is again an important aspect because if they were not clear in the way, in describing the way they selected the patients, there is a potential risk of introducing bias. So we had some bias by selecting a few instances of cases where it would, it would tip the finding in the in one direction than the other. So it is really important to understand that how the data was selected. Pre-processing steps, they include that very well. We already went over that. Then data subsets, they decided and explained it very well how the data subsets were created. The data elements were defined really well. We exactly know what they studied and how the patients were assigned to different categories. Now de-identification methods. So this was fully anonymized mammography studies and patient data was not available for most of the AI models. And then it was only used in the, in the testing and model validation step when they tried to create the model, but when it became, when it became a testing sets for internal test set and external test set. All the data was anonymized and stripped off of any patient data pathologies or anything like that. So there was no way for the algorithm to know whether this mammogram was abnormal or normal. So essentially what you can imagine is that they try to simulate a screening environment by in the retrospective data. So the other problem was that how missing data were handled. So this was not clarified in the paper. So you have a little bit of an, you know, a risk of bias. For example, they said that normal mammograms were considered to be studies which were normal and then 24 consecutive months they were monitored. And then they saw if there was any recurrence of any, any new cancer which developed in the next 24 months, but they did not say what did they do for patients who did not follow for the next 24 years. They included patients who have 24 months of follow up. So this is a clinical, this is basically a concerning finding because they did not see how they handled the missing data. Then they did not say if, if the patients who had a biopsy at one side but then they moved to another site for their surgery, how was that missing data handle, because sometimes there could be up staging down staging for example trying to figure if you have just the diagnosis of DCIS on biopsy, which is non-invasive, but at lumpectomy it can get upgraded to invasive carcinoma depending on how the biopsy was done. So if the surgery was not done at your site and the patient did not undergo any surgery, then what happened to them? Was there any, any evidence that we missed the invasive carcinoma in those cases? So these are really important steps where you want to discuss about how the missing data was handled and they did not describe that in some cases. Now the ground truth when it came to, when it came to designing the algorithm was histopathology. So they even had normal histopathology diagnosis for patients who went into the testing of the algorithm. And remember this is the test set and validation set, which is a step before giving that algorithm an exposure to the internal test set. So this is just development of the algorithm. So at that time it was histopathology, which was the ground truth, and then that was as a reference standard. So the next one is that number 15 rational for choosing the reference standard. I'm not going to explain that at all because they did not talk about the alternatives. They talked about their own algorithm and how it is working as a second reader. So this is really important. So any, most of the, most of the research paper will have something where they will say that simultaneously three models were developed with different types of inputs. And those were tested randomly at whatever data, and then the robust model was selected for further steps. So in this case, it was not described, all they said was that there was one model, and the weights of the model was changed. So it could be one of the subset of different steps or different models but essentially they need to have more than one model, which can be used for testing and then one of those can be selected, whichever performs the best. Then the source of ground truth annotations qualifications and preparation of the annotator. So this was all done by radiologists. So they looked at all the buyers for a five buyers three or any buyers one or two lesions which for biopsy, and they annotated them using small octagons or hexagons and then that data was spread into the algorithm, and that is how the algorithm was created. So measurements of inter-reader and inter-reader variability. This is an important step. And then what methods were used to mitigate variability and or result discrepancy. So in this case, they had, they basically had two different anonymized readers, and they were trying to independently sample or segment the lesions or find abnormalities. And the whole data set was re-sampled with replacement. So that is really important step and this is how they made sure that they accounted for inter-reader variability. And this is what is going to address one of the issues of inter-reader missed cancers. Remember, one of the points we discussed in the problem slide that some of the previous research did not consider this inter-reader missed cancers. And that is what they were trying to mitigate in this study. So it definitely adds more value to the previous ones. Intended sample size, how it was determined. So essentially, they did not give any indication of how the sample was determined. They just said that it was a retrospective data set from 2007 to 2020. So they did not describe how they arrived at this time frame. But he would assume that probably it was considered sometime, but it would be nice if they would have given it in the publication. So how was it assigned? So we already discussed that they compiled six centers together and two centers together. Among the six centers, they randomly divided that population into three different proportions. Each one was assigned a test training set, validation set, and internal test set. And the external test set was totally different. So that is really well done for this paper. We already discussed about point number 21. In this case, the data was disjoint at the patient level. Really important step for any AI algorithm. Because you don't want the algorithm to know what already exists in your validation set. Otherwise, it is going to be biased. You will know that patient, it will know that image. So it is going to give you falsely assured results. So in this paper, they have given all the details of the models in the paper as well as their separate appendix, which you can open and really look at what software libraries were used, what framework was used, what packages were used if there were any. And then number 24 initialization of model parameters. So this one is something called as transfer learning. So you, you basically teach the algorithm with the help of radiologists and then you transfer that knowledge to understand how it can work in the real world. And then number 25 details of training approach. So it is also providing the appendix really, really important if you're reading any paper where they don't provide these details, never trust that paper unless you can email the author asking for these steps. So can you please give me the details of training approach, if not provided then you have to guess that something is wrong and this is not a good quality paper method of selecting the final model again as I said there were no multiple models used. So this is not really mentioned for this one assembling technique. It's one of the techniques which some of the AI models use. This is not applicable for this study. The metrics of model performance was judged by the outcome of sensitivity specificity when it was used as a standalone compared with the radiologist or as a decision referral model when used in conjunction with the radiologist. The physical majors are mentioned, and they have gone through the statistic really well. So they have used area of receiver operative curves, then robustness of the analysis and sensitivity analysis. It is quite good validation or testing on external data set, it is really important but some of the researchers really struggle here. It is really difficult to get external data. So most of the times researchers end up having their own data into two different models, or two different groups and the second group is called as external data but that is not really the case. What is the real world problem when you don't have external validation or external data. So for example, you will see that there are a lot of AI algorithms which are available commercially now. And it's a separate discussion, I don't want to go into that right now, but when you or your practices buying any new AI product, then you have to use another type of checklist, it is called as Echlear Checklist, even just like our Echlares. So it's called as Echlear Checklist and you want to really make sure that how the algorithm was tested and how validation was obtained. And this is that step. So what does it mean. So I'll give you an example. Here is, let's let's imagine that there is an AI program which is designed for detecting abnormal mammograms in a subset of population in say North America. That subset of population may have slightly different proportions of category E category a category V category C and category D density press. So for example in our population if we look back, there were around 55. I think mammograms or something like that on one of the charts there. So the proportion of category D density breast is low but if you deploy this algorithm in an environment where the proportion of category C and category D density breast is really high amongst your spinning population, then your algorithm may not work exactly how you want it to work and how to overcome this. So imagine that if I design an algorithm, what I would do, I would reach out to different parts in this world and try to see if there's any possibility of collaboration. And then you do this thing by either asking that collaboration site to send their data to you to test on your algorithm, or you export your algorithm to that site and ask them to test it on their data set. So this, why is this a problem, it is because of the review ethics board restrictions. So external validation is always a problem, and you need to go through multiple steps to get it done. But that is what increases the robustness of any AI algorithm, because you are actually testing your algorithm in the real world. So always remember the importance of this validation step. Now results. So we already looked at the results and they gave a flow of participants or cases using diagram to indicate how inclusion exclusion was done. Then demographic and clinical characteristics were described in each partition performance metrics on all data partitions was was pretty nicely done. They tried to compare performance of different types of models as a standalone or as a decision reference model. In the estimates of diagnostic accuracy and their precision, they gave a confidence interval, it was mentioned, then failure analysis of incorrectly classified cases this was not mentioned. So what does this mean. So for example, if the algorithm classifies a mammogram as by rights for or by rights I mean by at zero, and then calls it back but you know that this mammogram is actually benign or negative and the reader already has called is called as a by rights one or two. Then how was this discrepancy handed this is something which is which was not mentioned. Then they did also did not mention what happens to the false negatives called by the algorithm, because when it was using in a decision referral system, AI was classifying mammograms as completely normal or classifying them as not completely normal or completely abnormal. But amongst the first group where it classified them as completely normal. What about the false negatives in that group. They did not say how that data was handed. So it is an important step and important consideration. In this discussion they spoke about study limitations including potential bias statistical uncertainty and generalizability, they completely are aware that this is a retrospective study where they tried to replicate a real world experience of reading the spinning mammograms, but the proportion of normal versus abnormal mammograms the proportion of different density breast the proportion of different types of abnormalities such as architectural distortion mass calcification scope of asymmetry global asymmetry, the proportion of those things is going to change in the real world. So there is always a limitation for that when you actually employ them in the real real world and you actually saw that on the results that the radiologists were the ones who are calling most of the patients with global asymmetry back and the algorithm was not actually doing a good job in doing that. The limitations of practice including intended use and clinical use they have provided a clear idea how it can be used. The most important in the other section is that did they give any registration number and name of the registry for the algorithm, which has been tested. This is really important and makes things a little bit more concise in saying that by the way this is the registration number of our model you can go and know more about this thing and make sure that you feel confident that whatever we designed was actually really good quality stuff which is our which is recognized by some of the partners in the industry, then the full study protocol they have actually released that. And then the source of funding is also mentioned so this is just a reference of this claim checklist so there are 42 points which we went over for this paper. And as we see that most of the things were really nicely done, except for a few exceptions. The study is quite really good and robust. So I'm going to stop sharing this portion now, and then we can open for any questions and answers if anybody has any questions or any comments. Again no questions the chat box but I do have a couple. How much better would an AI software be in relation to CAD software. Yeah so that's a really good question so for AI and AI CAD so CAD is doing one classification class right it is just doing detection. It's just showing that this is a normal this is a normal this is a normal there are multiple studies which are presented where they looked at the variability between a reporting radiologist and a CAD software, and somehow it has never shown any added advantage to the screening program. So what is the added advantage to the screening program. Say you detect four out of 1000 mammograms as breast cancer as you're expected to. But what about the others which undergo enormous study and they have missed cancers, and any modality which is going to detect those missed cancers is going to have something called as an incremental cancer detection rate. Above the ones that you're already detecting, and those modalities are nothing but breast MRI ultrasound and contrast mammogram or molecular breast imaging. So remember that tomosynthesis may add a few couple more cases, once in a while, but it doesn't add a lot of numbers in terms of incremental cancer detection rate. So when we are deploying any of the AI or CAD tools, it is really important to see how many more cancers you're actually detecting when compared to the radiologist. And that is where some of these softwares have really not shown a lot of significant difference. And even in the AI study or AI software that we use for this paper, you saw that the radiologist was doing a good job at detecting cases or cancers, when the AI will just use as a standalone reader. So that's why some of those softwares can be a bit of a challenge when you really want to rely on that software. It can be a good double reader, but it is not a good standalone screening tool. Whereas what the, what the authors of this paper described was that they wanted to use that AI software in a decision reference system, where it is going to help you detect probably more number of cancers, and then probably have an increased confidence in saying that normal mammograms are normal mammograms. So that is reducing your false positives. So those are the two metrics which are really important when you use any AI software in clinical practice. Now we have been using AI CAD in our, in our particular, in our clinics, it has actually worked pretty well. And as I said in Thomas synthesis settings, it actually detects a slice where the abnormalities present. So it's definitely a good tool as a second reader. That is what I would say but there are quite a few papers out there which looked at comparison of CAD with the radiologist and see how it fares, but it is definitely not detecting more cancers than the radiologist. Right. Okay, and just a couple more. So how far do you think, so I think of course Canada would be a lot further in terms of how they use the software and you know where they use the software but how far do you think we are from a democratization or something like this. So that's a really good question. I think the biggest challenge is funding when it comes to developing any softwares. And unless there's an industry support, it just becomes really hard. So if you just look back and look around you, you will see that there are so many oncologists are so many people from into medicine they're on like multiple different clinical trials and you know trying to test out some new drug and all this kind of thing because there is a lot of industry funding for those kinds of research. Now, the issue with research in the ideology is that there is not a lot of strong industry support when it comes to developing the algorithms. And that's why most of the studies are like, for example, my hospital, one of my colleagues does a lot of these research so he's like an isolated example who does a lot of research in AI for example he's developing an algorithm with support of a research grant to detect HCCs in liver, like all the arterial hyper enhancing observations and he has even gotten you know external validation and those kinds of things so it isn't bits and pieces, but the strong industry support is somehow lacking. The biggest challenge with radiology AI related research is that you need multiple multiple number of images large datasets for the algorithm to be really robust at detecting abnormality because you need, as I said, like, you know, amongst 1000 mammograms only four are probably going to have so you have to generalize that to such a level that you have equal proportions of normal mammogram and a very small proportion of cancers in them to see how those algorithms do and getting that data is a challenge. So that's why it is still a little bit far away from generalizability, but it is entering into mainstream. For example, some of the newer CT scanners, they have anyone MR scanners they have some kind of AI algorithms, which are based on gas, which are nothing but general real adversarial neural networks. And whenever the patient has like motion artifact for example on CT chess, the algorithm tries to correct for itself. So the algorithm saves you a bunch of time or repeat exam or repeat radiation dose. So we are already seeing some applications of AI in our day to day radiology work but when it comes to the task of detecting managing radiation, I think we are still a little bit far away from getting applications into clinical practice on a day to day basis. Right. Okay, and just one last one before I let you go. This is not entirely relevant to your article, but just in terms of AI in general. So how different do you think our job would be say 15 years 10 or 15 years down the line. So in the sense that as young radiologists do we need to be practicing a little differently or concentrating a little more in a broader skill set. Yeah, that's a that's a really good question we get that type of discussion all the time like we have you know medical students who come for shadowing us in our department. And everybody has concerned that you know what is going to happen to radiology in the next 10 years because this is what I want to do and then it is going to take me five six seven years depending on what path I choose. With a field be different or any different than what it is right now. So the only thing which I would say is that probably a radiologist who knows AI is going to be an added value down the line. We are still far far away from AI replacing radiologist jobs because that is how people usually portray that as a threat. So even now the model that they trained for this paper, you saw that the radiologist were doing a good job. And then there are just some other things which you never know how the AI is being trained on. And it just takes you probably a lot of time to realize that there was probably something which was missed or something incorrect was done so I would say that from a future perspective it is probably nice to know AI. And that is why such type of general club activities can be a really good start point where you get these discussions in the group of your colleagues, maybe if you're in an academic setting, ask your residents to present some general clubs find some journal articles, use a checklist to self train yourself about some of the concepts like what is meant by validation like for example if I'm, if I'm, say if I'm a chair of a department and I want to buy a new CT scanner, and the industry representative say hey by the way, we have this new scanner. If you pay us some, you know, some one lakh rupees extra two lakh rupees extra, we will give you this AI software. But as a chair of the department, I need to make that informed decision, whether I'm paying two lakh rupees for something and I need to know how that AI software was trained. So even as an end user, I think it is really important to embrace AI or have some knowledge of AI. Maybe we will have different kinds of reading rooms in the future where the patient will be registered. So at the front desk, you will have chat GPT run through the entire clinical note. And then you find okay this patient has a history, and that history points towards an increased risk of cold rectal malignancy. So maybe when I'm reading the CTF dominant illness, even if it was done for rule out accurate appendicitis, I'm running the entire colon probably more carefully because I know that this patient is at an increased risk of cold rectal cancer, which was not disclosed in any of the notes before or not provided in the So it can become a really good adjunct to our practice. And even as patients, I think there is some increased applications down the line. For example, if they want to know if they're say in your clinic, and you give them a diagnosis of Byron three. So right now if they go online type something on Google, the first thing that they're going to get is death threat from breast cancer. How many patients die from breast cancer. What is locally advanced cancer. So that sends them into a spiral hole and they just get really anxious about it. But if you could develop some software available in clinical areas they could just type in there. And something like a large length model like chat GPT just gives them what they're supposed to know based on their experiences. It is going to help them as well. So I think I think there's definitely good application if used as adjunct to download and and I am, I will not be surprised in the next years if we actually start using some of these things in day to day practice. Thank you. Thank you so much. So thank you for all your efforts. I think we'll see you again next month for the MRI teaching course. Thank you for inviting me for this article. Thanks to the entire team of Indian radiologist. Thank you.