 Thanks, Mike. I appreciate that. Good morning, everybody. So I had to struggle with a little bit of challenge to get online. So we want to talk about the observer variability in pathology and look into possibilities how artificial intelligence could eventually help to mitigate the effects and the impact of observer variability. So as Mike mentioned, my name is Hamid Tizush. I'm the director of Kimia Lab at the University of Waterloo, a member of Waterloo AI Institute, and the faculty affiliated to Vector Institute. So these are my disclosures. So at the moment I'm doing some consulting activities for you on digital pathology. So the motivation, of course, for us is that we know in the US almost 12 million people experience some sort of diagnostic error every year. The numbers of fatalities are very different. I have heard numbers anywhere between 25,000 to 250,000 death per year because of some sort of errors. 28% of diagnostic mistakes have some sort of life-threatening results and permanent disability. And just as an example, breast cancer misdiagnosis costs almost $4 billion a year. So that's a serious problem. And it could be that in specific cases where we are working, the numbers appear to be small. But then when you scale it up and then you look at the population, you will see the impact. Well, medical imaging is a large field with many, many different branches from CT and ultrasound and MRI to PAT, OCT and so on. Microscopy is, of course, our focus and digital pathology, in particular as virtual microscopy. And there are millions and millions of images being captured every year for different purposes, but mainly for diagnostic purposes. So if we look at misdiagnosis as one of the major problems that we have, so there are different types of error generally. So there could be an scanning error that we fail to fixate on specific areas when we are looking at images. And it could be a recognition error that we fail to detect abnormality. So we go over it, we basically see it, but we do not recognize it. But most problems happen in decision-making errors. So when almost 50% of error is by incorrect interpretation of malignant benign as a benign malignant lesion or tissue. So this is a problem that we have to spend a lot of time on the clinical community, the research community, the computer science community even. Here in an article from 2016 survey of 260 anatomic pathologists and 81 laboratory medical directors. And what was interesting for me that they said, have you been personally involved with a minor error? So 71% of the anatomic pathologists say yes. Have you been personally involved with a serious error? And almost half of them, 47% say yes. So that's quite substantial. And of course, the disclosure, we don't have much transparency. We have not looked into procedures, how we disclose and managers in some aspect and as a general framework and platform. So it's an issue. So for example, when being asked internals and surgeons, in your disclosures, have you used the word error? So 71% of internals say yes. 14% of surgeons say yes. Or the first one, would you definitely disclose an error to a patient? So 65% of internals say yes. 96% of surgeons say yes. So it's something that depends on who you are, what end of the clinical work are you active? And then the responses and the numbers may be different. So if you go back to, let's say oncology, and this are markings of the prostate gland in a MRI image by seven oncologists and it's unbelievable how much variability is in the end. Plastic gland is considered something really easy with its Walmart shape. And if we use this type of variability and for example, in radiation oncology, and of course, we will not only miss the tumor part, the red part here, we will also impact the green part which is healthy tissue. So it's the variability in oncology and specifically in radiation oncology is definitely a major concern. And when we talk about the lineation of regions of interest in images, even a simple case like prostate has almost 18% of variability, bladder 32%, abdominal order 40%, and you get to scary numbers of pulmonary nodules up to 54%. So whenever you really want to delineate something on images, things get really distinctly variable and different from each other. Going back to pathology, so there is a large number of reports that look at the observable variability here, 20 sections given to four observers and people are reporting capital values below 30%. So 0.3%, which is really not really good numbers that we want to see in terms of our agreement, but for some of the cases there appear to be some consensus in this report. So that's a relatively old one, 1996. When you look again, for example, bone marrow pathology and you're looking at differences of subjective evaluations of three pathologists on the left, we see how valuable they are. So you don't even need to calculate any numbers to appreciate the differences. But also when you look at WHO's established independent factors, you can still see the variability. So it's not really about what are the criteria. So the variability will be there and will be embedded anyway. So another example, they're looking at interobservable variability for squamous and non-squamous, a small cell lung cancer. And so the percentage of agreement was anywhere in this study from 67% to 90%. And based on the primary analysis of the data, the differentiation of non-squamous and squamous histology range from 77% to 94%, so capital values of 0.48 to 0.88. So you can see different similar numbers for different organs and soft types if you're talking about cancer. So another example is here for breast carcinoma, 143 old slide images. And this was, if I remember correctly, this was specifically looking at virtual microscopy or digital pathology basically. And six pathologists looking at it, this is one of the things that has been missing because we know that variability does exist. In the conventional microscopy, and we have a high concordance in diagnosis between microscopy and digital pathology, but there are not many studies who have looked at, okay, if I go digital, will the variability increase or decrease? So here among the things about pathologists, we had a capital of 0.497, and this was about grading of breast carcinoma. And when we talked about the grading of fare, so grade two was we had a fare agreement and grade three, we had a moderate agreement, and for grade one, we had a good agreement. So most likely intuitively, the results can be understood. So when things get really dominant, then perhaps finding an agreement is much easier. What is scary to me as a non-pathologist as a computer scientist is the intra-observable variability. Because you can intuitively understand that if experts sitting down and they are coming from different corners of anatomic pathology with different experiences and different specialties eventually, they may disagree with each other. But if I have just one pathologist, then the intuition of normal citizens like me as non-clinicians is that that colleague, that pathologist should always do the same things. Of course, we know that that's not the case. And just to give you one example, so three separate reviews of agreement of each individual with himself or himself was moderate. So we had agreement around 50%. So capital value is from 0.33 to 0.75. So which means if you get the same case to the same pathologist, you would get different results. And of course, we know that again from literature, but it's just to point out that it's not about that different pathologists disagree with each other. So the variability is something deeper than that, that people disagree because they have a different level of knowledge. So well, variability is probably the source of almost all problems we have. And the question is what's going on in medical imaging? Well, we have misdiagnosis, we definitely have. And then because of that, and depending on that, we have also inaccurate and improper treatment planning. So what is the reason for probability? Well, that's a very difficult question. Some of the reasons could be, well, the information is imperfect. The imaging is imperfect. Anatomy is imperfect in terms of there's no clear boundary. So the complexity of diseases and how they manifest themselves and the shape of tissue is definitely not something that you cannot draw, yes and no lines in many, many cases. And of course the human perception, visual perception is inherently subjective. So among others, this list by no means is a complete list the reason of variability. It's a problem that probably and some of the papers even suggest that just make your piece with it. There is variability, accept it, but just account for it and be prepared to deal with it. I'm not sure I want to accept it. So I think we can do something about it. So what are the consequences of variability? Well, we get based on that error or variability. And if there is a wrong diagnosis and there is the wrong planning treatment of the patient or non-treatment of the patient or side effects for the patient could be prolonged treatment for the patient. It could be reduced patient throughput for the clinic and hospital. It could be financial burdens for the healthcare system. And of course there could be legal ramifications depending on what type of healthcare system you're talking about. Everybody's talking about precision medicine but most of the time people mean to predict what type of treatment portable or likely to succeed for a specific patient depending on various patient attributes and treatment context. There's no question that if you don't have the right diagnosis the projected or the estimated treatment won't be correct either. So that they go hand in hand. I cannot really separate diagnosis and treatment as a non-clinician. Okay, what can AI do? There have been a lot of buzz about AI. We have supervised techniques. We have unsupervised techniques. We have weekly supervised techniques. It could be algorithmic and topological. Anthropological are the so-called deep networks. Unsupervised AIs are mainly clustering and search matching. And weekly supervised is about interaction. Being online and interacting with human experts in order to learn. So supervised we need a lot of labeled data. So and when you talk about labeled data that somebody gives you diagnosis will delineate the part of the image that are of interest. Then of course you will have the variability in it. So any AI solution that has been trained with labeled data, who labeled the data? So did you account for observable variability? Yes or no? I don't see it in the literature. So weekly supervised you don't have labeled data but you give some sort of feedback, reward and punishment to the agent to do its job. And unsupervised that there is no teacher. There is no reward and punishment. You just operate the techniques, operate on raw data. You just give the images and the reports and the software twice to figure something out. So artificial intelligence is a big field and machine learning is a subset of that. And artificial neural networks are a smaller part of machine learning. Support vector machines as a classifier is a part of machine learning. Decision trees and expert systems have been going around for quite some time. Random forests are relatively a new development of decision trees. Natural language processing NRP has become a big part of AI. AI and sophisticated systems like BERT and BioBERT have been emerging in the past two years. And older techniques like fuzzy systems and metal heuristics like evolutionary optimization have also been around. So what is quite successful in the past four or five years is the deep learning, which is a small part of artificial neural networks. And this is where we get more and hear about really impressive results. Okay, so what is the ultimate solution for observable variability? That's a tough question to ask. And everything I say is not really a solution that we can say we can use it tomorrow, but it's based on everything we know, something that we should go in that direction, something that we should look at as potential solution. So how can AI remove observable variability? So if I have an image and I classify it, would that solve my problem? Because when I classify the image, most of the time I get yes and no. Is it long? I don't know. Or after the grading, I know what it is. I just want to grade it. And with the classification comes some sort of confidence or likelihood, 96%. So I classify this as a squamous cell carcinoma with 96% confidence. That helped me to get rid of observable variability. Well, if you get all pathologists to accept this output, then yes. But is that possible? Is that possible? That would imply a full automation. Which means we just give the task, we take the task from the pathologist and we give it to AI and whatever AI gives us, we accept. Okay. Then of course the variability is removed because these techniques are quite consistent in the way that they do things. But how likely is it that we accept full automation? From today's perspective, not very likely. Because we cannot understand this decision. So we want to accept them. And there is not much perspective in short term that the classifications, which are the most successful deep learning techniques there are. Most successful AI solutions are classifiers. And they have a lot of value. You're not dismissing their value. You're just asking, beside the value that they provide, can they get rid of observable variability? I would say no from today's perspective. We can generate fake images, synthetic images. Can that remove the variability? I'm not sure because they create additional information that we have to basically analyze. What happens if you segment image? What happens if you find and delineate automatically using AI? That's great. That's quantification. That's fundamentally the same thing as classification because you classify pixels. But then again, it would help to remove the variability if everybody accepts the result. And guaranteed, if those segmentation techniques and delineations have been trained and labeled data coming from a few pathologists, then the likelihood is really large that the result won't be accepted as a consensus. What about search? Which is my favorite. You give me an image and I give you a set of similar images. Can that solve the problem of variability? If I am looking at an image and I have a large archive of histopathology images which means I can search in that archive. I can send my query and say, what is that? Some sort of a smart algorithm can search in that archive and send me back similar cases. We could find similar cases and with similar cases come some metadata. The reports, the outcomes, everything comes back and then I can look at it. This is not something new actually because the pathology consultations that we use are basically image search. When we consult each other, when pathologists consult each other, fundamentally you are doing an image search in your mind. We don't know in your head, in your brain. We don't know how we do it, why we do it. When we are looking at it, it becomes more explicit. If I grab one of the W Show's tumor classification blue books and I go through the pages and I try to find a case of a microscope or arm on the screen, fundamentally I am doing image search in my brain. We are doing it, but we are not doing it with computers. We know the benefit when we consult and when we look at the atlas we know the benefit. What is cumbersome is time-consuming is not efficient. If we had the possibility to search, if we could send an entire biopsy sample and say, let me find another patient similar to my patient. If it was possible to send a small part of the tissue, if I am looking at the detail and I am looking at 40x and I am really interested in minute distinctions that I am looking at or if I could select a region of interest in a whole slide image and send that to the search engine. Would that help to search? Would that be possible to do that? If I am looking at an image and let's say I have some doubt and I send it to an image search engine and say, have you had a case like this? This is the consultation. That is what we call virtual peer review. Then the search engine goes and finds other biopsy samples that are similar to my patient and of course then the data comes with it. The reports and outcomes and everything else and if there is any molecular data all comes back. Of course all those information the pathologist who is asking that is one person and the other information is coming from other pathologists who have already looked at other cases evidently diagnosed cases and they are in the system and we are looking at them. That is what we call virtual peer review. You basically tap into the knowledge and wisdom of your other colleagues yourself in the hospital, in the clinic and what comes is not just an yes and no it comes similar cases I have seen a case like that and it was popular with carcinoma. Which means what? If I have a corny whole slide image as a pathologist and I send that to a search engine and we get multiple cases and with the reports so basically we could build a computational consensus and the more we retrieve the more we find the easier it becomes to find consensus so then the magic is okay give me access to a large archive the more I have at my disposal the more long cases I have the easier it should be to find cases to remove the squamous case that cause variability that I mentioned at the beginning. We cannot really put some value on top of search unless we talk about natural language processing and the reports and all the metadata that we have so at most of the reports we have at the moment are unstructured so the pathologist just sits down and write the reports depending on the practices in different hospitals and clinics but we are also moving in towards structured reports synaptic reports and if synaptics report emerge and are widely used that makes the job for computers and for search a lot easier so natural language processing can help to categorize reports and notes it can auto generate reports yes it can at the moment is very primitive but down the road it can be done quite sophisticated again provided that we have access to a large number of good reports which is not a given in the short term so there's a lot of obstacles to get there and conversational AI when the pathologist can just talk to an AI agent and ask questions to clarify something so when we look at pathology reports there's from simple things to sophisticated things there's a lot of things we can do with the reports we can get a significant key voice out we can recognize topics we can highlight those things we can summarize it such that when we bring back the results we can also highlight those keywords in the corresponding reports of the matched cases such that the comparison and decision making becomes easier and more efficient for the pathologist many people have started to not just look at whole slide images and their annotations for training some sort of AI techniques here a visual dictionary approach but also as you can see on the left also using reports diagnostic reports from the past cases as training data so you put in both images and reports such that the AI has a better chance to distinguish cases from each other and then you go online and use this system of course there is no report for the new patient for the new patient you just get the biopsy sample of the whole slide image and then you do whatever you need to do for example reasons for prediction in this case for process something really interesting is clinical report generation is this is one example on the left you see that the actual report is saying the nuclei are severely pliomorphic and the first sentence in the green text below it which is generated by the computer says the nuclei are severely pliomorphic of course is not always that easy it's the research community that's quiet at the beginning I would say mainly not because of the technology again because lack of access to large number of data and getting one million reports is not easy so if there are clinical reports and there are in hospital it's not to my knowledge nobody has published any paper with a large number of clinical reports evidently diagnosed cases but we have initial investigations that show this is possible we can auto-generate reports and if I have access to match cases we can basically auto-generate reports for the unseen case for a new patient so which means what? which means if you give me the query and I go in and find the top 3 it could be top 5, top 10 top 100 similar cases and I bring back those reports using natural language processing we could also put together the green report here and say not only just say this is popular with carcinoma but also provide more information to a repository of NLP techniques and provide editable synaptic reports to the pathologist again the pathologist has to stay in the pathologist cannot go anywhere the pathologist has to there and look at the data and say yeah okay so it could be also upon request so we don't want to bias how the pathologist is working and if systems like this get established and then we can talk about and work toward establishing guidelines for using them but in the short term we have to look at some cases and see what would be the effect of using systems like this specifically matching similarities and image search reducing eventually reducing the observable variability so there is a long way to go but things are quiet interesting and quiet quite exciting at this stage so I will stop here and see whether we have any questions my thanks there to Hamid for his talk a couple of questions Hamid have come in hand side first of all Aston has asked how are the rapid changes in terminology and the diagnostic techniques incorporated NAFTP vs PTC Gliomas in general so you mean how did that you mean in general are you in the Hamid if you can see them yourself are you in the plenary stage chat you will see the questions coming in on the right hand side let me see I have to of course yeah so okay so the chat okay the question I asked there was from Aston Powers so it is in the chat box yeah how are the rapid changes in terminology and diagnostic techniques incorporated NAFTP vs PTC Gliomas in general well I have to generally say that at the moment nobody has made it to the clinic so that's a tough question for me because we still have to work I would say in one or two years to get something substantial to the clinics before we can answer this type of questions so at the moment we have a lot of stuff on the research side there is a lot many many activities and initiatives are going on to make sure that we also look at the user acceptance and the regular side of things FTA and for us in Canada Health Canada to bring the technology so and make it available and then we can answer questions like that at the moment you can only post on the research results and the validations on mainly non-clinical data so research data you can say that it will bring a lot of changes but what type of changes how would that be I'm really hesitant to make any prediction in that regard we know that we can change a lot so when I say V I mean the community at large pathologists, computer scientists AR specialists, policy makers administrators all together we can we can bring about a lot of changes but we have not done it yet so there is no major diagnostic system deployed in pathology yet to my knowledge that can be used on a daily basis for making clinical decisions so we have to wait for that everything I say comes from the research side of things all of them encouraging but again we don't have hard clinical data to back things up which I'm not worried about it but we don't have them yet sure and I realize I missed a question that was asked earlier and it was from Stanley Cohen he asked them are you defining benign error in terms of difference from a subsequent expert or consensus diagnosis or based on actual patient outcome well ideally actually that would be the good but at the moment when we talk about research when you have rely on reports then you have to make sure that you have evidently diagnosed cases which is again based on outcome again I do not know any major test and major validation that has done that so anything that we have is heavily subject to variability anything we have is a few pathologists who have spent a lot of their time and energy and knowledge to enable as a researcher study to look at the potentials of AI for diagnosis which is fantastic but it doesn't solve our problem in long term because we are looking if we don't do that we don't have any variability with us even with AI if the images and the reports are coming are coming from a few or even worst case scenario from one pathologist we cannot because AI will learn our biases so ideally it should be based on outcome which is the absolute cold standard we did this and that was the result to come back in the chain and make any adjustment that we need there was also a follow up question we have tackled some of this but Thomas Wesley has asked related to the question above in the measurement of the intra and intra disagreements has anyone ever measured the outcome impact e.g. caught somewhere else in the workflow by other clinical indicators etc and how would you measure it tough question not to my knowledge but I may not be the right person to answer this question I'm sure many many colleagues from the pathologist side have much more in-depth overview over the reports on observer variability not to my knowledge and interesting is also is when we look at inter-observer variability this is purely for research because in the practice we do not have that luxury we do not have the luxury to bring in three pathologists in the room to make a diagnosis intra-observer variability would be much more interesting because this is the daily practice this is the daily work that every pathologist in his her own room and alone with struggling and fighting answering the question alone so inter-observer variability is much more pervasive and much more a realistic case to look at it I don't know no I don't know that so not to my knowledge okay you had lots of people mentioning that it was a tremendous early time insightful talk but there was one other last question and it was from Zev Leifer he asked with regards to verbal and written natural language translation interpretation in AI considering the vast differences between responses from Siri, Alexa and Google, intra and intra-availability and your thoughts on that sorry I didn't get the question it was your thoughts on your thoughts on verbal and written natural language translation and interpretation in AI consider the vast differences between responses from Siri Alexa and Google and again into an intra- variability and if you had any thoughts on that so generally the AI community which I consider myself a part of the AI community does claim that NLP probably is much more advanced than what we have for processing images from computer vision mainly because NLP didn't have the major computational challenges as working with images does so we have put a lot more energy into NLP and training NLP solutions have been historically a lot easier so the progress has been made but when you look at auto-captioning national images when you show me a photo that you captured in heart and the computer out to captions as I said a dog playing in the park okay so that's fine but this is not a sensitive case that I'm looking at a renal cell cast in normal kidneys so and there we know that we have a huge variability in the terminology especially because we have been working with unstructured data especially because pathologists come from different background scientists and terminologies are different and so on so would AI be able to incorporate that pervasive variability in the language in learning such that everything that we want to do with auto-captioning auto-generational report conversational AI would it be possible to do that theoretically I like to think yes practically I see a huge burden and challenge as we have for images that at the moment we do not have that large a diverse archive of documents reports such that the AI can learn the diversity can AI learn the diversity yes it can and we know that we can counteract the bias and the bias comes when everything comes from a few sample few pathologists a few clinics and hospitals and we don't the initiatives are missing the initiatives are missing that hospital major hospitals get together multiple hospitals maybe supported by governmental agencies and create large enough diverse enough data set to enable that as long as we don't have that we won't be able to exploit AI potential to counteract observer variability both in text and in final diagnosis for images Excellent well Hamid that is all the questions and I think that was an absolutely fantastic talk to kick us off for the second day of the conference so my thanks once again for coming on live and giving this talk Thank you