 Okay, I think there's quite a few of you, so I will probably get started Again Thanks for coming My name is Karthik. I am an AI engineer. I live in Singapore and my my job on a day-to-day basis is to sort of lead projects that cater towards industry challenges that can be in the areas of finance in language processing in security in Healthcare, which is a key interest of mine I also did my master of engineering with the biomedical engineering department at NUS that's in Singapore And my research was based quite a bit around the use of AI for histopathology, which is a sort of subfield in medical imaging and histopathology is where we segment tissue from, you know, places where doctors do cancer resections and stuff like that in order to understand what the components of the diseases are I Didn't have a lot to show about my research previously until I think just two days ago You know we it was actually just published So if you're interested in the work that I was doing this was about three years of work I'm just going to put the link in the chat. So so that's the The sort of work that I did for my masters and stuff Not the most pleasant of experiences, but definitely I think something worth You know something that really taught me a lot about the way AI works and and how we use it properly and What its applications are in medicine? So of course Coming from You know with my domain expertise and in histopathology You'll find that a lot of the maybe the examples that I gave are heavily based around microscopic imaging That doesn't mean of course that these Techniques can't be adapted to other forms of imaging whether it's x-ray, CT, MRI And If you're interested in those techniques, I think there are some other science circle members who are working on on I think Some of these discussions I think sumo previously presented on magnetic resonance imaging MRI imaging so that's used to study for example diseases in the brain To study neurological function things like that and there's a lot of applications for AI in that sector Okay, so the thing about me is whenever I do these presentations. I Paul that's because your voice isn't on Well, oops, what's the point of me saying that if you Okay So so the thing about me is As an AI engineer, I often find that I Revisit the fundamentals a lot and so everything that I learn about what I do in the field Actually brings me back to some fundamental topic that I learned And I and these presentations I typically try to structure them so that we can go from something simple to something more complex and I am experimenting with this and I'm trying to use it to also You know ensure that I have my grounds well established and So in order for us to do that I think it's very important for us to try and look at a very simple example and to figure out how AI is applied from then then derive it to something like the field of computer vision So so this Example is something that I presented in a previous lecture. It's called deep learning if it feels like a repeat I apologize. I just hope it adds some value to what was presented before So let's talk about This concept of a neural network. So a neural network is Based on the method, you know, the the functionalities in your brain What make what helps you sort of make your decisions? And these architectures are built to draw on mathematical properties of data to draw Relationships to then build a sort of classification. So this example for you, you know Let's talk about the predicting the probability of buying a house So we have a house. We want to know what are the chances this house is going to get purchased And so then there are two Factors that we have to let's say there are a few factors that we have to consider two of them could very well be One the number of rooms, right? We typically if you you want to buy a house if you have morbid rooms or something like that Maybe we take away price from the equation that could be something that makes the house more attractive The distance to the bus interchange nobody really likes to to you know travel and that sort of stuff So in that case these two factors could very well influence the probability that the house is going to get bought It's attractiveness Now a neural network sort of correlates these variables by multiplying things called weights And so these weights sort of talk about the contribution of each variable to an outcome So x1 the distance from the bus stop multiplied by weight number one that is Added to x2 the number of rooms multiplied by weight number two. That's a guess of course the weights And that sum is added to a bias bias Just think of it as something that is used to normalize those variables and that's used to determine the probability that someone is going to buy a house Now of course this doesn't mean anything if the network isn't trained So the network needs some Examples maybe we did a survey and then we showed a few people. Okay, this house The distance from the bus interchange is 200 Meters for another one. It's 100 meters one has three rooms another has four rooms So so what's the probability that you buy a house? Do you want to buy the house or not? So if it's a no, it's a zero if they do want to buy a house It's a one. So that's our ground truth information annotated information Now what the network is going to do of course, it's going to make a guess the first time about what the weights are It's going to randomize those variables It will add a bias to normalize the values And then what it will do is you will pass that into a function F And that function is called the activation function So this activation function sort of determines a probability between 0 and 1 the output of the network Whether or not something is true or false going to get bought or not. What's the probability? And that gets compared against the ground truth information So if they find that through these examples and these inputs of x1 and x2 the network got the variables wrong Right, it got the output wrong. It's going to calculate an error This error is also called a loss and that loss is going to be used to update the variables the w1 and w2 So in that sense what the network is actually doing is learning a bunch of features that contributes to a final outcome So it's trying to relate data and it's trying to make those relationships to come up with an outcome Imagine this just this segment of Predicting whether or not the house is going to get bought That could be a very small part of a bigger problem So whether or not the house is attractive could very well determine whether or not the house is going to appear in something like a magazine And so These are how sort of neural network architectures work They sort of apply some multiplications to some variables and then they come up with an outcome and Just to Continue explaining the architectures the input layer is where these variables go The output layer is where the prediction comes from the hidden layer could be anything that we sort of program So the thing about a neural network is it has no semantic idea about the properties involved It doesn't know what a house is. It doesn't know what distance from From a train station is or a bus interchanges. It doesn't know what the number of rooms are You as a human have an understanding of those variables But the computer everything is literally just mathematics and statistics And so it's really trying to make guesses and then it's updating that on the mathematical properties The error the you know the the factors and the variables And then it tries to make a guess about how the data is really related to each other The the data components are related to each other rather Um, of course the structure of a neural network architecture is something that we define That's not something that computers have the capability of understanding on their own That's of course still up to the programmer. So there is some human intervention involved at this stage Now in the field of computer vision, of course data isn't really one-dimensional You're not going to have um, you know, uh a single dimension or rows of data What you have is a two-dimensional Set it could be 3d, but you know, we'll just we'll just keep it simple for now And we'll talk about you know individual pixels in an image If we look really really really close into an image, um, of course we can see these pixels and we can see Um, we can sort of understand what sort of values they they contain or encompass So in an rgme image, you would have three different values for one pixel a red a green and a blue value Now we can call these red green and blue channels. So these are one of the um, the things we refer to This is one of the terms we use when we describe a color image We break it down into different channels. Of course, um, if you use the printer before, um, you don't necessarily see rgb You see cyan magenta yellow and black. So so there are different color representations Of course in computer vision mostly it's just red green and blue Now there are grayscale representations as well. These are very simple representations The typical values that we these range from in computer vision is to 0 to 255 0 to 255 because that's the range you can't distinguish with your human eye So if you looked at that color bar, the reason why it doesn't look grainy looks very smooth, right from 0 to 255 It's because the human eye can't distinguish color differences between those values at that level of precision Of course, there's uint 16 32 bits Those are a bit large for our computers to process. So, um, we won't really talk about those Um, and so, uh, let's talk about the diagnosing of an image Now if we think about it if we just studied the individual colors if we broke an image down into its individual components The red green and blue that tells us some level of a story about the image So it tells us how many red pixels there are how many green pixels there are And what's the intensity of those pixels in the range of the you know in in the context of the image itself Now the power of that of course Is we can use that to determine some property regarding the image. We can actually use that for image classification For example, let me just move away Uh Let's talk about differentiating cancer and non-cancer. So let's say a surgeon takes a tissue sample Um, he's trying to remove a tumor from a patient's body And that sample is then passed, uh for microscopic evaluation Now if it's not cancer, it's going to look a bit like the image on the top left There are not many cells in the tissue if it's cancer It's going to look like the tissue that's just next to that the one that says tumor call Tumor call because it's taken from the the central part of the tumor Now the concentration of cells is so much higher that the image looks a lot darker So if we studied the grayscale representation that effectively tells us that this is a tumor The problem is healthy tissue can also look very cellular by virtue of things like reactivity Reactivity meaning the way the brain responds to a disturbance. If you get an injury It's a very high chance that your cells will reproduce to Sort of make up for the loss of neurological function Cells in the part of the cerebellum, which is a part of your brain as well These also look quite granular and so the concentration is quite high So it's not enough for us to use colors alone to make that sort of classification What we need of course is morphology Mophology meaning shape and that's done through this mathematical operation called a convolution So in a convolution a filter that is maybe three by three in size of five by five pixels in size Is passed over the image for us to get a new representation of the image I typically use this for engineering You know students, but I don't think it really matters What it really what really matters is what are the effect of these convolutions on the image So let's talk about the image of a zebra and an image of a horse. We want to differentiate these two images I'm going to convert them to grayscale just to make it easier for us to visualize I am saying that I want to do of course, um the I want to do a classification I want to do a comparison and when I apply this filter this filter is called a sobel edge detection filter The resulting images look a bit like this So you can see the edges in the zebras, uh, you know lining the fur the surface All these things are really highlighted a lot more than they are with the horse because those features are not so distinctly represented and that alone allows us to make that diagnosis we've essentially Identified some morphological and shape aspect of the image or some textural aspect of the image and then we then made that classification So in computer vision if we think about it That use of a filter a two by two image, uh, two by two feature map to to sort of, uh, Derive some feature about the image some feature regarding the image It's almost the same as multiplying a weight to a variable to understand some outcome as we did with the house, of course And so these filters that that three by three filter those sets of numbers That's something that the computer can of course guess as well to determine some feature regarding the image to understand The textural the edge the shape components of the image and that's how neural network works When we talk about computer vision in the field of uh self-driving cars, for example, you know, these are the base The base principles that allow those networks to detect objects in the field and so um A neural network is essentially an expansion of those steps a neural network tries to identify the filters a cnn A convolutional neural network tries to identify those filters. It tries to guess those filters It makes some sort of diagnosis some sort of classification in the image And then it goes back it updates the layers and then it tries to train itself To better extract features from the images in order to better make a classification um, the power of this technique Is you know as we do these convolutions and as we update the weights based on the error The train neural network is able to do this The lines and the dots and the textures all these components are going to translate into high level features within the image What do I mean by this? These lines these dots and these edges will eventually tell us about the existence of things like wheels Things like wings things like a sail things like a nose a year an eye A fin uh, you know, um, and so so what they essentially allow the network to do is to identify salient features Within the images that people can use to make a classification And that just comes from mathematical operations that extracts these low level features the lines the dots and the edges And that's the power of the mathematics that goes on in computer vision systems Um, and how do we improve on those systems? How do we uh, apply these systems? Uh, when we have You know very little data to train on Sometimes it's a problem with a data set being small that doesn't allow us to extract these features easily And one of the biggest problems in medical imaging is is because of the policies that are involved in getting data sets It's very hard for us to get enough data to produce these features to train a model For it to identify and extract these features And so in that case we sort of apply certain techniques We cheat the model into thinking images are different by applying something called augmentation And augmentation simply means using the same image but a different representation of the image to extract more features We can adjust for example the orientation of the image We can flip it in the horizontal axis or the vertical axis We can adjust the contrast in the image the color of the image And so let's say a cameraman is moving closer towards an object when he takes a photo You know, we can apply an augmentation to train the model You know with an artificially enabled zoom in order for the model to learn those features that are important towards identifying this object this parrot Um in the context of medical imaging in microscopy, especially Um different labs that produce these microscopic images when they apply stains Chemical treatments sometimes even the temperature of the room that the lab operates in that could be different from day to day Or an hour to hour It can cause the image, you know, its contrast or its color features to change Uh and and in order for us to account for that sometimes what we can apply to that image is a it's an artificial color augmentation And so this forces the model to not learn from just the color representation of the image to pick up the shape components to pick up The shape representations the textural features that allow it to make a diagnosis Let's say we want to improve the model further and we don't have enough images One method for us to deal with that and one method that's really popular in medical imaging is this concept of transfer learning So transfer learning is where we apply and we learn representations from another image Or other another data set of images. Let's say you're looking at clouds, right? We like to say we sometimes what we like to do is when we look at clouds in the sky We like to think that oh, I see an elephant in the sky I see a tiger in the sky. I see a cat in the sky Some representation that we've learned something that we have in memory We use it to make an association between another object that we see transfer learning essentially works the same way A network learns on a multitude of images. It could be anything from cats versus dogs versus cows To to you know distinguishing classes of vehicles per se and these public data sets are available in massive amounts on you know The the internet and what we can do is we can train models on those data sets And we can then apply the weights the lines the dots the textures and all these features that it's learned from those images And we can apply them to other problems This is especially useful in medical imaging because when we don't have that large data set We can then apply these learn features and we can use them to diagnose medical images And so this is a powerful and commonly used techniques technique in AI And these networks that have been trained, you know, some of some of us we don't really have a large GPU at home We don't have multiple GPUs at home A graphic processing units. We don't have powerful computers to train these models on maybe 14 million images But what we can do is we can use models that were trained by companies like google or facebook Method these days, of course And these models that were trained on those large data sets the models that have learned features those lines those dots those Textures, they are also available on the public interfaces, you know public websites On github repositories And so that's the powerful thing and what I really love about the deep learning community You know a lot of AI researchers actually publish their models Publicly for many people to use them in their projects It doesn't really it doesn't really matter whether or not it's a commercial or non commercial projects these models That were trained by google and meta. They're available for anyone to use and so the accessibility of AI is really fantastic And that accessibility is actually allowed for you know, image analysis in the medical domain to really flourish A lot of models that were produced for different areas like gastric biopsies brain Pathology analysis, you know MRI imaging they were based on these models built by meta and facebook Maybe they were adapted in some way or another but essentially the principle is the same It's there to learn and then fine-tune on new data One example of a network that's used it doesn't really have to be used for classification Of course it can be used to derive a new representation of the image The idea is we feed the inputs of data that could be Again in the wine one dimensional space we could be talking about The the price of a house The number of rooms of course the distance from the train station and stuff The output feature is not really a classification. The output is the same data The goal is of course to make to derive a smaller representation of the image for those of you who work In mathematics, you are familiar. You are probably familiar with the concept of a PC a Principal component analysis. It's where we try and take the data from a high dimensional You know feature space to a smaller dimensional feature space and then try to reproduce the data such that we get a Minute representation a representation that takes the most important features of the data set If you don't understand this that's okay because What's really important is where this is used or where this is useful By learning a smaller representation of the image, we actually derive the most important features of the image Now we can choose to then decode the information to provide a new representation And how is this representation derived? It's derived from our annotation So let's say I have an x-ray image or I have a few x-ray images I basically draw a mask manually for some of the The x-ray images in order to identify the lungs within the image And then I pass that to the model and I say that okay for this particular x-ray image I want this output this representation What the model will do is it will learn to derive the important features from the image to then decode these features to provide that representation How does it learn that it's based on the error the first few times? It's of course making a guess the next few times It's of course updating the weights to fine tune itself in order to fit itself to your data to make sure that those representations are learned accurately And the usefulness of this You know in multiple domains is that we can use those representations to understand some underlying diagnosis In kovid-19 this this was really popular This sort of research regarding segmentation of the lungs because it helped us understand Whether or not a patient with kovid-19 whether or not his lungs are expended or they You know are contracted is there some sort of pneumonia property regarding the lungs? Is there a risk? And so this was really popular research at that point of time In the in my area of interest in brain brain tumor analysis The segmentation of the tumor for resections to remove the tumor accurately is something that is heavily discussed Why is it important in the case of the brain? If you don't delineate the boundaries of the tumor if you don't identify the location of the tumor Before surgery correctly if the doctor takes too much of the tumor away from the central part of the You know the the detected tumor if he takes too much if he cuts too much of the tissue What's going to happen is the patient is going to lose neurological function The patient might either go blind, you know, they may lose some hearing they may lose some motor skills And so that's the danger of not identifying the location of these tumors well enough And so segmentation allows us to artificially identify the boundaries of the tumor With so much precision, you know with the ability to understand features Mathematically that pathologies are not able to identify because they're not able to look at the pixels themselves closely enough By being able to do that diagnosis by being able to do that segmentation People are able to better identify resection margins for tumors They're able to identify the locations that they should cut and remove the tumor to make sure that it doesn't come back again After the patient has undergone surgery and to make sure that the healthy tissue in the brain is not accidentally removed One other area of learning that's really interesting for these imagerists is of course the use of self supervised learning A self supervised learning is a technique that's used again when there's not enough data to train these models If you think about an x-ray image or you know on the left is an is an image of the eye I think that's probably for oct optical coherence tomography Now these images like the x-ray images particularly there's a certain orientation. That's correct Right. You only stand You have an upright posture for the patient. So any tilting of the image is actually somewhat wrong What we can do is we can artificially augment the image by rotating it And then tell the model to predict the rotation How much is this this image actually rotated from its original ground truth? And when the model does these predictions it gets certain variables You get certain errors and those errors are used to update the weights Just by doing that, you know, we've not introduced any classification any annotation from the clinicians whatsoever But just by rotating the image and then telling the model that the image is wrong Tell me what's wrong with the image you make a guess and then it gets an output and then I tell it Oh, okay. You got this wrong You need to update your weights in order to fit the data the models capable of identifying and learning Some important parts of the image. So by rotating x-ray images The model is capable of learning features that are salient to x-ray images An x-ray image is supposed to have these features in this location per se And these models can then be used for classification tasks later on Because they've already learned some features features some textures some low level representations that can be adapted to tasks in the future Um, another really important use of uh, ai is generative ai chat gpt is a form of generative ai. Um What do you call it a stable diffusion is a form of generative ai? There is an input the model is trying to generate a fake image Um, when this was done early on The models that were used for this were called gans generative adversarial networks. Um, yes, exactly uh Actively communicating with each other and self-correcting each other is a perfect example of a way again works Basically, you have a generator a generator is taking some data some random data noise And then it's applying some sort of multiplication some mathematical Multiplication in order to try and generate a fake image of something in this case the picture of a dog Then you have this network called a discriminator The discriminator sees this fake image the discriminator also sees a real world image And its objective is to try and determine whether or not the image is real or fake So these two networks are essentially playing a game with each other They're trying to distinguish one is trying to distinguish whether or not the image is fake Another is trying to generate an image in order to trick the other model into thinking that it's real And these networks are then updated based on whether or not they got the outcome correct or wrong Whether or not the generator was able to trick the discriminator. Yes or no So by playing a sort of game with each other and by sort of improving with these outputs The models are able to get stronger at their individual tasks But the most important thing is that the generator that's output by this model is capable of generating fake images You know fake features and that's sort of the idea that's used you know in in the creation of models like stable diffusion One really interesting use for this that can be applied to the medical imaging domain is this idea of style transfer So style transfer essentially uses GANs to take a real image and then to apply some sort of Augmentation to make it look like another image in another style Let's say you wanted to see how your real world picture your photograph looks like in the eyes of van Gogh Right, so you would take a number of pictures That you've taken number of your own photographs You would have scans or maybe van Gogh's pieces, right? That captures his distinct artistic style and you could train a model to apply that style to your images Some of the simplest examples of these are you have pictures of zebras You want to show you want them to appear like horses instead So you apply a style transfer to make them look like horses and the powerful representations that these models learn Can then be applied in the medical imaging domain Previously we had to have matched images, which means let's say you wanted to apply Style transfer from a zebra to a horse your training images would have to consist of zebras and horses In the same position the same orientation so that the model can learn these representations by performing some sort of matching But now with advancements and new models, it's possible to do that with you know Images of zebras and horses in any positions So you could have three zebras in one image one horse in another image You could have two horses in one image three zebras in another image They could be standing at different locations The model is still capable of learning representations of zebras to pass that style onto images of horses The area in this where this is interesting of course is of course Microscopic imaging again my domain In for these samples that are being used by the way When they're taken during surgery Maybe to give the surgeon an idea of whether or not he's cutting tumor on a healthy tissue The samples are taken during surgery they brought to the lab and then they're stained But to provide a very quick output they are frozen And when they're frozen what happens is ice crystals start to form So these are not samples that go into the archive. There are samples that are used during surgery they're thrown away Samples that are straight Stained for archival analysis, you know for pathologies to look at maybe the next day or something like that They use a specific method of paraffin embedding In order to preserve the quality of the samples to see these images So for example, if you look at the top left Section right there are the four pictures on the top left the one at the top left the extreme left Those are the images that are frozen the one just next to that. That's the quality of an image that is unfrozen And you can see that the ice crystals have almost destroyed the image to such an extent that it's so hard for Pathologies to make a diagnosis for that image. So what if we could learn a style from the images? That are stained that are paraffin embedded and preserved with such quality that it's so easy to make a diagnosis What if it's possible to artificially transfer that quality? To the frozen sections so that we could train models to learn the quality from these separate images In order to apply a style transfer on future images And that's the power of style transfer In the medical imaging domain um This is one example of something that I tried personally for my for my master's thesis You can see that style transfer essentially allows us to perform Augmentation of the images in a way that we weren't able to before The one on the left the example on the left. That's the machenco normalization Which is the the standard method, you know the typical method that you use and all that does is provide a color transfer But if we look on the right we can actually see if you zoom in You can see that the sharpness of those individual nuclei those individual cells has actually improved To such an extent that it's easier to diagnose it. It's easier to look at this image It's easier to see the important features regarding the image and this was done with ai Another area that this can be used of course is uh in new medical imaging optical bioimaging rather So right now the traditional method of course is to take a biopsy It's to cut a piece of tissue from the patient and to then provide a stain and then to analyze it with the microscope But clinicians today Are exploring even more techniques using advanced laser systems for example In order to get these images to study the chemical markup of the tissue one example is a simulated raman spectroscopy So that's about using two lasers two laser beams with different timings You you direct these laser beams on the tissue you get the light the light that's reproduced And that tells you about the chemical composition of the tissue the lipids the proteins And these images actually look a bit like those images that are produced using staining The problem is they don't look exactly like them enough for it to be you know diagnosed It's not something that the clinician who works in the pathology department is used to Seeing so what if we could make their jobs easier? That's where we can use style transfer We learn features from the From the images that are stained and then we pass that style of staining that that appearance of the staining onto an optical bioimaging output So that's another popular Imaging domain and that's where one you know, that's where the future of medical imaging lies at the moment another area of course is the You know analyzing the genetic markup of images so that what we use for that is immuno staining So we we apply a stain that sort of identifies the genetic properties of the tissue In this case, it's ki 67. That's one gene that's used in tumor analysis for the brain So if that gene is very high in a particular area that stain will show up brown Uh, and so one area that we can do is we can identify the more morphological properties in another stain a simpler stain Because these stains are actually quite expensive a ki 67 stain is not cheap And there are many genes because of the heterogeneity in tumors, you know, you could have many genes in many places So one thing that people are studying is whether or not they can identify just the shape properties in the cells And whether or not that correlates to some genetic property in the tissue in that case You only have to use a hematose island and eosin stain a h and e stain and that's quite cheap So another area of stale transfer that's being explored Is that correlation the correlation between the genetic properties and the shape properties observed in the tissue Um This is one really, uh Advanced area of imaging and this is where imaging is going computer vision is going So, uh, the example that I like to use for this is actually a bit, um Localized What do I mean by that? Uh, I I basically used an example, um, that that's quite prevalent in Singapore Uh in Singapore when when kids are at the age of 12 They go for their primary school leaving examinations And during those examinations if they're taking english Uh, one of the exercises that they have to do you have your comprehension Of course you have your essay writing That's part of the exam another part of the exam is an oral examination So during the oral oral examination, they're basically given a picture like this And they're asked to then talk about the picture to the examiner just talk So their eyes go across the image they study the different things that are going on in the image And they then explain to the examiner that this is what it's going on I think this is why that's happening I noticed that this is happening and then they put they form those insights their ability to community communicate Um is then assessed for that examination And one of the things that people have been asking is whether or not computer vision models are capable of capable of doing that Because this image is not something that you can study as a whole They're very Various activities very different things occurring in different parts of the image And is the model able to identify these parts of the image very importantly In its individual patches for example And the other question that's asked is whether or not a model is capable of understanding the meaning of these things Or their relationships per se And that that actually brings us to natural natural language processing A part of ai a domain of ai that's heavily influencing computer vision Your chat gpk makes use of this quite a lot This idea was introduced quite recently in the past five years. It's called a transformer And it's based on this idea called attention So attention attention tells us how different elements of a piece of data is correlated That data could be a sequence of letters or work, you know a sentence for example So so in this case high i am a short sentence that's a sentence A sequence of tokens rather each token being high i am a short sentence so on that that then that then forms a numerical representation That's that's processed by the model Now could we understand whether or not these tokens are related to these other these components of the sentence the words For example, if we look at the the words in the bottom right hand corner a plain banks a grassy bank the bank of england They are the same word the same tokens But they mean various very different things in the context that they are used in right a plain banks refers to The context of aircraft An aircraft an airplane, you know, it could be related to an airport Uh the bank of england that's that's a finance financial context, right? So could we map these contexts that understand the the the use of the words in the sentence By understanding that it's very much possible for us to do things like classification or prediction We could predict the next word in the sequence. What's the next sentence that should follow after high? I am a short sentence. Maybe it's something like nice to meet you Right and that's essentially what chat gpt is doing You take your input sentence the stuff that you put in chat gpt draws these relationships It understands the context of the words through a transformer through the concept of attention It actually understands through attention the important words in your paragraph It learns to ignore the unnecessary stuff and it provides a prediction output that prediction output to you Looks like a very nicely response a very nice response Through your question It looks like an answer, but it's actually predicting the next sentence. What does the user want from this input? That can also be applied to images if we think about this feature space where we draw relationships with each other It's very much possible for us to put in images and correlate them to these words as well Right a picture of an aeroplane could very much go into the top right hand corner, you know for that image below A picture of a bank could go into that finance group So a transformer actually allows us to do this what it does is it takes that image and it breaks it down into individual patches And it processes processes them just as you would For a sentence so it's actually treating an image like a sentence like a story The image is the the paper for this is actually called an image is word 16 by 16 words It's actually a very powerful paper. It was introduced to to to to sort of treat images like sequences like sentences Um, and this vision transformer paper the the citation is below if you want to take a look later It actually projects these images. It understands the relationships between them But of course the way the model is trained again is dependent on some sort of output So was this image an image of a bird? Was it an image of a ball? Was it an image of a car? By updating the class by predicting the class it's able to understand the relationships the the mathematical weights that it should apply to the data in order to perform these associations And the usefulness of this in the context of medical imaging has to do with the use of attention Attention allows the computer to pay attention to or or rather to look closely and ignore areas of the image That are not really relevant to the classification In the case of a golden retriever the one with the golden retriever for example The picture of the dog is actually really a really small part of the image It's actually picking up from under the you know from the car So if you look at that image very closely You know the car the seats All those features are things that you don't want to pay attention to all you care about is the dog And if we want to make that classification the rest of the stuff in the image could get in the way of the model understanding the output And because of that we apply attention and attention helps us understand What's not important in the image and helps us focus on the important features for classification In the area of imaging There are you know, whenever you talk to a clinician, they will they will always tell you there are certain parts of the image that I I ignore I'm not really interested in Because these parts of the image will confound my dog by my diagnosis They will confuse me right by looking at these parts of the image like muscle tissue like white space like cells Maybe there's an ink stain on the image You know, maybe there's an artifact something like a scratch that was done during surgery that I'm not interested in And we can actually train transformer models to ignore those parts of the image Just as we can train transformer models to ignore parts of a sentence that we aren't interested in That don't mean anything at all to the context and the meaning of that sentence But one level up and this is where medical imaging is going in the future One level up The outputs from different transformer models. Remember how I said we could put a picture of a bank And into a feature space that talks about things like Finance Bank Interest Right, we could perform that mapping if we took a picture of a plane We could correlate it to picture to to words like airport Right and and so that that embedding that feature space allows us to do a lot with medical images That this previous annotation, you know, um where pathologies actually Draw draw lines around an image to indicate where a tumor is that's not really natural for clinicians It's more natural for them to write about the diagnosis What did I see in the images? Um, I saw this uh cancer the cells look very concentrated It's a very simple as simple example Of course the examples of the diagnosis are written Uh are on the top left hand corner of the images But that association allows us to then train models that are capable of Outputting this these diagnosis. That means if I put an image inside I'm capable of giving a text representation of the diagnosis A sentence that explains what the diagnosis is One example that this is being used in of course is radiological imaging And in really radiological imaging, it's really remarkable that a model is capable of learning Text diagnosis and it's capable of then looking at new images and providing a text output of a diagnosis So this is a really really really interesting example of where image image captioning That's what this is called where image captioning is being used in medical imaging This same correlation allows us to take different images from different modalities MRI CT histopathology To put them together and perform a more holistic diagnosis of diseases And that understanding of diseases that development of that understanding Will definitely help us in improving diagnosis for patients and I think that Yeah, from that base understanding of where neural networks come from that mathematical representation of images the mathematical manipulation of images of data in order for us to get a diagnosis and the learning of those Multiplications those variables those weights that contribute to some feature representation of images of text You know, those things are really powerful and not just helping us perform things like diagnosis in the medical imaging domain It's also helpful for us to obtain a numerical representation and understanding of disease You know, what what is the heterogeneity of the disease? How does this gene correlate to a patient's chance of getting cancer? How does this gene help us understand whether or not this drug Is going to be useful for the patient in reducing the tumor size By doing these correlations, I think medical imaging really has a chance to influence the development of Drugs in disease of treatments like radiotherapy In customized diagnosis for patients for example And I really think that AI as a statistic as a really just a powerful mathematical statistical method Is capable of helping us in the medical imaging domain improve outcomes for patients And so that's really all the all there is to it. That's that's why I'm interested with it And I hope that this presentation has sort of given you an insight Into what these techniques are where they're useful where they're going and what are sort of the considerations involved Um, yeah, and that's that Thank you very much