 Good morning everybody. Thanks very much for joining us. My name is Matt Fry from the UK Centre for Ecology and Hologyology. I'm hosting this which is the sixth webinar in this AI for environmental science series which is supported by NERC or constructing a digital environment program. So this is a NERC program to be running since 2019 with the aim of envisaging and developing approaches to creating the future digital environment exploiting advancing technology and increasingly diverse datasets to improve our understanding and management of the environment. We just had a very successful conference this week. Some of you were able to make it the NERC digital gathering 2023 and I believe some of the outputs and the talks will be loaded up onto YouTube in coming weeks. Yeah, we've had part of the this program has been some funded projects range of other activities including the conference, but we've also had these webinar series, which have covered a diverse kind of range of areas about digital environment. This is a seventh seventh series in a webinar so that the programs run. So it has a kind of format we invite a presentation from a leading expert in the field and then we have a chance for Q&A at the end of it. You can see that all the other fantastic talks online in the YouTube channel I think I'm going to post the link to the YouTube channel so please go and subscribe to that. Look into the videos we've had the seminars we've had in the past. So like I said this is the seventh webinar series and it's focusing on AI and environmental science so focusing on the development use and application of artificial intelligence techniques and environmental science. So I tools are enabling new analytical value to be delivered from existing sources of data as well as providing powerful tools for gathering new data. The webinar series has covered activities across lots of different areas. I'm very excited to say that today's presentation is seminar from Professor Ian Styles of Queens University Belfast and the Alan Turing Institute. And he's going to be talking about AI for biological imaging sensing. And it's quite a timely talk as there's been a call out this week from NERC on the title of it. Automating image analysis for biodiversity monitoring so relevant to a couple of the talks we've had in the series so far. And we'll post the link to that so please do have a look at that. But yeah, as I said today's webinars from Professor Ian Styles Ian's professor of data science at Queens University Belfast and he's previously at the University of Birmingham where he is founding director of the Institute for interdisciplinary data science and AI, and the Turing University lead and PI and director of the EPSRC funded basketball tier two high performance computing facility. So his principal research interest is imaging and image analysis with a particular focus on applications in biology and medicine, and he works across a wide range of imaging modalities to develop new techniques for understanding and extracting and summarizing the content, content rich high dimensional image data. So the will. Ian's going to do a talk, please post questions in the Q&A section rather than the chat and we'll fill these all the end and have a have a chance to ask in some questions so just check we've got the recording running so I'll hand over to you in thanks very much. Thank you very much Matt and thank you very much to everyone for for inviting me to to be part of this this seminar series it's a pleasure to be speaking to a different audience to the one that I usually talk to. So as Matt said in in his introduction, my, my interests are in imaging, particularly in in biology and medicine, and I'm going to talk about some techniques we've been developing in in that context. And I can nod towards how I think those techniques might be might be relevant to the environmental sciences. And there's one very explicit example that I'll give where, where I can where these techniques have already been applied in that domain. And then I'll speculate a little bit on where on how the other technique I'll talk about might might be might be useful. Let me just get the right focus so a number of acknowledgments to various people who've contributed to this work, of course, as a large number of PhD students and postdocs in here, who of course do do the actual work on these projects I'll highlight particularly Jeremy Pike, who will, who did most of the work on the first project I'll talk about and Sam Tonks, who did done the bulk of the work on the on the second project I'll talk about and of course plays should go to them and criticism should should come to me. Matt already talked briefly about my, my, my bio and I won't have like two more except just to leave it like I've recently joined Quincy University Belfast from University of Birmingham. And so if any of you have come across me before then that that move is that's a fairly recent, recent move for me. I'm currently enjoying the, the, the, the slightly different environment of Northern Ireland as compared to the West Midlands. And if any of you are doing weather research, then I can, I can say with some certainty that we get all sorts of weather up here in Northern Ireland, all on the same day and often within the same hour. So, yeah, it's an interesting, interesting new place to be, but that's not what we're here to talk about today. I, as I said I've done most of my work in the biological and medical sciences, but there's an awful lot in common between those, those areas and the environmental sciences some of the techniques that are used are very similar. And if the techniques are similar then quite often the, the data analytics will be, will be similar to, and imaging is a, is a really important tool in the, in the biological sciences. Yes, it's really fundamental and, and, and image biological labs, expend a great deal of both, both resources and effort in improving and advancing their imaging technologies, and the same of course is true in the environmental sciences. So what I've tried to do in this, in this talk is, is to pick up two examples of, of, of areas where I think there is that significant potential for translation of these techniques into the, into the environmental sciences now I'm not an expert in the environmental sciences and I'm not completely up on the, on the literature in the area. And so it's quite possible that there will be some things in comp things that I talk about today that are already being done. And in fact I will give one example of that. But there are some things that we've done in the in the bio in the biological domain that I think might might be of interest to, to some of you. As I said I've worked on a, on a whole host of problems in imaging in the biosciences ranging all the way down from things happening at very, very small scales over, over long time periods so looking at the sort of long time distribution of molecules within cells. All the way up to, to some, to, you know, all the way up to looking at distributions of proteins in, in, in the whole organisms. And the whole range of stuff in between. Okay, some of which is more or less relevant to the types of problems that you're interested in. So let's focus on on two of these areas. First of all, I'm going to talk about some techniques we developed for an imaging technique called single molecule localization microscopy. And that's a technique that I've, that's the problem that I've chosen, because that work was directly inspired by a piece of work that was actually done in on on topical cyclone prediction. And I'll, I'll, I'll hint at that in, in, in a, in a moment. The second piece of work I'm going to talk about is really to do with multi, it's called high throughput screening is what it says on the slide here. So really it's about multimodal data, and about what predictions you can make across data modalities so given, given a whole bunch of multimodal data, can you make predictions of one data type from the other data types. And how do you ensure that you, how do you validate whether your predictions are correct or not. So we do this in the bio domain. I think there's plenty of scope for doing this in the, in the remote sensing domain as well and I'll, I'll, I'll say a few words about that towards the end. So what about by talking about a problem that's like down at the molecular scale, which used to do with identifying the locations of proteins in cells. So my fifth appearance is this is not, not not a cosmological image of of a galaxy. This is an image of of proteins in on the surface of a cell. Okay, it's what the protein isn't particularly matter and there's, you know, there's some information on the bottom right about about what the cell type is. So the way this image is acquired is we take the protein we're interested in, which in this case is something called alpha tubulin. And we, we attach a fluorescent marker to it, and we fiddle with the chemistry to make that fluorescent marker flash. So it flashes stochastically. And you obtain a movie image that looks something like this so you see lots of flashing points. So if you look at this image we can see we can perhaps if you if you look at it for long enough. You can see perhaps there's some evidence of some, some structures that look like they might be fibers in this. And perhaps there's a there's a gap in the middle of that structure. You see, there are indeed structures that look like fibers in this, and there is a gap, a bit of a gap at least in the middle. And so this image that we that we show here is a reconstruction from those, those flashing floor for us. And that that we then managed to localize we take each time step, we identify where the where the flashes are occurring, and then we accumulate those over time. The reason we do it in that way is because the resolution of the microscope is far far bigger than the size of the protein so what we do is we make them flash, so they're only activated sparsely. And if they're sparsely, we can assume they don't overlap and therefore we can isolate them more effectively and more accurately. So, so this type of technique gives you very high resolution images of, of the structure of the protein that you're interested in. So that protein in this case associated is associated with a structure called the the microtubule network, and that's what you see in the image on the left and then in slightly close up on in the image on the right. Fundamentally the data that we have here is comprised not of, not not of these these lines okay okay this is not this is not a direct image of these tubule of these microtubules. This is an image built up from from from a set of points. And fundamentally what we have is the coordinates of all of those. All of those flashing to all of those. Okay, so, and this image is is a is a computational reconstruction of that. But that masks a lot of what's going on here. And in fact what we would like to be able to do is to rather than in in build a load of biases by reconstructing the image in this way is to analyze those points that point like data those types of coordinates, much more directly and to understand what that's telling us about the, the structures that we have. Okay, so the work I'm about to present is is inspired directly by a paper that was, it was, it was actually published in 2020 but was released as a play point a few years before this. There's a technique called persistent homology to quantify something called a diurnal cycle that you get in in in cyclical cyclone data. So there's some kind of cyclical behavior in the data that tells you that that something repeats over about a 24 hour time period. So cyclones are sort of interesting for this because they complies a sort of circulating region with with with a hole in the middle. And this complies, this was rather similar to something that we will a problem that we will interested in in in biology. When I saw this paper I thought that maybe there's something we can we can draw from this and and and apply in a slightly different area. And that technique is is persistent homology. Now persistent homology is is a technique that is that was heavily on a mathematical field topology. So homology is related to geometry. But well as geometry is concerned with things like distances and areas and angles topology is much more explicitly concerned with with shape. And in particular about classes of shapes that can be deformed continuously from one shape to another. And humans are really good at this. Okay, so the three images at the bottom here are images of a structure called eventually in bling that is quite common in biology. And we can look at these three images, and we can see that they contain a very distinct loop like structure there's a there's a closed loop in each of these three images. So very different shapes and a very different dimensions. But, but we can see them we're very good at doing this computationally detecting those links, those, those, those close those close loops given that particularly given that they're not actually closed loops, because they know clearly got gaps in them and remember they're made up of point like data so all we've got is sets of coordinates here. So, what, what these structures have in common of course is that you can, you can deform one to the other in a fairly sort of continuous way. And this is what topology is good at this is why it was used in that in that in that cyclone up with topical cyclone application. The fundamental problem here is detecting structures that the geometrically quite different, but the topologically very similar. And the sort of classical cartoon example that people use here is is and sort of 11 o'clock in the morning when it's everybody's coffee time is probably quite a good one is that in geometrically a coffee mug and a donut, a very, very different you wouldn't normally consider these to be the same shape, but topologically they are, because you can take a coffee mug. And as is shown in the bottom right hand corner here, you can deform it continuously without tearing any new holes or filling any holes in into a donut you can imagine that if you had a piece of play that you're going to make your mug out of, you can, you can, maybe you started to, to form your, you form your handle and your, your, the bowl of your mug, you can take that piece of clay and you could continuously remould it until it looks like the donut at the bottom. This, this approach, this, this, this, this, this really sort of simple idea of, of being able to continuously deform one shape into another makes this problem of identifying a diverse range of shapes that are fundamentally the same, quite much more flexible and in fact provides a very robust and very flexible and extendable framework for it. And this technical persistent homology allows us to go from things that we know and familiar with geometrically, and to extract those those key topological properties of these structures from that. So the sort of cartoon mathematical sketch of how you do this. You take your set of data points, which are the black dots on here. And you draw areas around them so little circles around them, and you grow those circles. You look at what happens to your, if those circles intersect, you draw the line between them. You look at what happens to that that graph that network, as you increase as you vary the size of the circles. And what you'll find is that things that are what we would call real features, like holes in the middle of those loops will persist over a large range of length scales by which I mean the diameter of these circles to show you what I mean, let's run a cartoon example. Okay, so we have some points, and we have a small circle flown around them. Okay, and each of the lines on this bar on this thing at the bottom here called a barcode, let's present one of those points. As we increase the size of that radius of that circle, we get to a certain point here where two of the two of the points have their circles have joined, and we draw a line between them. And that those two circles are replaced by a single component corresponding to that joined pair. And that happens at the radius of about seven units. Okay, and so one of those components gets deleted at about seven units. We continue doing this, and we join a few more things up. And we find that they get replaced in this barcode by the by a single component that represents those four things joined together. But we're not learning very much. But we get to a point here, whereby we have joined all of these points together, and we now have a very small void, very small region that that is that is closed. Okay, so there is a closed loop in the structure here. And that tells us potentially something important about the cyclical nature of our data. We also rather than just joining pairs of points together we join. We get to a point here where the quadruple splits of points together and it is soon and we'll I'm not going to show this explicitly because I can't in 2d will join Tesla Hedler together so to the quarter quarter points of points. We continue doing this, and we form another whole like big whole like structure here. And finally, everything gets closed off. Okay, now. The point about this is the process of forming this, this network like structure makes the process of detecting these void legions these holes, very, very easy. Indeed, there's some simple algebra linear algebra that one can do to isolate this. Okay. So you can you can picture this you can you can do this graphically if you want this is a much better illustration of how it works. And you can see you've got this very robust whole like structure on the right hand side here that persists for a very long, very long time as you're forming this network. So this is this this is what the technique that can be formalized very extremely rigorously mathematically that you can use to detect that type of structure in your data. Now in 2d it seems like a bit of a bit of overkill but in fact, this problem is really computationally difficult in 2d. And in higher dimensions, it's even harder. And in the higher dimensions things become really problematic, because we then can't visualize this to do this by eyes a little easy in 2d in higher dimensions, it can become virtually impossible. In 3d it's just about possible. In 3d of course you don't just have have holes in a donut, you have might have say void legions inside a football, where there's where all of the points might lie on the outside of the football, all your data points might lie on the outside of the football, and the inside is empty. Okay, very difficult to see sometimes, and in again in higher dimensions, almost impossible. Okay, this technique is, is dimension independent and can scale up to any number of dimensions that you want. The algebra is completely general and independent of dimensionality. And here's just another video showing this on a slightly more random collection of points and again you can see this, this whole preserving structure in the bottom on the bottom there. So, how do we, what did we do with this. So we applied this to a problem of understanding the distributions of molecules on cellular structures. We first used it as a very robust clustering technique so you can take, you can take the technique and adapt it a little bit to identify what we would to do clustering using a concept that we call persistence. Which looks at basically a density of points relative to their local neighborhood, and identifies whether a point is is is whether a an increase in density is truly robust against its background. An example that I've heard used on this which I think is probably reasonably accurate is looking at, you know, mountains on mall hills, okay. The, the, the mall hill on the mountain you might have a small mall hill on the side of a mountain that represents a region of increased density, but compared to the global landscape of the mountains. It's irrelevant. Okay, so the this notion of persistence looks at how robust something is above its above its local background level. And in a in an earlier way you have very, very noisy data that allows you to do very robust clustering. Okay, so we're the particular biological question we're looking at here. Is one of inhibiting a biological process with a. So it's a particular leceptor called the site receptor that can be inhibited by a drug, and that has an effect on accumulation of other things in particular, the clustering of a protein called integrin. And we were able to show extremely robustly using this this idea of persistence based clustering. That application of a plug that inhibits that site receptor reduces the mean area of of those clusters. We could not do this using any of the other established clustering techniques like K means, or DB scan, or other, or any of the other clustering techniques they all fail on this problem, because the nature of the clustering is is very subtle. So this technique is able to pick it up extremely robustly. And we have, we have biological, we have a different approach to this that is able to verify this using a different, different measurement technique. And so we use that as an external validator of this approach, that notion of external validation is something I will come back to very shortly in the next section. We're also able to use this to to detect something more subtle about the organization of cell surface receptors, which is whether they organize themselves in a in a kind of uniform dense cluster, or if they'll structure within those clusters, and that supplies you from what I said earlier, that we're quite interested in whether there are whole like regions in the organizations of the protein so you can see if you look at these images at the top this is a, these are images of something called a nuclear complex. They all form quite distinct willing like structures and that's a very well known phenomenon. Okay, you can see it in the images. But doing large scale quantification of this is extremely difficult because it's it again the data, we can do it quite nicely by I. But doing it computationally has proven to be an extraordinarily difficult thing to do. Because the valuation in the clusters, and the amount of added noise in the images is really very difficult. So what to do that the nuclear power structure by the way is a hole into the the nuclear into the nucleus of the cell. And it's a way that things get in and out of the cells. So we were able to classify to identify and isolate those structures and kept and indeed count them, and profile them extremely robustly with this approach. And on the right hand side here we show a different biological problem, where we have similar phenomena we have structures that have that have holes in the co localize and other structures that don't. And so this this approach proven to be very robust and very scalable for looking at large scale screens of these. I'm fairly sure that there are, but there are problems in environmental science where you may be able to to draw on similar on on on the on the on this type of technique. In that in that in that different context, but in in this in this problem, this is going to be extremely robust and very scalable. And so we were we and this this work it turns out is now become a quite a quite a popular technique that's been drawn on by the rest of the imaging community working in this area. I'm going to move on now to the second topic I said I talked about which very recent piece of work. It's been done in collaboration with the pharmaceutical company Glaxo Smith Klein. And, and my colleague, so PhD student Sam Tonks, and my colleague Alex, my former colleague Alex full and continuing collaborator who's who's still at the University of Birmingham. The problem we were interested in here is one of something called high throughput screening, which is ubiquitous in the pharmaceutical industry. And it's, it's, it's used in the very early stages of drug development to screen large numbers of drug candidates on on single on cell cultures. To see if they can identify a biological effect of a drug on on a particular cell type. And it is back one of the very earliest stages in the drug development cycle. And what it, the way roughly it works is they take a candidate drug compound, and they dose up a, they, they, they dose up a cell culture with that compound, and they look to see what it does. So they look to see what it does is they image it down a microscope. And in order to see the things that they, that they're interested in, they apply a, they, they, they add a fluorescent label to to the structures that they think the drug is going to affect. And what you end up with is images of a number of fluorescent labels attached to a range of different structures. And they look to see what effects the plugs have on those structures so in this in this study what we have is we have a fluorescent label that's bound to to the cytoplasm, which is the fluid inside that makes up the bulk of the area of biological cell. We have something that's bound to a drug that's bound to the floor of all that's bound to the structures within the nucleus of the cell. We have a drug that we have a floor for that specifically looks for damage to to DNA so identifies damage to the structure of DNA, and, and attaches itself to that. So with those floor falls we can image those three structures separately. We can also take what's called a bright field microscopy image, which is where we basically shine light through the sample, and, and it gets attenuated by different structures in the image, and gives us some really gross level information about what's in the image, but at least on first site doesn't give us any of the specific detail that these images contain. This is extremely costly they do this millions of times across millions of plates of sample plates. It's very time consuming. And it's limited by a phenomenon called spectrum saturation, which dictates basically what floor falls you can use here and still image them uniquely. And so the question that we, we asked here is, can we take a very simple image type. The bright field image that we see on the light so this is a very low cons last low cost, and at least superficially low content image. And can we somehow, is there, is there hidden or correlated information within this image that we can't see with the human eye. We can somehow extract the, the information that is that is correlated to these fluorescent images from that low cons last image. Now, this might seem like an impossible task. Okay, this image is really very poor, but it doesn't doesn't show you very much at all. These images show you quite a lot of detail. But it turns out, as we will show in a moment, there is a lot more image, a lot more information contained in that blackfield image, then we can see with the naked eye. Okay. And it turns out that this prediction problem of going from that blackfield image to these images is in a sense possible. And I say in a sense, because there are some caveats to this that I'll touch on at the end here. Now one can immediately see that one might have applications of this in in say remote sensing. So particularly if you have. If you're looking at sort of very high high spectral resolution multi spectral data, you might ask me do we do we need to acquire all of that data. Can we predict some of it from other other modalities. Can we can we reduce the spectral resolution of that data is another thing you might want to do just to a, you know, perhaps increase the time resolution or just reduce the amount of data that you acquire. Okay. So, the, the, the technique that the way we've approached this is, is to take a technique that's been quite popular for a while in computer vision, which is that of image to image translation. I have been playing with this for a, for a number of years now. And it's allowed the computer vision people to do things like convert between images of zebras and horses. And to take a photograph and to generate a painting in the style of money, or to recolorize an image to take a summer scene and convert it into a winter scene. Okay, now. This is all this and if any of you have played with tools like darling, or mid journey, or indeed stable diffusion which are some of the more, the more popular tools at the moment. Doing this type of thing is, is, is really, really popular and can generate some, some really, really impressive images with that with the techniques that people use, use nowadays. And of course, it's one thing to generate a pretty picture. And it's another thing for those pictures to be genuinely accurate. And if you start looking at the details of some of these you can look at for example this picture of a, of a zebra down here and you can see it just doesn't quite look like there's something not light about the, the sliding pattern around the face. And, you know, it's kept kept the horses main which which zebras that don't really have in so much so much. And that might not matter for kind of toy application like that. But for biological data, and indeed for environmental sensing data. It really does matter because you want to do science with the, with the synthetic images that you would generate here. So we, the pharmaceutical industry have been very excited about the potential of this type of technology for the placing multiplex imaging. And so what we set out to do was to try to understand whether we could do that image to image translation task of predicting fluorescent images from bright field, accurately and reliably enough to do biology with those synthetic images. So we used a, we used an off the shelf image to image translation system it's a, it's a, it's based on a, a, a pair of general, it's based on a pair of generative adversarial networks. We might do it differently now and I'll maybe hint at that how we might do it differently at the end. But basically it's trained, it's trained on the paired data. So we, we train on. We have a large data set of, of, of image of paired image data, where we have bright field images and the corresponding images of our fully fluorescent stains. And we claim a neural network, a generative adversarial network to, to predict to generate the fluorescent images conditioned on that blackfield image. Okay, so we have the, we have the benefit here of having a very large supervised training set. We have 1.5 million images, we didn't use all of them we used about 300,000 of them, because we found we were getting diminishing returns on, on, on, on, on that we have that 300,000 was more than enough to do what we wanted. So we, we, we, having playing down your network to do this, we were able to generate arbitrarily large numbers of synthetic images. So what we have on the, the two rows here, the top are the ground truth images, and the bottom are synthetic images generated from the corresponding, the, the blackfield image corresponding to the top flow. And you can look at these and visually, they, they certainly look plausible. And if we didn't have the ground truth image, we would perceive each of these synthetic images to be, to be very believable. And as in they were, they were good enough to fall expert biologists who were unable to detect them in a, in a blind test so we, we put a panel of these to a, to a, to some experts and ask them to say whether they could, could detect which one was real, and which one was synthetic. And they were not, they were not able to do so reliably at all. So, and we, we were able to, to, to, to, to, so that, that, that, you know, that's one test, okay. And the other tests that we, that we learn on this included it would to use a number of what we would call internal validation measures. So the structural similarity index peak signal and peak signal to noise ratio to assess the, the faithfulness and the, the quality of the, the generated images so the structural similarity index compares an image to its ground truth. And the peak signals noise ratio looks at the sort of overall quality of the image. In, in, in, in all cases those internal validation measures that look solely at the images indicated that the synthetic images were very good indeed, which was encouraging. But we were still not quite confident about this. And the reason was that, even though these, these statistical measures were able to work, did not generate any significant differences between the real and synthetic images. And even though the biologists couldn't tell them couldn't tell them apart when they didn't know. So if you actually look at them, there are some, there are some subtle differences in them that we thought could be biologically important. So if you look for example, the region indicated by the arrow here, you'll notice that in the, in the ground truth there's this this cell towards the upper region of this of the site of plasma image shows a quite different distribution of the floor for compared to the synthetic image. And if you look at again on the right hand side at the DNA damage channel, and the cell that's highlighted with the arrow, you can see very different distribution of the diet of the fluorescent die in that cell. So we've, we thought that there was something that although these were visually plausible and statistically plausible, that they, that they may not be biologically as as realistic as we might have hoped. We began to put a little bit of flesh on this by trying to understand whether the image to image translation technique itself believes its own predictions. So we decided to look at the variability of the output of the network which does have a stochastic component to it. We generated large numbers of synthetic images and looked at the valuation across them so calculated computed the variance of the predictions basically on a pixel by pixel basis. And we found that although they are biologically that they're visually plausible and statistically plausible. Some image channels in particular noting particularly the, the channel that has for for DNA damage has quite significant variability in its predictions. Okay, in other words, there are many plausible biological images, there are many, many visually plausible images that that network could generate that have excellent statistical properties and excellent visual and look excellent visually, but actually they're highly valuable. And that gives us that caused us again a little bit more suspicion about whether these things are, are biologically useful. To really flesh out what was going on here. We took a the standard image analysis pipeline that GSK use in assessing their, their images. What one does that uses classical image analysis image processing techniques to compute a number of of scores that represent things that they think they're interested in so some of them do things like count the number of nuclei. They compute statistical properties like the, the, the intensity main, some of them compute much more sophisticated measures of texture on the images. And in many cases, the, the networks are able to so in, in particular in the in the in the nuclear images of nuclei, the most of the statistical measures, most of the feature image features. In the synthetic images are indistinguishable from the, from the, the same features calculated on the ground truth images. So that's good. Okay, but there are one or two things that are not the same that vary. There are two features that really do vary significantly. And when we consider different channels when we start considering the DNA damage channel, the amount of variability between the, between the, the features calculated on the ground truth and calculated on the, on the synthetic images that they begin to become really quite wildly different. Okay, even though they're visually plausible. So we, we, we concluded from this that, that although perhaps some of the images, particularly the, the new, the images of the nucleus and to perhaps a lesser extent the images of the site, the cytoplasm are a pretty good and are at least moderately reliable. And the DNA damage channel is, is far less reliable, even though it's visually and statistically plausible. Okay, we took this a step further, and we looked at a measure of whether the synthetic images were able to show a drug effect. So we took images of the, of experimental control samples that didn't have a drug applied to them. And we compared them using something called an RZ score with images that had had a drug console applied, a drug applied to them. And we looked for, for differences between those images and higher, higher values of the RZ score indicate a consistent difference between, between the images. So we took some of the image, some of the image features with the, with the highest scores, the most significant features. And we, we computed the, the RZ score to assess whether the synthetic images were able to show a consistent difference between the similarities, the, the, the ground truth and the experimental controls. Okay, so what we see here is this this top line here. So just this line, sorry, let me just remind myself what's what. So the images show that given the ground truth image, you're able to consistently all of all of the image features to greater or lesser degrees indicate that there is a drug effect they all have quite large values. When we consider the virtual nuclei images, they're slightly less effective at identifying the drug effect than the ground truth images but they're still potentially plausible. The virtual nuclei and the cytoplasm images taken together are also able to reflect the drug has an effect. So when we introduce the DNA damage channel, any notion that there is an effect of the drug any different consistent difference between the experimental control and the drug dose sample disappears, according to this measure. So the, the, the effect, ultimate effect of this is is that the, the, the poor biological quality of that DNA synthetic DNA damage image has destroyed any of our ability to detect a drug effect in those synthetic images. So what's our conclusion from this. These image to image translation methods can produce visually and statistically realistic virtual virtual screening images. But what we've demonstrated here is that visual realism and indeed statistical realism is not the same as biological relevance. So to rely on, on those synthetic images alone. Any downstream analysis you make you do such as determining whether there's a drug effect or not may well fail if you haven't done sufficient validation on your on your data. If you're thinking of doing this in remote sensing. My long advice is do not rely on visual realism or statistical measures of image quality. Because those types of internal validation are not sufficient to to distinguish whether you're the content, the, the, the important content of your synthetic images is actually pleasant. And you need to think about whether there is an independent external validation that you can use. In our case we have those, those, those features generated from a from an independent pipeline that we could use to assess the, the, the through quality of the and the through content of the image data. So although we think that there is significant scope for these methods. The, the, there is still a lot of work to be done on on genuinely evaluating them properly. Now, one question that you might ask of this is perhaps this is an artifact of the, of the technique we use to translate between the images. So we've got some, we've got some, we've started doing some explorations as to whether some newer methods, in particular, in particular diffusion models can do any better in this domain. And they're able to generate some really impressive and really high quality images. This is the image in this case is images are these are images of blame tissue so these are the sorts of blame Atlas type data that people use slices of blame tissue and looking at delineating nerve cells within that. And, and it's able to do some, some, some quite get some quite impressive results from, from these diffusion models which, which do the predictions in a slightly different way. But so far they're failing for the types of data that we've been working with we need to do a quite a bit more work on on understanding why that is part of the problem is that this method is is much slower. Than the, than the, the, the GAN based method that we've used previously. But this is, this is an avenue that we're actively pursuing. So with, with that, I want to just wrap up the talk with a with a brief summary of what we've covered here. What we're going to talk about today is two techniques that I think could have some, some application in in environmental science, and particularly environmental imaging. The first technique we showed was looking at identifying structure, and in particular void void legions in in point like spatial measurements. So we did this at the intern for the distribution of molecules with a within a cell. You can equally, I guess that that you know you could look at coming to mind the schools of fish and that sort of thing in the environmental sciences but I'm sure there are many more applications of this. There's at least one known application in, in, in weather stroke climber in, in, in soft talkable cyclone in understanding the behavior of a topical cyclone. The second thing we talked about is something that is this multimodal data translation problem that I think could have some fairly nice applications in remote sensing. But as we've found in the biological case needs to be done very, very carefully so very easy to get excited about very realistic images. But you do need to think very hard about whether you're whether those are genuinely trustworthy, and whether they do contain the con, even if all of the statistics and all of the visual analysis you do of them looks convincing. An independent external validation is essential, but I do have some confidence that that we are with that when we can get those types of pipelines in place and given particularly some very recent advances we've seen in in AI and there's some really exciting techniques coming out at all the time at the moment. And I do think that within the next few years that that we'll be able to do some really quite exciting things with these techniques. I just like to sum up by saying that there are. There are many more problems in the biosciences that I could talk about that have a lot in common with techniques in the environmental sciences so some of the multi spectral work we do in the biosciences compared to some of the multi spectral remote sensing work. And I think that there's a lot of scope for the two communities talking much more frequently than they perhaps do. I have almost out stayed my welcome so I'm going to stop talking now, but I do have time for some questions if anybody would like to ask any, and I'll stop sharing. And that's fascinating thanks it's really very interesting. Yeah, I'm not saying welcome at all I'm sure people would be happy to stay and ask some questions so like I said, people post those in the Q&A we can field them on to Ian. So that's great I mean I could already see loads of applications in environmental science so those really interesting to understand those methods and even you know just a quick search while you're talking, thinking about LiDAR data where there's some clouds, and people are looks like people are already using that persistent homology there but I'm sure there's lots of interesting aspects. Well, very recently by looks at it as well. You're talking about holes in those states but is it, does it, you can be used to other types of other kind of features and structures and things. So persistent homology is, is, is principally concerned with, with continuous deformation so it's really, it really is best suited to identifying whether you've got legions of space that are not occupied or legions of your data space that are not occupied by things. Now in, you would, as we did and I didn't didn't get into this, particularly in what I talked about earlier, the, we would, we combine this with geometric measures. So it's, it's very useful for sort of bulk level characterisation of structures, but doesn't tell you, doesn't, you know, doesn't tell you how, how dense your points are, how far apart they are or anything like that directly. But it does, it does tell you whether the structure belongs to a clock to a wider class of other structures. And, and that can be really useful, both for high throughput analysis, or for very high dimensional data where you can't detect that type of information at all. So one of the keynote talks at the digital conference this week about using point cloud LiDAR data on trees and identifying individual tree structures and branch structures. So there's obviously lots of applications there. So that does look like, that you can floor on this to do some analysis on those types of structures, yes, I have seen that done. So there's a question in the chat from Tom Wilding said, it's very interesting talk thanks. So for the drug testing example, when trained on 300,000 images, is it possible that machine is better than the fluorescent dye, because the machine will integrate over all images to predict where the fluorescence would have been located in a single image. So that's a really good question. In the first case, we don't think so. Simply because the, the quality of the, the input images, the, the, the black field images is really very poor indeed and we don't think that there's any grounds to believe that the, the machine would be doing much better here. Now I do concede that if you had a much better quality source of. If you had a really high quality source of input data, and perhaps the, the techniques that you were trying to predict will have low quality, then you might get a very different answer and you might want to put more faith on your prediction. But if the tables returned and we had an extremely clear and high resolution black field image and very poor quality fluorescence images, then I, that may well be true. That's interesting to think in the same thing as I haven't thought about. Thank you for that. One of the things we mentioned about the potential applications to the hyperspectral imagery from satellites or drones, which is increasingly happening and whether you could. Yeah, people looking at trying to generate that type of data from cheaper cheaper methods if they've got enough data. I think there's good cause to believe that you might be able to generate some of the spectral channels from other spectral channels, which might allow you to acquire a much lower resolution. So that's another question on that. Because obviously you potentially you could have a model that could just generate likely looking images that could fool a biologist from, from nothing at all without an initial right field image but. Absolutely yes. I think it's kind of is a way of producing sort of measures of the intrinsic information in those initial images relating to the actual final application of it is that there are ways to jump, jump that step to see if an image is going to be any use on. I don't know. I don't think we've got so one thing, one thing that we. So you always need to. So one, one thing here is you always need to have a sort of biological grounding in this now. Here we've always started with with an image. But actually, I think nowadays you could start with with other sources of data so you could start with a bunch of experimental parameters. And generate synthetic images from those so much as you see, you'll have seen me the, the, the, the techniques that can generate images from a piece of text. So from a capture. Okay, you could envisage that you could enter a description of an experiment and get a bunch of images out. I have no idea how, how actual those would be. If I could envisage and people are thinking about this and we're thinking about the equivalent of large language models for this type of problem. So how do we, how do we claim an enormous model that could encapsulate everything we know about, about biological cells. Okay, that that's looking slightly more feasible than it perhaps once was. And, you know, you could be thinking about the same in terms of in terms of environmental images, I'm sure. And I'm sure people are thinking about this. Yeah, that's fantastic. Thanks very much. Just, just final one then just what could you kind of summarize what you said you mentioned you've been working with kind of companies with those with those workflows what's the kind of impact of those, the output of that method is it, is it just about time saving and things or that certainly that they become. And I wouldn't like to say that we've, you know, we've dampened their enthusiasm. But I think we've shown them that they, we've been people in. So, being trying to try to not not be too critical here, people in people in high up in large organizations like that are always looking for ways of saving money like. I think that the work like this. It's plot that these techniques are promising. Okay, but we can't get rid of our labs just yet. But I think I think you know digital biology is in that sense is coming. I'm pretty confident that it is going to happen to greater to a greater or less extent. Exactly how much it will replace wet lab work. I think that's still to still to be worked out. Great. That's fantastic. Thanks very much. And I think that's probably all we've got time for today. Thanks everyone for staying on. Thank you for your presentation and discussion just to remind everyone will we've recorded the session will have that available on the website and on YouTube channel again soon post the link to that. Just to say that yeah the next webinar is going to be on Friday the 4th of August again 11 o'clock with Tom Anderson of British Antarctic Survey talking about tackling diverse environmental prediction tasks with neural processes. Yeah, make note of that and book it in find the link on the website and register for that one. And yeah again thanks very much Ian and. Thank you for your attention and thanks for inviting me.