 Good morning, everybody. Thanks very much for those come along this morning. Welcome to this digital environment program seventh webinar series on this one's on AI and for the environment. I'm Matt Fry from UK Center for Ecology and Hydrology hosting this, which is the fourth webinar in this in this series on AI environmental science, supported by NERC and the construction of digital environment program. It has made me to develop the digitally enabled environment benefiting researchers, policymakers, businesses, communities and individuals alike. The program is running for a number of years with the aim of envisaging and developing approaches to creating the future digital environment exploiting advancing technology increasingly diverse datasets to improve our understanding and management of the environment. This through funding a number of projects and a range of other activities, and also through building the community in the area of digital environment. Running events and a successful conference last year, which point should remind people that there's another conference coming up this year than from the digital program the NERC digital gathering 2023. It's open for registrations and submission of abstract, the details should be in a link in the chat soon. So I'm going to place the 10th to the 11th July at the British Antarctic Survey offices in Cambridge, and it's free to attend so book your place now. So, this, as I said, this is a seventh series of webinars in the, in the, the, this series from the digital environment program in this, in this series of because considering the role in the opportunity as well as some of the pitfalls and the use of AI and environmental format of webinars is to kind of invite presentation from leading experts in this field across different kind of science domains and different using different methods, followed by chance for Q&A. And could I invite you to look at the links in the chat. Follow the digital environment Twitter feed and also subscribe to the YouTube channel if you haven't done so yet so there's all the talks are on there's a fantastic arrange of really interesting talks and well worth looking into. So this, yeah, this series as I said focuses on the development use and application of artificial intelligence techniques environmental science. So my tools are enabling new analytical value to be delivered from existing sources of data as well as providing powerful tools for generating new data. And this webinar series is going to cover activities across this area. So I'm very excited today to say that today's presentation seminar from Tom August of also the UK Center for ecology and hydrology always gives a really interesting talks. And he's going to be talking about AI for on the ground biodiversity monitoring. So Tom's a computational ecologist that the UK Center for ecology and hydrology is research focuses on the use of technology to better, better our understanding of biodiversity by through innovative data collection methods and also engage in communications. So this comes from a background in field and applied ecology and now focusing on applying AI methods to improve the quality and quality of data that we collect about biodiversity from remote monitoring stations and mobile applications through to natural language processing and virtuality. Tom's shown that these novel methods can be used to improve our biodiversity research. And in this webinar, Tom's going to discuss recent advantage and technologies and AI for monitoring biodiversity including computer vision AI tools for processing audio. So I'm going to be talking about AI for on the ground biodiversity monitoring and I put on the ground there in quotes because I'm not going to be talking about anything satellite based or drone based I'm talking about monitoring with feet on the ground. And then when I talk about biodiversity monitoring, I'm really talking about species level observations so what species has been observed, where they've been observed, and when is it being observed. And I'm going to present a number of different case studies around this topic. And so you also want to say upfront this involves collaboration as many others. And a lot of things I'll be talking about a developer by other people, either UK, UK CH or at our collaborators. Okay. So, I feel it wouldn't be an AI presentation without some sort of generative art so I will get that out of the way on on the first slide this is mid journeys interpretation of an Android undertaking a butterfly transect a butterfly transect is very, very classic well established monitoring butterflies and he's seen Android walking through a lovely meadow surrounded by an improbably large number of butterflies. But I think it's this image captures to the key elements of biodiversity monitoring, which is the observation process and the identification process. The observation here being the visual acquisition of information, and then identification then the reasoning of that image into a species identity, but I could also be acoustic listening and then identification that way. And then identify sounds kind of a bit like serve and protect the LAPDs. Yes, the Android is the future that mantra. So, I'm going to delve into this concept of observing identification using AI through three different case studies so the first is about AI driven biodiversity monitoring so can we build systems which do both these things and are driven by AI, both observe and identify. So the second is more human involvement. The second is, can we improve that identification step by developing AI that are human inspired in their model design so they take elements of what humans do when they identify species, and we we code that into the models to see if it has that ability. And then lastly, I want to talk about AI system monitoring so this is where not we're not trying to replace the humans in the process but how do we. So not trying to put this Android in the butterfly menu, but by a meadow, but how do we align how do we, how do we supply AI tools which will assist a human who is undertaking these sorts of assessments. I am a data set. Maybe I am more than a data set. I am a data set dreaming. So to give a bit more color to the presentation, I'm going to drop in some random bits of poetry. I'm really involved in a collaboration with Thomas sharp as a part of the CD funded and decide project in this work called the data set stream. So a few of these will pop up as we go through, and we'll use them as kind of starting points to tackle some of the, some of the topics. We will be running as a part of the digital gathering that the map mentioned we will be running a short workshop on artists scientists collaboration so if you want to hear more about dead set stream and our experiences then you can catch up catch up there. And biodiversity monitoring has been going on for hundreds of years. And a lot of what we do at CH, where I'm biodiversity monitoring is citizen science or community led monitoring. So this is volunteers members of public going out and recording the nation they see so kind of natural history. And through these hundreds of years of natural history in the UK, we have large amounts of data. I mean, the large volume starts or after sort of 1970s we start to get sort of fairly decent amounts of data, from which we can then extrapolate trends in biodiversity over time, and these data collected by members of the public across a real wide sector, really underpin a lot of the reporting that we do in the UK on our biodiversity, and actually also globally, they underpin a lot of the kind of global statistics on trends in biodiversity. Across the UK landmass humans observe butterflies and mods. They log their observations, their data comprises location date and species name. These are the fundamentals of reality. Space, time and imagination, location and date, coordinates of space time, species name, a victim layer created by minds. So in the perma data sets dream, it's, it's a poem, which is supposedly the dream of a data set so a data set, which contains butterfly moth observations has become conscious, and it's having its first dream. So it's talking about natural histories going out and logging their observations. That's what I'm going to talk more about. I do quite like this two sentence or three sentence refrain at the end here, where it's kind of tearing apart the, the biological record is a species location and time location and time being fundamental physical properties, but the species name being something that's created by humans and that's not something I really thought about but we do have a lot of problems with species names, different people, giving these different species names or selling them in different ways. So it kind of adds adds a nice kind of insight into why that's particularly problematic as it's something that is human generated. I'm going to try and mute myself when I cough just not to blow your ears out. This is data coming in to I record I record is one of the citizen science platforms that we look after UKCH. And you can see here, very large number of observations over a million records is a few years old now so we gave more records in this now spatial coverage you can see it's pretty good spatial coverage across the United Kingdom. So, you know, police places where the sparsely populated tend to get a few records, and you also see a bias in the taxonomic coverage so this blue line goes up very strongly here at the invertebrates so that's the principle groups that we focus on are the invertebrates. I mean, mainly because the things like mammals, the birds, they're quite well studied by or supported by other other charities are NGOs. And these, these special biases is something we think a lot about there's also temporal biases which we think a lot about as well when we come to analyze this data. So I want to move on to this first case study where we think about. We've done this bilateral recording passes that's principally led by volunteers. How can we use AI to move to a system, which could be entirely independent of human intervention. So the case study here is around moths and this is the classic pose of a mother. This is a moth wall setup so hopefully you can see my cursor this light at the top is a UV light is very bright and moths are particularly attracted to UV. It's hung on a wall. So when moths come to the UV light they'll then land on the wall. And this moth over here has got a torch spotlight which is illuminating the moth giving nice illumination so that you can collect a photograph on his phone, which he's then submitting to an actualist. So the key features here of a moth, a moth wall is UV light, the wall spotlight and a camera. You can put that into a system you end up with something like this is designed designed by our university and published in 2021. And we can see here is a UV light that sits on top of a small wall. And in front of it is a camera in a box and a ring light around the camera illuminating it. And four components that you have here with Christy and his moth thing, except now it's in a box. And it can sit out on its own so we first use these bought from the Denmark team back in 2021. And we used it to monitor land for narrow grails narrow grail doing a pilot into monitoring biodiversity on their line side, and they didn't want to send any people out onto the line side for safety reasons. So they're interested in what technology they can use for monitoring their biodiversity we actually use a whole range of different technologies, but this one showed particular promise. It did attract quite a lot of moths to it. The images were pretty good. And moths are quite distinctive in how they look so it seemed like it would lend itself fairly well to computer vision. So working in partnership with that that team from Denmark, we kind of developed a CH second iteration so we have a workshop here at CH and a group of engineers who kind of took in that that first model and kind of redesigned it so this is a small tweak so it's using a different more robust light here. The camera housing is waterproof it's actually been run under half a meter of water and it still works and it's been put in a 60 degree oven for a day and it all still worked. So that's quite different to how an ecologist would do it we'd have a lot of gaffer tape and old ice cream tubs involved. So there's a proper bit of hardware. And it's yet robust and scalable so we can, you know, we're looking now to see if we can outsource production given demand around it. This is the front of that the camera bit looking to the right is looking at that that board. So you see the camera in the middle and these these lights around the outside which are illuminating that board getting good pictures of the moths. Obviously we don't just get moths, we do get other animals to which are attracted to lights at night. But the moment we're focusing on the most at some point we need to come back and look at those other insects as well. If you open up the box this is what it looks like. The components are this this Raspberry Pi kind of top center and below that the SSD, which is storing the images so the Raspberry Pi is controlling the camera. The camera can take a picture when it's text motion or on an interval, whichever you prefer. The images are all stored locally on that hard drive. And then the whole system is turned on and off using this timer really relay at the bottom and the rest of things just kind of make that make that work. So we have a project just started which is looking at moving to edge processing so we'll get the AI model onto the Raspberry Pi and processing the images in real time. That would allow us to send back little text packets daily of the results from from a camera trap because these systems are designed to work autonomously and remotely. They can be deployed, you know, for months at a time without people needing to go and intervene so they run off solar and battery. So as long as you've got, you know, good sun, these things can run for a long, long period of time. This SSD. You can you can change that sensitivity to motion or the interval obviously change how much data you collect but these SSDs can last many months. There's a video of it running in action out in Panama. And you can see the moss flying around that attracted in by that UV light that ask kind of like a lighthouse kind of really brings things in from a distance. And then when they close they see that illuminated whiteboard, and they land on that. Obviously quite a lot of moss set in motion in front of the camera. So out in Panama, we had to accept to take a picture whenever it takes a motion you end up with a video of the entire night. So it's probably not the wisest way to collect images in a very bio diverse region. But if you're somewhere where there's, you know, if you're in the Arctic Tundra where there's very little activity that might be quite a sensible thing to do. So the tropics it might be more sensible to use kind of interval approach. I dream another figment. An image migrating and multiplying through the internet. It is of a boy releasing a butterfly and asking, is this a pigeon narrator. It is not a pigeon. It is a butterfly. The boy is an Android lacking a sufficiently structured database. So I love this last line the boys and Android lacking a sufficiently structured database. If anyone is sort of developed AI tools for classification, this is often the main problem is getting that sufficiently structured database of images that are labeled that you can use to train your AI. So here, you know, the boy the Android doesn't have enough data to differentiate a pigeon from butterfly. And that's obviously, this is the next step with our most system. So we collected our images from the system. We now need to develop an AI tool to classify them. This this section of the poem is actually inspired by this meme, which some people might have seen comes from a comic book, I think. So here's the AI workflow for this monster. So starting top left, we have an image that we collect from the trap of moths on the board. So first steps, actually, we use kind of three models in series. The first step is to detect to localize the moths within the image, kind of cut them out, cookie cut them out. They then go to a classifier, which tie it tries to differentiate moths from non moths. Okay. So that works. This first step localization works very well. And this step, the binary classifier works very well as well. We leave the non moths, we take the moths, and those moths go into a classifier, then try to assign that moth to a species. That's where we end up at the top right. The species classifier is built using a different workflow. So it's not trained using images from our traps. It's trained using images from Jeebiff. So Jeebiff is a large global infrastructure that stores biological records data. So a lot of photographs coming in from citizens iron apps that look like this. So they don't really look like our images, which want to wipe background. They look a little bit different, often, you know, close up high quality cameras on with the species of vegetation. And we train our classifier on that. So this, this species level classifier is performing around 80% when challenged on Jeebiff data held out Jeebiff data. We know it's not performing so well on our trap images, as you would expect, because that that transfer one model from one kind of environment of images to another, you always see a reduction in performance. So we expect that as we provide labelled data from the machines that should not just step back up and maybe even improve beyond those sorts of performance. It's also worth saying that that performance is very varied across species, some species of Martha, you're very distinctive. And so it can be very constant. There's quite a long tail of months, where we have few images, and the species are quite hard to tell apart. So that shows the model performance more poorly. Also worth saying that this this work is being done by our partners in Mila and Montreal in in Canada. So this is what the raw images look like coming off the trap. You can see here the localization. We also do tracking to track track individuals through the image that's useful for the classification because you can provide multiple cutouts the same individual to aid classification. So this is a classification step where we have some some non moths, some things that are just classified as moth as one which had a green box around it where the species level is not confident. So it's just returned as a moth and these others were in blue where it's been given a species identity and a probability. So next steps for this project is to think about how we how we open up all these tools for for anyone to use. Currently that bit of hardware that I talked about where we're going to be publishing that in an open, open journal, the design is open. CH is currently kind of manufacturing them in small numbers for research researchers who want to use them. We're also trying to develop this online platform where anyone can upload their images from their moth traps and hasn't classified and you choose the model they want to use to classify them. So kind of meaning that you no longer need to have that large GPU compute locally. And that sort of thing one thing we're very conscious of is that a lot of these developments are happening in in in Europe and North America, but a lot of biodiversity money that we need to do is in the topics where people may not have access to that sort of computer So we're trying to build a platform which will allow anyone around the world to to use this or technology and and to get insight. This would also allow experts to come in and validate and verify data that's being collected and contribute to the to the kind of openly available data set of labeled moth images. Why are we doing all this I mean when I presented it to start with I was talking about the interest in putting things out where people couldn't go the network rail context because of safety. It could also be because it's just impractical so putting these out in, you know, in very remote locations where it's difficult to get surveys to go in and out. But even beyond that it has benefits even if it's in places which are relatively accessible because of the frequency at which you're sampling, you can gain new insights so we have species phenology so phenology is some change in some biological over time. So we typically look at things like the abundance of moths through a year, so he'd have like two peaks as two generations of this moth for example through the year, but this could also now be through a single night, because all of our records are time stamped. So open up new avenues of research. So you can look at the species frequency. So typically, when you go out and do a moth trap you'll you'll get a measure of abundance. With this trap you'll really a more of a measure of activities and abundance. But you can look at this this relative frequency relative activity. It's going to be quite robust because you're sampling over a long, a longer period of time more frequently. And the species abundance is all activity can be quite impacted by weather and things like that so getting that greater coverage is going to give great, great robustness these kinds of species species checklists and relative frequencies relative also because it's, you know, standardized approach. The same system, you can use the same AI model for classification, you can be quite sure that the data is robust analysis across time. Whereas it wouldn't be uncommon for moth surveys for different people to conduct the moth survey in different years and you'll have variation introduced by the individual expert who's doing doing that that study. So it's just robust data for inter-annual changes also within year because you can sample up to every night if you want to. And so you could look at variation, quite fine, a temporary variation. And that might be in relation to some sort of management that's taking place, whether that spraying and agricultural setting or logging or all more seasonal patterns. There's also other things such as biomass estimation. In these images, we can measure the length of the body of a moth. And from, from, you know, an experimentation of weighing these insects and measuring the length so we can we can use formulae to work out the biomass of each of these individuals and the biomass of insects is kind of the underpinning of the food chain and many systems. So we can get to biomass estimation. We have also seen some species interaction. So in Panama, we had some praying mantis on the board, so eating things that were coming in. So there's some species interaction information in there as well. And undoubtedly there's other things that we haven't thought about. So the next case study I want to talk about is about using or improving AI by reflecting on how humans observe and identify species. So, so in this case I'm going to focus on ladybirds. Now, if you show this to an image classifier AI is essentially using the values and all these pixels and how they're arranged to find patterns, which then correlate to species. And in a sense that's how the human brain works as well when we're trying to identify something from an image. But a human naturalist would also ask the question of, you know, what, what does leaf this insect is sitting on, where has that observation come in from what time of year. And in fact, in our volunteer-led recording schemes, when data comes in, we have automatic flags for these sorts of things. Has it come in from outside of its known range? Has it come in from a time of year where we wouldn't expect the adult form of the ladybird to be flying? So these are important attributes for understanding whether this is correct or not. So when observation, when image comes in, we can actually get a lot of this data because images will come in with a, with a date stamp. And when they come into our recording platforms, they come in with a location as well. And from there, we can reach out to other open data sets and we can get these other data, such as, you know, the data in relation to the known flight period of the species. What's the weather like in the previous week? What's the weather like in the previous few months? What's the habitat, another location, et cetera, et cetera. So we can get all the secondary metadata, which a human naturalist would use as a part of their process of reaching their identification. So the question then is, if we build this into an AI classifier, does it perform better than just the standard approach? We have an image that goes into a deep learning algorithm and the predictions come out at the end. We can then also use a similar machine learning approach where we combine all these secondary metadata together. We put them through a few layers and we'll get probabilities or scores for each of the species. And then we can multiply that probability with the probability that came out from the image classifier. So in this way, this kind of metadata is acting as a prior on the image classifier. So the image classifier might think it's species A, but if it really is the wrong time of year, then this probability on the left hand side would be low and that's going to down weight the probability for the image classifier. So by multiplying those together, we get these predictions. So the final approach is to combine the layers partway through the network. So these two start off independently, then they get combined, and then we have some more layers. So it's kind of learning. It's more all embedded in one deep learning model. And the model is able to learn across these factors and might be that certain characteristics are only visually represented the image at certain times a year. So it can do these awesome things and that produces predictions as well. So when we look at these side by side, you see that the image only has the lowest score when we combine them together by multiplying in that kind of prior format. It's somewhat intermediate and when we combine them together in one model, that's the most performant. So this is really good. It kind of agrees with kind of our expectations. And that's why we do it as humans. We would expect this additional information which we know is relevant and assists in classification that we'd expect to improve performance. And that's, that's indeed what we see. I think my, since we published it back in 2019, my view on this has changed a little bit. I think that whilst it's the most performant. What's also important when we think about the uses of AI is their interpretability and their trust. And so, when we present the result to the user, I'm of the growing opinion that actually, we want to be able to disentangle this we want to be able to show them that the image looks right, but that it's probably the long time of year, or it seems like it's in the wrong habitat. I think actually when we present these are we want to have that kind of granularity. So the user can then use that information to update their classification or, or to assess where the AI is, is misunderstanding species. So, I think this is clever and performs well, but I now wonder whether it's always the right approach for when we think about deploying things for for users. And that kind of brings me on to my final case study, which is around how do we make these tools useful to people? How do we get these kind of AI tools in people's hands so they can use them to help, help in that monitoring? I know there are others who are careful and diligent and watchful and kind to, but they do not bring in the data. How do I move towards their absence? So in this little passage from the program, I think the data set is sharing probably a lot of our anxieties, which is that there's lots of people out all around the countryside every day, observing wildlife. But they're not all natural historians and not all recording our data and sharing it. Many of them may well be recording it and tweeting about it, but it's not making its way through into our data sets. And so here the data sets kind of acknowledging that and asking how do we, how do we move towards their absence, the absence of data? How do we begin to collect that data that people are observing? And that's sort of what we're trying to do in e-survey. So e-survey is a mobile app that's free to use, it's available in the app stores, and it incorporates AI to help people to monitor the environment, and it's particularly targeted at farmers. So the motivation behind the app is kind of threefold. So we want to help to educate farmers. So farmers in the UK are increasingly becoming custodians of the countryside. Okay, we're increasingly receiving payment for things that are not just growing food that are also, you know, planting wildflower strips or establishing woodland with carbon credits, etc. So one of the things we want to do in the app is to help to raise awareness of the flora and fauna that's been supported by these interventions. So we want to provide information that's actionable, so the kind of so what question, you know, the farmers should be able to use information in there to do something to help help their kind of land management. And we want it to provide evidence. So, you know, farmers are increasingly being asked to do these sort of interventions and increasingly asked to evidence what they've done, and so it's a tool for doing that. I'm not going to put an emphasis on farmers, but it's worth saying actually, and hopefully as we go through or see this, it's actually what much more widely applicable to any kind of land manager and monitoring all sorts of different environments. So the mobile app you can see on the right hand side, one of its core components is that you can take photographs of flowers. And it will give using an AI image classifier developed by the guys that plant that in Montpellier in France. It will give some suggestions. And just to flag here, two things I think are really important for image classifiers. When you're when you're creating user interface with users, I think number one, you should always represent the uncertainty of the model. You should never just in this case it's always never just say it's it's converse for travel without any statistic is really important, I think to cascade the uncertainty to be honest and transparent about that. And the second thing is, you want to make it easily falsifiable. Okay, so you want to provide in other alternative suggestions, and you want to provide the data that the user can use to prove or disprove that classification. So in this case, that's images of the plant that's been observed. Actually, in this case what it does is it finds images of that species that look most like the image the user has taken. So if this was an image of a leaf rather than the flower, you'd see images of leaves rather than flowers down here. And I think that's really crucial and you can imagine how this could work with acoustics and other things too but I think that's that's really important to think about user interface. We need to think about how we kind of really transparent. And I think this all builds into that trustworthy AI or this sort of discussions as well. We also show. So once you've taken a picture of a single flower or a whole suite of flowers, you'll get information on the insects that that flower is supporting. So we have data from from past projects we've done and from public literature on which species associated with flowers and so we can present this to the farmer. So it's kind of showing the benefit that these these flowers are providing. So as I said, you can you can take a picture of a single flower, you can also take a picture of all the flowers you see in your wildflower margin, for example. So here's a bunch of flowers served in the wildflower margin. Again, the AI has been used to help research classifications and we get this report about all the different flowers, the number of insects they support a total number of insects being supported insect species being supported by that habitat. And here, the number of plants have grown, they've been observed from the seed mix that was put in so we have a database of the seed mixes that farmers use the farmer can select which one they've used. And so this is this is an actual information so they can look at this and see which of their plants grew which didn't and act upon that in the future. So just one point on this species number here, obviously we're not observing the insects directly so inferring the presence of species from the plant communities are there and the known interactions of insects and those plants. Also the true number of insects present will be lower than this. And we're also looking to in the future add in geographic information so we know the ranges of each of these 268 species. So we could subset this to just a location where that observation was made. That's a change to come in the future. Another thing farmers are particularly interested in is in the beneficial insects. So, there will be insects that are supported by those plants, they've, they've seen in a wildflower margin for example. The insects that are living there that eat the pests of their crops. So, whilst as an ecologist I guess I'm principally interested in the wider biodiversity is being supported by this, these interventions. The farms are particularly interested in these beneficial insects that are going to help their crops. And so this screen on the right hand side is basically showing you the number of beneficial insects we've got in that, in that habitat, and, and which, and which crops that they're benefiting. And CH done quite a lot of work in the past looking at how these beneficial support crop yields and we found that if you take out the least productive parts of fields, and you put in these kind of seed mixes, you actually don't decrease the yield of that field. Because first of all, the bit you take to go production wasn't particularly productive in the first place, but also because these insects that have been introduced a benefiting increase in the yield in your field either through pollination, or through predation of pests. So we also have kind of third mode currently in the app is it allows you to do a transect. So you put down a square on the ground and you identify all the plants in it. Then you move on a few meters put down again, identify all plants and do some number of times. And this is a kind of established protocol for monitoring habitat quality. So, currently, you know the farmers who are using the app not many of them are using this approach, they're mainly using that the approach I showed before where you can just take a bunch of pictures of the flowers and habitat. But if you do do this, then you can then compare across time you can compare across fields, compare within farming clusters, and you can compare to benchmarks as well. So it kind of gives out more robust data to be able to do this kind of rigorous comparisons. It also allows us to but one of the one of the reasons we developed this is to kind of provide that evidence. So all the images collected a georeference, we get the GPS. So you'd expect the data to look like, like these four sets, clusters of points going up the side of the field here, collecting images around each quadrant. We could also identify pictures which weren't taken that location. So if we're worried about people cheating or gaming, the system, particularly those like financial payments involved here. That's going to be quite critical, as is the ability to go and review these images all the images all the data is collected through the app is put on a server. And if the farmer chooses that can be shared with others who can then come in and kind of, you know, double check their their images, etc. So that was my final example. So we'll have some sort of closing thoughts. The first is thinking about this is another mid journey image here. I asked it to choose an image of Charles Darwin lecturing to some androids. So we're thinking about now what is the role of the human as we moved this more AI centered way of monitoring biodiversity. Clearly, we have initially this big role in generating the data required and doing that, that validation. But also we have a role in using that AI and I think it is worth thinking about how that transition is made as well. Often when I give these presentations to the biological recording community, there's a lot of fear about how this might come in and upset the current balance the current way of recording and people get a lot of joy out of going and monitoring biodiversity. We don't want us. We don't want to stop that. So here's we need to think quite a bit about how this is going to change biodiversity monitoring. Kind of impartial without me think about what what is our responsibility as the developers of these tools and technologies, how will they impact the planet. Obviously, you know, we're talking about biodiversity monitoring and environmental monitoring so it should be all to save the planet but you know running these models and trainees models uses a lot of energy. And if we're updating our model every month that's generating significant amount of CO2 from the data sensors that run these things. So we need to keep that in mind. And we also need to think about negative potential negative impacts that somebody said normally might have on on users you might might for example volunteers feel feel marginalized as as as AI increasingly comes in and is used. Or taxonomists feel like they're being used simply to train their replacement the AI so we need to think carefully about that as people who impacted by technology. And then finally, a kind of emerging pattern is this, this transition from model centric designs when we first, you know, with ours first being developed for computer vision things like this we didn't know when I wasn't involved. People thought a lot about the model, the design the model. Okay, that was really important and we got to kind of convolutional neural nets and they perform really well, and we saw big improvements in accuracy. I think now we're kind of in this kind of data centric phase where actually changes to the model don't alter things the results too much. What really makes a difference is their training data. So if you look at like chat GBT, and it's various iterations, it's that big increase in the amount of data and the quality of that data. This led to its kind of improvements, not necessarily so much to change the model design. And I think we're going to move now, or in the future will move to this kind of interface centric design where the models are good, the data is good. So if you think about how do we, how do people interface with these models, how are they used on mobile apps, how are they deployed autonomously on servers, scraping data off social media or whatever. And that's really interesting I think it starts bringing us into this kind of societal space where it's the interface of AI and society, and probably we need to be working more with social scientists and people like that thinking about how, how this all plays out. I think for most applications, we're not, we're not quite there yet. So just finally to thank everyone who's involved is a really big team of people involved with Amy, that's the automated moth trap. So thanks to all of them, the data sets dream. That was the poem. There's also some visual art that goes along with that you can check out that link and again, we'll be talking more about that at the digital gathering. Is there. Thanks to them and the thinking like an actual paper led by Chris Terry, and I'll finish there. Thank you very much. Thanks very much. Tom, that's excellent. Very exciting stuff. So we've got to yeah just to remind people to post questions in the chat we've got a couple there ready but please do show lots of people got some questions there. This one from Laura saying really interesting talk. Thank you. I'll read through it. That's good, but you probably see yourself just to check I get the facts right. I'm feel case study of using AI to validate the classification of moth photos you said a probability of it being a species, specific species as calculated using known ranges and times of species occurrence. I was wondering whether this approach would risk dismissing correctly identified species that are invasive or present due to shifting ranges due to climate change. Is this risk suppressing picking up trends such as shifting species ranges by taking this approach. Yeah, it's a really valid point. So I think it's obviously in that case it's kind of is probabilistic. So I guess the closer you get towards the edge of the range and maybe on the established range the probability kind of comes down so you're right that if a new species is observed to think about that climate change example, you know, northward range shift. It's just beyond that northern range, you're right it will be downgraded. So that is something that we need to need to think about. It obviously impacts as well how often you should retrain that well so you probably want you don't want to retrain that or maybe that half of the model on the, on the metadata side, relatively frequently to keep up with that. So if a new species comes from America, for example, into the UK, then, you know, it would be, could be unknown or it's always range could be completely independent that it would have a kind of zero probability that would then matter. So yeah, that's that's really interesting to think about. So why the problem as well of kind of out of sample observation so same with the images. So Panama, you know, approximately a fifth of the species that we were observing were not known to science, let alone do they have any images of them. And so we need to have classifiers that are capable of classifying things that they know. Perhaps some sort of unsupervised methods, establishing the present things that they don't know and trusting them into into kind of morphotypes. So yeah, it is a problem. And we need to think about that I think especially around invasive any species which also really big, really hot topic. So this question from another Tom, can you comment further about quantifying uncertainty from CNN so image analysis outputs, what does the percentage mean, like from a software function. Yes, it hands up on, you know, I'm not a deep learning expert. But the, the values sum to one so every class that the CNN is classifying certain in our case, there's something, you know, tens of ladybird species, each class is given a score, and those scores some to one. So you could view that as a probability but it's not, it's not really a probability. So, there might be two species which look really similar. For example, they'll be given, and the image look like those and they'll be given roughly equal waiting. So, I think a lot of people do interpret it as a probability, you're probably not going to get too far wrong if you do, but I know that the computer science experts have told me it's not a probability. So it's a related one then from John Cooper. So in deep learning when when the classification of an image is combined with species specific attributes can you easily identify which attributes might have shown the classification is not to be trusted. Yeah, this, this comes back to my conversation about how wrapping it all in one model kind of makes it harder to disentangle. Actually, I think I think it's better just to show things in that disentangled form independently. And what we did do was, we randomized the inputs of each of those environmental variables independently of one another so basically, that was a way of assessing the sensitivity of the predictions to each of those input variables. So we could then, through that we could show that the habitat specialists, ladybirds only occur in one type of habitat, were particularly sensitive to changes in that habitat measure and species which only came out late in the year, particularly sensitive to changes in that date of the year parameter. So we were able to kind of validate that approach through the fact that the model was learning known characteristics and environmental characteristics and time of year characteristics of those species. Do you know of any of the image methods that could help even identify which part of the image is giving the information. You know you can say whether it's the leaves or the flower or something. Yeah, so you can yeah you can do that so you can create these kind of heat maps as well. So I guess my explanation there was talking about that. Yeah, the metadata side, if you're interested in the image side, then you can create heat maps which kind of show you the areas of activation so which part of the images were key to reaching the determination and that's actually quite important sense check to do in the process. So if you've got pictures of ladybirds and the activation hotspots on the leaf, you know that it's on, then that means actually it's predicting based on the host plant, rather than the species, which might probably will be useful information for predicting a ladybird but probably not what you're looking for. So yeah, that's important. And could it do that on the fly from the app so you could feed that back to the users and get to take a better photo. That's a good question. That's a good question. I'm not sure how resource and intent is intensive is to produce those. So, is a question from Nicola. Do you see a way forward for using technology like this in hyper diverse regions like the tropics where many species might not be classified or only a few experts and classify them could it be combined with, for example, DNA barcoding. Yeah, great. We actually have a two year project just starting up on exactly this. So we're going to be putting out systems in various regions of the tropics. I think we, we acknowledge that this works great in UK, because we have loads of training data but the tropics as I said we took one out in Panama and most significant portion of species are not known, and others are known very few have image libraries. So, we need to look at unsupervised methods, I think we also need to, you know, appreciate this can't solve by AI on its own. We need to fund local in country partners who have a lot of local knowledge already and expertise to kind of go out and collect specimens and take photographs and build up those train data sets and also to, you know, inform the model is, you know, what species can't be told apart visually. You know, we often just bung in species species names that's we're going to start opening the species names that's how we as a college just think everything has to be to a species name. But when an AI is looking at stuff, if it can't be identified as set by dissecting its genitalia, which is the case with some of these must be just no point in having them in several classes so using that local knowledge to understand what classes need to be aggregated. E DNA could could form a part of that so when people are going out and building these these these collections to start to build their models then it would be sensible to use a DNA DNA as part of that process to assure quality show their identification but also because some of these things probably won't have been described before. Thank you so this slightly related question. How much of a problem is it when you train the model on pictures that aren't taken in the exact same context as that you want to use them all in. So, yeah, pictures from different backgrounds or different quality or I guess different temperature areas or those sorts of things. Does it or does it not make much sense. Yeah, so it is a problem. It's actually more of a problem than I kind of intuitively thought it would be. Yeah, so obviously we're training on or that our partners in Canada are training on these GBF images which taken come from smartphones or digitalize and then we're translating that to be used on the trap and we do see a drop off in performance. So, it is important to to as soon as you can start adding in data from from your data set to support the training of that. Yeah, I think that was the second part of that question which I forgot. So related to that, then do you do you think you could get to the point where they're quite. Yeah kind of sensor agnostic models you know because often I can see the situation where you've got, you know, a different a different type of camera different sort of thing and then you have to build another data set and label it and everything. So we're trying to build these this as agnostic. So, you know, if someone else builds a great camera system that's fantastic let's use that and and I think the things things like you know white balance focus sharpness. These sorts of things will have an impact. That's why we're increasingly thinking it's important to have a metadata standard to anyone who is collecting these images can have this metadata which we can use to perhaps select the right kind of model. I imagine in, you know, in a decade's time will probably have hierarchical models that are, you know, that can identify it's a moth or not a moth and things and moths go to this thing and it can identify the family and then it goes to some models. I think these kind of hierarchical models we're going to see more often maybe like models to detect the white balance and correct that and that sort of thing. So, yeah, and then I just remembered the thing I want to say from the other was about if we train from images out of range, and then we're predicting so not not just from different data sources but from different ranges. That can also be a problem because things like moths can have different visual appearance depending on on their range so that does introduce additional challenging translate across space. The other says thanks for an interesting talk. Do you ever see a point where the AI classifier has been trained so well that it no longer is validated by classifications determined by humans but it said switches to validate human made classifications. Yeah, I think for so for things like like a Coke can, you can develop a classifier that is almost perfect because the Coke can always looks the same. So assuming assuming intact coke cans. You know the design is specified. So, and there are you know some moths and butterflies and other insects that are really visually distinctive like the peacock butterfly for example in the UK is really visually distinctive. And I can imagine just getting to a point where we think it's probably no longer worthwhile trying to validate that and it's also very common species that we might want to push those things through the benefit I see there is it allows us to move expert effort away from the common and easy and straightforward to the species that are kind of more difficult. So it's kind of allowing us to get those expertise to, to where it's really needed. So yeah, I think that that could happen. Probably, it will always be limited to a subset of species that are easy to classify. Tom says, did you use, did you use an image augmentation to generate your training data. So much which might help with white balance focus. Yeah, we did, we did all the kind of regular standard protocols for augmentation. I'm also interested in things like generation as well. So we've got colleagues in the same partners over in Denmark and they they're interested in taking images on complex backgrounds. So a camera looking at some flowers example and they turn them on to the bees on it. That's, that's much harder because you know white background that we do. And they've looked at generating training data by, you know, masking out bees one image and then putting them all over other images, other backgrounds which have no bees on. I think that's kind of interesting thinking about how you can like simulate training data. Obviously, you know, there's also some biases that could be introduced, etc. But it's quite interesting, I think, Avenue for first point. So the, the, the, the, the farmer tool, the E survey, is that right? Yeah. Does that also take into account contextual information as well as the flower photos, for example, whether location, etc. Yeah, so the moment is just, it's just using the image is not doing anything fancy, but we're going to hopefully build that in in time. So these, we have these kinds of species distribution models. It's, I mean, it does, it does. So the model that's using actually is global. And it does subset to the UK flora. But interesting point here is that when someone, the thing about that user centric design, when someone downloads each of their app, what's the first planning and take a picture of. It's probably going to be something in their garden or in the office, which is not a UK species. So if we have, if we subset to the UK list, then the first image they're going to take is going to get it wrong. And then they're not going to use the app, because I think it's no good. There's kind of interesting conundrum there. So we now have, we know basically if it's, if it's very confident as a species, not in the UK, it does pass that through to the user, but kind of flags it has not been a UK species. So these are kind of, yeah, interesting things you have to think about. That's interesting. On the, the moth traps to just the kind of, just the, just the use of the light affects them, you know, use that every night in the same location to think that would actually affect the moths themselves and what, how do you do about that? Yeah, so one of the, one of the kind of key things about moth trap is it's non lethal. One of the problems is if you run the moth trap every night, the moths are kind of attracted into that light. Daytime comes, they don't fly, night comes, the lights on again, and they stay there. So it acts as kind of like sync. It also is going to affect your results because you're you're actually recording moths that were there from the previous night. You would expect to aggregate species over time, which is what you want if you're looking at temple changes. So we now go for kind of one night on one night off or as a minimum or more nights off in between. Another thing we do is we turn off the light a couple of hours before sunrise because we actually found, if you run it all we have to sunrise, the moths stay there, the sun comes up and the moths don't move. And then birds learn that there's this buffet. And we've got sort of trail camera footage of these birds just repeatedly visiting in the morning and clearing up what's on the board again that's not really living up to a non lethal ambition. So, yeah, we're learning as we go, but there are ways that we can counteract this kind of negative effects. Okay. How do you get farmers to know about use of air and be remembered to use it. I'm really glad someone brought this up so I think one thing I've come to appreciate is that when you're budgeting for an app like this probably half of your budget should be going to engagement and outreach and that sort of stuff, because they kind of build it and they will come is not a thing for technology. So at the moment, we didn't have that budget so we can have kind of farmer kind of collaborators and clusters and some commercial people who are kind of using it. And we're kind of slowly building that community, because we're kind of constantly trying to improve it and update it we don't want it to be like to flooded with people. Because, you know, we, it's not perfect and want to be able to make purposes we go, but we're kind of slowly kind of organically building that audience. We go to farming events and we've done press releases and things like that to try and get farmers on board. Thanks. I've got last one so the poem which is great. Was that generated. No, no, no, so it's revised protocol Thomas sharp is up in York. And there was a link on my final slide there. So people can go back and pick up or if you just Google a data set stream. You'll find it. You can read the full poem that that the. The audio is actually 18 minutes long. So it's quite, it's quite a long purpose and real gems in there. And just on that is it working with artists definitely took me out of my comfort zone and some and some of it's like pretty wacky. But it does give that completely new view on your science and completely outside its perspective, which is quite refreshing and there's there's always there was I've always found that there was something to learn every time I checked them. Do you just do you think that's actually met like making people really think about the data and going to look about it more look at it more are they just kind of enjoying it for what it is. I think I think predominantly the latter I think probably people are enjoying it for what it is. But we did do kind of interviews with people once said, because it was a it was an installation to some visual art and people walking around these sculptures was listening to the to the poem. And interview people once they've done that, and people, people did say it gave them a kind of newfound understanding of kind of financial recording and the importance of it and gotten really just gotten thinking about this data that's collected and how is it used and who should be collecting it and who should be using it and what what does it all mean. But put on this was predominantly art. Okay, it wasn't like, um, public communication which I've done a fair amount of before. It was like handing over our story to artists and then generating art so so the principle, you know, impact was that kind of engagement and enjoyment aspects. Thanks very much. I think that's probably all we've got time for. Thanks very much, Tom's fantastic talk.