 Hi, I'm Richard and I'm a PhD student at Imperial College London working with biodiversity indicators and their data. My work focuses on using these data to understand biodiversity change across space and time, and also how to improve upon these datasets using automated and semi-automated approaches. And this is the part I'll be talking to you about today. As I'm sure you're all aware, a key challenge associated with evidence synthesis, including the collation of ecological data, is the discovery and identification of information sources. Within the scientific literature, there's a vast amount of data available, but finding these articles is currently a largely manual and time consuming process, which creates a bottleneck in the evidence synthesis workflow. As the literature to continue to grow rapidly, manually screening articles will become ever more challenging in the future. I'm therefore going to be talking about how automated text classification based on the titles and abstracts of papers can assist researchers in the initial article screening stage by quickly and accurately prioritizing thousands of articles, which helps to ensure that compiled evidence remains up to date and representative. As you can see in this diagram here, there are a number of stages associated with the development and use of automated classifiers. Training and validation are undoubtedly important, but today I want to be focusing on the real world application and their potential for iterative improvements. I'll be discussing this in the context of two global biodiversity databases, specifically the Living Planet Database, which collates information on vertebrate population trends, and the Predicts Database, which monitors biodiversity responses across land use gradients. During the training and testing of our classifiers, we find that the logistic regression models perform best, and that these display very high accuracy on our labeled datasets. This is great, but search engines are already capable of filtering and ranking articles given a set of keywords. We therefore wanted to try and assess the real world performance of our classifiers, comparing them to current standard protocols to get a better idea of their actual benefits. To do this, we conducted literature searches associated with our ecological databases and compared the rate at which relevant articles were found based on either the rankings of our classifiers or the search engines themselves. This figure and our results show that our classifiers outperformed the search engines, with most of the points relating to a particular search here, showing that you find more articles per article read when using our classifiers than you do with the search engines. So these results are encouraging, but they're also based on classifiers which have been trained using around 500 or more relevant articles. This isn't a massive dataset by big data standards, but it's a pretty substantial library of texts. If you're just starting out with the data collation, you're unlikely to have so many. And so you want to be sure that using text classifiers is still useful, even if you have fewer training data available. We therefore separately evaluated how the size of the training set and the incorporation of new classified texts affected model performance. Currently, whilst larger training sets undoubtedly improve model performance, even classifiers trained on relatively small numbers of texts, still display very high accuracy. Furthermore, as you can see, incorporating new classified texts does boost performance, and in particular, appointing negative tech, new negatives, further improve is the optimal setting. This is likely due to our screen protocol, whereby the new negative texts that we've manually classified were predicted to be highly relevant by either the search engine or our own classifiers. As such, these texts are likely to be in some way highly similar to truly relevant texts. And so incorporating them into the model fitting, better allows us to determine the boundary between relevant and irrelevant texts. In combination, these rules therefore suggest that it should be possible to quickly generate useful classifiers from models trained with only a modest set of initial texts. And then you can subsequently improve their performance by iteratively updating the training data. Finally, I just wanted to briefly outline some potential directions for further work in this area. So one of these is to facilitate the wider adoption of such approaches. We're in the process of creating a user-friendly shiny application, which wraps around the Python classifier code. And so not only could you train, apply, and save custom classifiers here, but you could also add in additional functionalities, which might further aid human reviewers. So such example of this could be model for modification, which might be able to mitigate some biases which may be present in the underlying training data, and could subsequently undermine policies conducted on the collated evidence. So for example, in our classifiers, terms are associated with weights. And as you can see in this example, the term decline is highly positively weighted, despite this being a classifier trained to find vertebrate population trends, all trends of interest. So a researcher may therefore want to mask out the decline term to mitigate its influence, and so hopefully reduce any biases in the subsequent data collation. I think this is a pretty interesting area of research and clearly further work is needed before we fully rely on automated workflows for evidence and data collation. So that's everything from me. I just wanted to thank you for listening to this talk. Thank my collaborators for the work that we've been able to produce, which you can see here for the paper and our GitHub repository. And I'd be happy to try and ask any questions you may have.