 Okay, all right. So, yeah, welcome to ESMCONF 2023 and the panel discussion on considerations around information retrieval, including text analysis. This session is being live streamed to YouTube and automatic subtitles should be available shortly after this event and will work hard to get these manually verified as soon as possible. If you have any questions for our presenters, you can ask them via the hashtag ES hackathon Twitter account by commenting on the tweet about this session. If you registered for the conference, you can also comment and chat with other participants on our dedicated Slack channel. We will endeavor to answer all questions as soon as possible. And we would also like to take time to draw your attention to our code of conduct available on the ESMCONF website at www.esmiconf.org. And, yeah, so today we want to discuss the future and the present of information retrieval and have a look at tools that might be useful in information retrieval and in evidence synthesis. And information retrieval in the context of evidence synthesis covers all considerations and tasks after the research question has been defined and I'd say before synthesizing the identified evidence. And this may include the quick and dirty initial searches for refining a research question and thus, for example, specifying a schema like PICO more precisely. But usually the main focus, I would say, are the systematic and reproducible searches which are the basis for a comprehensive evidence synthesis. So, yeah, furthermore, there's maybe the choice of sources, approaches and platforms for searching which are important tasks during the information retrieval phase of evidence synthesis. And this includes, of course, creating a search strategy for each designated source. And all of these topics we want to exchange ideas and exchange our knowledge about this essential step in evidence synthesis today and maybe think about what has changed in the last few years and what tools exist for information retrieval nowadays and maybe should exist in the future. And so, with great pleasure, I will now introduce our panelists, which have so kindly followed our invitation to join the discussion today for the next hour. And today I'm joined by Alison Beifel, Michael Guusenbauer, Hannah O'Keefe and Guido Sukon. And yeah, Alison Beifel is from the evidence synthesis team at the University of Exeter. And Alison, would you like to say some words about your current work? Hi, thanks, Claudia. Thanks for the invitation to join the panel. Yeah, I've been working for the evidence synthesis team for 12 years now. So, as part of the evidence synthesis team, we do systems for reviews, evidence and gap maps, rapid reviews, whole suite realist reviews. And I've probably been an author on 50 plus systems for reviews. I've been in my 12 years and obviously been involved and sort of given advice, perhaps on quite a few others. So, I spend my days doing a lot of searching on databases and pulling my hair out on a lot of the databases as well. So, hi, thanks. Great. It's great to have you on board in all your experience for today's discussion. So, our next panelist is Michael Guusenbauer. He's from the Johannes Kepler University at Linz. And yeah, we're very curious to hear a little bit about your work, Michael. Yeah, thanks for the invitation. Yeah, I got interested in evidence synthesis or in my field, innovation management. We are not only interested in evidence, but also broadly in all types of literature. And I got interested in doing my PhD. I was doing a review on a tricky topic of offshoring, which is different meanings and so on. And I got interested in where to actually search. And I didn't find the answers in my field. And I got more and more curious. And that led me to just last week, I launched a new website, which is called searchsmart.org, where I make the databases comparable with which we search or not search, which we should know. So, maybe we can discuss it later on that I found that a lot of new search innovations are hardly ever used or only within niches that do not reach a broader audience. And I tried to do that also with the new websites. So, thank you for the invitation. Thank you for joining us. And this sounds definitely less something to discuss in a few minutes. So, before we go there, I'd love to give the word to Hannah O'Keefe. She's from the Innovation Observatory at Newcastle University. And Hannah, tell us about your work. Hi, thank you, Claudia. Thank you very much for the invitation to be here today. So, I'm an information specialist, but I also work within our data science team, looking at development of new tools and looking at how we use the existing tools that are already out there. I have a big interest in how we go from this very technical side of development in all those stages right through to the user experience and how we make these tools accessible for a very wide audience, some of which have some good experience with technologies and some of which have never used any of these sorts of tools before. So, that's where my interests lie. Thank you. Thank you, Hannah. So, our last or at least panelist is Guido Zuccon. He is from the School of Electrical Engineering and Computer Science at the University of Queensland. So, thank you so much for joining us. I know it's late for you already. Guido, tell us what you do in Queensland. Hi, everyone. Thank you for the invitation. So, I'm an AI artificial intelligence researcher and my main topic of investigation is search engines, creating methods to improve how search engines work and also how we as searchers can formulate very effective queries. So, a lot of the work we do is applied in the evidence synthesis area. We have been working on how we can improve the creation of bullet inquiries and making these bullet inquiries more effective and we have been working on how we can achieve good screen parameterization or how can we automate the screening process and even how we can evaluate systematically the impact these different AI tools have on the compilation, the systematic review, the synthesis of the evidence and the final outcome of the review. Thank you again for having me here. You are muted, Claudia. Thank you. Yeah. So, I am moderator today. My name is Claudia Kapp. I work at ICWIC, which is the German Health Technology Assessment Agency and I work as an information specialist there. So, I'm on the one hand also responsible for conducting the searches, but on the other hand, I also lead our internal project to automate our information retrieval workflow. So, for example, currently developing our own Shiny app and yeah. So, also more and more getting insights on the development side of things and not just the user part, which is very interesting, indeed. But first, my first question I would like to post you is actually more from the user perspective before we maybe go into how, you know, what tools we need to develop. And I'm quite new to this field of information retrieval. I recently I used to work on psychological research before. So, Alice and I would be really curious to hear what has changed, what practices have evolved in information retrieval since you first started. Thanks, how far back we go here. My first job 30 years ago was actually an information retrieval, but I don't think you want to go back that far. But so, in terms of when I started the extra medical school, so that's 12 years ago, I actually don't think practices have changed an awful lot. You know, there are, you know, the databases we use haven't changed that much, unfortunately. There are some, you know, tools available online that help us. But I think fundamentally what we do hasn't really changed. The information we have to wade through to get the relevant stuff. There's even more of it. So, you know, there's that kind of thing. But in the actual practice of searching, I don't think a lot's changed. I think some things have changed in the acceptance of it, perhaps. You know, so there's, you know, have to report a lot more on the search and that's more acceptable. You know, but when we started, we still collected all that, it just wasn't published. The Crop From Reviews have always done that. So all the searches are on there. So that's taken a while to kind of filter through to the rest of the kind of systematic reviews. All the things that have changed, the types of reviews, you know, the different types of evidence synthesis, you know, from kind of realist reviews, rapid reviews. So that does involve different ways of searching. But you're still essentially using the same tools. You know, we're still using, you know, the same platforms to access, you know, databases. Yes. I'd like to say it changed more than it has, you know, because, you know, 10 years ago, we published a paper on the first thing I thought was how the interfaces, we thought the interfaces were really difficult to use. And we wrote a paper on that. And, you know, these, you know, just searching this morning, you know, they still haven't got much better. Unfortunately, I'm not going to name names. All right. Well, so what you're saying is that maybe the approach hasn't changed that much, actually. But on the other hand, maybe the needs have changed. Yeah, we definitely, no, that's, that's, that's good. Yeah. So it would be good if things would change, but the pace of change is so slow. And the question was about practice. So we still develop a search and run a search and then screen the results, you know, you might have a screening tool that helps you, you know, fill to the, this, you know, the ones come to the top that are more, you know, they're going to be, you know, the ones that are going to be included. But actually, you know, it's, you know, as a searcher, you're still having to do that initial groundwork. And there might be tools that say, okay, these search terms or these mesh terms or whatever are going to be helpful. But actually, you still have to put the search together. You know, it's not just a plus B plus C, there you go. You know, there's quite a lot of development and art goes into creating the search strategy. And I don't think that has changed that much. Nathan Hannah, you just recently published a paper where you actually had a look at all of those tools that actually try and help, you know, with the analyzing which terms to conduct a search with. Maybe you want to share some thoughts on what Allison just said. Yeah, so I actually, I agree with Allison on this one, that we can use these tools and they do help to a certain extent. But there is still a lot of manual input that's needed. We still need to compile those searches, even if we look through and we find the most frequently used terms or terms that are very important, but they're not used very often within some sample texts. We still need to try and identify which ones of those are going to give us the best results when we're searching. Because we've got to get this balance of precision, sensitivity versus specificity. And it's trying to find that and the only way to do that at the moment currently is this manual input. So I think there is a need for change in the way that we do things, but I do agree it's slow. And to be able to replace a lot of the human input is going to take us a long time to get the tools to the right level, so that we can use them accurately. So Guido, I think that your team has done some work on trying to actually automate exactly this process. So what are your insights on this endeavor trying to reduce the manual input? Yes, so the work we have done has been start from your research questions, so the topic of your review and attempt to automatically formulate a Boolean query or start from an initial Boolean query and improve the Boolean query to be comprehensive, or even start from a seed of studies that you might have for the development of the search strategy and automatically build the search strategy. What we found is that there is quite a lot of promise in these automatic methods and as they the technology becomes more powerful with the latest with these large generative language models, for example, we see that the effectiveness of these methods are going up, but we are still in need for information specialists to carefully edit these queries and really carefully review, especially in terms of possible biases the queries can have or possible concept drifts these queries might create. We certainly have noticed that AI is quite effective in reducing the noise that you get from your queries, not as much though in terms of improving the recall or better, there is always that tradeoff between precision recall. One of the biggest challenges we have had and I would like to hear from Hannah because she kind of touched upon these in her hands where the biggest challenge we had is that we are able to do evaluation from a retrospective perspective. We know our review has been done, we attempt to build the query automatically, we know that there will be some unjudged documents that we retrieve and that we retrieve originally, we try to deal with that, but what we find difficult is when we present our query that we automatically build to get the information specialist to try to gauge if that is a good query or a bad query. You say that, and I think also Alisson mentioned, it's a bit of an art to formulate these queries from a human perspective and I wonder how do you go about judging whether the query you have formed is a good query. That's one line of research we are interested in perusing and we are looking into whether we can come up with metrics that can help the information specialist in trying to judge the quality of their query beyond looking at the convenient sample of seed studies and say do I retrieve my seed studies for example. So I don't know Alisson and Hannah, back to you I guess. So I think this is really difficult, a lot of us will use our sort of sample set of studies and make sure they have been retrieved. We may have a look at a little bit of backwards and forwards citation chaining to see if those sorts of papers are also being received. In terms of metrics, I'm not really sure what would be most beneficial. I don't know if Alisson's got any input to that. I think a lot of it comes from general experience. Yeah, unfortunately Hannah, I think you're right, it is experience and if we can get that experience further in the beginning of when you start doing this sort of stuff, the better. But yeah, I agree with Hannah. Some of the things I do now is I might run the search that I've developed in Medline and then give it to the reviewers to screen and once they've screened Medline, then I might develop the searches in the other databases because I know that most of my results are going to come out of Medline. So there are things like that, sending samples, search results like you've already said and just looking through them. But I think often it might come down to it's going to sound really bad. Often it will come down to the number of hits. So if you're getting 10 and a half thousand from one database and you know the systematic review has nine months and there are three or two people working part-time, you know they can't get through that sort of screening. So you have to be pragmatic and say, okay, I need to get this smaller. And I know like Hannah and Guido said, you use like a list of key papers, a sentinel papers, whatever you want to call them. But I get a bit worried about that because I try not to develop my search around them and I don't mind if I miss a few from my search, say in one database because I think, well, if I develop the search just to find those papers, I've already got them. So you know, I'm just going to like the bias might be introduced there. So it is just, yeah. And if we can make that easier, that would, I think it would help information specialists because you would, because otherwise it is just up to the individual or if it's being peer reviewed, you know, maybe to go okay, that that's what I'm doing. But you always have that is it going to be okay? I think one thing on that is you could develop a search strategy and you think it might be all right, but you could pass it to another 20 information specialists and they would always have something to add or something to change. And I don't think even between us that we could all come up with the perfect search strategy. So it's finding a balance, definitely. Yeah, we used to run a course for information specialists and we'd get them to do a task where we gave them my research question. And before they came to the course, they had to develop a search in PubMed. And then we would collate it all together and see the number of hits and whether they got the included articles and, and yeah, the huge range, you know, we said don't spend more than 30 minutes. So it was quite time restricted. But like you said, Hannah, all different. Maybe I add something to that. Maybe we're talking about information specialists and then you are the experts there. And maybe the goal of AI and then those new tools is not to improve the information specialists quality, but rather to come from a field management where the standards, I would say we just worked on a review of literature review practices and management. And we found that the practices are not really up to standards of medicine. And there's a lot of people now publishing reviews, meta analysis and those sorts of things. And the quality is not always, yeah, you wouldn't like it as an information specialist. And I think to get those studies up to speed and get, give them a good start, maybe at the low end at the lower quality to improve the lower quality studies to an acceptable level where a lot of weak weaknesses can be addressed. It might be already something really valuable of the AI systems. Practically, you're suggesting that AI could serve almost as a standardization or in my level, the field a bit in terms of raising the quality of most of the reviews. And then we have maybe very, very specialistic and high quality review that go beyond what AI can do as a baseline, let's say. And that's a very good point. And related to what you say, I noticed for the work we have done, we needed to access to data, systematic review data. We had to mine queries. And we found it to be very difficult to extract search queries from published systematic reviews. We had to come up with a very laborious process of data cleaning to extract these. We automated as much as possible but still there were errors there. And what was interesting to us was that even queries that were reported in high quality publications like within the Cochrane reviews, even those queries contained errors, obvious errors, whether the query contained typos or the query even could compile in when you search PubMed. And so I think that an important step in the field would be to improve and standardize how all the metadata related to the search is reported and is shared with the community. And more and more we move towards the attempt to using AI. This metadata having clean, high quality metadata is very important for powering these AI algorithms. Yeah, that's a great point. And especially with your experience trying to extract this information, I think there's an effort at the moment to actually create some kind of search archive for search strategies and to try and find a standardized template to be included for search strategies, which I think is a very interesting approach. And then again, I think information retrieval is not only the search strategy. There's also more expertise around it. And I think one field also is the question of where to search, actually. So what sources should we use and should we apply? And I think, Micha, you may mention that you've done a lot there. And I think also trying to standardize search strategies, for example, one of the biggest challenges is that each database has a different approach on how to find their data, actually, right? Yes. So I tried to and I published a few papers on that on the methods of identifying or defining common denominators across the various databases, search engines, and so on. And yeah, it was quite difficult to identify the common denominator. And but I don't want to go into too much detail about how that worked. But what I found out is there's a lot and that's what I mentioned at the beginning. We have a lot of innovation actually happening within the last years in new databases like lens.org, dimensions.ai and others that where the problem is what I see. And I did have a working paper out there where I compared the usage of those systems. And it's still the same systems that are used that were used 10, 20, not 20, but 10, 15 years ago. It's Google Scholar Apartment and Science Direct, which are the top three. I couldn't collect data on ProQuest and then the payable systems. But it's I think interesting because there are so many systems that say, well, I'm the Google Scholar Killer and we can do it so much better. But still those systems, they cannot get out of the niche and I don't have the answer entirely. What's the problem there? If Google Scholar is just so good or if people just use the same ways of searching over time. And I think maybe AI might be a game changer now because most people like to search privately what they use privately, they mostly also use in the research work. And maybe evidence synthesis is like a very elaborate use of searching. We have a lot higher standards, comprehensiveness, reproducibility, transparency. That is not the case when you just search or look up an article. But most of the most of the researchers stay with the same systems and like the systems for all users. And that's how we still discuss now in 2023 why Google Scholar is not the perfect system and should never be used as the only system in a systematic review. And still people do it because they like the tools they already know. And I think now with the new approaches as we might change our general searches. So privately I was on vacation. Yes, as a researcher you are sometimes on vacation. And I tried out Sydney so that the Microsoft AI system and I was actually very positively surprised. And people will use that tool in ways, in research. And I think what we have to do now is to define use cases or tasks where this is okay and tasks where this is a problem. And as we have, we talk now about evidence synthesis, we need to match those requirements to the capabilities of the new systems that will be used. Now we are still ahead. So the usage has not changed yet too much even though the chat GBT has increased tremendously in usage. It did not really affect yet Google Scholar or Google usage. But that will change I would say over the next months, years. And I think we still need to be ahead of the curve and look, okay, what are our requirements and develop guidance based on the new capabilities that are out there. And testing the systems like the websites I launched last week, searchsmart.org can be one approach. So how can we test the new types of systems that are out there? And so to make them fit for purpose, the purposes we have. I think that's very critical. Yes, definitely. And before we go over discussing maybe how the search might change, I think I would come back to one point you mentioned, which is that mainly, yeah, researchers and people in science are using Google Scholar PubMed and Science Zurich and all these other new platforms like lens.org and dimensions are used by Niche. So I would like to hear from the other speakers what you think about that. Do you agree or is that your experience as well? So for me when I'm searching, I tend to stick to the more traditional databases. I know people who are using these sorts of platforms and Google Scholar and things. And I think one point that you raised was the issue of paywalls. So you tend to find that that means the sources that are being searched varies tremendously across different institutions. And it also poses a front when it comes to us actually automating things and using tools. There's the equity and the access to the different tools, especially those that charge to use them. And a lot of this really boils down to the funding that's available to maintain the tools over time and update them. And especially when we're coming back to using these new sources and trying to incorporate things like Google Scholar into how the tools work. So it really is this circle that feeds into each other. I don't know how anybody else feels about that. Yeah, I have used dimensions on Google Scholar. I often use Google Scholar in my searching, but for systematic views. But at the end of my systematic view, I do a search summary table and I can see where the evidence has come from and across all the different databases, you know, that it was in five out of the seven I searched in. And Google Scholar has never given me in maybe 15 of these tables a unique reference. It's always been picked up elsewhere. So my, you know, as a searcher, I start thinking, okay, it's not giving me any more than I get from the paid databases that I've got access to. But your right hand is all about equity. It's fantastic that it's things like those are out there, but we're in quite a privileged position in that we can search other databases and search them perhaps in a more comprehensive way as well. My experience has been observing people in the health slash medical area doing these reviews. And there, I would say all of those that I've seen, they would use RV, the Medline and so on. And they would author Boolean queries. And so you have an exact match that you can explain and justify technologies such as Google Scholar instead, they do not allow you to enter Boolean queries or if you do it through the advanced search functionality. It's a quite a loose Boolean language that they use. And also the ranking is a very strong features in Google Scholar that is not really in things like PubMed. So I think that there is a difference when information specialists use, you know, the more let's say formal tooling around promoting Boolean queries. And when inside, they use technology that they understand maybe a bit less or they feel less in control of, like Google Scholar. And I think there are benefits in both sides, right? And we have been trying to argue with information, friends, information specialists about the importance to go beyond Boolean queries and the importance to go to look at ranking, ranking functionalities. But at the end of the day, it boils down to what users are familiar with and are confident with. And I think a big limitation we have, we as AI scientists at the moment, is making sure that the tools that are based on AI can have some form of explainability. So the problem for information specialists, in my opinion, around using Google Scholar as a replacement to PubMed, for example, is the ability of explaining that that query did retrieve everything they wanted and they didn't miss anything, right? Which they can achieve with a Boolean query, but they aren't sure about with a semantic search engine like Scholar. One other thing I'd like to say about that is, you know, a thing like Lend or Google Scholar, they're so massive, whereas, you know, PubMed, you know, you're just looking at kind of medical research. So you think, okay, at least, you know, I've put some boundaries up there for what I'm looking for. But then when you go into something that's so massive, you, yeah, it's hard to know, like you said, am I actually retrieving everything I need? Because it's so massive. The reality on the other side is also what you said before, right? That sometimes you cannot just go over everything you have retrieved, but you know you have a budget and you somewhat need to stick to that budget, right? So you are controlling that through your Boolean query. There are other ways of controlling that, you know, for example, ranking, for example, not having one gigantic Boolean query, but having, you know, a set number of queries, query variation, issue these independently, and then somewhat merging the results and prioritizing the results. So there are alternative ways which might actually lead to some benefits, right? The issue, in my opinion, the big showstopper is having a methodology that is intuitive and can be explained and that you can benchmark and put boundaries of confidence in its effectiveness, right? You can do that with your Boolean query. It becomes harder with these more best match methods or with ranking. Yeah, so this is actually an interesting thought, I think, how these new technologies, and you mentioned ranking and also the ranking and PubMed, you know, is really interesting. I mean, you're still able to extract all your search results, but you can also have a look at, you know, the best match ranking. And what I was wondering is whether, you know, the hard for the information retrieval and actually the selecting the relevant evidence will become more and more unified or more, we merge more and more into one, you know, that you're like, Alison, you mentioned your first search in Medline and then you do another search for the other databases, which is also sort of more embracing the idea of joining the information retrieval part where you just get everything that might be relevant and then, you know, screen and rank what actually is relevant more into one step. What are your thoughts on this? Because you'd have to do, you have to have the full text, to be able to do that, that's quite a big leap, I think, from screening through, it's a very, you know, at your bibliographic level to, yes, that's in, you know, that's exactly what I want at full text. But I'm sure others will have a much better opinion of that than me. Yeah, in my opinion, that's a promising area in the sense that, at least from an AI perspective, because, you know, when you search, we have data about the collection, we might have data about, you know, how important the terms you have put are or how important terms we suggest might be, how discriminative they are. But what we are, what most techniques, technology tools don't do at the moment is exploiting that online signal that you are providing every time you do the screening, right? So, you have techniques at the moment for screening where you are providing feedback, you know, you say, this document is included, this document is included, and that attempts to automate the inclusion, exclusion assessment of the remaining documents. But we don't, but that feedback doesn't feedback to your search, right? It doesn't say, oh, this document, this reference, you mark it as relevant. Now that I know that, does that imply something to the query, you know, would the query change somewhat to, because, you know, now I learned more. So we don't do that at the moment. You do it, you, as an information specialist, somewhat do it across database, right? You are one of the few I'm aware that does that, and which is great by the way. But yeah, I think there is much more we can do with that person now. And I'd just say, I'd love to see that, because that is something I've mentioned a few times is that that loopback, you know, you can use machine learning, but unless it comes back into the search, we're actually not learning anything about how we're searching the database. I would love to see something like that. It's my wish list. But shouldn't scoping fulfill that task? So you should first scope your query anyway. So it feeds back into the loop, the search results, you would screen the first screening and look into the language that is used within the studies and those this language, then again, is included in the keyword search. And I think in that step, in the scoping step, that's the great advantage or the great opportunity of the new AI based search tools, because they can be in addition to the traditional like experts inquiry and just screening through existing reviews and other things. We could have AI as an assistance of getting us up to speed. But then again, as you say, Alison, we have that step of human information specialist looking at the query that comes out and then feeding into the actual search and making sure that the query and the information retrieval is actually up to standard of systematic reviews. And maybe we could even have the AI tools as an addition to like we have keyword match, we have citation searching, so snowballing methods, maybe we can have even AI search on top of like, I don't know, I'm not the expert here, but I'm having a corpus of already positively identified studies and then do an additional round of AI retrieval based on the vectors and not based on keyword match. And as far as I understood, the Cochran handbook, for example, mentions even Google being adequate as an supplementary system. So I think we need to separate two things. The first we have the principle databases we search with and the methodology that is rigorous, transparent, reproducible and so on. But then we could even do more, even though it's not fully transparent, reproducible as you say, Guido. And we could even benefit from that. Like Alison says, you don't find any unique hits from Google Scholar anymore, but maybe we can even get some more hits if we search Google Scholar in addition, even though it's not perfectly suitable or use Google or those new tools in addition, like Ellicit and the other AI tools that now pop up. Yeah. Perhaps we need to search in a different way instead of, you know, you've got your database searches and the terms, keywords, the control rule, whatever you're using in that. But you can use that maybe in Google Scholar and you just search in a completely different way. I think that would be really good. And if you were able to, one thing I'd love to be able to do is to evaluate the search at the end, you know, once everything's completed, you know, we already do a kind of evaluation of the search methods, so database searching, supplementary search methods, we see where it all comes from. But then we're now starting to evaluate the search at the very end as well to see, okay, this search in this database was these number of lines long, but actually this one term in the title picked up all the relevant stuff, you know, so if we could, you know, do that at the end and then that would help make our searching evidence-based rather than experience-based, like we were talking about earlier, that a lot of it comes from your experience of doing this stuff, but if we could, you know, complete the loop then in evaluating at the end what we've done and, you know, hopefully tools would be able to do that before us, rather than me taking a day and a half to do it. Alison, I think one of your wish list then, we have a tool that is called query visa that allow you to load your query, your Boolean query, it allows you to load your assessment of, you know, the retrieved documents and then your labels, your assessment included, included, and that builds a tree of a Boolean query with all its logic and it maps how much, how many relevant document in the ratio relevant or relevant, or included, included. Your terms have retrieved and then you can start playing with the nodes in the tree, which are terms and the Boolean logic operators to see how modifying your query would have changed those results. So you can use this visualization to play a bit around with your strategy and think, you know, was there a better strategy or, you know, could have cut the noise out. So does that have all of the results or does it just have the included results that we can do both? You load a file, like a random node file with your labels and it will map out the ratio between included and excluded. That sounds very cool. I'm not sure I'll be able to use it, but please send me the link. I wonder then with that tool whether there is a place for it after the title and abstract screening stage, to be able to then assess how other searches worked and make adaptations to it at that point to go back. So it creates a bit of a loop around there and so do you get to the point where you're happy with it? I think that could be interesting to explore. That's definitely. So I was wondering that tool that you mentioned where and how is that available as we've been talking about accessibility and time. So I guess this might be something coded in Python or can you just use it without? Yeah, so we have the tool, it's online and not fantastically hosted and managed. It's on university site at the moment where anyone can use it and play with. We also have integrated it with our partners upon the university within their systematic review accelerator. So it's one of the tools that their system allows you to use. And our collaborators in Bond, like Justin Clark, he has gone around and showed how you can use it to better understand your search and even use it for training systematic reviews. So yeah, I will make sure I will post a link to the tool and for you to play around and possibly put it on the YouTube comments of the live stream. Great. Thanks for that. So I think we're already almost out of time, but I would like to make one last loop to what Michael mentioned previously, how these AI tools actually might change the whole way we do searches. And I was wondering what you had in mind or what all of you have in mind. I mean, have you tried to do a quick systematic review with chat GPT or any of the new tools? And what were your findings? I mean, there's so many at the moment, like illicit or stuff. So yeah, what are your thoughts? Yeah, go ahead. I guess I can chip in a bit on this. So we have used the charge PTA. We have used to charge PTA in a few ways. One way is to attempt to create a Boolean query for a systematic review. And in there, we looked at engineering different prompts, different input to charge PTA. And even we attempted to use charge PTA in its conversational real setting, where we don't just give one input and expect an output, but we build the input through several conversations. And what we have found with that work is that the technology is quite promising, but it's not yet there. We can build queries that are surprisingly quite effective, but still below what an information specialist can build. However, the problem is, the current use is that you can get away from the blank space syndrome. You can start immediately from a Boolean query that is quite decent, and then new information specialists can get that and evolve it into something good. And so that's one promise. It seems quite good in terms of precision as opposed to recall. So if you are attempting to do some rapid reviews, it seems to be maybe one tool that might be quite helpful in that respect. However, we are... So this is one of the works we have done. Another work we have done attempted to extract from charge PTA, extract Answers to specific medical questions and get evidence for it. So can you charge PTA, give me actual the studies, the reference to the studies? And that has been actually not very good. There's plenty of hallucinations in these language models. I think what is dangerous is that what it produces looks like decent. And so you might believe in it. Some of the references it produces or links it produces actually do exist, but then the claims made about those references are not in these sources. So we found charge PTA being quite good for some specific type of tasks, but to be a bit sloppy for more complex tasks, especially around the synthesis of knowledge. However, we are in the early days of these new generative models, and there are improvements that are coming out every day really. I think we are understanding these models more and more. So I wouldn't be surprised that soon we managed to do some of these tasks much better than what we are doing at the moment. In innovation management, we have the concept of disruption. And that means when a new technology comes in, that typically creeps in from the low end of the market, and that in the case of EI coming to evidence synthesis in the form of lower quality evidence synthesis. And I think there's currently the value there that it might introduce a new type of evidence synthesis, which is low cost, very fast, and will replace a lot of reviews that already higher costs, but low quality. So I think it will replace those and put more pressure even on being higher of higher quality. So I think it's a good effect of EI that at the moment we have about 60,000 systematic reviews and meta-analysis, I'm counting a couple of years ago. So it's a lot of reviews, a lot of meta-analysis, and there's some comments that circulate that the quality of most of those reviews is subpar. And I think EI can get the low end at the moment to a better level, needing to consider all those problems with hallucination and everything. But it's difficult to foresee the future. You know it better than the MiG Vito. But I think there will be a lot of new developments now triggered after the launch of JetGPT. And I think we should, in evidence synthesis, embrace it as something that can facilitate our tasks. But when we want to have the highest quality of evidence synthesis and that should be our goal, then we need to define the tasks that are critical or that can be supported in our daily work. And I think that there's a lot of promise there. And maybe one addition, just a side note, I found out that what those EI tools can also do is you can prime them. And I tried it out once. I primed them with, okay, look up what Cochrane Handbook is and give me some advice on how to kick off my evidence synthesis. And I think that even increases the value of good guidance because that can be fueled into the how the AI will respond to the inputs we give them. So if we create good guidance for those systems, I think then the quality of the output we receive even comes better. Yes, definitely. To remind everyone that I love this discussion, but we have to find an end. So maybe some last remarks from all of you would be lovely. Kido, you already started. I wanted to say that it's a very exciting time at the moment in AI and also looking at it from the evidence synthesis lens as a viewpoint. But I believe we still don't have a very profound understanding of how these methods work and making sure that people are using it in the right way. Michael mentioned prompting these inputs that you provide to the models and we have started assures. It's very clear that the quality of the prompting affects these models and the effectiveness of this model. And we have started assures how providing evidence in specific way to these prompt changes completely what the model suggests in terms of actionable decision that then you could take from these models. So I think there is plenty more that we need to do before we actively rely maybe blindly rely on AI technology. But certainly there's a lot of promise in there and really exciting. Michael said he went to in the holiday recently. I cannot take leave because every day there is something new and exciting that comes out. Okay. Thank you. So Hannah, Alison, do you have any last thoughts on the future of information retrieval? So I think I agree with everybody here today. I think there is a lot of promise for automation. I think the tools that we have available already are helping us to go some way towards this. For anybody who is interested in tools, the systematic review toolbox has a huge list of tools that are available so far. And I think it's definitely something that is worth exploring and expanding upon now and in the future. Thank you, Hannah. Alison, yeah. I don't have any more to add. I'm quite excited to see what's going to happen. But I won't be testing these things out at the moment, I don't think. I think I might leave it to the likes of Michael and Guido and maybe Hannah to do the first tranche of is it going to work and how is it going to work before I go, okay, I'll try it. Good luck with that all. Yeah, so yeah, thank you all for joining today. And I'd really love to have this panel again in like a few years and see what's been happening because I'm sure there's a lot going on. And yeah, this was a great discussion. Got some great insights. And yeah, thank you all. Thank you very much. And all the questions that maybe come from the audience, I'd be happy. I'll be having a look at Twitter and Slack and yeah, I'd love to hear if there's any questions. So thank you.