 to all the wonderful CSVComp organizers and for making this possible in the current situation and to all of you for joining us this morning, afternoon, evening, depending on where you are. Yeah, I just want to briefly introduce myself. My name is Emmy, I'm the Innovation Community Manager at Elife and we have today with us as well, Daniel who is a data scientist working with Elife as well. So today we're hoping to introduce PISCOUT, which is a project that leverages data, technology and community engagement to improve a process called peer review. So we're hoping to share some of our latest progress, but more importantly, perhaps to share a bit of what we've learned in that process and also to get your feedback and input on how we could proceed. So yeah, just to tell you a bit about Elife as well. So we are a nonprofit organization. We started by scientists for scientists. This is sort of our mission on our office wall. Our mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and promotes the most responsible behaviors in science. So a lot of the work that we do is to innovate and experiment with technology and to work with the research community to achieve this mission. We are also known for publishing a fully online and open access journal in the Life Sciences. So that's sort of what I see as our playground for these technological innovation and experiments. So yeah, what is a scientific journal where it's very sort of easy, private into this topic? What does the journal do? So the way we see a journal helps researchers communicate the output of their work to other researchers and to the public. So this is a very brief, simplified diagram of how the science publishing process currently looks like. So researchers conduct research and then they write up their experimental hypothesis, results and conclusions into a manuscript. They submit this manuscript to a journal. The journal helped carry out this process called peer review where they recruit other scientists to help review that manuscript. The reviewers may decide to reject the manuscript or they will offer some comments to the manuscript's authors who will then revise the manuscripts by through considering the reviewer's comments. Once they're all happy, the manuscript gets published in the journal and communicates it to the wider research community and public. So that hopefully all sounded pretty straightforward, but actually peer review is rather a bit more complex in reality. Most of the time research articles are about an extremely specific question and field in science. And so it also often uses like a specific set of techniques and people that can review a manuscript then has to be experts in that field in order to be able to assess it effectively and review it. So this is more or less what happens in the first phase of peer review at E-Life. So let's say we've received a new manuscript on a computational model for modeling the coronavirus pandemic. So our editorial team will look at this and assign it to a senior editor who has expertise maybe in virology. The senior editor then has to pick a reviewing editor with the relevant expertise to help advise on whether or not to accept or reject the, to review the manuscript. If they decide to review it, then the reviewing editor nominates some further referees, maybe one specialized in viruses and another one in maybe modeling and one in pandemics to review that paper. So you can see that it's basically a day-to-day sort of challenge in publishing is how do you find the most suitable candidates to handle and review that paper for this process? At E-Life at the moment, we have over 60 seniors editors and 500 reviewing editors. This all has to happen. We want this to happen relatively quickly. So we sort of have a promise to say we are reviewing times often under two months in total. So this is all operating under a time constraint. And finally, because all our editors and referees are practicing scientists, so they're all busy. They may be handling 10 other papers. And so we have to be able to build a system that allows them to find people who have the right availability, expertise and time. So you can see how this becomes a problem without a technical solution to help them. Editors end up usually asking people they know. And as you can probably imagine, this leads to biases in editors and reviewers in terms of geography and career stages. So why is this all a problem? So there's this study by Marie Aetel that looked at over 23,000 submissions to E-Life between 2012 to 17 and found that the gatekeepers, so the editors and peer reviewers tend to favor manuscripts from authors of the same gender and from the same country. This means that if, let's say our editors and reviewers pools are mostly composed of men from the US, the manuscript we review and publish are also more likely to be written by men in the US. As E-Life, we want to make peer review more diverse and fair. We want to encourage this by making it easy for editors to pick a more diverse pool of people. So rather than from their own memories or networks. And the second thing is we want to introduce more underrepresented communities of researchers into this reviewing system. For example, early career researchers so that we could make sure that the work that we publish are representative and from all parts of the research community. So this is our motivation for building Peer Scout. So now I'll hand over to Daniel who will explain the concept and execution in a bit more detail. Right, I'm now going to talk you through the two that we call Peer Scout. And we already mentioned the different stages that are part of our review process. So let's focus on the last of those stages. Imagine a manuscript was submitted which went through the first stages and the GU or better we are the reviewing editor who's tasked with assigning external reveries. So we are looking for a domain expert who would be able to review the paper. So assuming we don't already know somebody and that's also something we don't want to rely on, you could try to find a person who's an author or a reviewer of a similar paper. And this is essentially what we're trying to automate here. By using topic modeling, we are curtailing a similarity score and use that to rank the potential reviewers. So let's try to use the two room. In this version of Peer Scout, there's a network chart on the left and more detailed results on the right. And if your eyesight is good enough, you might even see the numbers in the circuits. Those numbers represent the similarity score of the profiles and related manuscripts. And the different types of reviewing profiles are also color coded with representing early career viewers and bring for regular ones. And looking more closely on the right, we can see other useful stats. Generally, we would prefer more responsive reviewers. Like Amy mentioned, we want to have quick reviews. And at the same time, we are aiming not to overburden reviewers. That also highlights that expertise is not the only factor to consider. In the list, every other result is an early career viewer and that is to encourage the use of them. And so editors have no excuse that they are not aware of them. So what did our editor think? Let's say we are not done yet. Up until now, we worked with the editorial team and we then encouraged editors to use the tool and give us feedback. But at the same time, comparing it with another product called Review Connect, which is also offering review recommendations. The feedback wasn't as glamorous as we had hoped, apart from needing to improve the recommendations. The feedback was pointed to users wanting more information that would help them to better understand why a person was recommended. So our conclusion was that we are trying too much at once. We gave the editors a new tool to use and they may not even trust it yet. At the same time, we were trying to change behavior by encouraging to use early career viewers. For our next iteration, we need to focus on three points. First, we need a more objective measure of the recommendations. We can then better evaluate different algorithms and choose the one that works best. Secondly, we need to find a way to better show to the editors the information they need to have them judge whether recommendations appropriate, especially for people they are not familiar with. And finally, we still want to encourage editors to use early career viewers, but we might need to find a better way of doing that. So for our second iteration, the workflow looks very similar to the first. The main difference is that we are explicitly extracting keywords and key phrases, which help to make the algorithm more explainable. The scores calculated for the ranking, we can also identify the most relevant keywords and by showing that to the user, we are hoping that it's better at explaining the recommendations to the user, allowing them to judge whether the recommendations useful. Okay, on to the evaluation, that was our first point. This time, our initial focus on the senior editor recommendation, rather than our viewers, we would be looking to expand that in the future. So we are evaluating the algorithm based on actual assignments. That means how were the recommended editor much to then assigned editor. There's no guarantee though that the assigned editor was indeed the best match, like I mentioned, there may be buyers or there may be other reasons, but it's the best proxy that we have. The chart is showing four measures, all of them range between zero and one, where higher is better. Starting from the right, the required three measure tells us whether the actually assigned editor did appear within the first three recommendations. For recall at two would be the same, but considering the first two recommendations only, and recall at one only looks at the very first recommendation. The NDCG measure provides a way of judging the recommendation as a single score. That's particularly useful when comparing algorithms. As a point of reference, the chart also includes a random model. Fortunately though, the new algorithm is doing better than that. This now provides us with an objective measure of the recommendations. There's clearly still some room for improvement by the start. Apart from the summary, we can also identify recommendations that are more in need of improvements and think of ways to achieve that. And this is how the new prototype looks like. These are what select the manuscript, editors would get ranked accordingly. Here we omitted the score itself, instead it's now showing the most relevant matching keywords on the right. In this example, there are wasn't things that synapses is the most relevant keyword for the first editor, whereas it's then trick for the second editor. The user can still then still decide which one seems more relevant to the submitted manuscripts. But hopefully they then don't need to use a separate web search to do that. So with that, Annie, back to Amy. Thanks, Daniel. So yeah, just sort of a few things. That was a very quick tour of sort of what we've done so far and what we've tried. And I hope that that was good information about the tool and the algorithm. But I want to focus a bit more on sort of what we've learned in this process. The first thing is that we can build the most sophisticated machine learning models and user interfaces. But if we can't get the users to trust and use them, then there's very little point. And to build this trust, we really need to spend time to understand the user's needs and behaviors. For this, we're super grateful for the support of our editorial community in agreeing and embarking on this mission together and driving this project forward. Most of the work of communicating with our editors and getting their feedback is done by Maria whose photo is here on the slide and who is our journal development editor and members of the Eli product team. This sort of frequent communication and user testing and feedback allowed us to get very quick, continuous feedback from our users so that we could learn, improve and iterate very quickly. We also learned from that version one that Daniel showed you, that is probably a good idea to try and isolate and design tests for the individual aspects that the tool is trying to change. So separating the algorithmic performance from the drive to change our behavior. And this separation allows us to better focus and prioritize our tool development. Moving on, we see many possibilities both in improving the performance and increasing the number of use cases for the tool. So we're looking at concept extraction possibly as a way to improve the algorithm's performance. We're also hoping to build users, editors and referees profiles using other data sources. So for example, papers a researchers had authored. This will allow us not only to build profiles for our existing editors and reviewers on the system but theoretically any researchers who's ever authored a paper. And then finally, we want to see if we can turn this around and generate ranked lists using Pure Scout to recommend papers to editors and reviewers to choose to handle and review. So yeah, hopefully with continuous feedback and support from our community and all of you we'd be able to continue to pursue better more diversity and fairness in this peer review process. Oops, just want to say a quick thank you for being here and specifically to Maria and the editorial team at Elife Tech and Product, all of our editors and early career reviewer community to allow us to drive this forward together. And yeah, if you want to stay updated on some of our very exciting initiatives that are upcoming, please visit one of the four links that we have on the right there. And yeah, thanks very much for joining us and we'd be happy to take questions if there's time. Thank you. Yeah, thank you. I think we have time for probably one question. And I was looking at the chat and I know many people are wondering, all journals have to deal, especially journals at scale have to deal with this kind of issue around reviewers. And so there are other initiatives in the community like pub bonds that have been trying to tackle this. And so we got into some of the differentiations of what you're trying to do that's different from others like, you know, tackling bias. And maybe can you just talk a little bit more about if you know of other initiatives out there that are cross journal or similar initiatives, is there something that differentiates or in your mind is a trend in the kind of reviewer matching world? Daniel, do you want to take this one? And we can't, I think you're estimated. Okay. And you could. Sorry. Yeah, one consideration is whether it's open source. So I believe the one from pub bonds is not, but maybe one. So that's picking, and also maybe that like the last point that Emmy raised, like that we can turn it around like to actually for editors to choose a paper that's appropriate. That's especially like when we want to scale very useful. And we can like maybe add some positive bias, which may be more difficult to like using existing tool. That's especially if it's closed source. And maybe the other thing is that it's not just expertise, but it's like how many papers do they handle and other stats that they may not be so easy to integrate in another tool. But if there's another open source tool, then definitely would be interested to. Right. Yeah, I know it's a really hard, I used to work at PLOS and it's like it's an impossible issue to tackle. So especially when you try to bring in these ideas of bias, so I think it's really congratulate you on trying to tackle this and experiment with it. So, okay, I think we're all out of time. Thank you very much. We're gonna move on. So thanks, Amy. And thanks Daniel, we'll move on to Krista.