 Hi, my name is Dr. Mark Lejeunesse and I'm at the University of South Florida and today I'm going to describe some of the challenges and lessons learned from trying to use undergraduates to screen many many studies for systematic reviews and meta-analyses. And I don't know about you, but I get really excited every semester when I stand in front of a giant crowd of students thinking about all the possibilities, scheming about all the possibilities on how to harness all that energy, all those eyes, into coding, screening, classifying studies. And every semester I start that way and I really want to get students involved in learning how to do research synthesis. I think it's a very important learning outcome given how important research synthesis is in all sorts of decisions at many levels and exposing them early on to how important it is to synthesize study outcomes. I just feel like this energy, I could just bring it all in and convert it into high quality high quality synthesis. But every year, every semester, I fall flat in achieving that. And I am unable to get consistent high quality screening outcomes from my undergraduates. And so here are the three lessons I've learned from about eight semesters worth of trying to get undergraduates involved in screening a lot of studies. And I will begin first by describing a little bit about the population of studies involved in these many research synthesis projects. Typically what we do is at the beginning of a semester we do a little bit of scoping. We figure out a narrow topic and then we start screening studies. And the experience levels of each of these students varies quite a bit. They could be first year students, they could be seniors and nearing the end of their degree. And so you could see that as some variability in experience and confidence with science. And I try to address some of that and I'll talk about that later. That is really one of the big hiccups in having students confidently make high quality decisions because they just don't have the experience, the confidence to assess things in a high quality way. I guess I said the same thing three times there but you'll see what I mean. Classes sizes differ quite a bit from 40 or so students to almost 200 students. These two different groups impose different challenges in terms of implementation and so I'll talk a little bit about that also. And finally I haven't been very ambitious in the amount of studies to screen with each class mostly because based on my prior experience it's highly inconsistent. And so if you get them to process thousands and thousands of studies then you're left having to figure all that stuff out and it makes the whole process very inefficient especially if the decisions made are not of high quality. And no matter how many bouts of screening you bake into your endeavor there will always be a lot of variability associated with those outcomes. And so I've always kind of kept the number of studies low so that I could follow and assess screening decisions. But let's start with the first. Here's the first lesson I learned and it is about whether or not how you should approach assessing the quality of screening decisions by students. One common way to do it is to use a dual screening design where you have two reviewers screen the same number of studies and then you estimate a Kappa statistic to evaluate consistency and agreement or disagreement. That whole system just falls apart when you have many students. You could do the pair design and I have but what happens is now you may have one student or two students or many students that are highly inconsistent yielding very poor Kappa statistics and then you're left like what do I do with this collection of studies that has inconsistent decisions made on it? Do you drop the whole cluster and reevaluate it? And so now I use a totally different strategy where I'm not dropping clusters of studies that have poor screening decisions and now I have a screening decisions at the Kappa statistics at the study level. So each study independently gets screened by five or ten students and then I estimate the degree of agreement or disagreement at the individual study level not as a group level which could result in large amounts of studies getting dropped whereas as focusing on the individual study then I'm more nimble in making decisions on what to reintroduce in a second screening bout and there's always a second screening bout, there's always a third screening bout and so in the next lesson learned is and I'm shaking my head here because this is really the headache is students need to be on the same page in order to make consistent screening outcomes but here's a funny thing that I've observed is there's always very good agreement on what to include right so they read the titles and the abstracts and they see that it hits on these two or three inclusion criteria that we have right that's three check boxes they could check off and then boom they include it. The real challenge is the studies that we should exclude that is really what causes the most disagreement among students for who knows what reason but they studies that we should include always have nice agreement the studies that we should be excluding always have high disagreement on whether or not we should drop them from the study and so here's some kind of messy data where we did two bouts of screening about 250 studies each point here is a different study and on each axis is the screening outcomes of those two bouts so basically we screen 250 studies two times randomly amongst each student and you could fit a line through that and it shows that there's some reliability and consistency in screening decisions. Studies that should be included tend to have high agreement but really studies that should not be included there's again a lot of disagreement so even though we repeated the screening bout of the entire collection of studies multiple times there's still inadequate repeatability in those decisions and this is and it's inconsistent in the type of agreement based on whether that we should include it or exclude it that's a whole dimension that I did not anticipate that students would have a lot of trouble making decisions on what to exclude as opposed to what to include all right the final lesson learned and this one I've experimented with the most by far is the tools will always break there is no tool out there that makes this process efficient across so many students right the criteria that I have for a tool to be able to use is that it should be available to anyone for free and and you think that would be fairly straightforward but students vary considerably in what they have access to I feel like a lot of my students all they have is their phone right and so they're making screening decisions on their phone and that really limits what they can use to complete the tasks just a quick survey of the stuff I've used in the past I've used form fillable hdml's that populate a google spreadsheets the challenges with that approach is students may use different browsers and not all browsers are friendly to form fillable objects in hdml that creates a lot of headaches means like a lot of students are you know near the deadline start freaking out because they can't submit their efforts same problem with pdf form fillable pdfs you know I may create a bunch of form fillable pdfs in r and then I distribute the pdfs to the students to populate with their screening decisions but when I get back the pdfs you know they've used a whole bunch of different applications to open up and save their results their form to results and pdfs not everyone is using you know the adobe acrobat reader in fact now you know you open up a pdf in a browser it's not even using adobe to open up the the pdf which means that when they save their outcomes it's in very different formats pdfs are like some of the most challenging file types to mine and extract data and one of the reasons is because there's inconsistencies in how things should be saved within the file and then finally if the what I converged on now which kind of interrupts all those challenges is just you straight up use the main web platform that we have them use for all their online lectures and exams and so what I do now is I just bamboozle the exams of canvas as screening exercises and that is not a simple endeavor in itself but it saves me some headaches in terms of getting it into the hands of students and then getting it back in a high quality way even though their decisions may not be of high quality the what I'm returning when I'm getting back from the students does not create any headaches in inconsistency in formatting oh man okay and so I touched on three things and all these three things there's many possibilities to grow and improve on one devising the appropriate design in in making sure that the screening outcomes are reliable the dual screening design works if you got a small team but if you got a big team it just doesn't quite make sense especially if not everyone is on the same page in terms of the inclusion exclusion criteria two there's there's a inconsistency in what to include and what to exclude and that's been a huge blind spot in my ability to achieve consistent outcomes because I I'm just unable to properly train there's always a cohort of students that just are not making high quality decisions and then finally the tools tools are probably the easiest way to make great inroads and improving the efficiency of this whole process but there's nothing out there to make that happen easily and I would love to discuss with you guys on how to make this happen I got many ideas I just don't have the the energy to push through a new thing and so there we go these are the lessons learned there's actually more lessons learned but these are the three interesting ones that I've been able to pull together after eight semesters of trying to get students to make some magic in their screening decisions and I I'm always optimistic that one day you know we'll be able to process a lot of information in a high quality way we're not quite there yet and so I hope that these lessons help push your projects forward and and if you have insight on these things I would love to listen to your experience and what you've learned all right so I'll end it with that and thanks for listening