 Hi, everyone, thank you. We're here today to talk about a project that we've been doing at Glasgow University. Unfortunately, Jeremy Singer who is our lead academic can't be with us today, so we're going to do our best to bring out our inner Jeremy. So, we're going to talk about adaptive comparative judgment. Mae yma bod yw'r ffwrdd ranknogs i'w meddyliadau yma yn ddodondeb a'i gweithio gyda'r ddaur ffordd i'r deall, i'r warchfod ym Sorffade, i pethau yn gweithio'r delay, i'r ddod ddaur ffordd i'r ddod ym Os Dim, i'r I, i'r I. Felly mae'r ddod yn wahanol ni'n ddarparu, dod eu maes, mae'r ddod yn ddod wahanol ni'n ddod yn ddod o ddod i'u ddod i'u ddod i'u ddod, a mae'n gweithio'r cyntaf sy'n gyfraithio'n cyfanydd y gallu cerdd yn ymgyrchol. Ychydigol. Fe wnaeth yma yng Nghymru yn ei gweld, mae'n gweithio'n gwahodr ffordd o'r setysgrif o ydyn nhw, o'r best i wirth. Mae'n gweithio'n gweithio'n gweithio'n gwahodr hwn ni'n gweithio'n gweithio'n gwahodr. You don't need to think when you're marking about, this is a A this is a first, this is a 2, 1, this is a 60, this is a 61. You just think better worse, better worse. When you get to the end of the process then you can decide where you are going to put your grade boundaries, whether you're marking to a curve or whether you're marking to rigid standards. So it autobiades all of these different ways of marking and it frees you up just to make judgments, academic judgments, aesthetic judgments, about what's better and what's worse. ac mae hyn yn ymwneud i gyd yw mae'r marco yn ymddir eisiau hwnnw. Felly y cyhoedd i chi eisiau y modynedd arall? Gwysbyn cyhoedd i chi fynd arall. Diolch. Felly tyfu hynny yw hefyd Brexit. Ac mae hi eisiau oedd plwy o'r ffordd hynny. Diolch! Mae'n gwneud... Fy hynny'n gwneud... Y rhodd plwy eisiau. A lly Enw'r ffordd, is it allows you to judge using a single implicit criteria and again what's better or what's worse rather than trying to use complex explicit sets of ILOs it's much easier and it can be used for I always used to think it could just be used for questions that only had that had subjective different answers but actually you can use it for questions that have a single correct answer as well all of this is stage scene setting it will become a lot clearer once Neil starts talking I hope okay so distinctive benefits before Neil starts it scales you can use it for tiny sets of submissions like 20 or up to 10 000 potential for use in MOOCs I think is fantastic and indeed Jeremy's been using it in his MOOC it says compelling naturalness it is intuitive and plausible it can be used with one marker it can be used with sets of markers so you can get your inter and inter orator reliability it can be used for peer review which is how we've been using it so you crowdsource you get the students to do all of the judgments and then as an academic you come in and award the marks so it saves a lot of time it can be used to mark things that are very very different from each other because you're just judging better worse rather than a set of ILOs and you can also put in exemplars so if you really if you want to say I want to see where my grade boundaries are you put in the exemplar for the A, B, C, D, E anything that was at the A or above would get the A so forth and I think I'm handing over to you now oh okay I'm talking this side okay so we've used the software at Glasgow we're using it in the House for MOOC we've used it to judge our conference submissions because when conference submissions come in they're very very different but we were able to rank them to best fit to the conference and then decide which are the ones we were going to put in each of the streams there was a major experiment done of this adaptive comparative judgment which Pollitt talked about in 2012 and what Pollitt found was the expert markers who were highly skeptical initially of using this process but by the end of it they judged that this process was a better way of marking and it was a faster way of marking so faster is always good as academics if we can do our marking faster that's always great but we also want to know that we're marking and keeping our academic standards and it found that it kept the academic standards indeed I think it outperforms regular ways of marking this is you our implementation we've Pollitt used his own implementation or one that Cambridge we've done our own very lightweight implementation it's a simple LTI application so it doesn't have to deal with user stuff that's all done by the LMS we use it with Moodle or i have used it with Moodle a bit in experiments the study we're reporting on here we used it with future learn which allows you to launch separate LTI tools our tool lets the submissions be text pdf a youtube url a picture things like that in this instance it was source code and the students just paste in the text and then a standard source code bar matter makes it pretty printed and nicely coloured to make it easier to read you could also add students by the software allows staff to put in a set of things for students to review to make it a reviewing only exercise that they're learning to judge from the students viewpoint like other tools Moodle workshop many of you will be familiar with aropa is a similar tool we use at Glasgow University and I think a few other universities use it's got a phase for submission and then a phase for review we are thinking a bit about getting that more blurred in future so the the process the algorithm it's a round of sorting so in each round the people doing the marking or the grading just look at two things at a time and decide which is better and each artifact each piece of work will be judged at least once and probably not much more than once depending on random things about who turns when people are looking at things um these rounds are put into three slightly different phases which I'll talk more about in a minute but they have slightly different scoring algorithms to improve the quality of the sorting and the algorithm it's a bit different from Pollots Pollots algorithm is problematic because in his paper there's a typo and you can't quite work out what the algorithm is so I developed my own base more on his description than his maths and used the simulation to refine the this algorithm I'll be showing you some outputs from that simulation so this this is sort of helping you think about it but also understand the next few slides which show output from my simulation so you start off with your artifacts your pieces of student work here six examples they're just in a random order numbered in order so you can see what they are but this is just would be the order in which the students had submitted probably to make the things easier to view in my simulation I color coded darker means more to the left in this sorting lighter should go more to the right so in the simulation these colors are effectively assigned to be the perfect score and then a random number was added or removed to that so that it's got the slight error of reality so in each round each thing's compared the one more to the right is picked and given a point so and then I first sort all the ones that got one point are on the right all the ones that got zero points are on the left so get on to the next round and again the better one or the one judge better gets an extra point and now some have two points go further straight some have one point in the middle some have zero points this sort of sorting over a long long period would get it right but that's not quite enough but and so there is a sort of scoring algorithm put in there so that as it the first two or three rounds are just done that simple way but later on it starts waiting the comparisons depending on how far away in the sort are in the second phase it's waited to allow things to move quite fast because if randomly a few really good or really bad bits of work had been put together at the very beginning in the first random order one of them could have ended up very out of place and then later it goes to a much more refined sorting things close together algorithm so here you're seeing the first four rounds of a simulated one following one particular thing number 43 highlighted an orange down as it sorts and the red highlights are the ones that has been compared against and so its score at this level relates to where these are in the previous sort level and with a few more this is then using the more refined algorithm as you can see the sort of grayiness goes very smoothly almost from dark to light across showing that this is actually a quite effective sorting mechanism after about 18 rounds of sorting it's near enough perfect and this scales this would work however many artifacts were in there because it's comparing with a sample and using their positions so here it is the same with this is just the middle third of a thing all you can see of course is the color but as you can see it's quite smooth towards the bottom from being quite random towards the top so that's with 600 I've experimented with up to a thousand on the little server I was experimenting on that's getting to its beginning to run a bit slow so you can try this out if you have your mobile device handy you can log into our we demo site this is my first ever trial I just keep running to show off and I've put up some pictures for people to start it did show up some interesting things these are pictures of wildlife flowers insects birds they're not very good pictures they're just ones I've found on my camera but it was noticeable some people do sort looking at this type of artifact which has got different categories you could look at it for some people would sort to make a robin very good even though it's not a great picture but it's a nice bird and other people would say spider goes right down even though it's quite a good picture and so there's it did notice there's two artifacts and that's my the two aspects at least to the way people judge and this might mean you need a bit more guidance but that's these pictures maybe in academic work it's going to be a bit better so our case study functional programming in Haskell this is an interesting course in that it's run as the first half of an honour's module but also as a MOOC so about a thousand people in this particular run were using it from around the world on future learn and a class of about 80 honour students at the University of Glasgow it's a as a programming language Haskell it was developed at Glasgow so that's a good thing for us but it's from our students viewpoint it's a slight paradigm shift in their way of programming so it's in a quite new language to them previously they've programmed in very conventional languages like Python and Java and then they do this Haskell which is a functional programming language in quite different style in honours they were given a problem specification to implement something of which could be done in a writer page of code some guidelines as how to judge so some criteria about you know look at readability look at actually solving the problem and then in the marking page this grading page they were looked at their peers solutions to compare so in this instance it was a we're using this as a peer tool and then finally at the end they'll see their own ranking just as a what hortile it got into we don't want to say to people you are worst but the way they would say you are in the bottom group and I'm sure they will all have known this by having seen the others and they also got given a sample solution so this is a question writer spellbook generator this is the sort of thing that Haskell is a good language for doing and they were given these instructions about how to write the good quality code and then here's a sample solution as you can see it's a quite concise language and this is so student comments I'll hand it back to Sarah so we did some evaluation of course we did we like to evaluate it um we sent out an online survey to all of the students including the monster in the mook and our honours students and we asked them a range of questions about what they thought about the acj software because they're computing students we wanted those sort of answers from them evaluation and also what they thought about the process so these were students who were probably fairly new to doing peer review anyway and it was peer review using acjs we asked a range of questions and what we got was a lot of students telling us that they liked the way that they got to see lots and lots and lots of different solutions because typically in peer review you'd see two or three maybe but with this they could see a lot and the ones that were really invested of course could just do more we weren't limiting them so it's it can be quite addictive um when we when we first used it for the conference there was me and Kathy Boval we just got really really addicted to actually doing the reviewing so we did masses because we just loved seeing these photos um and we loved seeing the abstracts and all of that it was great anyway um again here's one they said they thought doing this helped them to think differently because they're having to think how to evaluate their um peer's code um here this is exactly what we wanted we wanted to show students early on what their position was in the class without having a leaderboard and they could see how well or how poorly they were doing for themselves instant feedback early feedback really important um and again I love this one I'd like to thank the course educators well I think that's thanks to Jeremy not to us but we'll take it um and again this like I thought this was really interesting that as time went on they could see that they got quicker at doing it um so yes the the process speeds up and it's one more so how could we use it some interesting stats well we can get some interesting reports out of it if we wanted to write them so we could set the software up to say who is the most deviant marker so you know sometimes you have a marker who is a bit problematic everybody else maybe is judging something to be good and they're judging it to be really bad well you might want to look at this marker and take them out or you might want to look at this marker and say what is this marker seeing that everybody else is missing so you can get that sort of information out of it you can also see which submission was the most divisive which was the marmite submission the one that some markers thought was brilliant some markers didn't think was brilliant again is it the cave who's missing what because you know I know myself sometimes I've marked something and I think it's maybe round about a sea and then somebody else comes in says actually do you know what there's something really novel really interesting I would give that an A these are the conversations that we have as markers and these are conversations that we can see in the software and again we can see how converged the judgments are is it the case that everybody thinks that one is the best that one is the worst or is there a bit of a controversy and so it's not the case that we just given a bit of ranking and we have to accept it we can interrogate the data so where next well the software we have is still in development it's still a pilot tool we've been piloting it successfully for how many years now three four before years in small things it's living software Neil is the developer Neil's a fantastic developer to work with because he understands what academics want and he will work with academics to get a bit of software that works for them so if there's a restriction in the software very often Neil will work to get around that restriction you know obviously there are restrictions like you just can't do you can't do magic he can't give you a unicorn but he has developed this bit of software in line with academics and working with Jeremy who is a fantastic academic who's very engaged in his teaching and indeed he's very engaging when he teaches it's really really useful because you get a bit of software that is suited for academics but it's technologically robust as well and I think at this stage if you think it would be useful in your teaching I think Neil is putting out a call to ask for a collaboration the software of course is open source it's on github anybody can go and pick it up but what we'd like is if people are picking it up if they'd work with us because we're doing further research in this we're still using it we're trying to extend our pilots and what we would really like is to work across the academic community to work with you to do a proper robust study so scholarship research or people who just think yes I want this in my teaching I don't want to do any scholarship no research please just let me use it all of these are fine and I think that's it fantastic um there are there is a roving mic so if you'd like to raise your hand and um ask a question we have had a few questions here as well so one of them was around providing feedback to students and um the question was if a submission is ranked at or near the bottom of the rankings how does the protest provide feedback so that the students knows what needs improving or why they got a low score so we'll start with that um but if you have a question in room we'll come to you next yeah so there is no traditional feedback given the no thing you should have done this this bit was poor I've been a student recently enough to remember that wasn't very helpful what there is is what David Bickel would consider to be internalising of feedback the students are seeing a range of work there's a very good learning potential there it probably needs to be studied more but I think that's a that is potentially a much more useful form of feedback there's an interesting new solution um if you do have a question can you just let us know who you are and where you're from thank you okay thank you uh Steve Rowett from UCL um it's a bit like a binary sort algorithm I guess yes and by the end of that a student is comparing two things that are probably very similar you know that they're near the bottom end they're near the top end they're they're in the middle does that make it very difficult because it's quite hard comparing two things that are similar right it probably does make the the comparison harder at the end and yet ideally we'd get students going through from the beginning to end but you can't unless you've somehow got them timed into it um and since we're doing fairly open the like the login and they're on time we can't do that so that that's something worth looking at and thinking about how to make it better from that viewpoint um but yeah when doing the the conference judging which I took part in yes things get slightly harder at the end but you also because of being through it you get quick you get good at it um if you're just jumping in at the end yes I can see there's an issue but I think at that point really at that point you just say well we've got the sort no more judging to be done yeah and if students needed if it if it was a pair exercise you we could we've talked about starting it again for students to go through the process so we could have multiple sets and then and then we fantastic then we could rank all of those against each other wow what a what an evaluation nightmare one thing a if a student is seeing two very close together the next pair they see will also be close together but they won't be close to these two they'll have been it's quite different in this argument thank you I think we've got time for two more questions so we'll take one online and then if there's any more in the room if you just hold your hand up please um and we'll come to you for the last question so there were a couple of questions around algorithms and there was also a question around the ease of use and whether it's openly licensed so maybe if we focus on ease of use and openly license um it's incredibly easy to use I have been using Moodle workshop for many many years I understand its affordances um it's not the easiest it's a lot easier to set up because there's many settings in it so from the point of view of a member of staff setting up is very easy from the point of view of a student it's really really easy they get two things on screen and they just either push left or push right that's that's it in terms of licensing I've actually do license um there's a few bits of other open source in there which are various different licenses but they're all quite liberal open source ones great thank you so time for our last question is there is anyone in the room or have you all posted them online otherwise I think we'll finish with a question around the algorithm and um so we have a couple of questions one is how sensitive is the algorithm one is is the algorithm just rules and the last one is it ethical to use an algorithm without checking the results or are you checking the results and um could you expand a little bit on that so if we maybe round up on that um is that all right the algorithm it's just rules yes that's what the algorithm is it's set and says given this we will award this number and use that in a sort um the checking well the simulation shows it works the simulation shows very convincingly it works with a lot of noise in it which is interesting that's where this algorithm's different from a standard computing binary source because there's it deals with noise it deals with some of the judgments being different from others fantastic thank you very much and um we also want to give a big shout out if you have been watching this online I think there are some colleagues from Sarah and Neil who might be watching this on the live stream so if you have been joining us online a very warm welcome to you as well if you could just put your hands together for our presenters thank you Idina's work with learning technologies helps to develop skilled data literate students who can change our world for the better teachers and students can develop and share coding schools with multiple or Jupiter notebook servers our digi map services deliver high quality mapping data for all stages of education future developments include a text and data mining service working with satellite data and machine learning and smart campus technology