 Mwy i, Ileidwyr, rydw i'n dechrau'r sefyllfa ar y gymuned yma. Mae'n fydig o fyrwch am ymddych chi'n ddylch ei ddweud ymddych chi'n dweud ymddych chi'n cyd-dysgu'r ysbytydol fwylau'r ysbytydd, yn ei ddylch chi'n ei ddweud yma, i ddweud ymddych chi'n fwylau'r ysbytydd. Ysbytydd William yw'r ysbytydd ymddych chi'n ddod am ymddillog, Llywodraeth Llywodraeth a Llywodraeth Llywodraeth's educational assessment. In a varied career, he has taught in the city schools, directed a large-scale testing programme, trained teachers, served a number of roles in university administration, including being of the School of Education, and pursued several research projects focusing on supporting teachers to develop their use of assessment in supporting learning. On 2003 to 2006, he was senior research director at the Education and Testing Service, Princeton, New Jersey and the United States. So you won't be surprised to learn that Dylan is a leading expert in the field of assessment and is often consulted by the media and government for reviews on assessment and education in general. Dylan Hasman, publication to his name, perhaps the one that is most often cited for its influence on classroom assessment practice, is inside the box raising standards to classroom assessment, which he co-authored with Professor Paul Black, who followed a major review of the research evidence on formative assessment. As you will hear this morning, his consideration are some of the ways in which technology can change how learners are assessed and how teachers can be supported in facilitating that process. Professor William continues to influence policy and practice by pushing the boundaries of our understanding of how best to design assessment in order to improve learning rather than just measure it. Please join me in welcoming Professor William to O.C. 2007. The emphasis in my talk today is going to be very much on the learning rather than the technology and I hope to convince you about why that's a good idea. I'm going to start off with some presets about learning, about teaching. I want to point out that what we really need in classrooms are what Lee Shulman calls pedagogies of engagement and pedagogies of contingency. I then want to turn, in closing the talk to the role of technology and I'm going to suggest that the role of technology is supporting rather than replacing teachers and I want to suggest to you that the really important and exciting development for technology and supporting learning is in something called classroom aggregation technologies. First of all, why do we need to raise achievements? Because it actually matters. People with longer education live longer, they have more money, they are healthier. Society benefits by having lower criminal justice costs, lower healthcare costs and the economy grows faster and more educated in the population. It's just as simple as that. So where's the answer? Well this morning the Conservatives announced small high schools as their platform priority. I used to live in New Jersey and last day Capital Trenton was part of this small high schools buzz. They took a 3,000 student high schools divided up into six 500 student high schools in the same building and they wonder why nothing changed. Chicago has tried to state why a small high schools initiative didn't work. It improved attitudes, teachers got on with kids better, but they didn't learn anymore. Government like things they can do easily so they can change curricula, they replace textbooks, they can go for charter schools or vouchers or academies or trusts and of course technology. Technology has been about to revolutionise classrooms for about 30 years and as Heinz Wolf once said, the future is further away than you think. The latest work on interactive whiteboards, Charles Clark when your secretary of state decided that every school in London should have one and we were fortunate enough to have the resources to evaluate that properly and what we found was there was absolutely no evidence of impact on student achievements. Actually what we found was there was evidence of no impact on student achievements. There were as many schools were putting in whiteboards had made things worse as they were where they made things better. The problem is that we were looking in the wrong place for the answer. We had three generations of school effectiveness research. The first one was for results. Some schools get good results, some schools get bad results. That must mean that some schools are better than other schools. Yes, so the conclusion was schools made a difference. Then people said, hang on a minute, the schools with other kids are getting all the good results. They're the ones in the posh areas. So people said, ah, right, okay, we still have control of the social class. What do they find? Social class actually accounts for most of the variation. So the conclusion to this was schools don't make a difference. It's all to do with poverty. And then people said, hang on, why don't we actually look at what the school contributes? Let's look at how much the kids knew when they started that school and how much they knew when they finished. The value-added approach and what that has shown is that actually it doesn't matter very much which school you go to but it matters very much which teachers you get in that school. The variability at teacher level is about four times the variability at school level. If you get one of the best teachers, you will learn in six months what an average teacher will take a year to teach you. If you get one of the worst teachers, that same learning will take you two years. There's a four-fold difference in the speed of learning created by the most of the least-detected teachers. And it's not class size. It's not between class grouping. It's not within class grouping. It's the quality of the teacher. So we have, in classic economic terms, a labour force issue, the two solutions. We've got to stack all the teachers we've got to start again, like Ronald Reagan tried with the air traffic controllers. Nice idea. Unfortunately, there aren't any better teachers out there who are deterred by burdensome certification requirements. And actually new teachers are actually pretty bad. You don't really learn to teach at all well until you're six or seven years in the profession as the recent data from Australia shows that the amount of value added by teachers actually carries on increasing for about 20 years. Basically, almost all teachers are almost useless when you start, and you're halfway decent by the time you finish. There's nothing harder than teaching, and you're hardly ever successful. You show me a teacher who's satisfied with what they're doing, and I will show you a teacher with no expectations. Our constant experience is a failure, but by learning, by attending to them, you can actually get better. So we actually think that the only way to improve learning at any kind of scale is to improve the effectiveness of the teachers we've already got, what my colleague Marty Thompson called the love-the-one-your-with strategy. How do we do it, and what is the role of technology in that? Well, in the past I've talked about the quality control and quality assurance, and you know that basically the quality control is the kind of bolt-on thing where you actually inspect things coming up at the end of the production line. You actually look at the thing at the end of the booth, you let them go, and if they're bad, you sell them through the production process again. So you inspect the end quality and everybody thinks this is very bad. Quality assurance is good because you build quality into the process, so there's no need for inspection at the end of the process because you design quality into the manufacturing process, and quality is designed in, and therefore that's good. Except that. For some processes quality assurance is more efficient than quality control, e.g. automobile manufacture. That's why Toyota is the most efficient auto manufacturer in the world. They built quality into their production. But nobody has managed to do that in Silicon Chips, as far as I understand. So the most efficient way to make Silicon Chips is actually to make a lot of them and test them and throw away the ones that are useless. So the crucial trade-off in whether you build for quality control or quality assurance are to do with testability, complexity, and predictability. And the question is where does learning fit? Now, Brenda Denver for her PhD thesis back in 1986 has produced an extraordinary detailed map of young children's acquisition of number. So each of these little blobs is a skill. So, for example, what she showed was that there were almost no kids who could do subtraction without being able to count backwards by one. So counting backwards by one is a prerequisite skill for subtraction. And she mapped this very, very accurately. Now what they did was they looked at programs designed to help children learn. Now look at this map here. Here are the prerequisite skills, these arrows of dependencies. And obviously, kids knows this, kids knows this, they know this, they know this, but they don't know this. So this is obviously a target for teaching. The teacher then designed a program of teaching specifically to address this skill to this child one-to-one. What happened? The child learned those things up there. That's what got learned. We cannot, anybody who's been in a classroom more than a nanosecond actually knows this. We cannot predict what it is that children will learn as a result of our teaching. So we cannot have quality assurance in learning. We have to have quality control. We have to keep on checking on what it is that kids have actually learned because we cannot predict it. We cannot have perfect teaching. There was a craze in America a few years ago for perfect teaching where they would give these teachers scripts, you know, designed by experts about how to teach really well. And they were really scripted so things like now walk around the classroom. And the point is that they're useless because classrooms are effectively chaotic places. Actually, even well-behaved classrooms are chaotic places in that the difference between one course of action and another course of action is so small that it's effectively only described well by chaos theories. So you cannot prejudge the complexity of the situations which teachers will face. What gets learned? Well, here's another slide which shows them items from this third international maths and science study. They're both about which fraction is the largest, the first item 88% get it right, the second item 46% get it right. And it's not the fact that the numbers are bigger in that second question. It is the fact that a lot of the kids had a naive strategy, the biggest bottom is the smallest fraction. They've got the right answer in the first question but not in the second. Yeah, which fraction is the smallest? Look for the biggest bottom 6, choose A, correct. Which fraction is the largest? Look for the smallest bottom 4, choose B, incorrect. So what gets learned is actually very, very difficult to predict. This shows how slow learning is. We actually tested some kids over a five-year period asking them basically as a mental-requisite task that they actually make some notes. What is 860 plus 570? An 86.5% of the kids can do it. At age 11.5, about 90% of the kids can do it. And I think what most will be surprised by how flat that line is, is that every year only about 15% of the kids are getting this. Yeah, 15%. So across the 36 kids are getting it this year. One of the two months. The SCSM project found that typically in teaching one-third knew the content at the beginning. One-third didn't know at the end. It's only one-third learned the content, and half of these have forgotten the content six weeks later. Did you know this? Perhaps more surprisingly, some did better on a delay post-test than on an immediate post-test. Cymgynt didn't know it at the end of the teaching, but it did know it six weeks later. The important thing is that what gets learned as a result of a particular sequence of instructional activities is impossible to predict. But student errors are not random. Those are the two most important insights on 20th century psychology. Most of our pedagogy is designed around the idea that students are at an error around them. When kids don't get stuff, what do teachers do? They do it again that's slower and lazier. If the model based on association, the idea is, and actually quite respectable psychology, the idea is that learning is a process of forming links between the stimuli and the responses, and therefore learning is ascending these chains of stimuli or response, and if they haven't learned something, well the problem is that those links aren't strong enough, so you reinforce, so you rehearse. So repetition is actually the right thing to do if that's what happens when learning takes place, but it's not for most of the kinds of learning we're interested in. It is actually probably a good model for learning at times tables, but it's not a good model for science learning and maths learning. The conclusion from this is that teaching is interesting because learners are so different, but only possible because they're so similar, and that's why learning is a liminal or threshold process of the boundary between control and chaos. You cannot respect the individuality of every single child, but also you don't have to, but the difficulty of learning to cope with and reducing that complexity into something that's manageable, and it's also why all that research on learning styles is completely fruitless, and loads of stuff on learning styles, and students doing VAK inventories, it's all a waste of time, partly because it's impossible to actually cater to the individual needs, and secondly it's not even a good idea. Can I ask you all to fold your arms, and I'll do it the other way? Learning in your preferred learning style is like folding your arms the way you're like doing it. It's comfortable, it's natural, it feels easy. Learning outside your preferred learning style is like folding your arms the other way, and it feels really weird, but what's interesting is you actually then start to think about what is involved in folding your arms, and doing it the way that you don't find comfortable actually gives you more insight into what is involved in folding your arms than doing it the way you like. So what's really important is kids need a balance of being inside outside of their preferred learning styles, and you don't need to know which kids are in which stage and which time. You just need as a teacher to vary your teaching style. Now I want to spend a little time talking about learning power environment. Learning power is a concept that Guy Claxton has put forward. The key concept here, the big trap, is that teachers do not create learning. That's true. Teachers do not create learning and yet most teachers behave as if they do. Learners create learning. Teachers create the condition under which learning can take place. Are schools don't function like that, which is why somebody once joked that schools are places where kids go to watch teachers work. Certainly with the intensification of test results, I see teachers working there. If the teachers are going home more tired than the kids at the end of the day, the wrong people are doing the work. The crucial feature of well-regulated, well-engineered learning environments, and I think it's an important way to think about this, is about creation of an effective learning environment. It is an engineering process. The key features are that it creates student engagement and they're well-regulated. I'm going to say a bit more about each of those. Why engagement? Well, it turns out that intelligence is partly inherited, so what? Actually, you remember media, you think that that wasn't true, but every single psychologist who knows the data, knows that actually there is an inherited component and a small zero. But it's also partly environmental. It's like physical heights. Tall parents do have taller kids, but the height of the kids they eventually reach is basically a whole other range of factors, which is nutritional or ester. We've always known that environment creates intelligence. What we haven't understood until recently is that intelligence creates environments. It turns out that intelligence becomes a better predictor of people's jobs, the older they get. That's completely counterintuitive. You'd expect the importance of intelligence to be less and less important for people to get older, and it becomes more important because people choose for themselves the cognitive niches that match their preferred level of functioning, and kids do the same in classrooms. In several classrooms, there are kids who are actually trying to answer every single question the teacher is asking, and those kids are actually getting higher up. Their IQs are going up. Neil Russell of the Urban University has shown that when kids engage in meaningful dialogue at science levels, their IQs on Raven's progressive matrices, which are purely spatial IQ tests, go up. There are other kids in the same classroom who are trying to avoid being asked a question. Those kids are forgoing. The opportunity to get smarter. So, if any teacher is allowing kids to choose whether to participate in a classroom discussion or not, you're actually exacerbating the treatment there. That's why we need pedagogies of engagement where we create learning environments where there's a high cognitive demand, which are inclusive of all students and where participation is in the literature. A good example of that is the work of the Hungarian-American psychologist, Csík said me hi, who invented this concept of flow, and the interesting thing about his work is he completely turned around with a certain motivation. Most psychologists at that point repeated motivation as an input. Some kids have it, some kids don't. Kids who have it do well, kids who don't have it, do badly. But he said, actually, motivation is an outcome. When you give kids challenging stuff to do, there's just a level of challenge that we'll cope with, they will be motivated, and they will actually get into the sense of flow. Whereas, if the challenge is low, they can become bored. And he documented a lot of cases of mountain climbers, alley dancers, chess players, who talk about getting, and those of you who've been involved in computer programming, you know, it's that idea of, I'll be with you in five minutes, dear, and three hours later, you actually think there's only five minutes later. So pedagogies of engagement are important, but why pedagogies of contingency? Well, as I said earlier, it's because learning is unpredictable. We've done a good job of actually getting assessments that evaluate institutions to describe individuals, but we haven't done a good job of using assessments that actually support learning. And that's why formative assessment is so important. It's because we can't predict the learning, therefore we have to monitor the quality of the learning constantly while it's taking place. Now, that's just my opinion. But the research says it's actually the most effective improvement you can make to teach it. So beginning with Gary Natriella 1987, within the last 20 years, these studies are synthesis of, between them, around 4,000 research studies, and they find consistent substantial effects. I want to focus on Geoffrey Nyquist's work, which is not very well known, because it focuses on higher education, but he looked at different kinds of feedback, knowledge of results, knowledge of results plus knowledge of correct results, telling you what they got wrong and what the correct answers were, giving them an explanation of some kind, giving them specific actions to take for reducing the gap into where they are and where they need to be, and most sophisticated is what he calls strong formative assessment, where you actually give them an activity to do to close the gap. And what's interesting is that he found about 180 studies, 31 on weekly feedback, a text size 0.14 standard deviations, feedback 48 studies, average text size 0.36. And the important thing about this table is that more effectively the principles of effective formative assessments are instantiated the bigger the effects. I think what's interesting from the point of view of learning technology is that most of the learning technologies have actually got stuck in the feedback only move, and that's why the effectiveness is disappointing. So you get something like twice the effect when you actually think trying ways to do that to achieve the closing gap. Just how effective is this? Well, one of the things that I think we don't do very well in education is actually to do cost benefit analysis. We say this has a significant impact on student learning. So, how big an effect and how much did it cost? Class size reduction, we're using class size by 30%, actually gives you a 20% increase in the speed of learning, but it costs 20,000 pounds per classroom per year. If you increase teacher content knowledge by one standard deviation, you've got a 5% increase in the rate of learning. That's very small, the smallest thing you'd think, and nobody knows how much it costs because nobody's managed to do that yet. If you get teachers doing formative assessments in their classrooms, you've got a 75% increase in the speed of learning and it costs about 2,000 pounds per classroom. So that's why I'm advocating formative assessments. I don't. And for me, the driver is formative assessment. Now can technology help rather than technology is the answer that wasn't a question. And so the search for me has been around what kinds of roles can technology play in helping teachers do effective formative assessments in the future because that's the place where we're going to get the really big impacts on student achievements. And those are three generations of pedagogy. The first generation of pedagogy is additional pedagogy, which is the kind of chalk and talk. You have negligible contingency. I just say stuff to you and I hope you get some of it or I can actually polish my presentation and maybe get it better, but I have no feedback at all. The second generation is all student response systems. So as I go, I collect information on everybody and the contingency, the degree of contingency depends entirely on the teacher skill. What I'm going to argue is that the role of technology in improving learning is primarily in what I call third generation pedagogies, where we have automated aggregation technologies which actually take the responses of different students and do some smart things with those things and give the teacher advice about what are the sensible next steps. The really brilliant teachers are doing this already, but most teachers can't do it. And so the challenge, the third generation pedagogy, is to have the contingencies of the teacher. What you do when you know that the teaching didn't work quite the way you intended and that's supported by technology. There is a paradigm evolving in America called evidence-centred design. Basically you design assessments starting from what it is you want them to do. That doesn't sound very radical, but for assessment it is. And then Almond Steinberg and Ms Levy have invented what they call a four-process architecture for assessments. And so you actually have the selection of tasks, representation of tasks, especially you identify evidence arising from their performance and you find ways of accumulating them. I'm going to say a bit about each of those in turn. So this is about questioning. This question, look at the following sequence. Which is the best rule to describe the sequence? Well, the correct answer is all of them because depending on what N is, any of these can be right. I don't learn anything from your thinking by just knowing which one you choose. I have to have some sort of what, well, why did you choose that? It's not to get some reason. Compared to this item here, in which of these right angle triangles is A squared plus B squared equal to C squared? No, it's not. No. B and D. Now, if I'd given you letter to the cards with A, B, C, D, E, F on them, you had to hold up the correct answer, that would be an all student response system. There'd be no way for you to hide because I'd say you haven't given me a choice yet. And the small order system, as I really do that. But the point is this. B and D are the correct answers. So if you hold up B and D, then you're correct. And if you get anything else, you're wrong. But you might be wrong in an interesting way. So you might just hold up B or you might say all of them. What I'm saying is that by having corrected this question in a smart way, just from knowing what you chose, I get very, very good information. Now, if I'm teaching, I can do a quick check. And if everybody gives me B and D, I move on. Everybody gets it wrong. I do it again slower and louder, like my teachers do. But if half of you get it right and half of you get it wrong, I can then say, well, you thought it was B. You thought it was B. Why? Are you going with a good discussion? But the point is that the design of the question allows you to make those very strong inferences. And your chances of getting this right by guessing are incredibly small because there are six possibilities and the solution space is two to the power of six. So your chance of getting the right combination by guesswork is one in 64, not one in 20, as it is with the typical multiple choice answer. So the right tasks. Now, what's interesting is you can't use this item with the clickers that are proliferating over higher education because the clickers, or most of the systems, only allow one correct answer. And so therefore you've got this problem of keep getting it right by guessing. Whereas well-designed questions with multiple correct answers give you a very, very small solution set compared to the whole space and give you a really strong warrant. An example from science. Ice cubes are added to a glass of water. What happens to the level of water as the ice cubes melt? All the answers are correct. A could be true for the evaporation. B could be true if you are a good physics teacher. C is a good answer if the ice cubes won't float in but will pile them like a Scotch on the rocks. And D is actually the correct answer because I didn't tell you what temperature the water was. And it turns out, of course, that D, the physics teacher's answer, is actually not correct because what happens when ice melts is it cools down the water. So the water shrinks and letters between 0 and 4 Celsius in the case of this ground. So this is a great question to have a good discussion about. But there's no point asking this question unless you've got the time to hear people's answers and get an argument. In comparison to this one, the ball sitting on the table is not moving. It's not moving because no force is pushing or pulling on the ball. That's the common misconception. Gravity is pulling down but the table is in the way. How's anything wrong with that, can you? Gravity pulling down, table in the way? Table pushes up at the same force and gravity pulls down. That's obviously what science teachers are looking for. Gravity is holding on to the table. Hmm, that looks pretty good too. There's a force inside the ball keeping it from rolling off the table. That's not correct, obviously. But if anything's conception, it comes from children thinking about inertia as a force rather than a propitiative matter. But the interesting thing about this question is this is a great question for checking on students' understanding of physics if it's B and D are correct but not physics. And if all the class give the answer C, then you know they've got the point you were trying to get over about the opposition of forces in equilibrium. You know, B and D are actually correct but they're not physics. So again, this is a very, very high-powered question and Mark Wilson and Karen Drainey at the Berkeley University, California Berkeley have got some alternatives like these which are incredibly powerful and kids almost never get them right for the wrong reason. So if you actually get the right answer from kids, you know you can move on quickly. But they're incredibly hard to come up with these items. I quite like this question. What can we do to preserve the ozone layer and re-through, reduce the amount of carbon dioxide that we use to bring out there? You know, so it could be on the rain forest, you know, let me go. Properly disclose of the air conditions and fridges does not look like an item. The item might run out of ideas for the last option and actually thought, I'll put something really stupid in there to see if they're awake. Unfortunately, it's the correct answer. It's the only correct answer because the others are all about the greenhouse effect, not the depletion of the ozone layer, which was caused by the proliferation of chlorofluorocarbons, which were caused by the disposal of air conditions and fridges. So these good questions can take a while to come up. In English, they've got the spiritual English literature questions that look bad or bad. Great question, but you need to discuss this. Whereas the verb in this sentence are very good, all stupid response questions. This one is even more sophisticated. Which of these is the best thesis statements? Now, this is only relevant to a particular genre of writing and persuasive writing in the USA. C is actually the best thesis statements. They're all credible thesis statements, but there are some ones that are not as good as C. And so the important thing is that the teacher knows that kids choose C and rejects D, which is also a thesis, as in E, but it's not a thesis within the genre of persuasive writing. And the fact that the plausible distractions are so good is what makes it a powerful item. But these items are very hard to come up with. You call these image questions. They're questions that are based on an important concept that is critical for students to understand. And when we're using the teachers mediated through teacher intelligence, we say to them, they must be able to collect and interpret the responses from all students in 30 seconds. So you can't get kids to explain that. The teacher will say to me, oh, I get every chance to explain their answer, but they never do. Because by the time you've heard from the 23rd child, the rest of the class is losing the role to live. And so they never do. And so what we're saying is that we need to, first of all, we're going to have technology helping learning. The first thing, and what technology cannot help us with, and what the clickers can't help us with, is questions that are worth asking. And that's a skill and a craft that we're actually only beginning to get grips with. And you can use these kind of questions for very, very low order things. So for example, instead of getting a test on figurative language, you just give them cards with A, B, C to go to H on them, and you just read out the things on the right to say, he was a bull in a shop, which is that. And you hope that kids say it's a metaphor because the word like is actually missing. You may have a drop of water, it's actually not really, there's no activities. But the point is you can actually run through these very quickly, and of course some of these have two correct answers, like the sweetie smiling sunshine with the solidification and the iteration. So these good questions could actually help teachers make instructional decisions in real time, and these kinds of decisions, these kinds of adjustments to student learning at a whole class level, what the research has shown, makes the biggest difference in creating both pedagogies of engagement and pedagogies of contingency. Because when you actually require a response from every single kid, it is nowhere to hide in the classroom. Everybody has to be engaged, and the teacher is constantly adjusting their teaching. Maybe 5% of teachers can do this currently. Now, what I'm suggesting is that we need to move towards more sophisticated methods of evidence identification. Currently, the great teachers do this with dry erase boards. Everybody holds up an answer. Give me a fraction between 1-6 and 1-7, and we can write it down 1 over 6.5. Interesting answer. Shows me the difference. But we need to explore the use of technology in order to capture that information so that we can actually begin to do something smarter. So we've got the classroom clickers, we've got the traditional keyboards, wired and wireless, and a notepad. I don't think it's going to be a notepad, so there's a classroom with a set of, this is the low-tech version, because you have a set of ABCD cards on a string attached to the chair, and the teacher just says, okay, reach for your card, give me an answer. So it's always there. Your notepad is very smart because it knows where it is. So you've got to say, for example, if you care about such things, do you get the map of Britain and say, put across where Manchester is, and the kids are doing it at their desk on a piece of paper, put across there, and the teacher can actually see where all the crosses are. That's in the beginning of classroom aggregation technology, because it's when the teachers can begin to aggregate the information from different students. So that's the evidence identification. Parliament-wired as a keyboard is being used in lots of classrooms in America, and the classroom clickers, as I said, the next generation will presumably have a facility for multiple correct responses. Discourse, and there's a software package called Discourse. This is a very interesting example because you have a screen, a kid has a screen like this where there's a question, and the kid has to type a response on a screen, and then the teacher has this screen here where they can actually see the kid's responses as they're writing. So you can actually sort of listen in on child number 13 and say, haven't written anything in another little while. But the other thing is then you can actually project one child's response to the whole class, either anonymously or with attribution, and use that as a focal point for discussion. And with multiple choice questions you can also have the school automatically by matter of the key. So this is a good example currently of aggregation technology, but it doesn't allow evidence synthesis except for multiple choice questions. Now, some of the really exciting stuff is happening at the place I see work, which is ETS, where we have evidence identification software for non-multiple choice answers. There's a package called E-Rator, which does automated FAA scoring, and it now scores FAAs more accurately than humans. Scary thought, but actually it's not so scary once you realise how bad humans are at marking this stuff. But basically, and in many high-state exams now there's one human marker and one automated marker, and there's a difference that goes to a third human, or a second human, third marking. But what's interesting is it does this because actually what most people pay attention to are very broad surface-level features like grammar, usage of mechanics, spelling, style, and organisation-like, you know, is the last paragraph saying in conclusion, or finally, or to sum up. And it's incredible that just a package looks at those kinds of things, actually captures almost all what teachers look at. It doesn't look at meaning at all. At the other end, there's a product called C-Rator, which is actually a paraphrase analyser. And what it does, it takes a short-answer question to a question like, what are the important principles in photosynthesis? And it looks at what kids have written, and it tries paraphrases of those and see whether it matches the list of right answers they've got. And these are being used in high-state examinations in, for example, Indiana. But they're quite limited because you have to choose your questions carefully, and in the Indiana National Assessments, they also use a package called M-Rator, which is actually marking graphs and equations in an automated way. The problem is that all these technologies are really good to summit in assessments, they're not good to perform as an assessment, because they only help you get a new dimensional answer. So what we have here is a chart showing what I think of the current situation, which is the important thing. The multiple choice technology is when you're highly structured evidence. You know, it's no unit algorithm, an A, a B, a C, or D, or an A, and we actually manage to do a lot of work with highly structured evidence. So this dimension is whether the evidence is structured, and this is the degree of teacher mediation necessary for the aggregation. Now, the ABCD cards are highly structured, and therefore you can actually have teacher mediated. Teacher looks at all the ABCD cards, but because the evidence is highly structured, clickers do their variation pretty well, already. The big goal is to get something happening over this top right-hand corner, because what we need is automated analysis and synthesis of unscripted information. And the reason that's so important for formative purposes is because currently we're only very good on accumulating evidence for unidimensional student models. What we're saying is let's use all the evidence to give the students a rank order. He's good, he's not so good in between. What we do currently is though these unidimensional student models, they're useful for surrogate purposes, but they're almost useless, provided. Because all you know is this keeps you better. We're feeling a bad community to be funnier. It's not helpful. It's true, but it's not helpful. If we're going to get serious about formative assessment using technology, we have to develop multidimensional student models. And this is where the evidence-centered design becomes very useful because we use Bayesian inference networks. So what we would do is build a proficiency model, which is actually what proficient performance looks like. We build a task model, which is how does the task that we're setting relate to the inversion of proficiency? What are the evidence models? How do we go from the outputs of the student producers towards evidence? And then what we do is we use Bayesian inference to update a student model. So the current cutting edge in this area is trying to build student representations of knowledge, trying to build what it looks like to be expert in this area, trying then to develop tasks that elitid that evidence and how in real time you might fetch your evidence of student achievements to update those models. Because then you can start using the hardware and the software rather than the teacher's bandwidth can be used to power the information so you can actually at some point either say the whole class needs to do this or divide the pieces of the following subgroups. So I can see at some point that software which actually just prints out at the end of a lesson, a speaking plan for the next lesson, where the kids say these four kids need to work together, these five kids need to work together. And it's all based on not a multiple choice test, but on a constructed response with the children way, it was connected automatically, interpreted and then obviously so many visualization after that. But that's the hope, that's the vision that a possibility of technology I think holds out in really supporting learning. It is teacher mediated, teacher supported, classroom aggregation technology. So to summarise, I've argued that raising achievement is important. To do so we have to change what happened in classrooms and we have to work with rather than replace teachers. Specifically, research evidence shows that you get more improved with student learning when you change teacher pedagogy than when you change subject matter knowledge. I've argued that the importance of pedagogy is engagement and pedagogy is a contingency and I think the world of technology comes in helping us move from single student response systems to all student response systems, where the information, where we're collecting information from all students in real time, is updating a student model constantly, but it's not working automatically because I don't think the technology is there, a footnote. I don't think you can come across a product called a cognitive algebra tutor developed by Carnegie Mellon. It's the only piece of educational technology that you can show to make a real difference in a wide range of settings to student achievement. You get a effect that has a 0.4 to 0.7 standard deviations. The 20 years to develop and it's good for two out of the four or five hours per week that kids get on algebra in grade nine in America. The 20 years just to get good results for two hours a week for one year in the school and so while that kind of stuff may long term have a future, I think that if we're serious about using technology to support learning in the short to medium term, it is going to be a focus on partial navigation technologies. Very much done. We started a bit late longer so I'm hoping we're going to have 10 minutes for questions. So we've got some roving mics so if people have any questions they'd like to get to Dylan. Please raise your hand so we can see you. We've got somebody down here at the front. While we're waiting for the microphone to come down here and take one of the questions from our remote power consultants. Is that okay Dylan? We have Hannah from the university, make a university in Edinburgh. She asks, are there differences between motivation to learn and motivation to perform? In other words, should we be assessing performance or learning or both? There's quite a bit of literature on this. People like Carol Dweck have looked at performance orientation versus mastery orientation and there's no doubt that performance orientation is actually long term harmful. So what we need to do is to focus students on learning rather than getting good grades, which is why grades have been so deleterious to student learning. So there is a big difference and we need to focus on getting students to understand that the really important thing is learning, not getting a particular score or a grade. Thank you. Thanks for the awesome presentation from your side, Professor William. My name is Kanish Bedi and I am from the University of Birmingham. We are a purely online institution and we run courses which are accessible to students worldwide. In our assessment systems, we try to incorporate authenticity as well as possible because our programs are management programs in which there are more than one correct answers and these are equally correct answers. We are at a loss to understand exactly how to use technology because we cannot use objective type questions at the same time we understand that in order to retain the authentic approach in our case studies and the kind of assessment we do, we are not able to apply that kind of whole technology can pay. So what is your suggestion? In what terms technology can be used in this kind of scenario? First of all, the evidence on the problem with multiple choice is that it is very difficult but not impossible to assess higher order thinking with multiple choice questions. The problem with authentic assessment to get a case study approach is that with a typical student you tend to only assess a small number of cases. So although the reliability, if you mark it again with somebody else doing the marking, maybe very high, it turns out that the reliability of the student is actually quite poor because what you are really measuring is that they get lucky this time. Was that a case study that they had revised for as opposed to one they had actually forgotten three months ago? So the reason that multiple choice tests come into their own is when you need to make sure that you are actually checking lots of knowledge in different places and I would say that any system that is purely predicated on one kind of assessment is what was bound to be less valid in terms of the inferences within the support than an assessment that has some bits where knowledge actually is important. So for example, if I was doing this in America and I was doing an accountancy assessment, I would want to know that the people I was certifying knew what Sarbanes Oxley said as an act and I would do that with multiple choice questions but I wouldn't rely only on multiple choice questions. So it's the diversity of methods of assessments that allow you to get into different kinds of inferences and that's what makes this assessment more valid. Thank you. Any more questions to your audience? We have one down here and one up at the top. While we're waiting, we have a question from Steve at the TLP group. He asked you if you could explain a bit more clearly what classroom aggregation technologies are. It's a good bit of feedback but I didn't do very good, shall we explain it further? The idea of classroom aggregation technology and the brain currently, the human brain inside the head of the best teachers is the best piece of classroom aggregation technology we have. What it does is it synthesises all the information you're getting from different students and actually makes it into a whole course of action for that teacher. So it's about getting information from all the students and collecting aggregations and synthesising it into a small way to support action. So those are wondering about classroom aggregation technologies and so global warming, factor of fiction, thumbs up if you think it's fact, thumbs down if you think it's fiction. That would be a classroom aggregation technology because I'd like to summarise the whole group to think about this and say okay where do we think about that? Where do we stand? So is there any way that allows you to collect information systematically? How do most teachers decide whether they can move on? If they ask a question that you haven't planned in advance, they think on one kid who already had their hand up, that kid gives the right answer and I say good, well done, and on we go. And so classroom aggregation technologies are the antidote to that. It's actually made more systematic about collecting more information from more of the students before you make a discussion with it. Thank you. Thank you very much. Tom, thanks for the time you're consulting. I think one might say that maybe learning is despite not because of the teaching. I suppose the easiest way to summarise that is to say that the best prediction of a grade in a subject from a student is the grade they've got in another subject. If good teachers were making a big difference, you'd expect to see them getting A's in that despite getting D's in something else, but we don't tend to see that. So are teachers really making a big contribution or are students actually learning despite the teaching? Well, that's why the third generation of school effectiveness studies still doesn't value add it because intelligence, IQ, that thing that does exactly what you said, which is the kids are above average on English, are by and large, are by and large on chemistry, and physics, and maths as well. And so yes, there's that factor. We can't change that, but in terms of the difference between what those kids knew when they were 11 and what they knew at 16 or 18 or 21, it turns out that you're putting a lens onto a very small aspect of what makes the difference, but it turns out that teachers do make a huge difference. So yes, the best teacher will give you a C rather than a D. The worst teacher will get you an E rather than a D. So it's a small difference for that kid, but as I said, there's a fourfold difference in the speed of learning. It is basically one year's learning with about one grade in GCSE this time. Okay, thank you. I think we have a question up there at the top. Dylan, some of our note participants are having difficulty hearing you. They're thinking a bit louder and so on. I don't know us in the hand of the Christian family. We're then personalisation, as you said, and the emphasis, but we're finding things that are a hefty strategy, what was the US strategy, and there's this aspect that's like a wall of sub-government policy and personalisation of learning, and clearly why it was. Depending on what you need for the personalisation, and I know that there's no consensus about personalisation within the form of DAPS and its successes. So if by personalisation means individualisation, then it's a daft idea because it's not possible and it's not even smart. But if by personalisation you've been creating learning environments in which different students can come in in different ways and that the teaching is adaptive, then I'm totally in favour of it. I would argue that formative assessment is all about personalisation. It's about being more responsive to students, but it's avoiding a trap of individualisation, which is impossible to do because we'd actually allow one-to-one teaching because we don't actually paper it in our forests. And secondly, I'm actually not convinced that it's a good idea. We have lots of evidence that children and students are going to learn better from each other than they do from teachers. So for me personalisation is about opening up teaching. It's about making more responsive and making more engaged, but it's not about individualisation. Thank you. I think we've got time for two more questions. Do you want a line and one? Yes. Sorry, I'm going to hand it up. Yes, ladies, ladies on the way down. I'm while awaiting that question from George Roberts. How might these ideas transfer to non-classroom-based adult community and workplace-based learning with a high degree of learning and self-direction? I think it would be very difficult because what I'm talking about is the kind of each aggregation that becomes much less useful when you've got different people coming to different things. So therefore, I would say that sharing case studies that you can already do very well is probably optimal in those kinds of cases. So I would say that they are lucky that they're working in a domain where the problems that I'm going to solve aren't really problems. But it's also very inefficient because everybody's unique path is more, it may or may not be interesting to other people. So there's a very interesting efficiency in that. So I would say it's not relevant to the kind of things I'm saying here today. Okay. Tatiana, the problem is health. Thanks again for a really interesting presentation and bringing back focus from technology to learning. And I'll continue with my questions in this manner. You've concluded that aggregation technology is an answer for cognitive assessment in the classroom. But I'll ask you about readiness of the theoretical theory in terms of interpretation of the answer, especially for these complex unstructured stuff that requires lots of play with it and lots of procedure in order to be technical. These are definitely a problem. So that's, again, it's reproducing what experts already do. And experts only can make sense of this stuff. One of the things we've got to do in one project, which is currently undergoing a randomized controlled trial in the United States, is to give teachers really high-powered questions, which focus on misconceptions that students might have, and tell them what are the misconceptions that students are likely to have, and tell them what they mean. So the way of bringing just-in-time teacher-subject knowledge. But I think we need to do a lot more work of actually mapping the kinds of responses that students might make and the kinds of responses you might make to those responses. What is really interesting is that in Japan, they have a word for this, which is because I thank you. They have a word for the teacher's knowledge of the kinds of difficulties that students have with this material, and what to do about it. And I think it's very interesting that they actually have a word for it, where we just say it's a problem. Okay, thank you very much, Billan. That's been a very inspirational session, and I'm sure it's going to be a lot of good support, particularly around the area of personalization. Okay, and thank you to Maggie for the moderation so well. So, I ask you all to thank Billan.