 Good morning, and welcome to the fourth meeting of the Education and Skills Committee in 2019 around everyone to turn their mobile phones to silent so they don't disrupt the meeting. We have received apologies today from Tavish Scott and from Ross Greer. Our first agenda item is our inquiry into Scottish National Standardised Assessment, and we have two panels of witnesses today. First, I welcome Professor Andy Hargreaves, research professor at Boston College and visiting professor of the University of Ottawa. Can I open Professor Hargreaves? We are just asking you to briefly outline your international experience as it relates to the inquiry. Thank you. Is it Madam Covener? Is that how I address you? However you like, convener is fine. First of all, thank you for inviting me to present evidence to this very important committee at this crucial time in Scottish education in thinking about how best to forge a way forward on an assessment strategy that will benefit all students in Scottish education. I began adult life as a teacher, then a researcher. I worked in universities in England, and then in 1987 moved to Canada where I set up a thing called the International Centre for Educational Change in Toronto. In the last 15 years I've worked at Boston College, which is not in Boston and not a college. It's 100 metres outside Boston and is a university. It's famous actually for the international mass and science studies which are administered from there, though that's not something I'm directly connected with. I've just moved back to Canada, where I'm also a citizen as well as a UK citizen with my family, and connected with the University of Ottawa. My international experience is I've done research in a number of countries around educational reform, change, systemically and in terms of its impact on teachers and the teaching profession. This is across a range of countries, but not too many, including Singapore, the United States, the UK, Canada, and that's probably about it. I've also done advisory work with governments, sometimes on an occasional basis and sometimes on a more sustained basis. For several years I've been one of six advisers for Premier Kathleen Winn, who was Premier of Ontario, a province of 13 million people, until May when she was deposed by an election. I've been one of 10 international advisers for a Scottish Government, proud to be over the last few years. I've also been engaged with OECD reviews of different countries, the one in Scotland, you probably know well. There were a team of four of us, just prior to that one of Wales, which is dealing with similar kinds of issues to Scotland. Sometime before that one of Finland and its leadership strategies. I'm not really known as a measurement specialist, so if you ask me any technical items, anything about technical items or design or validity and reliability tests, my answers will be extremely disappointing. But what I do see is, as I do with work on change in schools, school systems and societies, assessment comes across the radar a lot in terms of its connection to everything else. So I'm really very concerned and I think what I can best help you with is how assessment is in benign and less benign ways interconnected with other parts of the improvement agenda. Thank you very much, Professor Hargreaves. I'm going to ask Liz Smith. Good morning, Professor Hargreaves, and thank you very much for providing us with all that international experience in which we're extremely interested. You've obviously outlined your experience in the different countries and mentioned the OECD where we were given a set of six criteria which we need to adhere to if we're going to make attainment effective. Just in light of your final comment there, would you be able to start this morning's evidence session in providing us with some examples from your international experience where you feel that schools have managed to improve their outcomes for young people? And if you can relate that to the standardised assessments that they've used, I know you can't go into the technical details, as you said, but just where you feel Scotland could learn some lessons from that international experience would be very helpful to the committee. First, the important thing is to learn and not to copy. With a teacher or a country, I always advise never look at one model and copy it, but always look at a known model will have everything that you want, but if you have a number of models that you look at, then really you aren't empowered to learn what it is from these that are most relevant to you and to your country. One that many people go to is Finland. I'm a great fan of Finland. If I had to live anywhere else, it's one of the happiest countries on earth. It's a nation that values learning immensely. It has very low achievement gaps. There is almost an accidental relationship between family background and achievement in Finland, statistically, so it's of very high interest, I think, for everybody internationally, because it performs well on overall performance and also on equity. The assessment system in Finland is one that is based on samples, until secondary school leaving is one that is based in any system-wide sense, rather than a census. Many people, including from time to time myself, are extremely interested in the idea of a sample as a way of preventing people from teaching to the test. Or from gaming the system. The difficulty with Finland, so I think the benefit of Finland that we can learn from, is that most assessment is directed to improving learning and is done and chosen and developed within schools with some collaboration within and across municipalities, equivalent of our local authorities. The difficulty with transposing the Finnish model to other places, which we considered in the last few months in Ontario when the six advisers conducted an assessment review for the province of Ontario. We seriously considered the arguments of a sample versus a census. Finland is not very diverse as a country at this moment, although it may become so increasingly over time. If you are diverse as a country and you have wider inequities, which we do in Scotland and in fact Scotland is not unusual in that sense, then you do need to be able to identify which populations are in greatest need. For instance, in Ontario, in Canada, the most persuasive argument that I heard as a fellow adviser about the need for a census rather than a sample was from one of my Caribbean Canadian colleagues who felt that there was a neglect, and there is, in Ontario, of historically black Canadians. These are not recent immigrants and refugees who get a lot of attention, but these are black Canadians who go back sometimes to slavery on the underground railroad and are one of the most vulnerable groups in terms of disadvantage. He felt that having data which would enable you to identify exactly when and where those groups were being overlooked was essential to equity. I'm beginning with an example of something that's really promising and offers a sample, but it's persuaded me in some cases that where there is great inequity and increasing diversity, some kind of census can be more beneficial. If you look at other countries that use large scale standardised assessments, which Finland doesn't, first of all you have to disconnect the words large scale from standardised. So something I've seen in your previous documents that I've been looking through is of course many teachers everywhere use standardised assessments. They're just not large scale, so they may use a different one in this school than they do in that school than they do in another school. They're a very good standardised assessments, reliability and validity tested in literacy, in mathematics and so on. The issue is large scale standardised assessments. Can they bring about improvement that is authentic? I can give you many examples where they bring about improvement that is not authentic. So the improvements that have been documented numerically in the United States and in England have been soundly denounced by the statistical societies of both countries as being statistically impossible without actually in some ways faking or fabricating the results or the practices that lead to the results. Even in Ontario, which is mid-stakes rather than high-stakes, and so perhaps one of the best examples that we might consider. So when I say it's not so high-stakes, the assessments give you the power to intervene, to punish, to remove head teachers from the school, to close the school and open it as another kind of school. Ontario doesn't use those sanctions and provides a lot of support, but there are mid-stakes in that those are the ones that we probably have to pay attention to here, where knowledge of the results and the patterns can lead some school district directors with sometimes pressure from the central government to exert undue pressure. On their schools to raise their results over relatively short periods of time and this creates all the negative impacts we know of large-scale assessments. So even in Ontario, mid-stakes rather than high-stakes produces some negative consequences influenced by Scotland. We spent some time on the review trying to figure out ways to maintain a large-scale assessment without those negative impacts. That's extremely helpful. One of the dilemmas, I think, is fair to say that has been flagged up to us in the previous two committee sessions that we've had on this issue of attainment, is the fact that the tests that might be used to foster better learning for the individual child might be slightly different from those if you're trying to spot where there are problems within the education system. Just from what you've said about your international experience, it seems that you're making a similar point there. We've got to grapple with the fact that not only do we want to raise attainment for the youngsters involved, but we also have to be able to use the testing in schools to be able to identify schools that are needing greater support or local authorities that are needing greater support. Can you make a comment on that dilemma, because I think that it's a very real one in Scottish education? Sure. I think that it's the biggest dilemma. Some people think that the dilemma is learning versus accountability. Where that is closely connected to things like parental choice of school, publication of the results and so on, that is a big dilemma. For professionals, the dilemma is between supporting the teacher with information that would help them to help their students more effectively on the one hand and the need for people who can't know all their students but are responsible for them, like a headteacher of a large school or a new headteacher who wants some knowledge of where the school is. So she or he can help lead the school ahead. Or a director of a local authority needs and wants some kind of system-wide data so they can see where everybody is and be able to intervene and support as needed if people are falling behind. So the dilemma is actually not with accountability, the biggest one, but is with the need for the system to know where it is and not be threshing around in the dark, especially if it's a larger system. In Ontario, what we recommended, though it hasn't been implemented because it was a change of government, though it was accepted by the previous government and by the other main parties, so it was accepted by two of the three parties, was to create a kind of firewall between the standardised assessments and the individual diagnostic assessments within the school. Just like here, we do not have total confidence that that will be, we're on the front edge and we're in somewhat uncertain territory. I think where the world is moving and you're at the head of this is, I'd say, five years ago, systems around the world were in denial that large-scale standardised assessments had negative consequences for students or students. We're all learning and wellbeing and also for the teaching profession responsible for them. I think that denial is disappearing very quickly everywhere. We're all starting to own the problem and say, how can we have large-scale information and also good support, diagnostically, formatively, for teachers? The Ontario answer was to try and create a firewall and to say, the large-scale assessment agents should do this and they collect the results and about ten months later everybody gets to see them and they know where it is and they're useless to the teacher in terms of giving feedback to their children. At the same time, it will provide lots of support with other kinds of instruments, processes to help teachers with assessment for learning. I think the solution being tried here is different, which is to say, how do we use large-scale assessments to inform teachers' professional judgement? Local authorities will have knowledge of their schools, but local authorities will not be able to compare each with each other on the basis of the test results. It will be on the basis of the teacher's professional judgement, part of which is informed by the test results. You are on the front-edge here for the world. It's good that you're watching the world, but the world is really watching you. Figuring out to make this a success over the next three years, given the possibility that it may not be and to be a learning government as much as an improving government, is the key challenge. On the question of purpose, I'm not sure if you think that it's necessary for the one test to do the two things that you've identified. Would another solution be to have a standardised test that informs the nature of the system and diagnostic testing that supports the child who wouldn't have to be standardised? Is it the standardised bit that matters? We've had some discussion in the committee on purpose, and you're probably aware that the OECD review in 2011 really took the view that there should be one clear purpose, and that it's complicated if there's more than one purpose. We now have a situation where the Scottish Government says both that it is this national survey and that it's a diagnostic test. Do you think that that confuses the issue? I think that there's a general principle that many people accept, but not all, that data collected for one purpose should not be used for another. But the statement there is data collected for one purpose should not be used for another. It does not say data should not be collected for two purposes. I think that the point is that there's a lack of clarity about what the purpose is. The OECD suggests that there should be one purpose. We can argue why there has been a change, but it shifted from being just about having an understanding of what's happening across the system to almost a justification, but it's also a benefit to the child. Do you think that that has an impact on the way in which the test itself might be structured? Everything that you say is fair. The OECD was really saying that the prime purpose of assessment, if you like, was for assessment to support learning. There are really four message systems in schooling, pedagogy, curriculum, assessment, and a fourth is the best. There is a broad area of care and support for the child and their development. Assessment is one of those. Having a deliberate strategy that can develop teachers' expertise in assessment to support their students' learning should always be the prime directive. There is also a need, at the same time, to align those assessments with the curriculum for excellence and, of course, with the national improvement framework. These are like a Venn diagram. They're both important, but they're sometimes somewhat intentioned with each other. We have to be very careful, which is the moon and which is the sun, that one part of the Venn diagram does not start to eclipse the other. The curriculum for excellence recedes into the background as the national improvement framework takes over. As advisers to Scottish Government, we're always urging them to remain vigilant to keep the focus on both of them. OECD suggested, proposed, recommended that there should be alignment with curriculum for excellence and with progress in curriculum for excellence. We do also need knowledge of whether progress is being made. At the same time, OECD proposed and, as advisers, we've continued to recommend at every point, which has been accepted if you follow the public domain media on our recommendations, that this moves through a teacher judgment. It is not large-scale assessments that have a direct impact on all kinds of other decisions, but they are mediated through teachers' professional judgments. The theory of change, in a way, going on here, is that if there's any aggregation at any point, which there is, it's an aggregation of trying to create consistency in teachers' professional judgments. The judgment is really important, but there's a clear understanding, as we all know, that all individual judgments are flawed. We're all subject to unconscious bias. We're all subject to prefer the people who remind us of ourselves. Getting consistency of judgment means that wherever I am as a student in any class, I will get reasonably equal and professional response from the teachers who deal with me. The theory of change is this crucial thing, which is different from Ontario, of a kind of buffer of teachers' professional judgment between the large-scale assessments that the kids take on the screen. What it is that teachers do in the classroom with their children, that's the theory of change, and that is a challenge to make it work. It's different from one assessment being developed for one purpose being used for another. It is more complicated than that. That's precisely what's happening currently, is that one has become the other in order perhaps to persuade people that it's a good idea. I wonder if there's an issue about, and perhaps there's some international evidence, that one of the things that's emerged is consistency in actually doing the test. We're told by the advisers to the Scottish Government that they can do the test at any point during the year. For example, in primary one, that means that they can do it at any stage between 4.5 and 6. Is that valid, or would you take the view, as some of our panel members did last week, that in order for the findings to be informative and valid at a national level, there has to be some consistency both at the stage in the year, when they're taken, and in the circumstance in which they're taken, we're hearing some anecdotal evidence about some teachers prepping the kids for it, others not, all these kind of things that perhaps could be, maybe they're factors that don't matter, but I wonder if you have a view on the validity of something when it's not consistently applied. So, again, that this is, as this or any other assessment system unfolds, that they will all contain risks. So knowing what those risks are, which you've just covered, one of the most serious ones I think, is really important. Any and every system of collecting data about a child and of aggregating that data is imperfect. So I remember at the age of 7, the first test I ever took, probably you may remember the first test you ever took, I was called up to the head teacher's desk to do a reading test. And I remember the last word I could pronounce, I was just, I was P1 actually, I could remember the last word I could pronounce and the first and only word I couldn't when the test stopped. The last word I could pronounce was pneumonia, and I had to give the meaning of it, which frankly wasn't bad for 7. And the first word I couldn't pronounce was, I still even can't pronounce it now, was Sithis, P-S-I-T-H-I-S now, why on earth they had a test, but listing two words about pulmonary wasting diseases in successive order for a 7-year-old is beyond me. All I felt was that this test was very important, but I didn't know until, for sure, until the governors of my school 10 years ago, my former school, sent me class lists of when I was in the school, was that the test was being used to decide who went into the A-stream and who went into the B-stream. And I had the same class lists of children at 11, which were identical, almost, because we know the evidence from the time is only about 2% of children transferred. And it then listed which secondary schools they went to, and 70% of the A-stream went to grammar schools, and 0% of the B-stream went to vocational secondary modern schools. And that was all decided at 7, and we know that these tests were flawed, and that the 11 plus was flawed, but then we also found that when the 11 plus was abolished or replaced with teachers judgment, that actually the results of the selection were more social class biased according to teachers and head teachers judgment than they were by an objective test. So the first thing I want to just reaffirm is if you're looking for a nirvana of the perfectly consistent way of making judgments or the perfectly consistent way of tests, of doing tests, you'll be disappointed. They'll all be imperfect to different degrees and different ways. The thing to avoid on the one hand is treating teachers judgment as individual autonomous judgment. What we need in the teaching profession which we've argued about here is collective autonomy, not individual autonomy. That means we may have more autonomy from the bureaucracy, but we have less autonomy from each other. And by looking at the ways we make judgments together, by moderating them, we will create some consistency over time. And these data can help teachers do that, but the data will be always imperfect depending on if you're sick on the day, if you're tired, if you do it at the end of the week or the end of the day rather than the beginning of the day, et cetera, et cetera. The risk that you've outlined with the test, the biggest risk for me of this, is not just that it may happen accidentally, but that it may happen systemically. And what that risk is, is if there is undue pressure from Scottish Government or undue pressure from local authorities to drive results up in a short period of time to demonstrate success within a period of taking on leadership or before an election, then that pressure will and does lead teachers to do strange things that are utterly predictable. So for instance, if I were cynically advising a school now, I'd say if you want to show improvement in your results over three years. First, introduce the test without any preparation or professional development, so in the first year you'll do badly, and you'll have an artificial law for your baseline. And once you've got a bit of professional development, everybody will do the test better, so you'll have the appearance of an improvement over time. Secondly, in the first year or second year you do this, test all the children early in the year, so you're testing them when you're younger. And a couple of years later, test them all at the end of the year when they've had a bit more practice and preparation and learned a bit more and then you'll get better results over time. And all across the world, where tests are truly high stakes and punitive consequences can follow, these kinds of practices go on. You cannot really alter that technically much, although it is a good thing to allow differences in time when you take the test because of things like student anxiety and readiness and the sense of a dramatic event, obviating that and so on. But the way that you deal with these imperfections is really by creating a culture of assessment and a culture of improvement, where everybody is genuinely focused on improvement, including accepting those moments when you were unsuccessful and you need to identify a different way of moving forward. My last question is from somebody who sat in a class of 45, and we were literally tested every week and we were sat in the desk from one to 45th, so you knew if you were the 45th person in the class, not only because the market got up but because we physically sat in the classroom. I know the challenges of some, what are apparently objective of tests, but also an awareness what teacher brings into a classroom in terms of assumption. I suppose my question to you is, is there a danger in objective testing that what we're doing is reinforcing that? For example, if a test is trying to assess capacity and language in literacy and numeracy, is there a danger that you're reinforcing what children bring into the classroom in simple terms of words that they know? It's not that they can't read or they're not numerate but that they have less of a richness in the language that they hear at home or in their community. We're then saying that says something about your literacy. I think that there are some of the questions that I've seen in the test. That is a question of whether somebody sat and told you what that word means, as opposed to your capacity to be able to decode and say what that word is. That in itself matters precisely because what you've said in the past, people are conscious of their bias, they're trying to deal with it. If you have theoretically an objective test which is actually doing the same thing, do you not recognise or would you accept that that can do a lot of harm? I would say absolutely, there's a lot of evidence to support what you're saying that in a high stakes or even a mid stakes scenario, when there's a test say in primary three and that's the first test people get, kids start rehearsing the words in kindergarten. The words they're rehearsing from the first moment they enter school are all geared to preparation for the test, not so much because of the existence of the test but because of the stakes that are attached to it in terms of a school's improvement record and the pressure that is placed on it and the interventions that can be made. There is no way to resolve that within a census test other than lowering the stakes from high to mid and in fact to low stakes so that you don't have a culture of fear or anxiety or feeling you have to always demonstrate improvements or there will be unwanted consequences. To build a culture within the teaching profession amongst the head teachers and also in the RICS, the regional improvement collaboratives, where all leaders clearly understand that the purpose here is to learn and find ways to keep moving forward and never to create cultures of fear or anxiety that will lead people to contrive the results. A different point I'm making, I wonder if you can reflect on this. That is not what is taught, not what is practised but what a child brings into the classroom so a child can be very competent, very able, knows how to read for their age and stage but there are certain words they will not know because they have not come across that vocabulary. What some of the testing does, I wonder if there is a danger in that, you talked earlier about diversity in Canada and so on that what you are reflecting as competent readers is somebody who has had access to that particular experience which has given them that vocabulary so that they understand and can respond to that question. How do you take out that bias out of a test and have you looked at the testing regime in Scotland to see whether you think that some of what is in there is bias as opposed to an expectation in terms of skills? I actually took the P1 test yesterday and apparently I did quite well although I didn't find all the questions easy so at least I have some direct experience at least of an adult of what this looks like. All tests, particularly with words, involved are prone to cultural bias so in Ontario we found questions that involve things like appetizers on a menu which for children in poverty is just something totally outside their experience. If you have the test, the way you deal with this is in two ways. One is to continue to review and actually three, modify so never fear you have got the test and it is not subject to review and improvement. Secondly is in terms of accommodations and you may want to offer accommodations not just for children with legally identified needs that bring mandatory statutory supports with them but for all children who struggle with their learning in some aspect or struggle with some aspect of their learning. As you know, in Finland, by the time you finish school, 50 per cent of you will have been identified as having a special need. It is not a medical condition, it is just a way that you struggle with your learning. Those are two of the things. The third one is the genuine importance of having an array of assessment measures and data and information of which this is simply one. The primacy all the time must be teachers judgment and if it doesn't get there or starts to deviate from there, then you are facing the problem. It is a serious possibility that this great experiment will have failed but all I would say to you is I would ask you to own the problem that two things are needed. One is knowledge to support the child wherever they are and two is knowledge of how to support the system so that you know the system just as you are responsible for Scotland's people. The head of a local authority is responsible for all the children in that authority. I would ask you to own the problem that you do need these two things, that they are a dilemma and to seek the best way forward to resolve that and not favour one or over the other and deny the nature of the dilemma. I thought that last comment got to the heart of what the committee is struggling with and I am not entirely clear what your judgment of this is. The question that I think we are asking ourselves is, the SNSA is these particular tests, for example the one that you said you did yesterday. Can they provide the teacher with the capacity to improve the learning strategies that they pursue with that individual child in order to improve their learning and at the same time provide system-wide information about what the system is doing? Can that test provide both of those things with validity? So just to remember that the test should be considered to be one thing out of all the data that a group of teachers, so I don't like to think of an individual teacher because all professions are collective not individual. If you can't share your expertise, you shouldn't be in the profession, in any profession. This is part of the data that should not prevail over all the other data that informs your judgment. You may use other kinds of reading assessments if you are searching for other sorts of reading skills that are not covered by the test. The test, for example, as I see it, is largely about comprehension and reflects a worldwide movement to understand what it is you are seeing in a narrative. It does not test, as far as I can see, creation of ideas or generation of your own sentence constructions and so on. To test that you would need other kinds of tests or knowledge including your knowledge of the child. So it will give you some information about some things that are important for you and for the parents and important for Scottish education, but by no means all of it. That is a very powerful argument, but my question really was, can it provide that data at the individual diagnostic level to be used in the classroom and at the same time provide system-wide information at school, local authority and in particular national level? If the answer is that it is only one part of the data that we have, would you agree with some of our previous witnesses that it would have made sense, for example, to have kept the SSLN survey data, perhaps alongside this, to enrich the data available at a system level? At the individual teacher level, have you taken the test? How did you do? So you will have seen the individual report cards that go back. It would, frankly, need for me, like a reading specialist or an early childhood specialist, to be able to say what worth or value would that have to a classroom teacher. Some of EIS's feedback, which you will probably have seen in your testimony, is saying that in the first year at least teachers do get value from this kind of feedback and it does help them identify some of the ways that they can support their children. Of course, not all teachers feed their views in through EIS. They also come in other ways, but it is one way of thinking that some people think that it is contributing to the kind of feedback that is useful for their own students. Is that also the other half of the question that you are asking? On those skills that are identified by the test, it will give you information fed into teachers' judgments about how a system overall is moving or not moving over time and how subgroups within that system are moving or not moving over time. In terms of the test itself, I am repeating everything you know. There is national knowledge of the test, but not in ways where the nation can intervene in a particular school or a particular teacher because of their performance on the test. It comes back over and over again to how do you deal with teacher judgment. I may be anticipating a question to come. Professional development, in that sense, should not be seen only or mainly as training courses in how to do the test. That is part of what professional development is, but the research on professional development in the UK and the US shows that the best professional development is ongoing, it is embedded and it is seen as directly related to the learning and it is collaborative. If the leaders of your schools and your local authorities are continuously bringing together their teachers to see what is happening to the judgments based on all the data that they are receiving, that is what will create the consistency that is between the individual feedback and the national level trends. You have made very clear the importance that you attach to teacher judgment. In fact, you said just a moment ago that primacy must be given to teachers judgment. You have obviously, Professor Hargraves, reviewed a lot of the evidence that the committee has received on this. It is fair to say that, if we look at the evidence that we have received from teachers as individuals and collectively through the IIS, there is a very significant judgment there that says that those tests do not provide useful information in the classroom for learning and teaching strategies. Should that not be an alarm bell for the committee? I think that it should be a warning and it should be a way for Scottish Government to consider and to work with ACAR as to what is contained in the tests. If the breakdown of the skills are not seen as valuable or useful, then it is necessary for teachers collectively to be able to say what skills and competencies should be represented. It is not a reason to do away with the tests, but it is a reason to say what kind of test will be most valid for the skills that are important for CFE. When the designers of the test gave evidence to the committee, I asked them how teachers had been involved from the start in the early design of the test and they could not indicate any involvement from teachers at all. Do you think that that is a mistake? Sorry, could you just repeat the last part? The designers of the test gave us evidence and I asked about input from teachers in the initial design of the tests and there was none. Do you think that is a mistake? I think that with most tests they do involve teacher participation in the design of the tests. The danger then is because you think that there has been participation, that is it forever. Now the test is validity and reliability tested and you can move it anywhere at any time to any country in any place and it will last in perpetuity. Teachers need to feel continuously involved with all the assessments that inform their judgments. It is important to have it as a one-time thing at the beginning, but it is also important to have that continuous loop of feedback. I'm very conscious of time and we have a number of members still to come in. I'm going to bring in Rona Mackay very quickly. On the same line of questioning, I wondered if you thought that those tests are compatible with play-based learning. You'll be aware that there's a body of opinion that thinks they're not and that thinks our children are being tested too much. I wonder if I could have your thoughts on that. First of all, you need a clear philosophy and stance on what you want your early childhood education to look like up to and including P1. There are raging debates at the moment about people sitting behind me who will have more knowledge and even stronger views on this than I do. A play is an extremely important part of childhood. We know that the evidence is emerging now very clearly. Young children are spending too much time on screens and are not enough time engaged in other things. They're spending too much time indoors, not enough time outdoors. At the same time, privileged parents will read to their children from a very young age and children will have a mastery of a large vocabulary in a range of words from a very young age and other children want. That is a fairly strong predictor of all kinds of indicators of later success, including rates of imprisonment and employment, what you think of here as positive destinations, for example. In an equal society like Finland, where there is more subscription to public libraries than any other nation in the world, you can afford to have a philosophy of early childhood that is predominantly about free play. In a society that is unequal, where there are huge disparities in access to language, for example, at home, then it's important to consider, on the grounds of equity, some areas of play that are more structured and will still be forms of play, but I've seen this in Ontario, for example, and we'll provide ways of engaging with numbers or number sense that are still very playful, very enjoyable, but structured to try and progress children with less behind them when they come to school, so they have as much chance as all other children. I totally understand what you're saying, but I'm talking about in relation to these tests. Are the tests compatible with where we are promoting play-based learning, so can the two co-exist happily and do you think that tests at a very early stage are necessary and are they providing value and going back to Ian Gray's point, what actual value can we get from tests at such an early age? The test is a test of literacy. It's not a test of literacy. It's a test of comprehension, primarily, of reading. If developing reading to a certain degree is important within your curriculum, then the test will have some value. Is the experience of a test itself incompatible with a play-based environment? It doesn't have a lot of bells and whistles, does the test. It probably could, but at the moment, apparently, the reason for that is because of broadband width in some of your schools. If you have more broadband width, you can have fancier tests that are even more playful and enjoyable. Even my own grandchildren, possibly people's own children here, will sometimes learn maths and other things by doing games on computers as well as physically playing with objects. What I would say is, although I'm broadly not in favour of a lot of technology in early childhood, a bit of familiarisation with technology where possible in the classroom, so when they take the test, it's not the first time children face this, would make it seem less like an extraneous event and more like a continuous part of classroom learning. Just to clarify, do you see these current tests as high stakes, medium or low stakes? It is meant to be low stakes. It is at risk of becoming medium stakes. It is not at all high stakes. The 2011 OECD review advice that policy makers can reduce distortion and strategic behaviour by increasing teacher involvement and buy-in from an early stage. The SSLN arguably didn't do that. It was a tool for government and it didn't empower teachers. Historically, ownership of data in schools seems to have sat in Scotland with head teachers and deputy heads as well. Do you have any examples internationally of training teachers to engage with assessment data in a meaningful way? I appreciate that you've alluded already this morning to building a culture of improvement perhaps through regional collaboratives. Are there perhaps any other examples that we might be able to learn from to empower teachers? You've asked two questions, but they're related. The first one is on training for assessment for learning. Of all the message systems that I've described, which are curriculum, pedagogy, assessments and care for young people, it is typically given the least priority amongst the four. In Ontario, we're facing exactly the same question that you've been facing here. One of our recommendations was for more attention to be given to continuous learning of assessment and assessment for learning within the classroom context. I'd say within Ontario there's been some success because over a period of time with a stability of government, and you can get stability of government in three ways. One is not have a democracy, so Singapore doesn't have a democracy as we would understand it, and so has complete stability of government. You can get it by one party being in control for a long time, which happened for 12 years in Ontario, or you can get it by cross-party agreement and consensus that in a way education is above political infighting, which is pretty much what you have in Finland. In that sense, I would urge you to be a little more, not to be like Singapore, but perhaps to be a little more like Finland. Now I've forgotten the first part of your question. The usefulness, I suppose, of the SSLN comparatively for teachers. Yes, so learning formative assessment. What Ontario has, I think, is over 12 years it has quite successfully built a very strong culture of collaborative inquiry where teachers together will routinely enquire into problems of practice together within their school, and they'll consider all kinds of problems. As of data, including test data, as part of that inquiry. If I could make a very clear example. So we've been working with one-seventh of all the school districts on and off for ten years. Ten years ago, when the stakes were higher in assessment and the focus was almost solely on literacy and numeracy, and there were consequences for your results not progressing. Schools would identify what they called marker students. So marker students were students whose scores were just below the acceptable point of proficiency. Here we'd probably say like the level of progression you're supposed to be on in CFE. And to get the school up to a good score, school heads would have charts on their walls. We took photos of them that proficiency was number three, and here was your percentage of students on number three, and here was your percentage of students on 2.9, 2.8 and 2.7, and teachers would put all their disproportionate attention into the 2.7s, 8s and 9s. And when they said what about the ones and the twos, they were directly advised, forget about the ones and the twos, concentrate on the 2.7s, 8s and 9s. This was ten years ago. Now Ontario has broader goals that are much more like CFE, still with literacy and numeracy, but also with wellbeing and equity now defined as inclusion so that you have to be able to see yourself in the curriculum. Teachers are now addressing the broad range of their children's learning, including literacy and numeracy. And now they focus on what they call mystery students or students of wonder. A student of wonder is a wonderful student who teaches together in the school because they work collaboratively, wonder why they are struggling with a particular aspect of their learning. So the school will bring together the teacher who teaches them now, the teachers who used to teach them, the special education support teacher, the language specialist, a school counsellor, a speech therapist, and they will bring together 12 or 13 teachers to look at this student of wonder and how to advance their learning with all the data that they can bring in, which will include things like photographs of their work, taken on an iPhone, and then collected so that everybody can see. So there's numerical data, there's a test score data, there's diagnostic tests, and there's also all the other information that teachers use to inform their judgments over time. The ministry has a very good website that collects lots of materials and instruments that you can use, but the main thing is that the province now has a very good way of what we call mobilising knowledge and moving the knowledge around within schools and also between them. And the districts, at least for several years, work very well together in terms of taking collective responsibility for each other's success and not only for their own success. So the collaboration that you saw at the school level was also replicated to some degree at the district level. That's the first part of your question and I think almost the second part of your question as well. Very helpful, thank you. I'd like to ask the second question just with regard to equity. In a previous evidence session we heard from Professor Sue Ellis from, I think, Glasgow University, who spoke about what had happened prior to the standardised assessments being introduced with groups of children, for example, being removed from class. She argued that that was quite unfair and unequal because it created that unlevel playing field, as it were. It singled out children and it wasn't fair. Do you think that there's an opportunity then with the SNSA to level the playing field and to stop some of that from happening? The issue of exclusions is always controversial. So one of the regrettable things that happen in Ontario education is you could be a refugee from Syria speaking almost no English and you arrive on the day of the test or the week of the test. The school has to decide whether to enter you for the test, which is humiliating because you sit there for over an hour trying to make sense of a language that you don't know in front of you. Or the school excludes you and you score a zero, so the school gets a zero. Of course, the more refugees or students you have with post-traumatic stress in your school, the more zeros are at risk. It's an impossible dilemma for teachers when you have a test that is mid-stakes on one occasion with a kind of dramatic significance attached to it. How you can get around this is by making the test less dramatic, by incorporating it in, by feeling like it is part of the curriculum. I know mainly we're talking about the large-scale standardised assessment, but if you have other kinds of assessment going on as well, children learn that assessment is part of learning. They do peer assessments, they do self-assessments, they understand that they don't do learning and then there's a thing called assessment. But that assessment is part of their learning all the time, as this is as well. If the test itself is either modified so it can be spoken as well as read, if necessary, which in part the existing P1 test is, but not wholly. If you have, which is a resource question, the supports available to accommodate and modify people with learning differences so they can access and express what they know in different ways, then you do get greater inclusion. One of the things that was interesting, and what you had to say amongst many other things, was partly about what you were saying about high- and low- and medium-stakes, but it was fairly clear, I think, as well, that the relevance of assessments partly depends on the closeness of the relatedness to the curriculum in which they are working. I was keen to hear a bit more about how you felt the assessments fitted in or were helpful specifically to the curriculum that we have in Scotland. I haven't seen the other assessments, I've only seen the P1 assessment, but I know that that's where all the activity and the interest is at the moment. Just to be clear, Professor Hagus, there is an issue about P1, and the Government is dealing with that, but the committee is interested in the whole testing at all levels throughout the curriculum in terms of our inquiry. The short answer is that I haven't seen the P1 assessment, because I realise that's the part of it that just happens to be on the radar. The P1 literacy assessment is basically an assessment of reading comprehension. It should be consistent with the literacy strategy. It is part of what is assessed. If curriculum for excellence is about many other things as well as acquisition of literacy, the test simply needs to be not inconsistent with those. It needs to not interfere with those other things. The other ways of judging, for instance, what is the emotional and social development of children, should also be a very important part of how teachers assess how the kids are progressing. Related to that as well, and it relates to some of your work in Ontario, is around how we prepare teachers. In a sense, it refers to something that you've said in Ontario, which is that the Ministry of Education there should implement professional learning and development for educators at all levels of education systems in concert with the roll-out of the new assessments. How would you envisage transferring that advice to Scotland? What would be the analogous advice that you might offer? First of all, as your own review recommendations have pointed out, I think it's Harriet Watt has had what seems to be a reasonably well regarded training programme. If you like for assessment 101, which is just how you manage the basics of it and understand it and have digital competence of your own and develop digital competence amongst the children and know the significance of making judgments at different times about when you assess it, that is professional development as we typically understand it. As important and perhaps even more important than that once you've started moving is a professional development for teacher leaders, for middle-level leaders in schools, for school heads and deputy heads, and for local authority staff to essentially create a culture of assessment for learning and indeed assessment as learning. So that when I go into any school anywhere, for instance, I can see ways that children are continuously reflecting on what it is that they do and setting goals for themselves and making judgments about each other's work as well as their own work and that teachers are helping them to do that. So that teachers understand and children understand that not only is assessment part of learning but actually assessment is a form of learning and it doesn't automatically happen but it has to be something that you pay conscious attention to. And if you develop that effectively throughout a system, whenever any instrument or device comes in, you collaboratively figure out together what your priority is learning, your priority is your shared judgments about learning, not only your individual judgments but your shared judgments in getting some consistency of those shared judgments about learning. And then this thing comes in, whatever it is, and a strong collaborative culture can take these things, whatever they are, and integrate them into their own understanding of learning and teaching and assessment as it runs throughout school. I'm interested in what creating that culture might take. You mentioned half and I think probably maybe only a quarter ingest earlier on about the importance of consensus and the importance of political consensus when it comes to some of these issues. Is there more that we could be doing trying to create that consensus, whether it's within the world of politics or outside it? I would hope that, in one respect at least, Scottish Government could be different than Westminster, which on the issue that we cannot name is more able to articulate what it doesn't agree on than articulate what it does agree on. And if you can attain that cross-party agreement, the centre, I think, everything pivots, not on the technicalities of the test, everything pivots here for you as world leaders of how to know where your country is going and how to help all your teachers, help your children learn, it pivots on this thing of teacher judgment. Wales, after devolution, if you've read what we found on the OECD report, its first act almost was to abolish the standardised tests. One of the ways of saying, we're going to do something differently here than we've been doing it under Westminster, so they did abolish the standardised tests. And they replaced it with teacher judgments, and they were somewhat moderated, but not in a very disciplined way. And the result was chaos and inflation because nobody wants to say they're doing less well this year than they were last year. So the improvement just went up and up and up all the time until it couldn't go any further. Wales was very clear about what it wanted to get rid of to have no standardised tests, but much less clear about how it would create any kind of consistency around teacher judgment. And if you can find ways to at least support that quest, even though you might differ about the best way to do it, that will be the secret of moving Scotland forward. OK, I hope that Mr Mindell. That's finally from me, but I've got just a couple of questions. I'm going back, first of all, to the comments you made round your own experience of testing. I just wonder whether, in itself, placing pupils in rank order or deciding at a very early stage where they sit relative to their peers inevitably then leads to a different type of bias from the teacher in terms of the strategies they use, how they teach in the classroom. Taking your example, if you teach people in sets or according to their ability, does that not just further the existing differences rather than focusing on making sure that everyone is getting to where they can go? If I can check back with you, I think what you're drawing our attention to is that there may no longer be an 11 plus examination, we may no longer put kids into streams at age seven, but we do put them into groups and we put them into sets in secondary schools and sometimes into streams. And they always see data on this is very clear, which is the higher performing countries group by ability, select by ability later and the lower performing countries select earlier and the countries with higher equity select later and the countries with lower equity select sooner. Then there's a danger in introducing a diagnostic test at age four to six that starts focusing on individual interventions before the sort of pupils my colleague John Lamont was talking about before who maybe do have the ability but not the knowledge before they have a chance to catch up and adjust to being in that more formal classroom setting. And again, this is a risk, but it's not a risk that's inherent to the test itself, so I'm sure you can probably go to schools now, where if you spend enough time in the classroom, you'll see four reading groups and four or five reading groups and they'll have names of birds or planets or any kinds of things. And those reading groups are clearly, you know, fast, fast, quite fast, and in the middle a bit slow and very slow. And they work at different levels and the kids can usually pick up fairly quickly which group is which. The purpose of any diagnostic test is to, one of the purposes is to group kids in their learning because teachers can't respond with individuals all the time, sometimes it's a whole class, occasionally it's an individual, usually teachers work with smaller groups, is to group children for the most effective ways of instructing them. And a very good area of research on this is cooperative learning and so sometimes you will group children by same ability, sometimes you'll group them deliberately by different abilities and I don't mean randomly but I mean you'll have somebody who's a bit further ahead, somebody who's a bit further behind and they'll work with each other at different levels. Does testing too early and segregating people based on their ability, you know, is there a greater risk in that early years phase where people's sort of, I don't know what you would call it, where they're, you know, they've not been given a chance for things to sort of even out or balance out a little bit in that formal education setting? Is there a bigger risk in diagnostic testing being used to decide how to teach? I'm starting to sound like a broken record but it all depends on the culture of the school so all teachers assess early. Parts of your judgment, you know, do your, does this child need a bit of a push, do you need to hold back? Is a fight going to break out or should you let them work their way through it? All these are judgments, they're all assessments, they're all assessments of what you know about the children in particular, what you know about children in general from the evidence from your experience. So we're all always making early assessments and we might assess that a child has difficulty forming relationships with other children and we need to do something about it. I suppose we need to watch and wait a little bit but not watch and wait too long before we intervene and the same will be true in terms of language, for example. So whether your assessment's informal or whether it's formal it's important to make those, you can't possibly teach effectively without coming to those judgments and assessments about children from the very beginning. My question really is, is the risk of the teacher's judgment being imperfect at that early stage greater than the risk of the test producing a false positive or false negative? And I think that for some children who maybe haven't had the same experience at home, certainly having looked at the test, myself having seen examples of them done with children in schools, there are people for whom these tests will inevitably produce a result that doesn't give an accurate indication of their ability for the reasons you outlined but is at an early stage the risk of poor teacher judgment greater than the risk associated with testing? When a doctor looks at a brain scan, the brain scan doesn't speak to the doctor automatically, it has to be interpreted by the doctor individually and then perhaps collectively. I haven't had a brain scan but I fell off the appellation trail six months ago and brought my ankle in two places. I have a plate down the right hand side of my leg which had 42 staples in it. Part of that plate is having difficulty healing. When I went back to see the surgeon I saw the resident who was more junior and didn't really seem certain as to what it was but was giving advice and on the less. So I asked the question, have you ever seen this before? And he said no. I said, well perhaps we could have someone in who has. So I was reminding him that it works in a collective profession, not an individual profession. So the next person came in who is the orthopedic surgeon who actually sawed through my ankle. By the way they're all men because in orthopedics they think it's basically like being in the basement and having tools and plugs and everything else. And he looked at it and I said have you seen this before? And he said not quite like this but it could be this, it could be this. Can I take a photograph of it and I'll send it to dermatology? So then off it goes to dermatology. So what do they have now? They have the original x-rays, they have a photo of because we have iPhones, they have a photo of the ankle. And now we've consulted three people and also me because I am treated seriously as a patient. I make sure I put my occupation on the bottom of every email before we connect so that I'm taken seriously. If I was a plumber they probably wouldn't. And through the mix of those we come to some kind of judgement together about how to proceed. And we're still not exactly sure. We're kind of still trying to figure out what's best. So all judgement is imperfect including a photograph or an x-ray or whatever it might be. It depends on our collective ability to interpret that. If you have a culture where you teach people that the x-rays gospel and it will tell you what to do and the data will drive you, then you're in serious trouble. If you're in a culture of leadership where you say we drive the data, the data do not drive us, it's how we make sense of it including being critical of it that matters, then you have a chance of progressing. Thank you. And then just one final, slightly different question, which was really just, you talk about improving teacher judgement and other things. Is standardised assessments where you would start or do you think there are other things that can be done to help, maybe teach people around the issues that come with bias, helping in their training to enhance their ability to spot and identify different literacy problems or do you think that the assessment is the best way to encourage that collaborative culture and help people to understand where other people are? Exactly as you've said, there are many ways of whatever our field of improving our judgments and that is referring to our collective knowledge and also referring to outside knowledge that's somewhat independent of what we have amongst us. Part of the history of what we're looking at now is very important, so it's public knowledge but speaking as an advisor now, the position that Scottish education was in initially was to have a high state standardised test. As advisors, we think that whether you like it or not, at least on the natural advice, is that you can ignore it. The advice that we offered was that that would have all kinds of negative impacts on teaching and learning, a kind of high-stakes, large-scale standardised test. But your government feels that large-scale information is needed in an unequal society to be able to guide it about where best to provide support and intervention. So what we have now, what there is now, is what is meant to be a lower-stakes assessment that is one of the things that informs teacher judgment and that the main way we will figure out how the system is moving is really by the aggregated data on those teacher judgments. That's the art and the science of how we're trying to get beyond, on the one hand, a high-stakes, large-scale standardised test with utterly predictable and pervasive negative consequences. On the other, no standardised testing at all, which leaves us unsure and unclear about the consistency of a teacher judgment across schools and local authorities. That's the dilemma and that's the puzzle. I would hope, as an adviser, that somebody has come to love Scotland. I quoted my wife in Scotland a lot, that you can help us to help you to figure out the best way to do this. Do you think that, by going for a compromise between the two approaches that you can actually end up losing the benefits of both, or is that something that, as advisers, is considered? We don't see it as a compromise, we see it as a sort of third way that is between and beyond the two alternatives that the world has been dealing with previously. Professor Hargreaves, thank you very much for your attendance to the committee this morning. We really appreciate you taking the time to come along. I'm going to suspend for five minutes if you could be back just before half-past for the panel to change over. Welcome back. Can we now move to our second panel on our Scottish National Standardised Assessments Enquiry and can I welcome Sue Palmer, chairperson of Upstart Scotland, and Jackie Brock, executive officer of Children in Scotland. I thank you both for coming along today. I'm going to ask Ms Goldruth to open the question. I'd like to start with a question today about the previous Scottish Survey of Literacy and Numeracy. Jackie Brock, in the children in Scotland submission you say, we believe evidence from the SSLN and national qualifications provided enough evidence to highlight and track attainment and the attainment gap at a national level. I wonder then if the panel recognises the limitations of the SSLN at a local and at a school level to track pupil progress and also to inform teachers. Thank you. I'm interested that you're talking about the limitations. I wondered if I could start with potentially some of the potential. I know that you've heard about this in previous sessions. For example, in the first year of SSLN's reporting nationally on numeracy, what the first year showed us that actually in the early years of primary, children's ability to add, subtract, do basic multiplication and division showed us that actually nationally we were doing really well. Teachers were doing really well in teaching basic numeracy to children. What was appalling in P4 and beyond was that children were not able to apply that knowledge to more sophisticated concepts and fractions was the evidence of that. We were able then to understand the teacher's needs for development and numeracy, so they were very good at basic, but that transfer on to applying that in more sophisticated ways for children needed more attention. This was across the piece, across Scotland. It wasn't that one pocket of the country was doing brilliantly and the other poorly, across the piece. That evidence around P4 also helped to unpack what was then going wrong above P4 and later into it. All the implications that that has for your aspirations, our aspirations for STEM, doing well in maths, etc. What was then put in place by government, and in fact it had probably been anticipated as well, was a huge range of professional development that could then be applied for every teacher to work on their numeracy and demonstrating, because I think you mentioned this with Professor Hargreaves, that lack of ownership, I agree with you, but actually the opportunity to use evidence about how relevant that national SSLN was to their teaching in the classroom in my view is an opportunity lost. The following year, in terms of literacy, no one was surprised, but it showed us again that teachers were doing really, really well in relation to getting children up to scratch from whatever their background, in relation to basic concepts, but then beginning to apply literacy, the basic comprehension that you've heard about already with P1. Basic comprehension in terms of a love of reading, being able to talk about what they were learning and articulate it was a huge gap again across the country, but particularly for boys. My question and I think a shared dilemma is we've dismissed that evidence, we've not followed through in relation to what that evidence told us at national level in ways that could have improved and sustained performance, and I think that's really important for us, isn't it, that the wealth of assessment data and information that we have in Scotland, the follow-through, both at individual level but then at school and local authority and government level is lacking. I would just say two things about that. No local authority chose to enhance the sample of SSLN. What does that say? Equally, you've lost an opportunity at national level for you as a committee if you'd had a consistent tracking of SSLN or national information, but let's say SSLN is the equivalent. You could have had an annual report based on evidence of improvement and you could have been honing in on where do we need to go in Scotland to improve our education based on real data that is addressing both the individual needs of children in terms of the literature, but critically how we need to improve, and Scotland's lost that. I think one of the great strengths of the SSLN was that it didn't cover P1, but I fear that the sorts of results that Jackie was talking about there may have their roots in early years. I'm here mainly because I'm very much opposed to standardised testing of children at the age of five other than developmental testing for the general development. What I'm talking about here is actually specifically testing literacy and numeracy at that age. If you focus very hard on them at an age, very early age, as Oliver Mundell has pointed out in his last question, many children don't do very well and then they spend the rest of their lives playing catch-up. If you, as in countries, will in fact all mainland Europe, leave the specific test teaching of literacy and numeracy skills until children are six or seven, then you've got an opportunity to create the level playing field that we were talking about by focusing in on elements in that early stage, like speaking and listening, hugely important and foundational throughout education, not just literacy. Children's self-regulation capacity to control their behaviour and settle in a classroom. Social and communication skills, which are similarly important. Learning to focus your attention and control the focus of attention. Learning to deal with complex information, all these sorts of skills which are foundational. If we concentrate on those in the early level, which straddles both nursery and P1, rather than homing in too soon on specific literacy and numeracy skills, maybe we would create a better foundation and then you wouldn't get so much of a fall off at P4. Because if children are sort of shaky, if you're building your educational system on a shaky foundation, because you're too busy doing these three hours when there are other more important things that we should be doing, then that means that you might look good in the short term, but it won't have long term good implications. So that's what I think is a strength in not assessing at that particular stage in children's lives. OK. I'd like to go back to some of the points that Andy Hargreaves made in the previous evidence session, because he was keen to highlight assessment as for learning methodologies, which most Scottish teachers will be pretty OK with. He spoke about collaboration, about a shared understanding and a culture in schools. It means that assessment is embedded in learning and teaching, so it's not what he would argue is high stakes. In fact, he argued that SNSAs are not at all high stakes and superpower. I note in your submission that you say that SNSA is recognised by the public and media as a key factor and a high stakes policy. Why is Professor Hargreaves wrong? I don't think that Professor Hargreaves is wrong at all. I said that it was politically high stakes policy, which will affect public deceptions of it, and that will affect what's going on in schools. Because if you're feeling under pressure to improve results, then you are more likely to get the unintended consequences and behaviour that are described very often as relating to testing. Sorry, your first point was... It was with regard to being high stakes and talking about assessment as for learning. Professor Hargreaves talked about Ontario. In Ontario, they do test at P1. They have a developmental test, the equivalent of P1 with children of five, five going on six. It's a developmental index, it's called the early development index. It's used across Canada and it's a teacher, an assessment, the kindergarten teacher does it. She's looking at a checklist of social competence, physical health and wellbeing, emotional maturity, language and cognitive development, communication skills and general knowledge. Teachers are getting through that a great deal of information about the sorts of developmental factors that are really important at this age. So that could very well enhance professional knowledge if that's what you're wanting to help create a background for professional judgment. If you focus instead on just literacy and numeracy, that becomes salient and literacy and numeracy skills will tend to dominate what people are doing in the classroom and will have the inevitable effect of the grouping that was mentioned earlier and so on. So you can have some sort of testing as long and that's what I think Professor Patterson mentioned last week that the Netherlands has a test at P1. Yeah, it's a developmental test. Germany does really very good developmental tests at P5 because it's going to help inform how the teachers work. But it's not saying we're doing the three arts. It's looking at development. It depends what you look at, what you actually begin to value and discuss and base your professional judgment on. And could I just also point out that Professor Patterson said that we've based the sensor, I'm sorry I call it a sensor because that's what teachers call it. I can never remember how to pronounce all the letters. We've based the sensor on the curriculum. When we haven't, we've based the sensors on the benchmarks and the benchmarks for P1 are extrapolated from the experiences and outcomes. And that extrapolation, I would say, is really quite distorting. There's 54 of them for literacy. 22 out of 54 relate to speaking and listening, which I would say is by far, I mean that's nowhere near enough, speaking and listening is the big thing. 32 relate to specific literacy skills and I would disagree with Andy Hargreaves because I've been given the demonstration of the P1 sensor too and I'd say that it does cover a lot more than comprehension. It covers things like phonological awareness, word building, letter recognition, word recognition and so on. I would say of the 54 benchmarks it covers about 10, it seems to me to be distorting completely what the curriculum is. And even the existence of the benchmarks without the test will distort teachers' impressions of what the experiences and outcomes are. If you look at the actual original ones, it's words like explore, play, discover, choose and develop. These are major verbs. Once you drill down and turn that into specific tasks, you're getting away from a holistic developmental approach to early level, which is what curriculum for excellence is about. You're moving much more to a really drilled-down skills-based one. If they look at the benchmarks rather than the ENOs, which I suspect they will, that's going to affect the achievement of curriculum for excellence levels assessments as well. I'd just like to consider, if we're not looking at Ontario, if we look at what's happening in Fife, where I represent my constituency. The Durham CM assessments are going to be brought back in Fife, arguably due to the politicisation of the SNSAs, which you alluded to at the start of your answer there. This is going to cost local authority up to £100,000, but because more than a half of primary 1 pupils have not sat at the baseline PIP assessment, we can't just shift back, so instead the Durham assessments are going to be used alongside the SNSAs, potentially doubling the assessment load on pupils. I'm frankly appalled at that as a former teacher, so I'd like to ask if both Children in Scotland and if Upsart were against the previous Durham assessments. I'm against specific skills-based assessment of literacy and numeracy skills. I am not against developmental assessments and checklists, which are looking more holistically at children's development and can very much inform any sort of intervention that might be needed for specific children. Once you start testing on literacy and numeracy skills, that becomes what needs to get done in the classroom, so I'm absolutely opposed to the other sorts of specific assessments as well. We're not opposed to any diagnostic formative assessments at any age throughout Scotland's education. We are opposed to standardised assessment when it's used to measure and shape individual children's performance and individual teaching strategies in Scotland. For all the reasons that were set out by Andy Hargreaves in previous evidence, and critically I think Professor Haywood's point around backwash, the pressures on politicians, local authorities, individual teachers and children in relation to how high stakes and I think some of the semantics around this, if it gets into the press, if freedom of information requests are used to measure individual schools, and therefore individual teachers and used in order to shape performance, then we've got a huge problem in relation to how we consider Scotland's education. Critically they will shape behaviours and I'm not satisfied children. Scotland's not satisfied, our members are not satisfied, that assurances by the current Scottish Government that have been made, that have changed and they've shifted the approach that's being taken now to SNSA and that's very welcome. But unfortunately I think the die has been cast in relation to how these are going to be used, possibly not by Scottish Government at the moment. But once there's more pressure, the latest PISA results that show a problem, there will be more pressure applied on to various, the local system on Scottish Government to reveal more about what we know and I think there's a really real danger that the information that will be formed and judged and used from SNSA. It will be out of the Government's hands in my opinion. OK, so if we just go back to, I appreciate what you were saying to Palmer with regard to being against the Durham assessments, but if we follow what Fife has done, which is to get rid of SNSAs and to go back to that system, under the previous system children could be removed in groups from class and that made the point to Andy Hargreaves that Professor Sue Ellis had previously raised, which was about equity and about singling out individuals and removing them from class. Surely there's an opportunity with the SNSAs to stop that kind of behaviour from happening and to create a more level playing field for all children? I don't see how because if you're actually, the point about early level is that it's a stage in children's development when there's massive variation in terms of what they'd be able to do in terms of stuff like illiteracy and numeracy. It's been pointed out that that can be to do with previous experience and the sorts of richness of experience they've had in their home family background and so on. But it's also to do with individual genetic predisposition. Some children actually click learning how to read later than others to put it simply. So what you've got to do in that early level and I think that the point of curriculum, I adore curriculum for excellence because I think early level especially was trying to nudge the Scottish system away from this going in heavy on the three hours early as soon as P1. It is a developmentally appropriate stage, much more like the sort of thing you'd see in Northern Europe and unfortunately it's never really taken off because we're stuck in a sort of cultural habit of starting the three hours early. What horrifies me is that we actually had begun to move. We're beginning to see some schools starting to move towards play-based pedagogy, developmentally appropriate pedagogy in P1. But the introduction of the sensor will just kill that in its tracks because it puts the focus firmly back on get on with illiteracy and numeracy skills, crack on with it now. Will you say that P1 will stop play-based learning from happening? I would say that they are inconsistent. I'm not saying that you can't be playful in your learning and put some elements of play-based learning into a classroom where you have groups working on literacy and numeracy skills because those groupings will have to happen. I've seen them in every school I go into. If you're trying to address literacy and numeracy skills this early, yes, you can get a sort of hodgepodge like that. But if you are trying to provide a genuinely, developmentally appropriate stage, then testing just would skew it, will skew it away from being relationship-centred and play-based. What's your evidence-based for that? I visit schools regularly in my capacities in MSP and I was in a classroom previously not that long ago. It's certainly not my experience that's what happens in our schools, so what's your evidence-based for that assertion? Simply that every school has reading groups. They don't have any play-based learning? No, I don't say that they don't have any play-based learning. I said that you can have some play-based learning and you can have reading groups, but the very fact that you've got reading groups indicates that it is not early childhood education which is based on development and on supporting every child at their own individual developmental level. That is the ethos of a kindergarten. That is the ethos you see in Finland and in Germany in kindergartens. They're not saying, oh, well, we've got a standard of what we want in literacy, so everybody's got to work to that standard. They're saying, no, we support the child at the stage it's at and we create a supportive environment, a literacy-rich environment. We have particular attention to things like speaking and listening. We are looking at how well children are learning to focus attention. All those other things are going on as well. Indeed, in the Scandinavian countries, a great deal of emphasis on self-directed outdoor play, which, as was mentioned earlier, is disappearing from children's lives. When we started Upstart, it was before the tests began that we got talking about it, and it was nothing to do with literacy and numeracy. We were interested in reinstating play in children's lives and having a ring-fence period when that became very, very important. It's not that you can't have playful activities or games. You can. You can turn those into lessons in how to do recognition of words or sound symbol recognition. That is aiming to a standard rather than a genuinely play-based environment in which children are gently supported at whatever level they themselves are. I'm interested in what you say that you're not opposed to assessments depending on the nature of them and if it's for developmental purposes. If guidelines, for instance, went out to schools and local authorities to say that those tests were not to be used for streaming children or as a benchmark for their future learning, would you be content with that if monitoring was done to ensure that that wasn't happening? I mean, I'm interested to know what evidence you have that it is happening, but would that allay your fears or is it the entire nature of the tests that you don't like? I'm not sure that it would allay my fears because I'm not sure how easy it would be for teachers to do that. As we've said, once you've got tests, the sorts of things that are on the test do become salient and that does affect the way you teach. If you're trying to teach P1, actually it'd be very, very difficult to cover the sorts of things that are in the test in terms of the specific skills without actually grouping because you've got 25 children in a classroom and these things take a lot of sort of sitting down and helping them understand and particularly in the less able groups that you've really got to keep on and on and on and repeating it. So it's very time consuming and therefore the grouping helps a lot. So I don't see how, if this is what we are aiming to do, concentrate on literacy and numeracy skills, specific ones in early level, I don't see how the teachers can avoid using groups. I don't have a teaching background but would it not be possible to have that information, to have it there so that you've got it noted that the results of that test didn't actually stream children or group them and leave it to a later level to see how much they progressed with that? It's not so much the results of the test, I mean it's actually the existence of the test that affects what happens in a class, in an early 11th classroom. But you're a very existing scholar. You're a favour of developmental tests so would that not do the same thing? Oh no, well developmental tests is what you aren't interested in, you're interested in that overall holistic development. In fact that EDI thing that is used in Ontario across Canada and across Australia that I described earlier has been piloted in East Lothian and has been validated for Scotland on the basis of that. It never actually reached parliamentary level I don't think, I think it stopped at civil service level and it was roughly around the sort of time that the idea of introducing standardised tests of literacy and numeracy came in. So just come back to my original question, if guidelines were put in place to ensure that these weren't used for purposes that you don't believe they should be then surely that would be better. I've said I don't think guidelines actually can work in these circumstances but Jackie. Thank you. I think obviously you've talked a lot in previous evidence sessions about the purpose of assessment and the ranges of assessment and I think in terms of guidelines I suppose we need to be mindful of the amount of guidelines that are out there in relation to how teachers should be practising. So we've got to be very thoughtful about that and what I would suggest is thinking through again and you're obviously exercised greatly about the purpose of assessment. I think we need to look back at Scotland's really strong legacy of thinking about assessment is for learning and the points that Professor Hargroves made about culture. Now we've had a remarkable cross-party agreement and political agreement nationally and locally around what we want for assessment. I won't repeat it at length but 2005 assessment is for learning guidelines stressed the importance of teacher judgment supported by a range of assessment tools that would be decided locally and critically the importance of teacher judgment and moderation around that because we all recognise the understandable preposition for bias and we all understand that professionally teachers want to be able to check out with their peers and get support about how they can support the progress and improvement of their pupils. Of course they do. Then that was 2005 and a huge amount, probably some of you benefited from the professional training that went on in Scotland, fantastic developments there and all of those principles were reinforced later in 2011 under building the curriculum. A really strong amount, a reinforced pressure around moderation and I'm in terms of us thinking about purpose of assessment and the guidelines and what are we actually doing with the information. I think it's really interesting but frankly disappointing that we're not hearing about the thriving moderation that is going on in Scotland. Where is the moderation and the discussion at school level and the thinking about what are we hearing about assessments in our school and what successes do we have and how are we building on that improvement. What about the moderation that we're hearing and are we hearing at the thematic level. We hear a lot of amazing work being done at school cluster level around STEM because in order to improve teachers in STEM know they need to check out and work on standards to improve and that's happening and at cluster level. But I think there is a failure of confidence in the system at local authority and national level that this is actually good enough and that's what I think the genesis of SNSA was and therefore I wonder sorry to go back to your question. Within all this we've had, if you like, we've had a really settled political, national and professional understanding of the purpose of assessment. We've then, and then we have a really legitimate and important and powerful requirement in our education system that we must remove inequality. And for some reason we've decided that the valuing of teacher judgement and how we strengthen teacher judgement and moderation, how we strengthen assessment, how we build on our learning strategies, we've somehow decided that actually, no actually, we don't believe in all of that, SNSA, standardised national assessments are the way forward about how we'll remove equity. Now I think we heard some powerful arguments about maybe why that is but it doesn't seem that we've, it does seem that we're now lurching to a new way of looking at assessment that is standardised and that is proved, I think internationally you've got a huge range of evidence to suggest that it won't work in a high stakes environment. And I worry that guidelines just around the use of tests when you've already heard that they can't be standardised in terms of the timing of the tests, that the information won't be known in a standardised way at national level or even between authorities. So what are the guidelines then? How are they going to be used? How will teachers be trained and supported and their development supported that guidelines actually all of a sudden reveal clarity about how they can use this information in order to improve their teaching strategies? I feel the guidelines, there's an opportunity through this committee's inquiry to maybe go back to some basics around assessment and then think really carefully about what really standardised assessments could offer if anything as opposed to measures that we've been using for some time. Just on that point, I think that that's a very interesting, very powerful argument that you've just put. Could I just ask, in terms of international evidence, I mean the big thing that I think troubles local authorities and troubles many politicians and certainly troubles many parents, is if a school is seen to be requiring more supports, not doing as well as it could be, or there's a particular local authority, that's not maybe performing very well on what it's been able to achieve in the past. What kind of data do we need to have in order to help these schools do better and local authorities do better so that we can raise attainment? I think that if we look at a lot of the international measurements, Scotland has not been doing as well as it might have been, and that's a worry. Therefore, to try to use some of that data to improve things is what we are driving at, and I'd just be interested in your views on that. I hope that everything in our evidence and appearing today makes clear that all of children in Scotland's members and us, we want to absolutely improve the performance from what is a good performance at the moment but must get better, and there are areas of some decline. We do have qualifications and we do have PISA, and these are really important in ways that they begin to, for example, the fractions argument that I gave earlier and our performance around mathematics and STEM, we can use some of the qualifications and where we're going wrong, and some of that potentially to unpick further down the chain in terms of to actually say, we're not getting some basic concepts of applying maths numeracy into mathematical concepts, so we're not using the information that we already have. We also have benchmarking within Scotland a huge amount of money has gone into supporting schools to cluster with schools with similar socio-economic characteristics. So you can look at why are certain schools performing better or worse than others with similar characteristics and therefore how you can learn from those that are doing well. Into primary schools, again, there's been this kind of, frankly, myth in my view that there is nothing. Yes, there is nothing until now potentially that will help you compare in order to hone in on poorly performing schools, but there is plenty at local authority level where there is assessment, standardised assessment at local authority level that 31 of the 32 authorities bought into. So I'm sorry, it is impossible for me to find credible that any local authority director of education does not know how well and comparatively well or badly their schools are doing and therefore where they need to hone in on supporting those schools at year level as well to do better. I think a real issue that I don't think has been touched on sufficiently if I can say so around the evidence is so what do we in Scotland do with the evidence in order to tackle poor performance and to improve children. Again, that is a legitimate concern of government and why they have initially at least claimed to have introduced SNSA that they did want a tool to look at how you improve performance and that's legitimate. I would disagree about the means, but there is plenty of information. I think what we do have is to be concerned about is the apparent inconsistent way in which we are improving performance across Scotland. First of all, on the question of purpose, there has clearly been a shift in what the purpose of the assessment is. It started off as getting information across Scotland and then became a diagnostic thing. Which of either of those purposes would be the better and can SNSA testing fulfil either of them? Well, I've just written down purpose when you were speaking because I think that what the issue for us in terms of the P1 test is that we've got the wrong purpose. The purpose of assessment for early level should be children's holistic development. The purpose for testing in SNSA is standardised standards in literacy and numeracy, assessing children against specific standards. The two things are at odds with each other because if you are assessing development that is a holistic process. It takes in things like social competence, physical health and wellbeing, emotional maturity, language and cognitive development, communication skills and general knowledge. It is not specific literacy and numeracy skills. For me, as far as the early level is concerned, we've just got the wrong instrument. It's just not appropriate. What would you say to the person who says that you can't change what you don't know? I would hope that we would be using that developmental information to help improve because we would know stuff about children's development. It's issues like not just the background that you have mentioned and we can know about that and know that we need to provide literacy rich environment, plenty of stories, lots of opportunities for songs and rhymes and that sort of thing. But it's also things like speech and language difficulty. If you spot that, then you can pick that up and try to help with it. Issues with phonological awareness, children not actually hearing rhyme perhaps, then you want to look at audio metric testing. Maybe some children will need other physical check-ups like visual check-ups. This is the sort of thing that they do in Germany as a regular thing when children are five. A physical and cognitive assessment that will help them ensure that you are putting the right sort of supporting at the individual level for each child if necessary. Again, if I were the devil's advocate, the kind of characterisation is, and I've seen it in one professional life, what I would call the dismissive shrug. Will they come from such-and-such a place? We can't expect any better. In a sense that in order to address inequality, we need rigor and these standardised assessments offer rigor that wasn't there before. How do you address that question for families, for schools, for teachers who are anxious that young people are already disadvantaged when they come in the door? And if we don't have rigor around understanding through assessment, are they being treated as seriously as children in another school? Are they getting the same opportunities? Is there the same kind of rigor around their learning and not the lower levels of expectation? Some of the characterisation around this debate, how do you respond to that? I think that that's probably one of the most compelling arguments that the choice is between rigor and treating every child with respect and therefore testing when understanding their ability against something that's indefinable. It's nice, it's warm but we may be disadvantaged in these children. How do we respond to that? I respond by saying that if we are doing genuine developmental testing, which at the moment we're not, then we would be being very rigorous in terms of the appropriate sort of rigor for that age group. That age group in the vast majority of the world, including the whole of mainland Europe, wouldn't even be in school, let alone being tested in the three hours. It's just because we have this cultural attachment because of a very, very early school starting age, which we've had for 150 years and we've therefore assumed that going at P1 and you crack on with literacy and numeracy. Some children will be fine in literacy and numeracy at P1 and yes, you support and encourage them. Some children won't have the foggiest and they will need a different sort of support and encouragement and hopefully a very rich environment in which to make the progress so that you have got a much more level playing field when you do begin specific instruction in skills. It is not in any way not rigorous to be looking at children's development rather than saying let's just get on with aiming at standards. The point at which standards kick in is the significant one and indeed looking at the international evidence on when they do standardised assessment. The first standardised assessment in most countries is not before the age of 10, national standardised assessment, is not before the age of about 10. Singapore, where they don't start school till six but they have been testing at six, they've just abandoned it and aren't going to do any testing until after the age of eight. Because they've realised that it is actually changing the ethos of early years education in a way which is not productive for the children. There's lots of different sorts of rigour and if you talk to people who are specialists in early childhood education they are very, very rigorous indeed but it just doesn't look the same as sitting down and doing the three hours. I think that children's Scotland are opposed to standardised testing at every level so you can see perhaps the argument early years but I wonder what is the argument later on. So I think in terms of our response around standardised assessment we were certainly responding in the context of the way in which it initially been proposed in national improvement framework that was around looking at ways in which we can judge performance of schools and local authorities in relation to and how that information would be used in relation to those systems that were poorly performing and badly performing and then the reason why we were concerned about this is that the evidence has been well documented in relation to the distorting behaviours that come about as a result of that high stakes testing. I think that we want to stress that we understand the purpose of assessment. We understand the needs to look at ways in which local systems, local authorities and schools work together to look at moderating performance and making sure that that's a robust approach. That isn't just about sitting around having a coffee and looking at these results, it's a challenging approach to how we can demonstrate at cluster level or as I said at subject level or whatever that there is improvement. I think that that is a problem in relation to that robust approach that teachers are finding difficult, I don't know if it's at head level or subject specialist level, that maybe there isn't sufficient robust professional development going on. It was that sense in which we might revert simply to using information that I think is acknowledged that is a tenth of the literature. In one of the evidence sessions you talked about a tenth of the curriculum in relation to literacy and numeracy being covered by these tests. If that is that, there is the potential, I would suggest, of distorting all other efforts around literacy and numeracy. May I just, Deputy Convener, say a little bit about the purpose as well that you talked about assessment? Very briefly, I feel that it's really important to also bring in, if I can, what children and young people have said. I just wanted to quote from some work that we did for the General Teaching Council of Scotland where we worked with 591 children and young people from 5 to 18. I think that when you are going to be reflecting on purposes of education, I think that firstly it's really encouraging to reflect that the Scottish guidance around assessment is for learning and building the curriculum 5 guidance, it's actually reflected very much in what children and young people say they want. So very briefly, of course, key to helping them develop and learn is positive relationships. But specifically, children and young people want to be able to focus on, to help them. What did I do well? What didn't I do so well on? And what are the next steps for their work? Children and young people want positive short-term learning goals. They want achievements that they can reflect on and discuss regularly, one-to-one or in groups. They don't want assessments that are essentially memory tests. They don't feel that's helpful to their learning and development and progress. They want, and here's just two direct quotes if I may convene, if I make a mistake, they explain what I did wrong and they help me to understand for next time. They help me focus on what I do best and make us learn more about what we don't know. I know, Deputy convener, that you've been talking about the needs of children with additional support needs. Of course, there's potential greater variability for the whole range of children and the whole range of needs that may be additional and the extent to which some assessments can be modified and adapted and tailored for the individual needs of children, additional support needs children, care experience children. I know that you have a significant interest there, children with particular health and mental health conditions, for example. These need tailoring, so they need a combination of teacher judgment, of course, backed up by tests and assessments that can actually be modified and shaped to ensure that the teacher is getting it right in relation to how they can help support child learning, but really critically, their progress on to next levels. I think that I really would make a plea that some of those findings, which we can make fully available, that voice, if you like, of children and other people, that echoes national guidance will be reflected when you're reflecting as well and recommending around purposes of assessment, too. Thank you very much for that. I was going to ask one last question, but it's gone out of my head. Perhaps I can come back in when I remember it. Jackie, you've talked about it being a high-stakes test. If I understood you correctly, I may have a note picked up correctly. The 31 and 32 authorities were using Durham test, CAT test and using that. Why aren't those high-stakes? I don't know if you're a parent, but my children have just gone. Did you ever know that those assessments were happening? No, I didn't, but I do know. The genies out of the bag. Isn't that an interesting expression? I understand the bureaucratic definition of high-stakes, mid-stakes, low-stakes, but when the genies out of the bag and when parents have information that can help them say, is my child here, there or wherever, when the local press can, when councillors, when ministers, when your committee can, we've become high-stakes, haven't we? I think Professor Hargreaves talked about this. If we're really clear about purposes and we're clear in terms of translating purposes into the daily experience for children that we can then report to children and their parents and in time to media, then we can help mitigate the impact of those high-stakes. I don't think the discussion around SNSA has been helpful so far, because the genies out of the bag, or even the bottle. I think what hopefully you can do in your committee is to try and dampen down some of the concerns about how, if standardised assessments do go down, really the authenticity of how they're going to be used and how they're actually helping teacher judgment. I think there's a long way to go before that feels credible. How we have an honest conversation about how teacher judgments are being used to think about the progress both of individual children, but how our school is performing and how our local authorities and Government is performing in terms of investing where we need to. I think this could maybe lead to a more healthier conversation, but I do worry that if we're only then going to focus on the results of SNSA, then we've really lost a huge opportunity for us all to understand the importance of improving performance. Having worked in Scottish Government and seen the maelstrom of panic and concern that arises from the annual publication of data that, frankly, the media and politicians all collude in and distort potentially really good work that's being done at schools, I feel that we need to be so cautious about the impact of high-stakes testing and assessment and how we then use those results nationally. Sorry, sorry, yes, in terms of P1, that genie out of the bottle thing is particularly significant because the ratcheting up of parental anxiety impacts on the children. Within a year of the announcement that we would be testing P1 children, the workbooks had already appeared in the bookshop, help your child with primary one literacy, help your child with primary one numeracy, and as soon as they get wind of what's on the actual tablet-based test, I dare say there will be apps. This is making it very high stakes in terms of what's happening in P1, which is why something like a developmental checklist which is done by a teacher is a much, much less distorting thing than something which is linked to testing throughout the school system and is very, very specific to particular literacy and numeracy skills. We've heard lots of evidence about how helpful the testing that was done previously in some local authorities have decided to continue their own testing, like East Renfrewshire and Fife have reverted back to using that original one. Have we poisoned the water hole for what that was in terms of how the perception will be for that testing going forward? I think that the county has raised the whole question, and I did do a piece for Skeptical Scott recently saying that I hope that this is going to start a national conversation about what is relevant in that early level. And whether we should be thinking in terms of getting on with the three hours, or whether we should be looking at a different sort of approach, because it could be that we have revealed that the water is poisoned. Sorry, John, you had just a quick supplementary question. A short brief amnesia has recovered. As you know, there has been a lot of argument around this debate, and some of it has been quite heated. I would say that the argument that was made for SNAs that gave me most pause was when, both at the Government level and probably just in the political debate, it was said that if you had a child with special education needs, you would want to know. This is a means by which we do know that, and surely you put in these young people at risk if we don't have rigorous assessment. You can understand how compelling an argument that is to anybody who is thinking that perhaps this is not the best use of a teacher's time or whatever. What is your response to that? That is also a very serious statement to say that these tests ensure that we identify early young people with additional support needs and we can therefore meet those needs. I think that in many cases what we are doing is creating some of the additional support needs by very much focusing on the specific skills at a very early age. I worked for a long time with dyslexic children and it was very clear when they came to me that for many of them it was an auditory issue or a visual issue at the beginning, something like that. But because they were being asked to do sound-symbol recognition that they couldn't do, there was an emotional overlay which then grew. Then they felt the stigma of being in a remedial, I mean we don't talk about remedial groups now, but in a special group doing some special works, Sue Ellis talked about the walk of shame at the first meeting of this committee. The children develop more problems as a result of having been asked to perform these tasks when they were not developmentally ready to do so. It creates the needs. What we need is developmental checklist, developmental assessment in order to inform policy and funding to particular areas in terms of the needs they have. But also so that teachers, by becoming familiar with sorts of things that are being covered, their judgement of the children is better and they are able to look for, when you see a child that you are a bit worried about, who do we refer them to for the best diagnostic tests. That's the way it works in Finland and they have far fewer special educational needs because a lot of them are picked up through teacher judgement, proper diagnostic tests on the individual child, putting forward together sort of support packages so that by the time they actually start school the problems being sorted out rather than building up an emotional overlay on top of everything. In upstart's submission to the committee there's a reference made to Naplan, the Australian system, which says that similar low-stakes, laboured low-stakes tests were introduced, but the information was then used in a high-stakes way and is now acknowledged to have had the unintended consequences of that kind of testing. I wondered if you could just enlarge on that slightly and also ask Jackie if that was the fear she was describing when she talked about information becoming available through FOI or otherwise. I think that the genia of the bottle argument is very much at the back of that. Once you do national standardised testing it's public knowledge, it's of great interest to the public. Parents become anxious, teachers become anxious that they make sure that their classes get through, schools are worried about their results. The Naplan tests don't begin until year three, but interestingly I said that the early development instrument is being used in Australia as well as in Canada. Interestingly, its results did correlate rather well with the year three results on Naplan. A developmental tech is as good at predicting what would be happening at year three as anywhere else. For me, Mr Gray, the fractions argument is if I can go back to that and maybe a couple of others. I think it's absolutely right that the public, the media, Parliament is engaged in a debate about how we need to improve teaching and learning in order to improve the outcomes for our children. That can only lead to a deeper conversation. SSLN meant that we were looking at, rather than blaming and wagging our fingers at individual schools, teachers or children from a particular part of the country, instead of doing that. We were actually saying that we've got a systemic challenge here and here's how we are going to address it. There's a whole range of things that families and others can do to help us, but we could have really helped in a very high stakes way, deepen our understanding of how we improve and move the conversation on because the way in which the SSLN findings is that our teachers are rubbish and our children are pretty rubbish too because they can't do sums. It was a systemic issue around the application of basic colleagues. I've got no problem with that and I think that we would all benefit, wouldn't we, from a really better informed, high stakes discussion around how we're going to improve Scotland's education. What I really want to resist are the well documented impacts on individual schools, on individual neighbourhoods, on individual types of children with particular needs, of ways in which we've seen league tables or some fancy way of presenting the information about when children appear not to be performing well. Based on what we all know is SNSA or whatever the Durham assessments, it's a very narrow tool. I'm not saying necessarily that they're the wrong tools, but basing high stakes judgements on very narrow tools in isolation can only lead to distorting factors and lead to very poor consequences, as I'd suggest, for our children's prospects. I'm looking to see if any other members wish to come in. I think that concludes our session this morning. Can I thank you, Superman, Jackie Brog, both for coming and giving evidence, and I'll suspend briefly to let you leave and we're going into private session.