 Welcome to the third meeting of the Education and Skills Committee in 2019, to remind everyone to turn their mobile phones and other devices to silent to prevent them interfering with the broadcasting. Today, we have received apologies from Dr Alasd liking Alan and Gil Paterson attending his as a substitute today. welcome to Mr Pratterson. First item of business is a discussion and whether to take consideration of his at work programme in private at the next meeting. Are members content to do so? Thank you. Addendum 2 is the Scottish National standardize assessment inquiry, and this is the second week of the committee inquiry. I would like to welcome to committee this morning Dr Keeal Bloomer, convener of the education committee of the royal society of Edinburgh, and Professor Louise Hayward, Professor of education Assessment and Innovation, College of Social Sciences, School of Education, University of Glasgow, and Professor Lindsay Paterson, Professor of Education Policy, School of Social and Political Science, University of Edinburgh. I thank you all for coming along to participate this morning. Can I just open by asking a very—I ask you to briefly set out your own perspectives in relation to the SNSAs and the move away from SSLN, if I could start with Dr Blimer? Much good morning everyone. The position of the Royal Society of Edinburgh is that it has no objection in principle to the idea of standardised testing. It is concerned about the fact that we have, if anything, too little information and data about how the education system in Scotland operates, particularly in the whole of the primary phase and the early part of the secondary phase. So the idea of gaining more information through the introduction of these assessments is one that, broadly speaking, Royal Society of Edinburgh would welcome, which is not necessarily to say that it welcomes every aspect of what is subsequently taking place, but the principle, as far as we are concerned, is perfectly all right. Our major concern, I think, with what has happened, is that the purpose of the assessments has become less certain as time has passed. We were fairly clear at the outset that the main purpose was about monitoring performance of the system, and that, as I say, is something that we would welcome. Since then, the emphasis has been placed upon the diagnostic capacity of the tests and, therefore, their ability to help teachers, help individual pupils. I know that you wanted to be brief at the outset, so I won't go into the detail of this at the moment, but we are much less persuaded that the tests work effectively in that role. So there are concerns about the way in which they are being used at the present moment, but we think that the tests have the capacity to supply information that is of value and which has not been available hitherto. You also asked about SSLN, and we are puzzled about the abandonment of SSLN. We think that it is unfortunate that there has been no continuity in the kind of information that has been made available in the past. We had a previous kind of assessment, SSA, which ran for, I think, five years. Short interval, SSLN, which ran for six years, abandonment, and now we have a third system. We think that the idea of a sample survey, which is what SSLN was, is not incompatible with universal assessment of the kind that the new SNSAs provide. We do not see what the rationale for abandoning it was. It would be perfectly possible to run both systems in parallel. I think that it is very important to say at the outset that it is important to remember that both the SNSA and the survey are simply different ways of collecting evidence, and that both of these sit within the national improvement framework. When looking at any part of the system, it is important to see that in the context of the whole. Tests and surveys are simply different ways of collecting evidence. The really important thing is the purpose, and that different ways of collecting evidence relate most effectively to different purposes. Once there is clarity about the purpose, the second order of question is that once we know what it is that we want to find out, what are the best ways of finding that out? I think that it is an encouragement to come back to what really are the central purposes. The idea of having information from tests that supports teachers' professional judgment is an entirely appropriate approach. The issue is that we have to decide what matters. If what matters is curriculum for excellence, then our assessment system should reflect all that matters in curriculum for excellence, and we have to find ways of gauging how much and how well children are learning in relation to all of these processes. The move from the survey that I think surveys can provide very helpful information if the purpose of the information is to give feedback at a national level on how the system is progressing. I think that survey evidence is a very good way of doing that because it can provide evidence at that national level without having some of the unintended consequences that other ways of collecting evidence can have on either narrowing the curriculum or encouraging teachers to teach the particular parts of that curriculum. The central focus is purpose and then decides how best to collect information. I cannot see the names because they are sitting below the level of the woods. I think that collecting data that is neutral and reliable is always better than not having it. The new test assessments, as we are supposed to call them, whatever their faults may be, are more reliable, more neutral, more objective and more independent of bias than anything we have ever had before in recent decades in Scottish education. The reason I say that is that all of us who teach, and I include here university teachers as much as any other sector of schooling, are unavoidably subject to bias. Sometimes that is unconscious bias. We know, for example, that if you do not allow students' essays to be marked anonymously, there will be bias against, for example, women—women underperform, or rather they are more accurately assessed when they are assessed anonymously. That is an illustration of the kind of bias that all teachers inevitably have. The bias of school teachers, which is no less than, but no greater than that of university teachers, I should emphasise, is evident from the previous survey, the Scottish Survey of Achievement, where it was systematically shown year after year after year that when teachers assessed children, they tended towards optimism, sometimes really very great optimism compared with the results of objective tests conducted in the survey. That, for me, is the principal attraction of the new Scottish Standardised Assessments, which is that it provides neutral objective information that guards against bias. We know from the history of examinations that guarding against bias in that way has been one of the major means by which equality of opportunity has been improved. For example, for women in Scotland, for Catholics, for ethnic minorities, more recently. Secondly, as far as the abandonment of the survey is concerned, I completely agree with what is being said. The two could have run in parallel. The great advantage of a survey is that it can ask a much wider range of information and much deeper kinds of information. Incidentally, I agree that the survey as designed—that is the Scottish Survey of Literacy and Numeracy—was not adequate for some of those purposes. The older one, the Scottish Survey Achievement, was actually better. One of the ways in which it was better is not only that surveys can provide a national picture, as has already been said, but in reply to the cabinet secretary's legitimate complaint that the SSLN could not tell you where things were happening, where things were getting better or getting worse. That was a feature of the design of the SSLN and was not a feature of the design of the SSA, where the design allowed you, anonymously, to say that that school was doing better and it was doing better perhaps for these reasons to do with, say, homework practice or discipline or a school uniform or something. It is possible to design a survey in other words that both gives you a national picture and also gives you not only a local council-level picture but also a school picture, so both could be done. I am going to move to questions from the committee members and Mr Scott. Maybe I could just start with this purpose point. In your submission, Professor Hayward, you say that it is a really helpful submission and I am grateful for it, that there are three main purposes—sorry—these three main purposes interact in any national assessment system. Any action taken in one area will have an impact on the others. You said earlier on that there was not clarity at the moment on the purpose of the standardised assessments. Do you think there needs to be and what should it be? I would argue now that there is greater clarity in that the national assessments are there to provide one part of the information profile on an individual child. Lindsay pointed to the advantage of the items in a survey in terms of the reliability. The issue, of course, is that the danger is that that compromises in terms of validity. The test assessments are only able to give information on certain aspects of the curriculum. Although they give information on important aspects of the curriculum, it is not all the curriculum. For example, if you might get information on punctuation or on spelling, writing is more than that. The way that we get information on what matters is to do what is central to Scottish policy in this area, which is that we have to depend on teachers' professional judgment, because the teachers who work day-to-day with the young people are the ones who collect the information, and the system should support teachers in order that we can build and enhance the dependability of that professional judgment. There are more than one purposes. It is not just about supporting teacher judgment, it is about the whole school system and understanding what that is doing or how it is performing. There are at least two purposes. Is that fair? There are three main purposes. It is recognising the fact that no one part of the process will be able to address all of those purposes. That is why we have a national improvement framework that draws evidence from a range of sources linked to a range of different purposes. Does the panel believe that having three purposes is appropriate? I think that it could lead to confusion in the absence of any other source of information of the kinds that we have already referred to in the case of surveys. It would be possible to design what would, unfortunately, therefore be a very cumbersome system where all the SNSA results were supplemented by the full range of kinds of information that you might collect in a well-designed survey. However, that would impose such burdens on teachers and schools as to make it unmanageable. For example, it would mean—let's take a specific example here—that it is already through the school's census known from returns that the schools give what the home language of the child is. That is quite a difficult thing for the school already to establish. If, in addition, they had to establish, for example, the education level of the parents, perhaps the occupation that the parents worked in, even such matters to do with the size of the family, the living arrangements—a single-parent family or whatever—that is not what schools are for. It would be a ridiculous amount of burden. That is why sample surveys can give you deeper information. Although, in principle, you could design a SNSA-type thing that would cover all the purposes, I do not think that you can in practice. That was not what I was saying. Indeed. No, I quite understand that. In your view, the Government needs to concentrate on those who are promoting the best motive behind standardised assessments that need to be very clear about what the purpose is. Do you think that that has been established? Cableum, have you suggested that it had changed in your opening remarks? The emphasis has clearly changed. It was on national monitoring at the outset, and it is now more on the diagnostic capability. One has to recognise that the diagnostic value of the tests is limited. There are some strengths in it. It can monitor the same pupils over time, for example, which is not something that we were able to do through the sample surveys, because the same pupils did not figure in successive runs of the survey. We now have what has been described as long scales, which stretch through from primary one to secondary three, and it is possible to monitor how the individual pupil has progressed up the scale. I think that that is valuable information, and researchers will be able to make something of it in the future. On the other hand, the assessment looks at a restricted area of the curriculum once every three years. As far as the individual is concerned, that is a minimal amount of information that is going to be available at any given time. Although the information that is available in the print-out has more value than has been given credit for, and is available or is to be available to parents as well as to teachers, it is still restricted. It is a standard description of what performing at band six means, for example, and not very much more than that. There is another issue that some teachers have raised with me as a source of difficulty. One of the features of the assessment is that it is adaptive. As the child goes through it, depending on how he or she is getting on, they will be fed more difficult or less difficult questions. In order to be able to, as a teacher, work out what the outcome of the assessment is saying about the child, you really need to be able to follow their path through the questions. That is not too easy to do in the feedback, and there is also the issue that you can clearly get to the same banding as a result of going through different paths. Different interpretations might attach to that. There are some complications in the nature of the feedback. For a teacher, it would be necessary to be aware of what the strengths and weaknesses of the assessment are in order to try to get what is of value out of it. The other question that I was going to ask to Professor Hayward is that you have done some international comparative work, which has been supplied to the committee and it is really helpful. Just on P1 testing, I cannot find, and correct me if I am wrong, any other country in your international comparisons who does P1 testing. Am I missing something about how other educational systems around the world look at what is happening or assess how children age four and five are doing? I am not sure that the question was asked, so I think that we may not have the evidence. Fair enough. There certainly are countries who would use tests. Even at that young age, at the earliest stage of school? At the young age, but there would be few in number, and there would tend to be countries where there was a strong tradition of testing through their system. Certainly, I think that again it is back to the purpose is important. It is really important to know what young people are able to do, what they know, what they understand, how they feel about learning. These are all really important aspects. Gathering information about young people as they come into the system is really important. How best to do that is a matter for debate. Professor Patterson, you want to talk about that? The Netherlands does test from the beginning. I get this from the OECD report on testing in the Netherlands in 2014. It goes right through from years one to years eight, and year one is aged five-ish, so it is much the same as here. The test in year one and two elementary mathematical things, as we call it, are ordering language and orientation and space and time, so it is possible to do it. I remind you that the Netherlands is a very high-performing country in the PISA test, for example. The arguments about play-based learning, which we may come on to later on, are never just confined to age five. They are usually thought of as relating to the whole period up to about age seven and from about age three to age seven. If you include that range, then there are many countries that start testing at the equivalent of either our P2 or P3, depending on whether they start at age five or age six. For example, Denmark does the same as Chile does the same. It is unusual, more common than not, to start until about age eight, but nevertheless there are perfectly respectable countries that we like to emulate and many respects doing better than us that do start from an early age. Can I just ask one final question on purpose, Dr Paterson? Do you think that, ultimately, Will, if testing stays the same and there has been some argument for continuity, given how much we chop and change and have done in Scotland, if testing were to remain the same, how long would it take Scotland to work out what was genuinely happening in our schools? That point that you made in your opening remarks about the whole school experience, how long would it take us to know? I would give the same answer to any question about any educational reform of any kind whatsoever, at least a decade. Thank you, convener. I had some specific questions of clarification for each of the panels. I hope that that is okay. My first really was for Dr Blumer. It is a follow-up, or the first part of this, I think, is a follow-up to one of Mr Scott's questions or your answer to him, Dr Blumer. You said that you felt that the purpose of the test was not clear when they were first proposed. The emphasis seemed to be on getting a national picture, but, laterally, the emphasis has been much more on diagnostic use of the SNSAs, by which I presume you mean teachers using that information to plan the individual learning and teaching strategy for that pupil. In your answer to Mr Scott, you are rather implied, I think, that you feel or the RSE feel that those tests are not particularly effective for that purpose. Is that fair? Not entirely. I think that they have strengths and weaknesses. I said, for example, that the ability to track the same pupil over time as a strength—and it is not a strength that we have had in the past—I think that the amount of feedback in relation to the individual from any one test is quite limited, and that, obviously, is a weakness. There is also a need for teachers to become skilled users of the information that is available, and, of course, there has been some degree of professional development made available with that purpose in mind. However, my overall conclusion on it is that it is not a form of assessment that yields a wide range of valuable information. It is not without value, but it is limited. With regard to the other purpose of getting a national picture, the core objective of the Government is to close the attainment gap. You made an interesting point, and I wondered if you wanted to elaborate on that, that the SSLN previously was applied across all schools in Scotland, but SNSA does not take place in the independence sector. I wondered if you wanted to elaborate the impact, then, on the measurement of the attainment gap, if that is the case? That seems to me to be a matter that is relevant to the attainment gap. I do not suppose that there is any reason in principle why the new national assessment should not take place in independent schools alone, whether the Government could or would wish to oblige independent schools to make use of them is obviously another matter, but it is a dimension that was present in SSLN and is absent now. SSLN had information about family background and it surveyed teacher views as well. There was a richness to the information about SSLN, although I accept the point that Lindsay made earlier on that probably SSA's predecessor was a better test still than SSLN, but we have lost quite a lot of that contextual information and, of course, it is very valuable in relation to trying to narrow the attainment gap. We have also removed from the data cohort, which, in general terms, would likely be at the more privileged end of the spectrum, is that correct? Absolutley, yes. A lot of the impetus for this change, as the cabinet secretary in particular describes it, comes from the OECD report and some of the things they said about the availability of data in the Scottish education system. In the Glasgow university paper, the university says that that is based on a misinterpretation of the recommendation of the OECD report, specifically the shift away from the sample approach. It is based on a misinterpretation of the report and I just wondered if you could enlarge a little on that. I think that no one voice ever influences a shift in policy direction. That is very much the core evidence that is presented to us. This is the only evidence, I think, that is presented to us and to Parliament as an evidential research reason for making this change, so in this case it is. I included the quote from the OECD report within the report that I submitted. It is clear that they are saying that this does not mean that by necessity one particular path must be followed. It was open therefore for a brighter debate to think around the issues. It is back to purpose. It is around what is it that people want to know and what use are they going to make of the evidence? Data makes it sound very hard and impersonal. There is an advantage in having a degree of objectivity, but on the other hand the central purpose in all of this has to be improving children's life experiences. It is the way in which we collect evidence. Who is going to use the evidence and for what purpose? You do not grow flowers by weighing them. You grow flowers by creating the circumstances in which they develop and you feed them and you look after them and you help them to grow. Closing the attainment gap is a kind of shorthand, I guess, for improving the life chances of all young people in Scotland. Within that we have to ask ourselves very serious questions about how best we are going to do that. The focus therefore has to be on the action that is taken in relation to the evidence we have rather than having all our attention on the evidence. The second issue is that in my submission I listed all the areas where evidence is collected. There is a sense in which our system has to operate at all levels. There is certain information that national policy makers need in order to think about future policy development, so the action that they will take in order to enhance the direction of policy. Do they need all of the information right the way through the system? Or is it the teacher in the classroom who needs evidence about every individual child? Perhaps the head teacher in the school who needs evidence about the dependability of the professional judgment of every teacher in that school. The local authority needs information, so it is a layered model. All of those different layers have to work in order for the system to operate effectively, because otherwise we move into a world where we collect so much information that we cannot make use of it. That is why, in your evidence, a view emerged that the OECD had recommended the introduction of standardised assessment, and that is a misinterpretation of the recommendation, which was much broader in the terms that you have just described. Is that fair? That is fair, but I think that the OECD also argued that we should look at the range of sources of evidence that we had available and then relate those back to the purposes that we intended to serve. Professor Patterson, I wanted to ask you not so much about your evidence, but about some previous comments that you have made with regard to the introduction of SNSA. Back in 2017, when the policy was first being described, you said that the very local approaches to SNSAs cannot give a valid national picture, and that, therefore, the whole exercise is a waste of time—quite strong words. Recently, as this time last year, you said that Scotland has no reliable method of monitoring the performance of schools in literacy and numeracy for the first time in almost 60 years, a situation that you described as woefully inadequate. Those are quite strong words, perhaps stronger than the evidence that you have given this morning, and I just wondered if you still hold those who used to be correct. If we take the second of the two things first, that is the situation so far as evidence is concerned. What I was referring to in that quotation comes from a context in which I was discussing the demise of almost all surveys of school students and school leavers of any other group. The only one that we have left is PISA, and it is inadequate for most purposes at age 15 only, etc. I say 60 years, because we could even go back and say 80 years, because Scotland pioneered the use of good quality surveys to understand the progress of people through education systems. From that came a whole series of things—the Scottish School of Labour survey, the various surveys of primary school children, the SSLN, the SSA, the assessment of achievement programme and various other things—all of those things have gone. They are no longer there. The kinds of information that we had, for example, 20 years ago, when the Parliament was established, we simply do not have now. We cannot monitor it. It is impossible at present, for example, to know reliably whether we are actually closing the attainment gap. We cannot know that because we do not collect valid data. The thing about SIMD, the area thing, is just not valid as a measure of social inequality. I hold strongly to that, and I suppose that I feel strongly about it, because my job is to do research, so perhaps you can discount my strength of feeling because I lack opportunities to do research. On the first thing, which is about the use of the proposed SNSAs, I think that the question is still very much open. I have been somewhat reassured by the approach taken by ASA, the contractors doing the surveys, the details, the rigor of their approach as submitted to this committee and, as in the first annual report from them, and also as information from freedom of information requests that Reform Scotland kindly helped me to get. I do think that they are trying to produce standard reliable information that can be interpreted in the same way across the whole of Scotland, but there are still major worries. One of the really major worries is not knowing when the child is tested. If you take a child in primary one, the difference between being tested, say, when they just arrive in primary one and September and being tested just before they leave in June, is clearly about one-sixth of the child's development up to that point. That is an enormous amount of child development at such a young age. We could allow for that statistically in appropriately technical ways if we knew when they were tested, but the information about when they are tested, as far as I understand it, is not to be collected. Maybe I am wrong there and I hope I am wrong, and it would have to be there at least to be able to make use of the kinds of standardisation of the test results that would be needed in order to make sense of that at a national level. There are other ways in which we do not necessarily know the circumstances under which the testing takes place. Some schools are doing it all at the same time, almost like an exam the EIS has pointed out. Others are doing it much more informally. I have heard of many schools through teachers and parents where it is essentially just integrated into the classroom environment. A scientific study that was aware of that kind of variation would want to collect that information about the context and the conditions under which the testing was taking place. It can be done, and it can be standardised, so my comment originally may be wrong, but at the moment I would still be somewhat pessimistic about that. We have no reliable method of monitoring the performance of schools nationally. What about the other purpose that we have talked about this morning, the diagnostic purpose and the individual learning strategies? How do you feel about the strength of SNSAs for that? I agree that there are problems that have been identified already, but I think that one valuable way in which it could contribute to that is through what we might call calibrating teacher judgments. I referred earlier to the unavoidable bias that we all have as teachers, but one of the ways that we can try to improve that, that is to correct for our bias, is to keep looking at objective data and comparing our judgments with the results of the objective data. That can lead us to improve our judgments. That is what other professionals do all the time, and it is something that I think that we should do as teachers. In that sense, I think that the measures that are only measuring part of what a child can do are actually quite valuable. Of course, you find that good secondary schools are doing that all the time every year when the SQA exam results come in. They sit down and look at the results, and they compare them with the forecasts that they made for each individual student taking those exams, and they try to improve their forecasts and, indeed, also in turn improve their teaching. That is the way that I hope that the tests will be used, but at the moment it is not clear that they are going to be integrated into programmes of teacher development in the thorough way that we would be required to achieve that. Professor Hayward, in your very useful submission, you are referencing the three main purposes of assessment that you mentioned, holding people to account to be one of them. Can you detail a little bit on how the SNSAs do that? If you forgive a daffladi question for me, who is it that the SNSAs are holding to account? I think that what I intended to say was that any system serves a range of purposes. Just now, in Scotland, the national level data, which I guess is about, is the system performing as well as we would like it to, then the evidence that is available for that is through the national improvement framework. The issue about putting too much emphasis on something like SNSA is, to go back to something that I said earlier, is that what it looks at is very narrow in comparison to curriculum for excellence is, with its four purposes, the vision of what it is to be an educated scot. We want people to be successful learners and to be able to contribute in all of those parts. What the SNSA will give us information on will give us reliable information on a very small part of two areas of our broad curriculum. To then say that, from the two very small areas of the curriculum, we can then generalise to the education system as a whole would lead us to ask questions. It is about being very clear about the purpose if we want to ask questions about how much and how well young people are developing, then we have to do that across the curriculum. The only way we can do that is by basing those reflections on the evidence that we get from teachers' dependable judgment and, over time, we have to work to make sure that that judgment becomes more and more dependable. However, there are other ways, other than testing, that that is done in Scotland. For example, bodies like Education Scotland have professional moderation activities, where, just as Lindsay was describing with SQA, teachers will come together, look at examples of pupils' work, share their understandings of it, look at that against the national benchmarks and develop an understanding that will inform their professional judgment so that we can build professional judgment that is more consistent across every school in the country. There is no system that is perfect. We look for developing approaches that will give us sufficiently dependable information that will allow good-quality action to be taken in order to support young people's learning. Thank you. That was useful. I suppose that what I am getting at is the concern that the teaching unions have raised—or a number of individual teachers have raised—that SNSA data might be used to judge their performance. Is there an appropriate use of that? Should that be used as evidence of a teacher's performance by a headteacher or a local authority, for example, given that class-level data can be aggregated? No, that is the short answer to that. It is back to—assessment is very simple. There are two world views that you can have of assessment. A world view that says assessment is about ways of gathering evidence to inform learning. The focus is on learning and improving learning. Or you can have a world view of assessment that says that it is about judgment and categorisation. Those two world views sit uneasily together. In the real world, they mesh to a certain extent, but the focus ultimately has to be on learning. If it is on judgment, we get into all kinds of perverse behaviours, so that teachers believe that they are going to be judged by evidence coming from one area, so one test, then they will naturally teach to that test and they will spend more time on on these parts of the curriculum. The standardised assessment gives teachers one important source of evidence that they can use to inform what action they take in order to support children's learning, but it is only in a small number of areas. We would not want tests or standardised assessments that covered all the areas of the curriculum, because then we would do nothing else. The focus has to be on learning, not on assessment. A brief supplementary to Ross Greer's line of questioning. I was quite taken, Professor Hayward, with your comments on assessment. Learning has to be the principal concern of what we are doing here. I would be interested to hear the panel's views on what happened previously under the SSLN and whether learning was the principal concern of that survey. I agree completely that the Scottish Survey of Achievement was a better survey than the SSLN. By context, by way of background, I was a teacher previously and had children removed from my classes sample sizes, as was the case previously, who would be removed from class and then that data was never shared with me as a classroom teacher. I think that there is a disconnect more generally in the teaching profession because data in the past used to be held in the hands of teachers, deputy heads and it was not used to empower the profession. I see that there is a bit of a disconnect here with what happened previously and what we are seeking to achieve with SNSAs. From my experience at the SSLN, I did not empower the teaching profession and I would be interested to hear the panel's views on that. The items in those tests were designed by teachers so that teachers across the country who were part of the construction of those tests. There were courses that were run that were designed to help people to use the information that came from the SSLN, but I think that you put your finger absolutely on the crucial issue, which is that it was partial. Some people had access and others did not. That simply is not good enough. I used to tease a bit that if we had called SSA rather than the Scottish Survey of Achievement, if we had called it Save Scotland from Accountability, then it would have attracted a great deal more interest. The issue is absolutely crucial. We have to be clear about the purpose and information from surveys can feed back in to provide very helpful information for classroom teachers, but if it does not do that, then we are missing a significant opportunity. One other thing that was really interesting about SSA was the fact that, in addition to the national survey, local authorities had an opportunity to ask for a boosted sample within a particular local authority, which would then give them information at a local authority level. Technically, there would be nothing to suggest that, for example, a head teacher did not take items from a survey to use within a school or for a teacher within a classroom to give that same kind of information. I notice that, with the SSNE, the norming studies give the opportunity to develop a survey approach, which again could take some of the advantages that I have described there and build them into our system if that is our purpose. The reason that no teacher would have been given the results of the individual children assessed in the survey is exactly the same reason that all survey responses are confidential by the normal ethical requirements of any survey. If I were to do a survey of people and then to give to anyone, apart from the responses that they gave, I would be severely disciplined and ultimately could be sacked by the university. That is an absolutely fundamental principle of surveys. Only the survey contractor, sworn to confidentiality and the individual respondent knows what the individual respondent has done or replied to that survey. The reason why local authorities could get access to their level of information in the SSA, as they can, for example, to the Scottish household survey, is because the level of aggregation there—the number of people involved in the sample at the local authority level—is such that there is no risk of any individual individual identity being compromised. I doubt if that could be done at the level of the school and it certainly could not be done at the level of the classroom. That might be argued as an advantage of the SNSA, because the whole contractual situation is different there. It is intended that the teacher knows what the results of the individual child's tests are, and that is the whole design, so nobody is in any doubt about that. However, a survey should not and cannot do that kind of thing. More positively, however, you might ask then how would the survey be useful to teachers. There are two ways—Lewis has already mentioned one, which is about the overall national report—it was useful to teachers in the same way it was useful to government and politicians and so on and other people as well. There is another specific way, which is a good thing about the SSLN that developed after the SSA, which was that the people running the SSLN would pick out those test items that children were not doing very well in and use them as the basis of professional development sessions for teachers. That was an extremely good practice, so they would find, for example, that children were not very good at telling the time and so they would then use the kinds of mistakes that children made in telling the time questions to then advise teachers on how better to teach time. That was a great idea, and it shows how a survey can be used, of course, totally anonymously because that was aggregated across the whole of the country. It was not about children, and that person's classroom was about the whole of Scotland. A survey can be used in that way, but a survey cannot address the kinds of questions that individual testing of individual children can do. That is not the purpose of a survey. Dr Blimer, did you want to ask a question to assume that a survey or come to that is a system of universal assessment that says something about how the system as a whole is performing has nothing to do with learning? Learning in the system will improve if we know more about how we are doing and whether we are progressing or moving backwards. Although the connection is less direct than in the case of the feedback given to the teacher about the individual's performance, survey information of that kind is still a valuable contribution towards improvement. There is a kind of orthodoxy in Scottish education at the present moment that nothing influences the quality of provision other than the quality of teaching. That is not true. There are lots of other factors such as the curriculum, such as the nature of educational policy that influence the way in which the system is performing and therefore what the experience of the individual actually is. We therefore require to have that kind of information. Those sample surveys that we used to have fulfilled a very important function. It is not clear that any longer we have that kind of information available at any rate in the depth that we had before. In a couple of years' time, all of you will be expressing vocally views about whether or not the attainment gap has narrowed. It is probably possible to predict each individual's views on that matter, but you will be resting what you see on remarkably thin evidence at the present moment. Thank you. It leads on from that point. I also want to go back, Dr Bloom, to some of your previous comments about the adaptive nature of the test and some of the feedback that you have received. I wonder once you look at the adaptive element, some of the accessibility features that have been built in the variable timescale that an individual can spend completing the test, the different testing circumstances, and the different timescales for carrying out the tests. With all those variables built in, do you think that we can look at them as being standardised at all? Clearly, they are not fully standardised. Lindsay has talked about the issue of timing. I think that it is relatively common for schools to have a set pattern of timing. For example, one that I visited recently, last year had carried out almost all of its testing in May and had come to the conclusion, which I think is perfectly reasonable, that a primary school will get little value out of testing primary seven pupils in May. They have not got the opportunity to make any use of the feedback that they get. So, they had come to the conclusion that they would carry out the tests of all primary seven pupils in November. You can see the reason for that, but if that is a common phenomenon, and I think it is a relatively common phenomenon, then it sits ill with the idea that every pupil is being tested at the point where they are judged by the teacher to be ready. So, there are a whole lot of circumstances like that, which mean that the circumstances of testing for the individual are likely to vary quite widely across Scotland. Yes, that clearly does have an effect on the overall outcomes and whether you can fairly compare what is happening in one place with another. Not that you have the opportunity to make that comparison anyway, but even if you had, I think that those kind of variations would make it less than valid. Thank you for that. I also wondered whether you felt that it was odd not to have road-tested some of the testing models with teachers. We were hearing last week that teachers were only consulted in passing in the design of the test, particularly at primary one level. Do you think that if the tests are designed to help to assist with teacher judgment that you would have expected, that teachers would have been asked about the tests before they were implemented? An initial point, if I may, on teacher judgment. One effect of the tests is that they may assist teachers in relating their own judgment to national expectations and standards, and that is, in itself, quite helpful. The tests were the subject of some road testing before. Whether that was done to an adequate extent or not, I wouldn't really like to offer you a view. There's always attention in policymaking and implementation between taking your time to get it right and getting on with the job. I think that, if anything, the tendency in recent years has been to accelerate timescales, and that means that less is done in order to try and perfect the instrument before you start off. But, to be fair, that is not a criticism that I have heard much canvassed by teachers. I wonder if I could come to the points that you have made about teacher judgment. Other people might not see it that way, but I consider myself to be an optimist. I think that, particularly at early stages of education, teachers were optimistic of looking at the child's ability, because we know that there is less variance in ability than there is in attainment. Do you think that, just looking at these narrow aspects and looking solely at current attainments enough, or do you think that part of that teacher judgment that maybe a standardised assessment would not pick up at that stage is about looking at what the child is capable of? That is a really interesting question. I think that that is the distinguish between potential and the point at which somebody has reached. Ultimately, I do not think that it is possible to distinguish between so-called formative and substantive judgment or final judgment. I think that these two things, as Louise has already said, always happen. In order to know what is best to help a child with at age five, you need to know in a summative way, in a final way, what they already know, and that is a judgment. You cannot get away from a judgment as a precursor to helping the child to progress. Judgment is not a bad thing. Judgment is actually absolutely intrinsic to good teaching, it seems to me. A teacher then has to be optimistic that they can take the child forward, but in order to be optimistic, they need to have reliable evidence. There is no point in being optimistic on the basis of fallible evidence, of evidence that is wishful thinking or something like that, because you do not help at all. That goes back to the common accusation that children only ultimately suffer if they are praised for trivial things. You should only be praised, according to, for example, to Professor Carl Dweck of University of California, with the ideas of a growth mindset. You should only be praised for effort, because it is effort that improves what you are going to do. You should not be praised for doing trivial things that do not actually require effort at the stage that you are at as a child, because that will vary according to the age. That would all suggest to me that being optimistic is a necessary part of being an effective teacher, but being optimistic also requires that one is realistic about the limitations of one's own judgment. To be optimistic, you have to be able to listen to judgment that is independent of you as a teacher, and it is only on that basis that you can reliably act. Otherwise, you are living potentially in an illusion about what the child can do, and therefore what you can help them to do. I was just following off that. Going back to the point of just making round the variability of the test, do you think that they showed the teacher enough information to compare that with their own judgment? I think that, in particular, I have heard from teachers in my constituency who worry that if they can listen to something rather than read it or that, again, in terms of an individual's child's motivation, perhaps if they showed them two picture cards, they would be more engaged with the test than they are sat on a computer whether they are not necessarily very focused. Do you think that those are valid points when it comes to the design of those particular tests? They are indeed valid points. There are the kinds of things that the improvement framework of the whole testing regime has built into it. My understanding is that it was always expected that the tests would try to learn from experience of the first few years especially and try to build an improvement. That is happening this year. It is already documented, for example, in the Acer submission and so on. If you take the point that you mentioned about some children doing better in reading than in listening or vice versa, the fact that, apart from in primary one, the tests do assess both listening and reading, as well as writing, would mean that a teacher could choose to give greater attention to that aspect of the test and to the other aspect of the test, depending on their feeling of what the child is going to respond best to. That is a good example of whether the tests, even though inevitably also, as has been said, only assess certain aspects of the curriculum, are already sufficiently rich to allow that kind of distinction that you mentioned to be drawn. In the presentation that we got round the testing, we were told that a primary one child could interrogate, so you have to choose, and you can press a button in here, the word. However, in the assessments, the information given to the teacher will be no distinction made between a child who has pressed a button and heard it and the child who has decoded it and read it themselves. What is the value in a test that does not make that distinction? It does not tell you what level the child is operating at? Again, as a very telling question, and in principle, I would agree that you would want to know that as a teacher. Of course, it is an empirical question as to whether it matters, so you could then look at the results, you would need to have the information, you would need to have the information about whether it was the written form or the oral form that the child was responding to, and then you would have to see whether either of these two was a better assessment of the child's overall ability in language. With that information, you could take that decision and it might turn out it made no difference or it might turn out it made an enormous amount of difference. That is a matter not of the existence of tests, but about the design of the tests, and that would be an improvement to the design that would seem to me in principle to be desirable. Of course, to make it valid and reliable as an improvement, there would then have to be a lot of replication of items. You would have to give to some children only oral tests and some only written tests so that you could compare their performance. That kind of thing would have to be done as part of, as it were, an experimental add-on to the annual testing, which would be deliberately an add-on to improve the quality of the whole testing regime. The point is that the test does not show whether the child had to press the button or not. It could. It doesn't show it. The information you are getting about two children, both children can read that word, but actually one child needs to hear it, another child doesn't need to hear it, which is pretty important. I agree. I wonder whether that means that there is a danger that what looks like standardised testing to give full information is not actually that. I am told by people who are very committed teachers, not teachers who resist repel all borders, but teachers who really want to do their best, saying that it takes 50 hours of teacher time for a primary one class to do the testing. The information they get is not particularly valuable. Would that be a concern to you if that is people with experience of those tests? I do not think that my sense is that something that is seeking to be objective cannot be taken out of the context in which it is operating, and I wonder if you agree with that. I will take a specific point. I agree with you entirely that if it would be useful, if it can be shown to be important that whether a child responds to a written version or a heard version, it would be important that the test should allow the teacher to distinguish between those two, and that the reporting would allow that to happen. I agree. Do you not think that it might seem self-evidently important to know? It might be a child who, for example, can decode it and press the button to reassure themselves, and another child can decode it but knows that by pressing the button that will help them. Those are the two different skill sets that have been assessed, and it will surely self-evidently do that. We might be teachers who are able to do this anyway without a standardized test, so we might be digging ourselves into something that is not that important. What struck me about it was that something that presented itself to the teacher as rigorous, in my view, was not actually particularly rigorous because it was conflating two groups of children together, or it was giving us less information that you might be able to identify within a classroom and working with a child. Nothing is self-evident. Any claim that a certain thing is or is not the case needs to be tested by evidence. It could be that, if tested, that is if we set up an experiment in which we compared the children's responses on these different ways, oral and written, and we would find that the distinction was so important that the two types of thing had to be reported separately exactly as you say, but it could be that one predicts the other so reliably that you do not have to have the two separate versions. That is an empirical question that requires evidence to be able to satisfy, but if it turns out that the two things are sufficiently independent that they need to be reported separately, then indeed that evidence would say that they need to be reported separately as you say. Would you share my concern that there appears to be no evidence that the question has even been asked about whether those two different approaches actually matter? That would be a very constructive… If the standard assessment is supposed to be based on evidence, we presume that we could identify evidence that showed it does not make any difference whether you asked the question this way or that way with access to being able to hear the word or not. That is presumably the kind of very constructive recommendation that the committee might make. That is the point of trying to have these debates, both the present debate at this committee and also the new inquiry from the Government on P1 testing, is to come up with constructive ways of improving the quality of the design of the system and the reporting of the system. It does not dam the system. It says that this is a way that it might improve and you might recommend that you should do that in order to see if it can improve it. Will the committee call on to questions some of the assertions that are made around the benefits of the testing of that basic work that has not been done before they get put in place? That is the past. To improve for the future, to move forward from where we are, getting evidence relating to these really important points—I agree entirely that there are very important points that you are making—evidence in relation to that would allow improvement of the system to happen. It does not dam the system. It does not in itself say that we should not have the system. It says that this might be a very reasonable way in which we could collect evidence to see if we could improve the system in the way that you recommend. Just before that, Dr Blimer, you said earlier on about the information for teachers about the pathways that a child would—is that the kind of example of the pathways, or am I missing something as a something else that would be involved in a different pathway for a child to achieve a particular level? The instance that Johann Lamont is referring to could be an example of where children would be taking different pathways. The notion of the pathways is that the test responds to what it gets back from the young person and makes things easier or more difficult to put it crudly. It also has built in this facility to listen to the question, as opposed to reading the question. Those are examples of the pathway in action. The particular example, I would have thought, depends very much on what the question is designed to test. It is quite conceivable to have a question that is concerned with comprehension where it is not terribly important whether the child got the question orally or by reading. Self-evidently, however, if the question is trying to assess the individual's ability to read, the question of whether they read it or had it read to them is critically important. I cannot imagine that a feeling as obvious as that is built into the system, however. The question was, which of those three words sounds like pie? Is there a difference between reading and reading it? If that is the example and I am obliged to agree with you. I would say that there are multiple similar examples in the questions. There are choices between looking at pictures or reading words and obviously seeing something and reading a word are different skills. I want to go back to Professor Patterson's point around evidence base. For those tests, do you need bespoke evidence in trials or can you look to other educational research? There is plenty of existing educational research out there around how learning happens, where people look at different skills and different techniques for reading. Again, someone who has a wide vocabulary and can see a whole word and identify it is again different from someone who is able to decode or read new words. There is plenty of evidence out there on how those different skills work. Can that be used to inform how tests are designed? Yes, it can. Professor Sue Ellis described some of the ways in which that can happen. She probably knows better than anybody in Scotland on that body of research. I absolutely agree and I am very much supporting Johann Lamont's point there. I think that what would be required would be well-designed research into how this operates, but in fact that well-designed research has probably already happened, if not in Scotland than certainly in other places that have similar kinds of culture and education systems such as England for example. You would not want to reinvent the wheel. Scotland is terribly bad at not learning from elsewhere and we should certainly learn from research elsewhere to inform the kind of questions that Johann Lamont has been asking there. I know that you want to look to the future and not go back, but in terms of the future of introducing new things and taking forward new policies, do you not think that those are questions that should be asked before any new educational policy is introduced? What is the evidence base and does what we are doing match up with educational evidence? I completely agree and without wanting to go back, but for the future, yes, we should be looking far more. This is not just because an academic will ask for people to pay attention to academic research. It is not that, it is about because it is not just academic research. It is also for example what you might call the accumulated wisdom of the professionals in the system as often very well articulated by bodies such as the GTC or the AIS or other professional bodies as appropriate. That kind of thing should be much more part of what you might call a policy formation cycle. That, after all, was one of the aspirations 20 years ago when the standing orders of this place were constructed and it would be nice if that was done more than just by the necessary consultative memorandum attached to the beginning of bills. I ask a very direct question as to whether you believe, in light of what you have said to the committee this morning, that whether greater standardisation would be helpful specifically in asking schools to undertake the tests at a specific point in the year, or whether you think that we should be slightly more open-ended about it? There are two types of answers to that question. If I answer as a researcher, or if I was working in the Government statistical service or something like that, I would answer that there has to be as much standardisation as possible and, falling short of that, there has to be collecting of sufficient information to allow an estimate of the effects of not standardising, for example, the precise date at which people were, children were tested. That is the researcher's or the civil servant's answer. Of course, I completely recognise that the political answer to that cannot be that, and it goes back to the point that Cure made earlier on, which is that the purpose of the tests has shifted. Insofar as the purpose is now much more firmly placed on the diagnostic value of the tests, it would be impossible in the circumstances that have now come about in the last two years to require that the tests take place at a standard time of the year. That is a real dilemma. I think that researchers who have tried to insist on standardisation flying in the face of the political reality of it being impossible to have a standardised week in May or whenever, would just be failing to pay attention to the reality of how things happen in the real world. My compromise would indeed be the caveat that I made to the first point. We cannot hold them in May in a single week or November in a single week, but what we can do is collect information that will allow us to take account of the possible effects of maturation on, for example, the difference between the autumn and the spring of primary one. The point about the dilemma, which I think is a very important one. If you are a parent, I think that you are interested in two things. Firstly, how is my child getting on at school and what progress is he or she making? Secondly, you are interested in how well is the school doing. It seems to me at the moment that we have relatively good information available on how well a child is doing and the new tests are designed to provide greater information to teachers on that basis. I think that we are more or less agreed on that. However, the new tests are also designed to provide greater information to local authorities and the Scottish Government to assess how well schools are doing and therefore to be able to pinpoint areas of concern. In other words, if there are schools and or local authorities where the educational standards year on year are not as good as perhaps they should be, it is that aspect that we need to find the relevant data for. If we do not find that, it is very difficult to help those local authorities or schools who are slightly underperforming to improve. Could you be very specific about what additional data you think we need or perhaps better interpret the existing data to find out where there are weaknesses in the system and therefore to help schools that are underperforming? There is a very real danger of overgeneralising in the sense of saying that an instrument which is designed to collect information on very specific aspects of reading and writing, for example, and a number can be generalised to the quality of a school. That is one issue. There are three issues. The second issue is that by with the phrase standardised assessment is one way of collecting information and it can be an important and a helpful source of evidence to inform a broader judgment. The tension always is that it is not to use that in a way where it can have unintended consequences on other activities. If, for example, schools see that the test is taken on one particular week in the year and all the children are to it, then an atmosphere develops around that where it starts to attract stakes that nobody wants it to have. We have anecdotal evidence in our system just now that in certain circumstances that kind of thing has been happening. Those all take place within a context. We want to make sure that the consequences that follow from the use of any assessment that these consequences are positive consequences. The third issue is that this assessment is not the only part of the education system. It is the responsibility of education authorities to make sure that the quality of education within schools is of an appropriate standard. We have lots of other sources of information that come together in order to give a picture of the performance in a particular school. It is recognising that we have multiple sources of evidence in the system and that, when we ask key questions, we draw on a range of sources of evidence in order to give us a dependable answer to a question. In that case, do you believe that there is work to be done with the school inspection process to enhance that qualitative judgment? I think that the school inspection system is one part of our national improvement framework that is a way of gathering evidence on what happens within schools. Local authorities have their own quality assurance processes. We have a national self-evaluation system that is moderated by critical friends. We have a great deal of evidence in the system, and if we focus only on one tiny element of it, we risk ending up with a less dependable judgment than we may have had if we paid attention to the range of sources of evidence that we have available to us. That is a very interesting point that you raised. I am just trying to get at the situation where, if there are variable standards across local authorities, particularly within local authorities, where some schools, even if they might have improved on their performance over time, is one of the most important trends to measure a school against itself. How do we get to a satisfactory measure for a director of education in a local authority or a Scottish Government minister if there are concerns about the flatlining of performance in that particular local authority area? How do we drill down on those results and make some, as you say, in the national improvement framework to try to help them to improve what they are doing? I think that going back to the conversation that we had earlier, which is about the interrelationship of research policy and practice within the system, that often the truth is that when issues like that happen, we do not know why. We have to ask further questions. What is going on in a particular establishment that is leading to this particular situation? It is a trigger to seek further evidence relating to action. I think that it is that kind of seeing it at a whole systems level and thinking about what evidence do we need to collect that will give us the best quality information that is likely to lead to improvement. Interestingly, the research evidence suggests that in most circumstances the differences between schools are largely explained in terms of socioeconomic circumstances. The most significant differences lie within schools. Can I finish my question to Dr Blumer? There is a society at the time where it produced its report about the curriculum for excellence and how to measure the curriculum for excellence. It pointed up quite a few gaps in the information and the available research that we can use to draw conclusions about that. It also pointed to some international evidence that suggested that Scotland could learn and pick up the point that Professor Paterson made that we are not very good at learning from international comparisons. Are we talking about additional information that we need in Scotland to improve our own efforts to close the attainment gap, or is it a matter of interpreting the data that we already have? In our side of Edinburgh believes that Scottish education is relatively data poor and that we need more information than we have at the present moment, particularly at stages below the senior phase in secondary education. I think that we would all, at the end of the table, hope that the work that your committee is engaged in at the moment will make some kind of contribution towards improving the information gathering that is going on in the Scottish education system. Some parts of that, no doubt, are beyond the remit that you have taken on for yourselves. We are now involved in only one international survey. That, in my view, was a mistake to abandon the other two. I hope that, at some point, it will be reversed. We need more information about how we compare with other countries. Although PISA is an excellent survey, it operates at age 15. It tells us nothing about what is happening at those stages of the education system about which we are already most ignorant. I suspect that that is not the kind of issue that you are immediately concerned with, but you are concerned with the assessment regime and, therefore, I think that, by implication, you are concerned with the question of whether we would benefit from having something like SSA or SSLN reinstated. I do not know that I am entitled to speak for my colleagues, but I rather think that the three of us think that that would be a good thing to do. Whether that happens or not, I am sure that we all think that it is important to be clear about what information it is that you think that the national standardised assessments are supposed to be generating. It is possible, of course, to use a single assessment in order to generate information of more than one kind, although you have to be careful about whether one is compromising the other if that is what you do. It may not be necessary to say it only serves one purpose, but it is, I think, necessary to be clear about what the hierarchy of purposes is. Either it is an assessment designed to monitor the performance of the system, in which case what it generates by way of diagnostic information is secondary, or, alternatively, it is a tool that is about assisting teachers to aid individual young people and also to refocus their teaching so as to benefit from what they learn about how their whole class is getting on, in which case its performance as a source of evidence about the system as a whole is secondary, but we need to know which it is and then to act accordingly. If it is primarily about generating information about the system, then it needs to be able to fulfil that purpose, which I think would point in the direction of greater standardisation of approaches. If it is about diagnostic purposes, then that is not important. It is a question of clarity about objectives, first of all, and the rest follows from that. Can I come in again on the question about individual schools, because I think that Cair's put very graphically the distinction between assessing the system as a whole nationally, possibly also at the local authority level and the other purposes that this can be put to. You asked questions about what could a local authority director of education do about knowledge about individual schools on the basis of that. There is a workable model, unfortunately, that has now moved away from in England, which operated until, I think, about four years ago called contextual value added. There were two components of that. One was that, in looking at a school, you would be looking at what it adds to children's learning. It is not saying the number of tickets at the end of secondary school, it is not the number of hires that the school on average gets, but it is the progress that the school has enabled children to make towards the hires. That is the basis, for example, of some of the contextual admissions decisions that universities are making. That is one thing. It is about progress at primary as well as at secondary. It is about progress that the children make. The contextual bit of that method in England was taking account also of the social circumstances that the children were living in, too. We sometimes think of, as it were, parental social class or parental education as a kind of background variable that you allow for once, but it is not. If your parents can help you because their own education is advanced, then that continues to be a help right through. The child who has well-educated parents is likely to make more progress between, say, P1 and P4 than the child whose parents are not so well-educated. That is why the contextual bit. After a lot of argument between about the mid-1990s and about the middle part of the last decade, there was a system that by and large commanded quite a lot of consensus in England, which was put in place—I cannot remember exactly when it was, but it was sometime in the last decade—and then ran until a few years ago. It certainly ran right through the period of coalition for government, for example, and some of the policy decisions had been taken under the previous Labour government. That worked quite well. It was not perfect, but it did allow school-level information to be generated, but at the same time also taking account of the complexities of children's learning in terms of the progress and of the family and other circumstances that they face. There might be some possibility of using the SNSAs in that way. I should finish by saying that school-level information is bound to find its way into the public domain whether we want it or not because of freedom of information, so it would be far better to prepare for that by addressing the kind of questions that you've raised. Thank you. Thank you. Mr Scott. I just so agree with that last point, but that's a different argument altogether. Can I just ask on Liz Smith's line of questioning the achievement of CFE levels return, which the Government said in its submission to us that is a replacement for the SSLN? A, do you think that it is a replacement? B, it's obviously, we're not quite sure it works at all because they indeed badged us, I think, still experimentally enough to three years. What's its role? What do you think it's there for? I don't think that it's an adequate substitute for SSLN for two major reasons. One is that the assessment levels, the assessment of where the children have reached in terms of teacher judgments, and we've talked about the unreliability of these already. Secondly, it's not an adequate substitute for a completely different reason, which is the measurement of social circumstances. Actually, the SSLN suffered from that too, and we need much better measures of social circumstances. The committee has addressed this point before, and it's coming up over and over again. We know, for example, that two thirds of children living in poverty are not in the neighbourhoods, the 20 per cent most deprived neighbourhoods. Your constituency has no deprived neighbourhoods apparently, and that doesn't mean that it's got no deprived families. There are other ways in which the annual December report is inadequate, but those are the two major ways. That supports the contention that we should revisit SSLN but with some enhancements and some careful creative thought about how it should properly work. We have good models of that. The Growing Up in Scotland survey, which is an absolutely excellent survey that traces children through, contains really good sensitive measures. I'm not saying that we replicate that every year, but it would be too expensive. However, the experience that Scott Senn and Paul Bradshaw, who is the director of the survey, have built up over the last 15 years, I think, since the survey was established, would be really useful in helping to strengthen the evidence that you're talking about. On the point that you've all made about, and I was going to say in Cable, but I particularly said that all of us in politics would be basing our arguments on some pretty thin evidence on closing the attainment gap if we are where we are. Would a enhanced SSLN potentially help politicians of all political persuasions with what is genuinely a difficult issue? There's some purpose in it in that sense, too. It's designed to serve that purpose. It's the biggest question why we took it away, but then you've answered that already. Okay, thank you. I'd like to go back to the purpose and go back to some comments that Dr Bloomer made a few moments ago. If I was explaining standardised assessments to a constituent and I said that they were there to monitor the performance of the system, I think that they'd be surprised and confused because they think that they're there to monitor the performance of their child. Do you think that there's been something lost in translation when we've been getting this over to the public? I'm not suggesting that this is in any way your responsibility. That they are confused, that is the general perception. Their diagnostic test, essentially. Has changed, and as a result, parents have been persuaded that the primary purpose is diagnostic. That was certainly not the advertised primary purpose at the outset. Okay, next. The other several. Is it coming from the National Parent Forum submission, which I think puts it really succinctly and well? Yeah, I agree. Right. But also I think that policy also should be susceptible to development in the light of evidence. So I think, I don't know if we would all disagree, but I would argue that that shift to having the tests, to lower the stakes, to have them as part of what the repertoire that a teacher can draw on is a positive move. So it leaves a gap? This should be used, which is what I have. Sorry, sorry, yeah. Yes, but it's why then we're talking about how that gap might be addressed in a way that would not have the kinds of potential unintended consequences that had the policy state as it was. Could have brought with it. You believe that it could do with some clarification from the public's point of view about the purpose of these tests? Yeah. I think that once parents start getting report cards that incorporate the results of the tests, then I think that the misunderstanding will go away. And in fact, it would be very difficult then for the Government or MDLs to go back on that. Once parents start getting the kind of scale thing that has already been published, I think, on the Education Scotland website, and it's in the Acer submission, then parents are going to think, why did we not get this kind of detailed information before? And I think the problems then that face teachers might be quite different, is how to explain the sorts of things that Louise has been talking about, which is that the child's progress is more than just the result of the test. Sure, thank you. If I could just ask Professor Hayward, you said that the system is a modelled layer coming from local authority down. Do you think that's actually working in practice or is it too early to tell at the moment? I think that, like any complex system, there are parts of our system that work very well, and there are parts of our system that work less well. I think that learning from evidence is as important at the level of the system as it is at the level of the child. So it's making sure that we have good quality evidence that will allow us to reflect on the really helpful question that you raise, and then it will allow us to realign policy. In going back to the question that came up earlier, I think that there is the idea of research to inform. There is research that, along with other sources of dependable evidence, professional evidence from teachers and classrooms, evidence from school inspectors, a whole series of evidence that should inform any new development, but it's also research to align. Once we've got the vision of what it is that we want to achieve, then actually keeping an eye on what's happening as it's developing, so that we make sure that we stay consistent to the ideas of the vision, because our history, in common with the history of every country that I've worked with internationally, is that very often countries start out with very clear and very coherent visions of what they want to achieve, but over time the divergence happens. Because we don't actually go into the system to try to better understand why those gaps are beginning to emerge, it continues to develop until we get to a point where a new innovation has to come in. I think that it's changing that model to say that we have a vision of what we want to achieve and using research evidence as we develop in order to make sure that we remain consistent with that vision, and we feed the evidence from that back into developments both in terms of practice, but also perhaps in terms of policy. Thank you, that's helpful. While I agree with Lindsay that parents will be a bit clearer once they begin to receive test feedback in school reports, I'm not sure that they will necessarily all be well equipped to interpret what they are told. What they will be told is in relation to each test, for example the reading test, what band of 12 their particular child was considered to sit in, and they will be offered a paragraph of three or four lines length, a completely standard pre-written paragraph that tells them something about the band. Each of these descriptors starts with the word, learners in this band are typically able to, and it says something like read a wide range of straightforward texts or whatever. So what this is saying is that typically a child who falls in band six, for example, is able to do this but not perhaps that. Whether the individual child of that parent fits the typical stereotype of the band descriptor is, of course, another matter. As we have already discussed, you can get to being assessed as band six by answering a different set of questions from somebody else who ends up being considered to be band six. Obviously, a different mix of skills might have emerged in the answers that you give. It adds information to the parent's understanding, but there are limitations to the nature of the information that it adds. Yes, because yes, different children will get different questions, but if the design of the tests has been done adequately scientifically, then they will be addressing the same underlying skills. Most people would be aware that if they go to their doctor and they get a blood pressure test taken and it is showing something unusual that the doctor will almost certainly or should not, rely just on that one assessment, because the person has probably gone in some apprehension. Maybe they have travelled by ScotRail, so they have been late and all that kind of stuff. They will repeat the test. So people are aware. We all know about the essential randomness of things and what has not been conveyed. This is a real failing, it seems to me, of the public discourse around this, which is that all assessment is subject to random error. There has been some detailed studies of that in England and that degree of random error has very much diminished compared with 20 years ago when the national law curriculum assessments in England were first introduced, but there is still an inevitable amount of random error that is still there and we still have some way to go. That is what the purposes of the so-called reliability measures are in the new standardised assessments. They are pretty high, but they are not perfect and there is a degree of misclassification going to go on. That is not because Andy is doing the test badly or because the teachers are failing to understand them or anything like that, it is just because it is intrinsic to the nature of measurement that it has an element of error introduced. There needs to be some education programme around that and that is difficult. It involves acknowledging that there are mistakes, not deliberate biases but random mistakes that are made. I think that the challenge therefore to educate parents on what to do with those results is going to be very, very great indeed and I do not see any programme from any agency at the moment intending to educate parents about that, sadly. I would like to return to what Lindsay Paterson alluded at the start to the bias and objectivity and you have spoken about it again there. You said that no teacher is objective and certainly when I was teaching we used to be able to identify where certain pupils came from, a certain primary school in the city actually used to inflate grades, so we knew that that happened in the system. Professor Sue Ellis in a previous evidence session made the point that the SNSAs could challenge unethical and perhaps bias approaches to assessment itself whereby children are removed from class, for example, in groups. I wonder if the panel would agree with that assertion that the SNSAs could potentially stop that kind of thing from happening. If it helps to induce a mindset amongst everybody involved that if you are going to get properly reliable evidence then you have to adhere to standardised conditions in the same way that any scientist or any doctor or MD would want to do to get reliable evidence. You cannot, as it were, fix the results by fixing the conditions under which the results are obtained, so that would be a really good thing. Secondly, Professor Hayward, you gave an example earlier on with regard to moderation and quality assurance that Education Scotland used to spoke of teachers working collaboratively to get a better understanding of standards. If the SNSAs offers the same opportunities for teachers to work collaboratively to get a better understanding of CFE levels, Lindsay Paterson talked about the accumulated wisdom of the profession. Could there be an opportunity to improve that as a result of the SNSAs? I suppose that it is back to a point that I made earlier that the SNSAs give you information about very limited areas. For example, one assumes that the purpose of being a teacher, as you were in the classroom, is to help children to become better readers in that sort of context. The SNSAs will give you information on aspects of that, but as a teacher you know that the motivation, whether a child believes that they can read, whether they see reading as being important, are all crucial factors in whether or not a child will make progress in reading. It is living with that complexity and focusing on—I would argue that parents also want to know what can I do to help my child next? What is my child moving on to? What are the most important things that they focus on? The SNSAs can play a role within that broader picture, but it is the quality of the teacher, their understanding of the curriculum and their ability to generate tasks and experiences for young people that will allow them to develop as positively as they can, and then their ability to discern progress and focus on what happens next in learning. It is a complex picture, and we have to learn to live with that complexity and support that complexity if we really are concerned to improve the life chances of every child in Scotland. Some of the questions that I had about what Professor Paterson called neutral and reliable might have been ready to discuss, but would it be fair to say that a test that can be applied at any point between a child being four and a half and six with a lot of support and practice as we were advised at the demonstrations or not, any practice will distort the information that the classroom teacher is getting? If the purpose is, as it now appears to be, to give the teacher diagnostic information about how to help the child to make further progress, I would say that the risk is not too great, because the teacher has already taken into account the fact that he has chosen to test that child at age six, perhaps in the summer of the age worked out that way, summer of P1, rather than earlier. That would not be a problem. Where there is a problem, when I said an answer to Liz Smith's question, is in trying to aggregate the results to make interpretations about the system as a whole or local authority or the school. I would say that, if that is happening to an extent that we do not know about, then it actually comes close to invalidating the results when aggregated to these levels. I suppose that the other thing that I am interested in is, within the system, where does this process lie in terms of being important? I will give an example. When I was still a classroom teacher, S1 English class, you had to say some, maybe in October that there was going to be a parent's night. You gave the initial idea of how the kids are doing progress, behaviour, homework, and effort. I would want to give all of them A's, because, basically, they come in, they are really enthusiastic, really keen, they are in a new school, they are doing their very best. I was told by the head teacher that I could only give 20 per cent A's, because, after all, by the time they got to higher, only 20 per cent of them were able to compete. In fact, by giving a child an A and recognising what they have been trying to do, they are keeping them engaged in school. It is an entirely valid thing for a professional to do to say, I want to keep these wee people enthusiastic. I am not going to tell them now, by the way, that you are not going to get a higher. Do you accept that that is part of the assessment? Perhaps objective testing allows the teacher to know what they want and aspire for the child, what they want for themselves against the testing. First of all, do you think that that is valid? Secondly, if it could be established—we talk about not teaching to the test—if it is established that, in schools, support staff have been taken away from children with additional support needs in order to manage the process, which would have a disproportionate impact in schools with disproportionate high numbers of children with additional support needs, should that affect—does that matter? Is that a judgment in the effectiveness of the policy of a standardised assessment test? I am being told anecdotally what is happening in a primary school with a lot of children with additional support needs. The support staff have been taken to run the system. Is that not another form of distortion, which is the same as teaching to the test? That is a serious failure in so many respects. I do not want me to be pointed out, but the answer to that would be that it completely contradicts the idea that the purpose of the test is to inform the teacher's judgment, because the teacher, as it were, cannot subcontract their judgment. They have to hone their judgment on the test that they, as a teacher, administer. That is now a consequence of the test. It is a consequence of the school management and the local authority management. Is it a consequence of the compulsory nature of the test in a school that does not have the resources to do anything other than manage it in that way? It might be a consequence of the ways in which the tests are implemented by Government as well as by the school and the local authority, but it is not a consequence of testing as such. It is a consequence of the context of the testing. If we go back to the point that you made about your headteacher with his normal distribution—maybe it was her normal distribution—in mind, I suspect that that was nonsense and that should never happen. Clearly, we should not ever constrain people by completely non-evidence-based standards. If you wanted to give A's to everyone for the purposes of encouraging them, that is fine, but it does not produce a judgment. It is a form of exhortation. It is what the team coaches do at the beginning of a football match or something like that. It has nothing to do with actual performance. After the match, the team coach would want to say that you did well and you did not do well and you did not try hard enough. That would be the point, because that would be based on evidence. If the whole system of national assessments encourages greater respect for evidence in making assessments and judgments across the system of Scotland and Scottish education as a whole, that would be a good thing, because people would no longer get mixed up between exhortation and assessment. Would it be valid in assessing the benefit of standardised assessments to ask schools what the consequence has been on the routine processes that are going through? I am troubled by the fact that we were told in a demonstration that a child could be basically tutored and out around any number of chances to practice a test before they do it, which must distort what is happening in the classroom in terms of time. That was part of the practice sessions that the children would have. I do not think that it is part of the actual assessment itself. If you have not got a standardised test, you do not have to practice a test before you do it, self-evidently. Some schools might make the judgment in the way that the old survey was. You go, you do it, you come back. It has not really got anything to do immediately about the individual impact on you as a learner. Teaching to the test is only a bad thing if the test is bad, if the test is not a valid assessment of the content of the curriculum. If there are going to be lots of teaching to the test, which there will be, we would better make sure that the tests are valid, that is that they are assessing what is in the curriculum. For example, in primary 1, we are expecting children to tell the time from analogue devices, not just digital things. If that is a reasonable thing to have in the curriculum, then it is a reasonable thing to ask them to perform. It is not an unreasonable task at all to ask them to look at an analogue clock. It might be unreasonable at primary 1 to ask them to look at Roman numerals on an analogue clock, but that is not the point. It is about interpreting the position of the hands. In other words, the teaching to the test mantra is overused. Sometimes teaching to the test can actually be a good discipline that forces people to think. After all, we expect people in higher mathematics to have been taught to the test to the extent that they learn how to perform mathematical operations. Indeed, if we go back again to primary 1's school, it is true that, of course, the tests, as has been frequently said this morning, test only certain aspects of attainment. Those aspects are fundamentally important before any other progress can be made. Unless a child can actually do the elementary operations of arithmetic, they will never make progress in any other aspect, not only of maths but of science and in many aspects of social science as well. Checking that the child can add and subtract and multiply and divide and can do that mentally as well as on paper, although apparently very narrow, is the basis for the child flourishing in later life. Teaching to the test is not necessarily a bad thing—it depends on what the test is doing. I think that Islam has raised some very interesting issues about the relationship between—again, if the focus is learning—if, as an English teacher, what you wanted to do was to encourage someone to learn—a system that asks you to put a label on that learning is not necessarily the most helpful way to do that. What can that child do now? What does your understanding as an English teacher in terms of the progression of the learning journey from the time the child walks into the school until the time they are likely to leave? How does the child that you are working with relate to that learning journey, and then how might you support them to make progress on that journey is absolutely crucial? It is interesting that, in Norway, for example, it is illegal—it is written into law—that you cannot put a letter or a number against a child's name before 12. In that context, it is a recognition that using letters or numbers that are shorthand symbols for professionals and shorthand symbols intended to communicate with people externally can have a negative effect on the self-esteem and the confidence of the very young people that you want to support most effectively. There is a confusion sometimes in people's minds between criterion referencing. Having a criterion and then looking at the child's progress in relation to the criterion and the development, as opposed to norm referencing, where you are looking at the 20 per cent who can, or you are looking at the... I think that I would make a plea not just for better understanding about standardised testing, but actually, as a society, better understanding about assessment and its potential to enhance learning, but also the real challenges that it can set up for us in trying to achieve a society where every child makes really good progress. Louise's point about norm and criterion referencing is an interesting one. If you want a well-rounded and comprehensive picture of how a young person is developing intellectually, ask the teacher, and that has always been true, and I think remains true. Very few classroom teachers, particularly if they are operating in primary and have got the whole, more or less the whole week with the child, would have any difficulty in off the top of their heads giving you some kind of norm referencing of all of the children in their class, whether it was for reading or for arithmetic or whatever. How that related to how children elsewhere in the country are performing is an entirely different matter. So if you want a criterion referenced assessment, probably don't go to the class teacher. The information that you would get from the standardised assessment will be more helpful to you at any rate in relation to the limited part of the curriculum that it covers. We have in recent years become much more interested in how teachers' judgment correlates with some more objective notion of expectations and standards, hence the emphasis that has been placed on moderation that was talked about earlier on. The new assessments provide teachers with a tool that will help them to do some of that, and that is quite a valuable contribution. To return to the issues that are raised around the comparability of the data and the point that John Lamont raised, you will have some children taking these tests in primary one at four and a half and some at six years old and the significant difference between them. Just to clarify, Professor Patterson, did I pick you up correctly earlier on when you said that the aggregate group-level data at that stage, if that variability is not recognised, it would invalidate the data? As a simple headline, it would invalidate it. It is too big a variation at that age. It is not—this is a dependent thing. Obviously, I have students who vary in that age more than that age at doing their final honours exam, and we do not apply an age adjustment, so clearly it varies. However, at that age, certainly at that very young age, it would be—you could not draw, I think, it would be safe to say, valid inferences if you just have the test result and no measure at all about progress on the basis of that. Instantly, that is an argument for later stages in primary and having baseline testing in primary one, because it does allow you to measure progress, and at least then you can take account of it. Anyway, I am introducing too many caveats. My answer to your question is yes, it would invalidate it. From your experience, do you believe that there is a sufficient level of data electricity within local authorities and within schools to recognise and compensate for that? There is not. One of the reasons there is not—I mean, it is demonstrable that local authorities do not have that statistical expertise. Sadly, it has to be said that the vast majority of Scottish teachers do not have it either. Remember that you can do a primary teaching degree with what is now the equivalent of national five life skills and mathematics—sorry, it is now called applications of mathematics, which for some of us of a certain age used to be called arithmetic O grade, and it was similarly standard grade pass, in other words, C, and national A5. That is not enough to understand the complexities of statistical sampling and of measures of reliability. What is more, you might think that they would get this in their teacher education programmes, but, as was pointed out to you last year when you had evidence given here by some student teachers, they get no more mathematics in their undergraduate programmes than they took with them to school. They get courses on the teaching of maths, but they do not get any more maths. The typical primary school graduate is emerging as a new teacher with no more than a life skill level—a application of mathematics national five. That is not nearly enough. That is the basis on which I say this, that there is not enough expertise to allow the evidence to be interpreted in schools. Is that an opinion that other panellists would share? I think that there may be variants across different teacher education institutions. No, the evidence that was produced from that same session to this committee, as I said, I think was last year, might have been the session before, there was a paper from the Scottish Government that looked at the amount of time in a typical four-year programme devoted to certain activities, one of which was mathematics. There was variants, but none of it was more than a few hours per week. It wasn't nearly enough. They didn't even get to the level of higher mathematics. There's variants, of course, from one student teacher to another, because they're coming in with varying levels of expertise in mathematics. However, as we place increasing importance on teachers interpreting evidence, this has implications for initial teacher education, which so far are largely unconsidered. One thing that might be said, Finland is often a place rightly admired. One of the questions that is often asked is, why does Finland do so well and doesn't have national testing until the end of primary school and so on? One answer to that question has often been said, I think rightly, that it's got to do with the quality of teacher education. If you look into detailing what that means, for example, I think that I'm writing this figure, about 15 per cent of primary school teachers have enough of a component in their teacher degree to have a mathematics qualification. They could satisfy our requirements to teach maths in secondary school. On average, that would mean that every primary school had at least one person qualified to the equivalent of a mathematics honours degree. That doesn't mean that every teacher has to be able to do this, but you would want every school to have somebody who could interpret the evidence and share that interpretation with their colleagues. It's the same as true of other areas of specialism within the Finnish foreign languages or Finnish as a language, etc. The only thing that I would add to what Lindsay has said is that in terms of assessment literacy, it's assessment in its broadest sense, so it isn't only about interpreting statistical evidence, it's the broad picture, it's about how assessment relates to curriculum and pedagogy and the skills that are needed. It's making sure that in our system and in all the layers from the classroom teacher to, I don't know what kinds of induction programmes you have when you come in to work in Parliament, but it's the extent to which people are supported to carry out the roles that society is asking them to carry out, and it's making sure that that's there right the way through our system. The layer up from schools looking at local authority level. There's a challenge for teachers there and that level of data literacy is one of many skills that would be desirable in a teacher. At local authority level you have the opportunity to create posts and to recruit people with the specific skills for those areas, but there's been some evidence that local authorities no longer have the quality improvement staff who have that level of understanding. Is that something that you have picked up? The introduction of SNSAs and the need for local authority level staff with that level of data literacy has come at the same time as local authorities have lost the staff who would have previously had the relevant skills. Local authorities have a declining capacity to offer support to schools, and so long as local authorities remain an important tier of organisation within the system, that is decidedly unfortunate. Building capacity is fundamentally what we're talking about building capacity in the system. Again, I think that that probably might vary from authority to authority depending on the size of the authority, but the other issue is it's about seeing these skills and these competences of part of what it is to be a professional teacher. It isn't just initial teacher education, it's about making sure that throughout a teacher's professional career that there are opportunities for them to develop and hone and enhance their skills in those areas. That's been a very long session this morning. Dr Blumrath Professor Hayward and Professor Patterson, thank you very much for your attendance at committee today and also for your submissions, which you'll have heard have been highly valued by the committee members. Our next session on Scottish National Standardised Assessments will continue in the 30th of January. I'm going to suspend for five minutes but remind members that we are coming back into public session. We move to agenda item 3, which is a petition P1694 on free instrumental music services, which is referred to us by the Public Petitions Committee in the course of our inquiry into instrumental music tuition. I'd like to put on record my sincere thanks to all who signed the petition and thanks also from the music and education partnership group, who raised the issue with the committee during our evidence session, and to all those who gave evidence into our inquiry, including some very powerful contributions from the young people that were involved in our deliberations. Our inquiry report was published yesterday and we're hoping to hold a committee debate on the report in the chamber involving members from across the Parliament in the near future. We will also consider the Government and COSLA's response to our recommendations at a future committee meeting. In addition, we will consider how what's going on now research aligns with the committee's findings. The paper suggests closing the petition at this stage, or alternatively we could leave it open until we have received the correspondence regarding the report from the Scottish Government and COSLA. On that basis, I'm looking for guidance from the committee as to the preference to hold the petition open or to close it today. I just wonder if it wouldn't be courteous to keep it open at this time, until such time as both the debates happened and the Government and other bodies have responded to the committee's report. It just struck me from home yesterday. A couple of people actually mentioned the committee report, which it has to be said does not happen every day, so that suggests that it struck some kind of chord with people. I think that given that it's entirely relevant to the work that we've just done, it might be courteous to act in that way. Is anyone else otherwise minded? No? Can we content to leave that open to deliberation at a future meeting? Thank you. Our final agenda item this morning is the committee to consider further response from COSLA to the inquiry report into the attainment and achievement of school aged children experienced poverty. The committee considered the response from the Government and Education Scotland and an initial response from COSLA last year, but the substantive response from COSLA covers issues that the committee acknowledged would take longer to analyse and therefore respond to. One observation to make for committee members is that the Government responded in relation to the committee's recommendation that all local authorities should be surveyed on charges being made for core education and how those charges contribute to the cost of the school day. The Government's response said that it would pursue those challenges with COSLA, but the recent submission from COSLA does not mention the committee's recommendations in this area. It suggests that the committee right to the Scottish Government copying in COSLA slaking clarification as to which organisation is taking forward the work stemming from this recommendation on the cost of the school day and seek details as to what specific work is planned. Are there any other observations from committee members on the response from COSLA? Thank you. That concludes our public session for this week. I now suspend into private session.