 Good afternoon, and my name is Brian Jacob. I'm a faculty member here at the Ford School and the director of the Center for Local State Nervous Policy. You host as many of you know the periodic public lecture series on education policy issues. Today, we are proud to have David Figlio, who is a Barrington lunch professor of education and social policy and economics at Northwestern University. He's also many things, including the founding editor of the Journal of Unability in the House of Education and Financial Policy, has done phenomenal work in a variety of areas, school choice, school accountability, school finance, many, many other things. And today, he's going to talk to us about accountability and unintended consequences. So as most of you know, the standard format is that David will talk for about an hour, and we'll have about 20 minutes for Q&A at the end. And so lastly, before I turn it over to David, I'd like to make sure I thank Bonnie Roberts and Tom Cook who have organized the logistics and AV equipment and to recognize the Ford School for Cosponsoring and the Charles and Susan Gessner Fund, which has helped support some of the education policy work here at the Ford School. So with that, I'll turn it over to David. Thank you. Thanks. So it's a real pleasure to be here. It's actually been quite a long time since I've been at the University of Michigan. This beautiful building wasn't here the last time for one thing, and I've been really enjoying my afternoon here so far and looking forward to lots of stimulating conversations. So my task for today's talk is to go through a lot of the different policy design types of questions that people are thinking about or should be thinking about with regard to setting up school accountability systems. This is a particularly opportune time to do this. No Child Left Behind, which as presumably everybody knows is the federal school accountability system has been due for reauthorization for now four years. And we haven't yet seen a reauthorization of No Child Left Behind, but on the other hand, now this winter, the federal Department of Education is going to be allowing states to come up with their own effectively alternatives to No Child Left Behind through this so-called waiver process. And in the waiver process, states will have the opportunity to, in exchange for accepting certain types of strictures from the federal government with regard to things like teacher compensation or teacher evaluation, I mean, the states will have more freedom to design No Child Left Behind compliant accountability system that suits their goals. So I think that having a talk that describes some of the issues that states ought to be thinking about, as well as the United States government, as where eventually maybe going to have a reauthorization of NCLB should be thinking about, I think, is relatively opportune. So as an economist, one of the things I'm always interested in is thinking about incentives. We could very easily call this talk, instead of calling this talk intended and unintended consequences of school accountability, we could call this talk People Respond to Incentives, because that's essentially what I would argue is the real one of the major lessons. Like most public policies. So what I tell grad students, policy grad students, for example, is that the purpose of public policy is to change prices. And so by prices, I mean, basically, if people were going to do something otherwise, we wouldn't need the public policy to get them to do it. So we need to change the incentives in order to do that. But of course, then, a lot of public policies think that we can go and reach in and change one little thing without recognizing that the people who are going to fight so hard against doing things they weren't going to want to do anyway might react in different ways as well. So that's kind of one of the things that I want everybody to keep in mind as we're talking about this as well. I'll mention that I'm going to be short on specifics in this discussion over the next 45, 50 minutes or so. But I invite anybody who wants me to go long on to details for any of the things I talk about to do that in the Q&A. And I'm happy to go just about anywhere. I also want to applaud whoever came up with that piece of clip art because it's really awesome. OK, so this is more my technological style. Yeah, that's right. That's right. So, Jeff, you were responsible for the clip art, right? That was really fabulous. All right, so it's useful as we start to think about why do we have school accountability to spend a couple of minutes talking a bit about the history of the school accountability movement and how we got to the place we are today. And so it's important to know that the school accountability movement, whereas now, I would argue, it's entirely a standalone policy. It emerged out of a very different movement. It was really emerging out of the student accountability movement of the early 1980s. And the student accountability movement was anchored on both sides of the Atlantic in the United Kingdom and the United States. Both the United Kingdom and the United States had been shellacked relative to our peers in several, at the time, several very prominent international test score comparisons. And people were very worried about the United States and the United Kingdom really losing their competitive edge in the world. At that time, they'd fall into being in the middle of the pack amongst OECD countries, for example, or sometimes below the middle of the pack. And the worry at that time was, oh, we're just not challenging students enough. What we really need to do is come up with a set of standards and then hold students accountable for those standards. And so what ended up occurring in that student accountability movement then is that states around the country began joining, and this was also true then also in the United Kingdom or really in England and Wales at that point, began having a set of exit exams for graduating from high school, for example. And these exit exams, well, residents of New York know that exit exams were not a new thing. The region's exams have been going on since the late 19th century, but in the rest of the country, we saw the spade of new exit exams happening around in the 1980 to 1985 period. I was a resident to the state of Maryland. Maryland was an industry leader in exit exams for students. I had to take exams in, they called it the functional mathematics, functional writing, functional reading, functional science, functional citizenship exams, and I had to pass each of those. I successfully managed to do four out of the five the first time around, but I didn't make it pass the functional writing test the first time around. One thing you might think about is you may imagine that that's a signal about my own writing and it very well may be, but actually, these exams tend to have a pretty large stochastic component and we'll come back to that later on. And in my case, I learned that I had failed the Maryland functional writing test and therefore had to take my senior year in remedial writing the same day that I learned that I was my school district. These are county level school districts too. My school district's nominee for a statewide writing prize. I was the runner-up, but so instead of taking AP English in my senior year, I took remedial writing instead, which actually was great for my social life because I just went to the library the whole time and as an academic, that's the pinnacle of our social life. So I was planning for that. So one of those two signals are both of them were wrong, and I don't know which one is true. So Maryland, Massachusetts, Michigan, other states that don't begin with M all started to introduce these exams in the mid-80s. And sometime in the early 90s, people started to realize that if we're going to be testing kids in more and more subjects in more and more years to evaluate the kids, maybe we could use these to evaluate the schools and educators as well. And so what we started to see was this morphing of this notion of we're going to set a certain standard, hold students to us accountable for reaching a certain standard in order to receive some type of criterion for graduation, and this became then we're going to start holding schools accountable for helping kids get to a particular standard as well. And there are a number of different rationales for why the school accountability movement really started to take off in the 90s. A lot of people who are not familiar with this literature often think about school accountability starting with no child left behind. But of course, by the time we got around the no child left behind, the majority of states had already had their own school accountability systems sometimes for half a decade or more. In a handful of large cities, Brian has famously studied an early introduction of school accountability in the city of Chicago, for example, accountability systems predated no child left behind for more than a decade. So there are some reasons why we have these accountability systems, no child left behind in others. One rationale is this notion of performance measurement in general in the public sector. So this was very popular in both, Reagan era, US, is Thatcher era, UK in particular. Both of those two movements on opposite sides of the Atlantic were extremely interested in this notion of saying, well, public sector enterprises and nonprofit enterprises should be held to the same types of standards as private sector enterprises should. And so one argument then for having school accountability was to say, okay, well, let's benchmark these procedures against the private sector. Let's measure some type of outcome and hold public sector enterprises accountable for attaining some outcome, just like the profit-based outcomes you see in the private sector. A second is, I promise, because I know we're a mixed crowd, I promise to keep jargon to a minimum. But for those of you who don't know what the principal agent problem is in economics, the idea here is that if you're interesting, one group of, if you're interesting, somebody to act on your behalf, to make decisions on your behalf, it may well be the person who's making decisions on your behalf might have more knowledge and understanding about what's going on actually in the decision-making process than the person who's trusting them to do it. The words principal agent problem came out of this notion of a fiduciary relationship in business. But of course, in this case, you could think about a principal agent problem being the principles, in this case, as society, the families who are entrusting schools to educate their kids on society's behalf. But the teachers and principals, who are the agents, are the people in this case who know more about what's going on inside the school. So the notion may be that if we have this accountability system, this is now providing some additional information that's shining a light on things that may have occurred behind closed doors before. And so, and then another possibility is that one thing we started learning in the standards-based reform movement is that if we hold students to a high standard, students tend to live up to higher standards, except for those who drop out. But I mean, on average, the students who succeeded in making it through high school, those like me who weren't deterred by early failure on the test ended up doing better as a result. And so the notion would be if we're holding students to high standards, perhaps if we're holding educators to high standards, this may end up leading to a way to help to start closing these gaps that were emerging between the US or the UK and the rest of the world. School accountability has spread well beyond the borders of the United States and the United Kingdom at this point. I just got back a few weeks ago from a conference in Brazil. Brazil is just the latest set of countries all around the world that have introduced very aggressive school accountability systems. It now is the only major education reform that is implanted in every human-enhabited continent on earth. So school accountability is with us around the world at this point. And so it's important to think about, well, how should we be designing an accountability system? At this point too, one thing we've observed is that if you think about the rhetoric surrounding No Child Left Behind over the last decade, it's gone from many people think it's a great idea to most people hate it, but it's not going away to now we can't possibly end No Child Left Behind, we have to mend No Child Left Behind. And so the idea right is that we're not going to be in a situation in which we don't have school accountability. I don't think we are. So the only question is what's accountability going to look like as opposed to whether it's going to exist. So why should we care about school accountability? Right, there are a lot of things that the government does that maybe may not seem entirely consequential. Well, accountability turns out to be very consequential to all different types of stakeholders in many different dimensions. So there's evidence from all around the world that measured school quality is capitalized into housing prices. So as soon as we start to test students, assess schools on the basis of testing students, this leads to immediate and lasting change in housing prices. So this is one place which means that people, even people who don't have kids in school need to start worrying about the nature and the design of school accountability because their assets are going to be affected. I mean, not that anybody has assets in the housing market anymore, but still in the event that we want to sell, this may make a difference, right? I mean, the relative lack of assets. State accountability marks, as I mentioned, are very, it's not only just measures of school quality, but also how the state measures the quality, it seems to matter. So a paper of mine that I published six or seven years ago, for example, looked at the introduction of accountability, state accountability system in Florida, and the state accountability system in Florida took a bunch of publicly available data and mashed it together. So essentially it took a set of continuous variables and turned it into a discrete variable. And so what we find is one school that just barely missed being called a grade of A ended up having some very, very unlucky homeowners in their school zone relative to the school that just barely made the grade of A. So it turned out that in the short run, at least there was, depending on model specification, anywhere from a nine to a 15% premium to getting a grade of A versus B, holding constant all the already present data that was available with regard that were the components of that accountability system. So it's clear that people pay a lot of attention to this. This result's now being replicated over and over again all around the world. Accountability influences where people live and whether they choose public or private schools. Those of you who are a glutton for punishment in two different ways can go listen to me talk at 8.30 tomorrow morning. So if you wanna get up early and have a bagel, you can hear me present a paper on that topic, for example. Direct stakeholder support is dependent on accountability grades. What I mean by this, right, is that so a lot of people don't realize this but anybody who has school-aged kids certainly does that the number in the United States, the number three after local government and state government, number three supporter of school revenues in most states is not the federal government but rather private voluntary contributions by us to our schools. It could be through PTA contributions. It could be through other voluntary contributions in the state of Florida, for example. You're hearing a lot about Florida because that's been my poster child for the last decade. In the state of Florida, 6% of operating revenues come from the federal government, 9% come from voluntary contributions. So that's still smaller than 55% coming from the state and 30 something, whatever the rest is, 36% coming from localities but it's a pretty big number. So what do we find, what do we know? What we do know is that people withhold resources if they think the school is doing poorly and they give more generously if they think the school is doing well and they pay incredible attention to accountability marks especially in this case, this is especially the case for schools that serve disadvantaged populations, low income or racial or ethnic minorities in particular. Arguably some places where money may be even more of an issue. Teacher and principal or headmaster mobility are affected by school accountability as well. So there is an increasing number of studies that say that good teachers tend to flee schools that are getting bad accountability marks. Good principals are tending to want transfers. These schools have a difficult time recruiting new teachers. They're more likely to replace good teachers with rookie teachers, et cetera, who are of unknown quality, et cetera. So all of these things suggest that a lot of people care about what these accountability marks are. And this is why fixing No Child Left Behind or Michigan's version of No Child Left Behind is so important because it matters to a lot of people. It affects education in profound ways. So there are in my opinion five different fundamental types of policy decisions that people have to grapple with with regard to accountability systems. They range from the more relatively interesting to relatively uninteresting. So I kind of think that the first three, the four of these are more interesting and I'll run out the clock hopefully before talking about time considered for rating schools. Because even though that's really important in some dimensions, it's also kind of mundane for an academic audience. So the other things that I'll just tell you very briefly and then I'll spend the rest of this talk kind of going one by one through these different and tell you what the research says on this and what we know and what we don't know. Scope and domains of accountability. So with this, what do I mean by this? Who's getting tested? In which grades, in which subjects, for example. That's going to be one area that's going to be really important for us to know. Because the decisions that we make, I say the royal we in terms of, well if I were, educations are the decisions I would make would have on this have strong implications for what gets done. How do we measure school performance? Some people argue that measuring school performance is a straightforward thing to do but those are the people who haven't tried to do it. There are dozens of different approaches none of which are satisfactory, one of which has to be chosen. So how do we do it and what are the consequences of this? I'm gonna focus largely in my talk today about one very salient decision which seems, which really cuts the chase of what might be in play with revisions and no child left behind. Which is no child left behind right now and schools, states, accountability systems, the mirror no child left behind are based on getting kids over a proficiency threshold but the alternative might be to say, do we assess schools on the basis of their progress as opposed to proficiency standards and I'll get into what that means for schools and for accountability design in a few minutes. Exclusions are another decision. Who do we include versus who do we exclude from this accountability system? Do we count test scores for everybody or do we only focus on the stable kids? Do we include or exclude students with disabilities and if so, which students with disabilities? How about limited English proficiency students, et cetera? This has very important decisions and there's evidence with regard to all of the different, the various things that will happen depending on whether we include or exclude certain sets of students. Subgroup identification, by this I mean, so one thing that is both an Achilles heel as well as a great strength of no child left behind is this idea that there are 39 ways to fail and only one way to succeed and by that, what I mean by that is that in order for a school to make, meet what's called adequate yearly progress that is passing under no child left behind, what has to occur is that the school has to be successful in meeting proficiency standards for every subgroup in every subgroup in every test, every subject rather. So for example, a school like my kid's high school right now would have made adequate yearly progress except that they were one student away in the math performance for limited English proficiency students. Now the fact that schools can tell you that they would have done it, but for one being one student away, tells you just how profoundly this is going to impact the school policies and practices and so that's going to be an important question and then the time considered for rating schools, right? Do you base school accountability just on what happened last year or do you do a multi-year moving average, for example? Well, the people who are worried about measurement error, remember I was talking a little bit about my own personal failings as a high school student in writing, well so you average that up over a set of, you average a bunch of stochastic errors up over a small group of people and we could end up seeing big false positives or false negatives, for example. You average it up over a few years and that's going to become less and less of a problem as the law of large numbers starts to kick in. So some people would say wouldn't it be great if we just did a three or four year moving average? But then the problem right is that suppose we actually see genuine rapid improvement. Are schools that are genuinely rapidly improving going to be penalized because it will take a few years for their improvement to show up in accountability standards or the other way around, right? Suppose you had a school that did really well this year. Maybe they might rest on their laurels a little bit if they know that that's going to be helping to pay the way for them to maybe not be as innovative in the future. So these are all the various, I mean there are more but I think that these are the five primary decisions that people are going to have to make. Okay, so let's think about the scope and domains of accountability. So I mentioned, a lot of my research was done in the state of Florida. That's done for multiple reasons. Florida for one thing has been a major education policy innovator in both Democratic and Republican administrations for the better part of four decades. And so Florida's always kind of early on and trying different things. So Florida was one of the earlier states with regard to school accountability. It's the first in the nation for better or for worse in terms of stripping tenure from new teachers and replacing teacher tenure or replacing tenure protections with something based on measures of so-called teacher value added and in many other ways. So Florida is a very interesting place to study education policy in general, which is one thing to gravitate towards it. The other thing that made Florida so attractive for me and so you'll be hearing many of my anecdotes and stories are going to be about Florida today because that's the state I've studied the most aggressively. The other reason why Florida was so interesting for me too is that Florida has unambiguously the best state education data system in the country and has been doing that for quite a while. So Michigan is a place so Brian and Sue have been doing a lot of work to do lots of neat things with Michigan's education data system. A lot of the elements of Michigan's education data system were modeled on Florida. So Michigan is one of a handful of states that actually didn't squander. I guess I'm mic'd up. Well, I said it. Michigan's one of a handful of states that didn't squander the $750 million of the federal government invested in state longitudinal data system. So you guys are doing a great job in marshaling those data and of course they'll only get better and better. Oh I know, you didn't get that money. But you know, but you got the spillover benefit on it, right? Because had it not been, had it not been for the between $26 and $35 million that Michigan received to build and refine its state longitudinal data system, you would not have had the data to analyze when you got your own pot of money. So in that regard, right? I mean, there's a little bit of a collateral benefit. Okay, so in Florida, above the commissioner of education's desk is a banner that says what gets measured gets done. Now actually, historically there was a comma after measured, but I mentioned that my finely tuned writing skills pointed out that that was not the best comment. Actually I was very proud to say this is one place in which I know that my work has influenced something in the education department because the next time I was in the commissioner's office they'd replace that banner with a comma free banner. But anyway, but the point of this is to say that people are extremely responsive to measurement, right? I mean, the new thing of course coming down the line is accountability for higher education. There are not one, not two, but three different national research council panels working right now to try to figure out the best way to measure your performance. So, and we know already, right, that there's a very prominent publication that measures our collective performance in ways that ends up manipulating university resources. Of course, I'm talking about the Associated Press Football Bowl, but we also agree talking about the US News and World Report rather. And we do know that even though every college president denies that US News and World Report is dictating their behaviors, we all know that it is. And the same is true in schools. So there's a lot of very strong evidence that what's getting measured, for example, the specific topics that are on these exams are the things that are being focused upon in schools. So there's a fair amount of evidence, for example, and what's called narrowing of the curriculum to tested subjects, and importantly, to tested topics within a subject. And this is relevant because then, of course, the quality of the measurement becomes really important. Some people would argue that if the tests are really good and they're measuring the skills we really want students to learn, then teaching, so-called teaching to the test is a desirable feature. But if the tests are weak and they focus on superficial skills as opposed to deep inquiry-based or other types of critical thinking skills, then this could be a negative consequence. I was on, I mentioned a couple of NRC panels. I was on the NRC panel a few years ago with regard to the helping, giving advice to states for implementing the science component of No Child Left Behind. So states are not required to use, they are required to test students in science in at least three grades, but they're not required to use that for accountability for No Child Left Behind purposes. And there was a heated discussion on that panel as far as do we want to include science in accountability systems? Should we recommend that to states? And it really came down to the following major conflict. There were people who were very, very in favor of that because they were worried that science was getting short shrift under the accountability systems which don't include science. And there's a lot of evidence that suggests that that may be true. But on the other hand, people were really worried because the types of science that actually can be assessed easily are exactly the things that they didn't want to be assessed. So they were thinking, we'd rather have less science in the classroom but that would be the science we want to have than have more science in the classroom but have it be the science we don't want to have. So for example, a typical science exam will discuss all of you, the science standardized exam. Students are tested on the distance between the earth and Jupiter but nothing about, for example, how do we know, why do we know that Jupiter is as far away as it is? Arguably, I don't know how far Jupiter is. I would have failed that functional test and I'll do. But at least I have at least, don't test me on the Q and A, but I have at least some ideas of how we might have figured this out. Okay. So there's an interesting argument here, right? Do we want to focus on testing in a broad variety of subjects and will that benefit or harm the testing of those given subjects or the teaching of those given subjects? And so that's one thing that we want to be thinking about. And I don't think there's any really right or wrong answer except in my places, if I were to advocate based on my reading of the literature, I would advocate for if we're gonna test, let's invest a lot of money and resources into the testing because if we are gonna test, we know that schools are gonna do what is on the test. And so we want to maximize the likelihood they're gonna do the things we think are valuable. And so there's another interesting question, right? Which is should we include non-test measures? So some people are thinking, some people argue that we should include things like socio-behavioral outcomes on accountability systems. Do we, or some people have argued we should include graduation probabilities on accountability systems. This would certainly cause people to focus more on those types of outcomes, but let's think about graduation rates for a moment. By the way, that's what one of these NRC panels is almost surely going to focus on, I can't say because it's a embargo document that I had nothing to do with. But one of the NRC panels may or may not talk about, make recommendations about evaluating schools on the basis of student credit hours completed in graduation rates, higher education institutions. But of course one way to maximize graduation rates would be to minimize failure. That's something that's entirely within our control, right? I mean, measured failure that is, as opposed to actual failure. And so you could imagine the same type of thing might go on in high school, right? If high schools are giving money or being punished on the basis of how many kids graduate, you may observe the same types of things going on in high schools as well. When we think about measuring school performance, well the real fundamental choice that I mentioned here is between status measures and growth measures. So status measures are basically what No Child Left Behind looks like right now. A slightly simplistic way of thinking about it is that schools are evaluated on the basis of what percentage of students meet some proficiency threshold. Now eventually in 2014, it would be 100%, right? But I mean, but for now, nobody's at the 100% proficiency threshold just yet. Schools are considered good if X% is above, if they're measured fraction proficient is above some target and bad otherwise. Growth measures focus instead on looking at the growth from year to year in students performance. So the idea might be here that perhaps you might have a student who's not at proficiency but is approaching proficiency. Should the school get credit for bringing a kid closer to proficiency? Or alternatively should schools only get credit for kids who are proficient? We can tell normative and positive stories for both. So I'm gonna focus a little bit on both. So first of all, the status measure. Why is it that most states and the federal government focus on a status measure? Well, one reason why they have, which isn't on there, but it's related to it is it's easy, right? I mean, everybody can calculate percentages, right? It's a lot harder to figure out how to measure growth, right? So in that regard, it's easy. It was easy to implement. You didn't have to follow kids longitudinally. There's lots of reasons why this would be the case. It also has a benefit of being transparent, right? Because everybody can calculate percentages that it's very easy then to go and tell parents, journalists, et cetera, 72% of kids in Booker T. Washington Elementary School passed this exam. And so that's an easy thing. It's very transparent. It's very straightforward. It also has this advantage, and this is partially a vestige of the standard-based reform movement too, which is this notion that we want kids to be competent. We want kids to be proficient. And so let's focus on this target, the proficiency or competency target, and say what really matters is getting kids over that hurdle. If a kid is doing well relative to last year, but they're still very far away, should we give schools a pass for that? I mean, that's one of the arguments behind this. We want to have proficiency for all. Some advantages for the growth measure. Well, there's a fairness issue, which I think is pretty salient, right? So one thing we observe is that these status measures of proficiency rates are, no matter how you slice it, extremely highly correlated with measures of the socioeconomic status of the students who attend that school. So for reasons that may or may not have anything to do with the school, or anything within the school's control, it's very easy for mediocre suburban schools serving college educated families to look good, even if they don't do anything. And it may be relatively hard for certain schools to serve very disadvantaged populations to look good, even if they're behaving astoundingly well. So in some regards, this fairness issue is this idea of saying let's hold schools accountable for the things that may be within their control. So that's an argument in favor of growth, another argument, but I mean the problem with this, right, could be, suppose that schools, that school serving low income disadvantaged families ends up having a 5% proficiency rate, even if the kids are approaching proficiency. A lot of people would find that somewhat objectionable because it looks like it may be letting low performance schools or schools that have low proficiency levels rather off the hook. Now there are some really important reasons why we should be thinking about status versus growth, right? Systems that employ status and growth models are gonna generate fundamentally different incentives because they lead to very different rankings of schools. For exactly the reasons I was just talking about. So you can have a growth model which has a growth model based measure which is absolutely nothing to no relationship with socioeconomic status. And a status measure that is very related to socioeconomic status. And these two different types of systems end up sending different signals about which students deserve more attention. So one very famous and well documented set of findings and we've seen this in a dozen or more different cases now is the so-called bubble kids phenomenon. The idea that students who are right around the margin of proficiency are the students who tend to get the most attention in status model based systems of school accountability. Now in the case of growth measure, growth models, who are those marginal kids? It's hard for schools to know, okay this is a set of students who we expect, right? Maybe we're gonna get a lot of growth out of them because they had something happen last year and now they're poised for growth or we're measuring trajectories or something like that. It's difficult to imagine. I mean I've talked to hundreds of school principals over my career. I can't imagine, I've never met a single school principal who I think is smart enough to figure out to predict which kid is going to get growth in which year or which kid should get the specific marginal bump right now, right? It's not because principals aren't smart, it's just because I think that's just fundamentally beyond any of our skillset, I think, to really know this. And so what this may end up doing then is that by the status based models of school accountability might lead to more targeting of resources, the growth model may lead to less specific targeting of resources. Now you might wonder which is, what's good? Is it good to target resources or not good to target resources? Some people will, a lot of people read this bubble kid's phenomenon as an unambiguously bad thing. I think it's much more nuanced than that, right? Because on the one hand, it may very well be that the kids for whom we may actually, if we really believe these proficiency targets, and that's a big if, but if they're substantive and meaningful, maybe we actually really do want to focus extra attention on kids who are on the cusp of really becoming proficient. But you know, it's an interesting question. And I don't know the right answer to that, but it's something people need to think about. One thing we need to worry about is I mentioned measurement error twice already. I'll mention it a third time now. Measurement error is a big deal always, even in status models. It's a geometrically bigger deal in growth models. Why? Because now you're subtracting, effectively subtracting one signal measured with error from another signal measured with error, and so the measurement error problem is multiplicative. So we get it, so all the worries we might have about, is this really reflecting something real, are even more pronounced when we're looking at growth models. And of course, none of these approaches capture school efficiency. They just capture just measures of school performance, measures of one measure of school performance, but not the whole efficiency side. When we think about subgroups, I'm gonna come back to some things about growth versus status in about five minutes. When we think about subgroups, there are some clear trade-offs with regard to treatment of some subgroups. So for example, do we want to include or exclude students with disabilities or mobile students from an accountability system? On the one hand, let's think about the mobile students for a moment, because actually a former student of mine has a very explosive new paper on the topic of mobile students. So in most accountability systems, what happens is students have to have spent the whole year or most of the year in a school in order for the school to get credit for that student. And it makes sense, right? If the test is happening in March, do you really want to hold a school accountable for a kid who arrived at the end of February? How much time does the school really have to focus on that given kid? So most accountability systems now say, okay, we're either gonna base accountability only on the students who were present in your school starting in October or September or maybe the past June, and not the kids who just show up. Well now, there's a benefit to that from a fairness perspective, but on the other hand, there's another issue because now, suppose the school does know that 80% of the kids are gonna count and 20% of the kids are not gonna count. Schools may have the incentive to focus more energy on the kids who are gonna count. Now, how might this happen, right? The kids who are gonna count might get, the kids who aren't gonna count might get put in a classroom with a less good teacher, for example, or they may end up being singled out less for special remedial subgrouping or something of that nature. Both of those things seem to happen a lot. My former student, Umet Ozek, has found recently is that looking at a state that has a very strict cutoff, he's done this regression discontinuity model in which he looked at kids who, wow, okay, he looked at kids who are, you told me an hour. All right, so we'll split the difference. We'll do eight minutes. Okay. So the kids who arrive a day too late seem to have a dramatically lower performance that year than the kids who arrived the week before, for example. Now, there could be some reasons why you might believe that these kids might be different. Umet tried looking at everything he could think of and he couldn't find any case in which they were different on anything except they just have to show up. And this given in this particular state, in this particular state, it's Florida, but I mean, I didn't want to talk a lot about Florida. In this particular state, it turns out that it turns out that I forgot what I was going to say. I'm having a Rick Perry moment. I'm sorry, bear with me. And so I really am an expert though. I trust me. Energy. I mean, in the case of the state of Florida, what ended up happening is every year the critical threshold changed by a week or two. So one year it might have been October 15th, another year it might have been September 29th, that type of thing. And so it's not just kids who are moving in the week four versus week five of the school year because some years that was relevant and other years that was inframarginal. So these are examples. We see the same types of things happening with regard to students with disabilities, for example. One thing we observe with students with disabilities is in study after study after study. I think Brian's might have been the first study to document this, but there have been a number of others since then that schools have incentives sometimes if kids with disabilities aren't counted in the accountability system, schools have an incentive to reclassify slow learning kids as disabled. Then interestingly enough in cases in which when disabled kids are in the system you end up seeing different types of incentives emerging. And in cases like No Child Left Behind where students with disabilities actually are a listed subgroup, we see something very different occurring. We're actually observing a phenomenon over the last few years in which schools are finding ringers, identifying ringers as disabled kids. So we've seen this massive uptick in speech pathologies which have been identified in the states that treat speech pathologies as disabilities. And so these are examples of ways in which schools might go. Now maybe these really were speech pathologies, but we all have a speech pathology in one dimension or another. So it's all a question of threshold again. And so what we observe here is that whether we include or exclude students on the basis of a classification is gonna provide schools with lots of incentives to selectively reclassify students. I wanna instead spend my last five minutes, I'm gonna skip the time period and talk about a few kind of important questions why accountability might not increase student achievement. And I view there is really being four major reasons why. One could be something we've already talked a little bit about, improving measured rather than generalizable achievement. So for example, if you can teach the tests and really the tests are completely unrelated to anything that we actually care about, then it could very well be that schools could improve just the things that they care about as part of the system, but not really improving generalized achievement. There's every study I've seen that looks at both high stakes and low stakes tests tends to show that schools do better on high stakes tests than low stakes tests on accountability. However, my view is that some people view this as that evidence that accountability doesn't work, but I think the weight of the evidence is that schools also do better on low stakes tests too in general, maybe not earth shatteringly better but better nonetheless. But this is one possible reason why accountability might not really work in the way we want it to. Another is again, realignment of teaching toward tested material. I talked a little bit about that already. Strategic behavior to affect measured test performance or failure or inability to respond to incentives. And this fourth one I think is a particularly reasonable one because again, if you think about this whole notion about these principal agent problems, we're assuming in the principal agent problem we're assuming right through the agents actually know how to use their effort more effectively, but that's really a big assumption, right? Because then we're immediately assuming then that schools are populated by educators who are intentionally sacrificing kids because they're too lazy. I mean, and that's, I don't think that's true. So in some regards, it may be that people just, they don't have the resources. Maybe you need extra resources to get to to actually be able to respond to incentives or maybe you would if you could, but you can't, so you won't type of reaction. Sounds like playground language, but the whole notion right of maybe people can't respond to incentives or they don't know how to. I wanna spend my last little bit of time talking about that third thing, the strategic behavior, because I think I've touched upon some of those other issues at various points in the talk. So there are a number of different documented cases of strategic behavior. Of course, the thing that seems to be getting a lot of the ink these days is outright teacher cheating. This has been a fact of life since time immemorial or at least since we started testing, having high stakes tests for people. But of course it's not just the outright teacher cheating. And of course, outright teacher cheating makes headlines in part because it is so blatant and so black and white, right? It's hard to tell a story. Teaching to the test, we can tell positive educational stories for that, right? It could be the case that schools are being manipulative. It could be the case that the test is providing signals about what we care about as a society and schools are aligning their instruction to things we care about. Strategic reclassification of low performing students, right? It could be the case that this is schools going just being strategic and reclassifying low performing kids as disabled kids. On the other hand, it could be the case of the accountability system was revealing that there are certain kids who really needed extra supports that maybe before the testing accountability regime, these kids were being ignored. But then we get into a few other things that maybe are between outright cheating and these things where there are gray areas that are getting a little less gray. One is strategic deployment of discipline. So there are at least six studies I know about now where there is evidence that schools tend to take bets on which place bets apparently on which kids are gonna do well versus poorly on the exam. And the way it works, right, is that kids get into trouble right before an exam and the kids of schools think are gonna do well get easy punishments so they are in school for the exam and the kids who the schools expect to do poorly end up getting very strict punishments so that they're less likely to take the exam. That, it's getting harder and harder to think about that as being educationally beneficial. Feeding to the test is one. Those of you who are athletes may know that one of the things that say athletes might do before a strenuous athletic event is to carboload. It turns out that cognitive scientists have also found the best way to get a short-term brain boost is to carboload as well. To have low-fat, empty calories, high glucose. And that will help to carry you for two to three hours. So students in the room, this is a really good thing for you to continue to consider doing or maybe I should have done that and I might have remembered what I was gonna say in my Perry moment for a matter. Okay, so schools know this too. And so schools are engaging in behavior, tend to engage in behaviors in which they're going and juicing kids' meals literally as well as figuratively and it seems to work. So this is one piece of evidence. Now all of these examples are things that play into, I promise I'm gonna stop in a second. All of these examples are things that it's important to think about from the point of view of the design of an accountability system. Because if we think about all these things, strategic reclassification of kids, redeployment of discipline, feeding to the test, cheating for that matter, all of these incentives are much stronger when we're just trying to get kids over a threshold than they are if we're measuring growth. And the reason that's true is think about the feeding to the test example for a moment. Suppose that we're going to go and give kids 100 extra calories this year. Well now that's the thing that's gonna, that artificially inflated test score is going to be the thing that's being subtracted from a new test score next year. So we'd have to do even more next year, maybe 200 calories, 300 calories next year, 400 calories a year after that, and so on and so on. Likewise with the strategic, say, bubbling in of kids' tests. Maybe teachers might have to start thinking, okay, well now this year, do we really want to go and get all the answers correct? Of course what Brian's found right is that actually many of the teachers who want to do this may not necessarily advantage the kids because they might invariably get the answers wrong, but that's another story. The point that I want you to have, and take away from this, is that if we're really concerned about these types of strategic behaviors, this is the best argument I can think of for wanting to measure school accountability on the basis of growth as opposed to levels because it's next to impossible to do this. And that strategic deployment of discipline is a case in point. So the two places where I've conducted this research in this case where, well in one case it was in the state of Florida, and the other case is another thesis that's being written that I've been supervising in another unknown state, but not Florida. Both of these states had level based accountability systems and moved to value added based accountability systems. The strategic deployment of discipline went to zero. In both of those states. So that's some piece of evidence that growth or value added measures might make a difference. All right, so maybe what I'll do is I'll kind of wrap up. Oh yeah, I do have something about evidence on student achievement. This is actually not even all that, to me not all that important, except I spent all this time talking about design issues. I guess I should tell you what I think we know. I think that the weight of the evidence is that school accountability seems to have worked in terms of improving student performance, not just on high stakes, but also on low stakes tests. At the same point, it's not a silver bullet. However, there's I think very strong evidence to believe that it's not just because of those strategic behavior, low hanging fruit teaching to the test. There's some evidence of substantive responses as well in a number of different settings. Several people in this room have contributed to that literature. So what kinds of substantive responses are we seeing? I'm not saying these are a recipe for success, but they are at least pieces of evidence that schools are doing things other than just playing dice, right? I mean, so they're re-deploying resources within classrooms. They're enhancing remedial education. They tend to be experimenting with changes in the nature of school year, school day, school week, changes in grouping within classes. These are potentially meaningful and substantive responses. So it affects teacher labor markets too. So basically my main lesson I wanna take away from this, accountability affects behaviors, right? People respond to incentives and both intended as well as unintended ways and the design of an accountability system is going to affect which types of behaviors are realized. So Michigan like every other state is currently planning on submitting a waiver application. So any Michigan policy makers who are watching this pay attention to incentives because people are gonna respond and I'll stop and take questions now, thanks. Yeah, yeah. When it comes to the principal aging problem, I guess the way I understand it is really sort of the idea that you have parents who are the principals and you have teachers and the faculty which are the agents. But really where do the students fit in? I mean that they're neither principal or agent. I mean really, have you seen any sort of effective measures which would get students involved in improving the accountability for their own education without the schools having to bear down? So that's a good question. There's certainly again the student, the student accountability, student standards movement suggests that students do respond to incentives. So here, let me tell you one thing students don't seem to respond to. Students don't respond to long-term incentives. They respond to short-term incentives. But they do seem to be pretty heavily responsive to grades. So for example, a teacher who imposes high grading standards tends to get more work out of kids than teachers who have low grading standards. One thing that we've noticed is in these grading standard studies is that high standards teachers are not necessarily the most popular teachers. In fact, they're often less popular. We certainly see that in faculty ratings at universities as well. However, faculty members and teachers who do expect more from students tend to elicit a better response. Of course, they tend to elicit a heterogeneous response. So some students give up. And that's a problem when we're starting to think about this. So one way, I think, to help to enlist students in this would be for teachers to go and enhance their standards. If teachers started expecting more from students, that could end up leading students to react. Now, the one thing that we worry about here is that the response to standards tends to be strongest in elementary and middle grades and not as much in high school grades. And so nobody's been able to, I don't think anybody's really been able to solve how to motivate 10th graders who really don't care. But it does seem like standards can motivate 5th graders and 7th graders to at least some degree. So that would be one little piece of evidence. I saw one hand up over here. Yeah, I wanted to sort of get back to the ask question, which it seems to me a different version of, if I think about what I loosely know about the school research literature, it's pointing to a different thing as a solution to the problem. In some sense, it's a consistent evidence of the teacher matter, for example. That's number one. Number two, even if his point isn't 100% right, you might get some positive response to standards. But you've listed problem after problem after problem after problem. So in some ways, stemming from the outside of this, it looks like we're putting a huge amount of resources into something. It has all these potholes in it. This may not really be the way to go. I don't disagree, by the way. I mean, I, you know, so my view is where I've come down on this is more that accountability is with us. It's in the water. And so the big question will be, how can we design the best accountability system we possibly can? And my view is, in my opinion, the best accountability system we possibly can would be, since we know that people are gonna be teaching to a test, if we're gonna have tests that people are gonna teach to, we wanna make sure that test reflects things we care about. We know that people are going to respond strategically. So if we think that that strategic response is bad, then let's create an accountability system that is going to result in, have fewer incentives for that type of thing. I would much rather see us be in a world in which we don't hold schools accountable in these brute force ways and instead really try to work on thinking about identification and development of high quality teachers, because I agree completely with you. The one thing we do observe is that the one thing that really seems to matter is these really good teachers. The problem is that nothing we found seems to be able to correlate it all with who's gonna be a good teacher. I'd like to think, you know, I mean, I teach, you know, places like University of Michigan, Northwestern University, we'd like to think that we select really good students and then give them add value to them. And then, you know what, when they go into the classroom, they do maybe a little bit better than the students from Southern Illinois University at Edwardsville, but not only trivially better. It can't explain more than, I don't want 2% of the variation or so are these measures of raw intelligence or where you went to college. And then most professional development that we've seen has another one of Jeff's, but doesn't work, right? And so then we get into this interesting kind of question, which is, all right, so what do we do, right? How do we get to a place in which we're, how do we get to a place in which high quality teachers are identified, retained, and compensated without having a system like this? I would like, if I, again, where educations are, I would have huge amounts of principal compensation tied to overall performance and then give principals the opportunity to dole out merit paid to teachers on the basis of, you know, on the basis of their perceptions of who's doing a good job and who's not, but I don't see that coming down the pike either. So I'm kind of depressed, actually. So this is like a cheap, fast alternative. Yes, somewhere in the teens. We're spending a lot of teacher time and student time and we're messing up a lot of things because this is just a world we're in. That's kind of what you're saying. Well, yes and no, yet partially, partially. The big question is, is the world we're in, is the 17th best alternative? So I guess the real question is, what's the counterfactual? Is the counterfactual, if we could roll back the clock and not have entered into the accountability era, would we have been in the 25th best alternative or would we have been in the 12th best alternative? And I'm not really sure, you know, and this is the thing I don't really know because for one thing, we kind of measured it. I do think that, I mean, one thing which I do think has happened is school accountability systems have done a lot of good and have done a lot of harm. And I think that we have the capacity to tilt the balance more towards good and away from harm but I share your feeling. I mean, all the attention we spend on monitoring and evaluating is time that could be spent taking away from developing. If we could just figure out good ways to professionally develop teachers and retain the good teachers and not retain the bad teachers, that would be a good thing. Now let me tell you one potential solution that where we could go. One possible solution would be to focus really hard on breaking down the barriers to entry into the teaching profession. So right now, one reason why we're in the mess that we're in as far as American teaching force is because people have to decide relatively early whether or not they're going to invest a huge amount of money which in thing, in skills, I won't say skills, in credentials that have no market value outside of public school classroom teaching. And so these, because I mean, these are completely non-transferable skills or credentials. Well, if this is the case, right, this means that somebody who is interested in giving teaching a shot who doesn't know if they'll be good or bad but they like kids and they think they might be good but they don't know, why should that person who's not sure they want to commit to teaching ever do that? In some regards, if we could make teaching cheap to enter then in some regards, I think it might be easier for us to muster up the energy to make it, make teaching easier, make it easier for us to get bad teachers to leave teaching because yeah, they tried, they didn't do so well, that would be fine. But right now when people have invested all these years and all that money in this credential, I think that's a major part of the problem. So again, that would be, if I could have that world or this world, there's no question I would go for that world. Make teaching easy to enter and easier to leave. Yeah, I know, exactly, yeah. I think many of us care very little about test scores, many fundamentally, but care about two young people develop capacity to have say a minimum of two or three really good relationships in their life to have a decent income, not to spend much time incarcerated, these sorts of things. As you've mentioned, the accountability movement has been around quite a lot. It's no longer okay, I don't think to say, well, it's not long enough time spent to evaluate this. Is there any relationship between the movement towards what we call accountability and the things we really care about as opposed to the short term things which on the surface have almost no relationship to what we care about? Well, so one reason, that's a great question, one reason that we care, I do think we do care about test scores, I mean, I don't care about test scores either, including those of my children. I mean, frankly, I mean, what I care about are exactly these things. Are they going to be well adjusted members of society or are they going to have good interpersonal skills? Are they going to have certain skills that are going to help them to get, maintain, and succeed in a chosen profession? Are they going to have good executive functioning skills, good organization and management skills, good critical thinking skills? Those are the things I really care about and I think you care about too. Now, there is, I do contest the assertion that test scores don't really correlate with those things because actually one reason why we care a lot about test scores I think is actually because they have some meaningful and substantive correlation. And in fact, we even observe this type of thing. One place where we observe is this new area of research that's coming out that's relating, for example, these measures of teacher value added which we didn't even get into today to people's labor market, students' labor market outcomes. And so we are finding that teachers who raise kids' test scores more seemed also to raise their permanent income or at least their income in early mid-career. So these are, I view that as yet more evidence that these things actually do, they are a proxy for things we do care about, but I do think it raises a really interesting question. So one possibility might be how do we end up, what could we come up with a new evaluation scheme that directly incents teachers and other educators for the things that you're mentioning about directly? One possibility could actually be, and I'm only being partially facetious here, would be to give teachers a stake in the future income of their students. So one possibility might be that we take a small fraction of all the money we spend on education right now and put it into a pool. And so teachers who ends up, every teacher who interacts with a student who is successful, financially successful, for example, might get a tiny little bit of money proportionate to that person's success. And then if we were to look at other, I know, I'm being somewhat... The number of hours in the hall to do that. Well, but not only that though, Charlie, there's more than that, because we might, we don't wanna necessarily say, look, I don't wanna be having our teachers train people to be investment bankers only. Right, I mean, so if we were also to look at things like getting some points if they managed to survive to 25 without going to prison, for example. And that's, because we know, right, if somebody survives to 25 without going to prison, they're almost surely not gonna go to prison later. And those types of, so maybe the artwork could help with that one, right? But as far as I got to art, and those sorts of things, we might conclude, and I think the argument would be that we need to inform our local communities that what really matters is perseverance in practice. In fact, it's not all natural ability that made the difference that you went to school and graduated to school and give a better talk because it wasn't you were born able to do that. And that we do this in orchestra, we do it on the football field, all these places. And sure, we do it in reading and math, the perseverance and hard work shows up in the map. But it's not, it's partly math, but it's partly personal characteristics of the success, generally. And that if a community knows that, one can encourage them and say, in the case of Michigan, the Meek Scores are what they are, and it's a politically driven system. We have to, we are doing it. But pay attention to what kind of young people are coming out and hear what we think should be looking at. And I realize that we've had, perhaps, modest influence, but rather than say, well, everybody's doing accountability, so that's what it is and just accept it, Ann Arbor. We might want to say, Ann Arbor, let's say, the Meeks are currently disrupted in measure, and let's think about other ways to judge your schools. Well, you know what, okay. With that, I'm gonna have to. Okay, that's, there we go, a bad pie note. All right, thank you, David, for a.