 Well, it's wonderful to be here and I will add to I'm going to try to earn my grade back here Brian So we had a cancel last spring. I felt horrible about it because we really did think that we were going to be finished at that point But what Brian didn't realize until he just asked me two minutes ago Is that we actually had the first national release of the findings at Vanderbilt yesterday at 1 p.m? The research team Brian kept on going Matt. Why are you you know? Why do we have to get it done? Why we need the report this way and I hope we can edit this part of the video so it doesn't sell me out here But I just you know, it's time to get it done We figured it out and but it was really because I couldn't cancel on Brian twice So I hope that will at least get me a quarter point back The the real agenda for today at least from my perspective is to talk about the design and implementation Some of the threats to validity in terms of the experiment in which we are conducting To go over findings which are quite interesting. Maybe contrary to what some advocates thought in many expected it in some lights and then to See how this fits within the general policy arena Is anybody else getting an echo a little bit? I move over here In terms of design and implementation We kind of have to take a step back into 2004 2005 time frame because the way in which this experiment was set up is we implemented the project We had negotiated for two three-year period when it was implemented part of the part of the negotiation For for the project to launch was that we wouldn't talk about results We wouldn't do any analyses until full implementation was completed full implementation to us meant that that final check went through payroll services and we were done and No matter how many people asked us. We really didn't look at any outcome analysis except for calculating the teacher Effect estimates in order to distribute bonuses in those Final payments went out in November. We then launched into the last nine months or so Trying to figure out what the story is within the data at the beginning of 04 05 time frame the focus of pay for performance in particularly for advocates of pay for performance We're saying we have this great thing now. It's called value added all we need to do in order to approve teaching And I'm simplifying this quite a bit here is to offer a significant financial incentive, which we've never done most pay programs implemented prior to 2005 had a maximum bonus amount of about three thousand and even three thousand in Many respects was on the high end And that we need to tie the performance standard the performance measure to something that's objective We can't just have professional development or somebody come in the classroom and give you a check mark In value added in a sense brought this to the table At the same time there's reports coming out and surveys of fortune 1,000 companies that from 1987 until present had increased six-fold In the number of their employees whose performance bonus component was based upon an individual level So these things were informing the project and I think it's important to say at the same time that when we design this project This isn't a project that we would think we would implement in practice, right? This isn't the optimal solution We're trying to answer a policy that had been debated in many regards since the 1860s We can go back and see superintendents quoting and saying pay for performance is the way to go And all through the 20s in the 20s. There were some sort sorts of pay for performance programs 1921 the single salary schedule comes in which compensates teachers based upon years of experience and degree held by 1950s 90 over 95% of all school districts in the United States were operating the single salary schedule if we use this to 2002 schools and staffing survey again 97% of teachers were paid off the single salary schedule And in fact if we wait that by the teacher estimates it goes up to like 99.5% It's it's important to remember too within this project that there's There's different types of pay for performance programs or there's different theories that's driving behind it We're testing what people are going to call the behavioral or motivational aspect of it many advocates of pay for performance are going to say a Financial incentive will cause teachers to work harder. I think teachers quite personally work very hard But it may encourage teachers to try something different that they hadn't in the past The other piece is the selection or compositional effect So those who are most likely to be rewarded under the metered activity are likely to come into the profession And those who are most likely to be rewarded are to stay in the profession As well as those who do not receive an award it sends a strong signal either to improve practice or potentially find other opportunities of course we test the motivational aspect So the project was called point the project on incentives in teaching It took place in Metropolitan National Public Schools, which has about 72,000 students 125 130 schools we were based in middle schools grades five through eight There's of the individuals who were eligible to sign up 296 teachers signed up That was about 70 percent eligible. We had at least one control and one treatment teacher in 38 of the 40 middle schools in Metropolitan National Public Schools And that was just in the first year as I'll show later. We had quite a bit of implement of attrition It implemented for a three-year period 0607 to 0809 And teachers could earn up to a bonus of $15,000 the average teacher salary in the district is roughly around 42 And that's per year. So we just randomize in the beginning of the project I want to show this because a fear of mind in terms of the results coming out is is There's two sentences to the story and it's potential that media only get the first and the first which I'll tell you in a bit The second is that pay reform can come in many many different structures We can conceptualize an incentive program in many many different ways This is a taxonomy we developed for a project we did for the OECD But you can see incentive structure, you know comparing a Rank order tournament to a fixed-price contract unit of accountability We could think of a hybrid system or we're rewarding individuals and teams What are the standards and thresholds is it going to be linear? Are we going to talk about some step function again all of these could play? Could play a role in whether they have an effect or not at the beginning of the experiment as well I should say that that this was just not You know Vanderbilt and ran and a couple a team of researchers moving forward and saying we want to do this in a Superintendent saying all right go ahead It took a roughly two and a half years in negotiations That included getting buy-in and support not only from the superintendent the school board From the local public education foundation from the business community from the mayor But also a key partner throughout all this was Metropolitan National Education Association So the teachers association both at the local and the state level When reporters would call and say you know, how did you get them on board? Are you really being serious about this? They were key players And it really came down to the executive director of the Tennessee Education Association Making a statement during one meeting and he goes we just want to know He's like it's been too long. We just want to know whether this is an effective policy to increase student test scores other things that that were laid out throughout the negotiation process is that Teachers couldn't compete against other each other. So we had a fixed-price contract. Obviously this gives us a pretty big financial exposure Awards would be made to individuals not to teams and we actually see something that quite contrary in practice in other places But but Metro was very specific to say we do not want a vote of seventy percent of teachers saying yes To have to make those other thirty percent participate And so they let us implement at the individual teacher level We're gonna evaluate performance based upon a gain score or their growth excuse me their progress over time in the performance threshold certain a bonus this was Difficult one to determine but obviously with a fixed-price contract We didn't want them so low that if everyone got above the bar we would go bankrupt essentially We also didn't want them too high that they You know would almost demotivate those who had no chance at all to reach them and Maximum bonuses were large and hopefully that's pretty clear We developed in terms of trying to figure out what the standard was What's the performance threshold in which a teacher had to meet in order to earn a bonus? We took three years of prior data We calculated a very simple transparent Value-added measure. I'll explain that in just a bit We looked at what the point estimate was at the 80th percentile, which was about three point seven at the 85th percentile And then the 95th percentile and that's how we arrive at these five ten and fifteen thousand dollars This was also important to the district and to many involved because This was it was fixed for the rest of the time for a three-year period So essentially teachers couldn't prove over time at the beginning of the following school year They all received a teacher bonus report that clearly explained an outline, you know, how they performed relative to that benchmark And various documents that we provided so in terms of the bonus formula itself one thing that we encountered was In this when we submitted the grant in prior no five and we heard in the spring of 06 Is we were told that every teacher for the most part in middle school who teaches math just teaches math When we received the data and we had we basically started in July of 06 We had to have this implemented in September of 06. There wasn't much time We found actually the great majority just didn't teach math and so the way in which we were conceptualizing of The performance measure in order to calculate bonuses could leave a lot of room for opportunistic behavior in Game and in that sense is if I teach 50% math and 50% English language arts Am I just gonna teach a hundred percent math because I can now earn $15,000 so what we did is Created a bonus formula that would actually adjust downward if a teacher's students didn't perform in a Non-math subject at the average level of performance within the district for that school year so Basically what we did here is if we say that teacher earns 10 it has hit the 85th percentile of performance obviously We would have I subscript M1 be one I Mscrew up to be one that's gonna get us to 10 PED is gonna fix it to say that this teacher had a hundred students 75% of them were and I would say we're in math 25 of them were enrolled in which one's English And they didn't make it so Essentially, then we adjusted to that they get 75% of the 10,000 This would be this would be district average progress Yeah, and it came up quite a bit actually that that Adjustment took place although there wasn't drastic drops In terms of the performance measure that we use again. I mentioned we use a simple transparent value added measure that I'll walk through right now the We Is everyone sure knows here Tennessee has TVOS it's existed since about 1995 TVOS reports come out The only individuals who can access at least as of the time we implemented this experiment were The superintendent the principal of the school And a designee of the school board TVOS data comes back to the state of Tennessee in individual sealed signed envelopes So we're making good use of our data here when you open up a TVOS envelope you get a single page piece of paper It has about it has a somewhat of a table on it 10 6 numbers the most information is it is a disclaimer at the bottom That's a paragraph saying what these numbers shouldn't be used for For the most part I Don't think teachers ever saw these numbers and in fact throughout the experiment at one point we had a teacher call who Near tears I mean we have interesting phone calls throughout the experiment, but near tears and said I always thought I was one of the highest performing teachers in the district and She had around a negative 27 on a very simple measure of value added which put her bottom the second percentile of performance She'd have been a teacher in the district for 15 years TVOS had always been there But it's now this translation of this information that seems to put something in that she reached out to us to say What can I do to become better? Obviously as researchers we don't want to interject and in train But we pointed her to the math mentors and back to the district So she could get the necessary help in terms of our measure. It's as simple as this Please note these are fictitious names, especially if anybody is here from the IRB See people like academic jokes. This is good Jay Smith is actually a person who used to work with the Center. So so we made him our example And Jay Smith The way we calculate this scored a 250 on the math t-cap in 2006 We could say that he was a fifth grader at this time We would then say we would then have to wait until summer and The state would calculate for us every single student in fifth grade who had it 250 in 2006 what did they score in 2007 on average and that's the 270 We say the expected gain is going to be 20 points and in this particular case Jay Smith did very well He scored a 285 which we could essentially say it's going to be a 15 points above 15 points value added. That's what you get in the final Wrong button. That's what we get there We do that for every single teacher That is enrolled or every single student that's enrolled in a teacher's classroom from the continuously enrolled from the 20th day of school Until the time of spring t-cap testing. We chose that definition because it aligned with the state's NCLB policy And we didn't want to design a system that went against with the structures that were already in place This breaks down for everyone the bonus awards by year We start off with in 0607 we have about 143 treatment teachers And the first thing you'll notice is we end up with 84 in 2008-09 we had a significant amount of Attrition and I'll show you in just just a moment where those individuals tended to go At the same time, it's pretty clear that each year we range between 40 and 44 teachers are earning a bonus And so You know some may think that that something could be happening here on average We spent right around nine to eleven thousand dollars was the average bonus. We paid out 1.27 million over the period of three years We had spent weeks and weeks and weeks before we implemented this project and we projected there's no way it's going to go over one million and You know going into year three in the spring and in already Having talked to most of the major Foundations across the US and we were kind of doing transatlantic flights at this point as well They weren't interested in supporting teacher bonuses. They wanted to do the The research or the program being implemented, but not teacher bonuses We were very fortunate that the individual who put up the money for the bonuses was also Promised to be our financial backstop and he covered the rest and so I'm very very very indebted to him the piece and in throughout the project that that has Been incredibly helpful is I'm sure many in here know is that that district data systems are not built around performance management Nor are they district data systems warehouses in which all the information from HR from federal programs From special programs, whatever it may be is kept in one place Nor is it the case that it's kept in the same type of information management system We were fortunate to hire somebody who was in metro national public schools as a director of data quality and assessment Who actually was a graduate of Vanderbilt to work 75 percent time for the center and his soul He would say a sole purpose was to charge forward and help us get access to the information that we needed in order To to gain as much information that we could to track and to get the data as accurate as possible The best part is that his last name is pepper and he is a doctor Our response rates and surveys that we gave in the fall and spring we had a Between 92 and 100 percent response rate, which was wonderful in year one We only administered a survey in the spring simply because of timing and the amount of Effort and where we were in terms of the implementation of the project But then we began to also include non-participants obviously because of IRB reasons The 30% who never signed up we weren't able to get approval in time to begin to survey them But through working with IRB and making sure that we Followed appropriate rules. We were able to start surveying them in year two We also conducted a series of interviews both at the end of year one and a series of interviews this past spring I just get teachers perceptions Tremendous amount of survey data that comes from instruments We used in the past on our own research studies as well as the brand corporation who was a key partner here in studies as well and Brian actually Has added some things on as well and it's been great We have principal perceptions of teacher effectiveness is another thing we did we asked math mentors who were in the district to Fill out logs about how often they came in contact with teachers. It wasn't always controlling treatment It was just teachers across the district We had teachers take The LMT and so we had a measure of teacher not content knowledge And on and on in the district was wonderful to open all this data up to us Because it was critically important as we started trying to answer this question one of the pieces as well is that When the private foundation reached out to us in the spring you said, you know, how do I know you have the right? Linkages between students and teachers, right? This is the foundation it for us to be able to calculate a value added measure of teacher effectiveness And we said well, we have monthly course snapshots We have we I know where every single student is at this date in the month and when a change occurred In this transactional history and he says, how do you know it's accurate? But it's data. Yeah, it's in the system. And so finally I gave up and I said, how are we doing audit? And he said that that's what I was getting at And so we sent every single year and this was only the treatment teachers because there was a different incentive for the control teachers And there couldn't necessarily be a comparable Comparison here. We sent them a list of their roster every single student that was enrolled in their classroom. They were masked By period and who was in their classroom at some point during the year and would not be counted for bonus purposes What's alarming is that of the 143 rosters that we created in the first year 55 come back with appeals The appeals process was then turned over to the district to solve Oftentimes it had to go down to the site level to figure out whether a particular student transfer was true or not a lot of times It was inter-school transfers or pullouts or some type of special services But we changed 153 students solely based upon that The other thing that came up quite often it came up in about seven instances in the project as well It's whole classes were matched to teachers that didn't teach those classes And there's a fantastic quote from the Dallas Morning News about a teacher who didn't receive a $10,000 bonus because of his poor French scores And the quote goes on to say that that he was very discouraged because he doesn't speak French Nor does he teach French? But he didn't get his bonus because of it and it's simply these data systems That that we have to be very careful about the weight we put on them It states particularly on a race to the top move into high stakes personnel decisions whether it's firing Or whether it's rewarding and the promise can be firing The first thing that's gonna be challenged is the quality of the data and the information in which that decision was made The good news is the data systems became a little bit better, and we see this go down over time quickly. We randomized Obviously treat to get the 296 and the treatment and control We stratify the sample to 10 groups based upon a school effect estimate And then we looked at clusters of teachers. The reasons we did clusters is if we saw a variation In the scaling also of scores in fifth and sixth grade seventh and eighth special ed and advanced algebra There was another piece in here in it didn't come it It's essentially not not relevant in the context based about how things fell out But we're very concerned about cheating or Opportunistic behavior since we obviously had at least one treatment one control teacher in every classroom There was an opportunity that the principal and the teacher could collude and That they could find out some strategic matching of the best students who are optimal in order to Maximize that teachers chance of getting a bonus and This this came up as well that during the recruitment process we had three teachers run up So during the recruitment process we put Train research assistants in a school for the entire day. We had an FAQ document. They could sign up Everyone asked questions except three people They ran into the room signed up in my colleague Dale Baloo who who Is Just been an incredible partner on this project says you don't have any questions and if you know Dale there's just a great look on his face and They go no they go we teach third year ELL students and they said in year one they do Oh, you know, they don't do well in the high-stakes test year two They don't do well in the high-stakes test But by year three when they're picking you know when they pick up the language they get the mastery They're used to the test huge gains And they're like we're taking home $15,000 They didn't it's what we learned But anyway, it was a good story There's really when we talk about threats to validity. There's three things we're mainly concerned about obviously randomization failures Particularly with small groups. We may they may end up not being equivalent. We see this at the grade level We don't see it overall the purpose of Simon is the other piece here as well Which I just kind of spoke to a little bit and then teacher attrition obviously The good news is overall the treatment and control groups were balanced across a large number of Student and teacher characteristics But as soon as soon as we go to a little bit lower level particularly the grade level we start to see some imbalance In those instances when we ask David made the models at the grade level Obviously we put in a tremendous number of covariates into the model The number of sensitivity tests here, and I don't outline all the tests that we did Were fell into the situation where they told the same story It was actually quite nice to see the same story just come up over and over and over There may have been one or two, but by chance We might expect a little bit of that We we asked at the beginning of the of the experiment that principles run the schools The way they always did Obviously we implemented a few weeks into the first of into the first few weeks of the school year So assignments had already taken place This was a huge concern for years two and three when they returned and they knew their principal and they knew who'd be making those assignments We also asked that that the teachers who participated in the project Not tell whether they're in the control or treatment group part of the concern of many people involved in the project Is that it could break down Collaboration it could cause resentment and so so we asked that they did not share this information and They actually actually signed something saying that they would not share that information So so in terms of purpose of assignment and we followed this over time every year in I'm not going to go through the numbers, but really we're looking at proportion of students who switch out of a teacher's classroom You know are we pushing certain kids out or into your class? Are we looking at dropping students with unexpectedly low beginning your performance? dropping students who have a downward trajectory and In part of this we're assuming that a teacher would really go out and try to figure out What the likelihood of a particular student is in terms of the performing credibility well in year three? Year three is a generic number here, and so we're going to expect regression to the mean They do in a sense the teachers in a sense have some access to this information Is with the T-boss data system? They actually do projections, and so this projection information Does individual student projections for everyone who's enrolled in that teacher's class? So it's it's It's plausible this could be happening But during this period of time login rates were very very low into that system And we're also able to track in who logged in So as you see here, we have control and treatment status school year three years of the experiment found in 2006-7 school year we lose two and three teachers if I already announced this once I apologize, but we only lost one teacher because they asked to be removed all Other attrition occurred because they changed grades, and they're no longer eligible under the criteria That we had set forward which is quite interesting because everyone you know the popular media is going to say Teachers do not like pay for performance You know they don't want to do any of this and I think it's important to remember that these things can be done and teachers will remain supportive Reporters have asked me for the last day now You know were you surprised by the findings and I said no, I'm surprised that we even made it this far I mean we were literally on the edge of our seats well into year three of just whether this would would last It's not part of the larger attrition here in year two is One of the clauses we had is that a teacher had to instruct at least ten math students and that were expected to take the spring high-stakes assessment We didn't have those data and those linkages at the beginning of the project because we're still in our data Negotiation pieces and so there's quite a few who in the spring, you know the students take the test We have the data now we can look at and they never qualified to be fair We let them continue in the experiment. They were not bonus eligible that year and a Number had a dropout because they meant I think it was 14 But we can find out the exact number here and so this breaks down control treatment and reason for attrition Just kind of the way we categorize it here is change in assignment versus other ability criteria The two places where we see somewhat of a difference in the end is the other left the district altogether More control group teachers left the district and down here. We see more treatment teachers left teaching But stayed in the district Typically they move to an assistant principal ship or a principal ship Sometimes to a coordinator coach type of position out of grade was you were no longer in grades five six seven or eight There's one middle school I believe in metro Nashville that actually has fifth grade in it, but it's elementary and a teacher changed and We mistakenly did not know this and so we try to drop him and he goes no no He's all very deliberate here because there's fifth grade. That's middle school. So We honored his request so in terms of evidence have been balanced. There's there's really three ways We're looking at this and that's at the beginning it do we see any differences between the treatment and control group Do we see differences between treatment and control group over time and do we see differences prior to? point being implemented I Know there's quite a long list here But where we do see differences and I'll get to some of the student ones next as well, and this is only a set of examples of the type of Variables that we were able to collect on teachers But we did see that that from year one to year two that we ended up with a slightly greater proportion of female teachers in the treatment group Which actually goes against some of the the survey literature saying that males would be more likely and more inclined to participate in an incentive program And we ended up with slightly more more black teachers remaining in Year two year three again, it goes up a little bit higher for female and Then ELL ELL was the only place in which we had slight imbalance in year one after the randomization and then Year hired I guess it is by year three But overall for the amount of attrition that we went through losing nearly half our sample The balance remained in place pretty well We also thrown a number of control variables just in our models obviously because the attrition Number of sensitivity tests and again these are some of the ones that we'll look at it's important to note here, too That for things like t-cap scores These would be the scores from the first year prior to implementation Obviously if we use their score during the year of implementation that essentially could be endogenous with treatment. I Mentioned that an issue With the randomization is small lens, so as soon as we you know, we we randomized on average We're looking at the average treatment effect, but but if we start thinking about doing analyses by grade Are we still balanced at the grade level and you'll see that when we use Achievement prior to point We have slight imbalance here and as you can see the size of the coefficient is quite similar Incoming score of kids in the classrooms that would be there That would be the normalized math test score for students In the year coming in the fifth grade they're in treatment control Between the two groups and this actually would be four So this is seventh grade, so that's going to be their sixth grade score two years prior These are levels Yes, and we I'll show a few charts that we do The other piece and now I'm going to earn back my final quarter point here I do want to make the observation that says Thanks to Brian and Lias Brian and I actually first met over this project I think he was in Chattanooga at the time in the great state of Tennessee And he got my email I called him and I just said look I need somebody who knows how to do this You're the only one who's ever done it actually you created this this in your Chicago study and This this is going to be a big ding on us if we don't have this covered in our proposal Brian was it was generous is he always is and signed up and and did a lot of the suspicious Answer string patterns as well as looking at different forms of of particular cheating or gaming We didn't see any that Would alert us to to having an issue the other thing we looked at every year too of course was was a sorting of students and and Again, nothing came up. The only time that that something did come up. We actually had a teacher call and say Two treatment teachers in my club in my school are cheating I'm like, okay, and we had a very systematic protocol of who could talk to treatment teachers They had a designated line that call into it would come to two of us We were discussed it as a team have a very systematic formal response back and and what she said is she goes They're planning their curriculum together And and last time I checked that was called team teaching So so we followed up In a nice way In terms of dependent variables We have t-cap math scores transformed to rank base these scores Typical things you would see we did sensitivity sensitivity test with the benchmark gain scores that we showed you for Calculating the value added scores There's a number of other things we looked at as well We also added in reading ELA science and social studies another piece in this too that that was incredibly enlightening is is how English 101 was ballet And we are fortunate to audit the hundreds and hundreds and hundreds of different course codes and course descriptions before We did our first year of calculations and we were able to set up a Kind of a protocol in order to consider something an academic subject because the the coding they use often I Can't see ballet with reading Treatment status we look at really in three ways and I'll present these results and that's just simple the average treatment effect One zero one being treatment Treatment status interactive with student grade We're looking at whether there's differences Based upon grade as well as treatment status interacted with year and we only did this one pulled across all years In terms of findings We did not find a statistically significant effect an average treatment effect overall There was essentially no difference between the performance of students in Control group classrooms and the performance of students in treatment group classrooms Years year prop it would be the year prior to There'll be yeah, and I can shoot but if you're in seventh grade it goes back a couple years Yeah, we also found that there was no significant differences for students in grades six through eight When we estimate separate effects by grade We did find a significant positive effect in fifth grade in years two and three of the experiment However, when the student when students Metriculated into sixth grade. We no longer saw this if that saw this difference And the effect is quite large. It's a half to two-thirds of a year's worth of academic growth But given that it does not persist We feel as though we can't place too much weight on it The one caveat here is that and I said at the beginning we're still waiting for the data to come back From the state they changed our testing system. It's delayed We will go right in and look at the same exact thing to see whether the sixth grade The students in fifth grade last year their their effect persists In terms of teacher attitudes, this is probably some of the bigger takeaways. I think I'm trying on my points back Significant gain between for both treatment and control groups or Were you a reviewer It's a critical point here is actually the gains in the school district went up quite a bit during this period of time I cannot tell you why they went up. I could I you know, we probably bring up 20 different reasons one potentially Important thing to note is that in year two the district was facing state takeover There's a lot of newspaper coverage about the state takeover would it get defaulted to mayoral control What was going to happen? There's no superintendent It's potential you could make the the argument that the incentive effect from state takeover Was greater than what we may have seen from the incentive effect of this monetary award We also looked at whether this was true because it was pretty large gain and again, we're normed off the state so We see this in third and fourth grade as well So it's just not the treatment grades in which we see that we see very few differences in terms of of teachers perceptions of different instructional practices attitudes how they're approaching their jobs their positions between treatment and control groups We do see that that teachers in the in the treatment group were more likely to report some somewhat of a positive Outlook towards their school environment and their principals and the control group teachers were For the most part though, there's really no differences that we see over time for lack of a better example, but you know if a teacher said something like I Contact parents more regularly and there was a significant difference between treatment and control Then the next question said Do you contact how often do you contact parents and the next one say never There was no common story there's really no common theme as we try to tease out what all this information was saying Now those weren't exact questions. So but Teachers of all years were generally supportive of the project We had no complaints about bonus calculations. We had no challenges that this measure is unfair And as one may expect novice teachers were much more Likely to think that the program was alright compared to veteran teachers The big aha moment right here Comparison between teachers who didn't did not win bonuses those who won bonuses were slightly more favorable than those who didn't win bonuses This is a good audience. They laugh at my really bad joke Teachers who want a bonus revealed increases in positive perceptions of the point across years as well Do you that's good, that's good That's five bonus points Brian