 Welcome, everyone. I'm Brian Jacob, a professor here at the Ford School and the co-director of the Education Policy Initiative. It's my pleasure to welcome all of you here today. The Education Policy Initiative is a program of coordinated activities designed to apply rigorous research methodology to inform education policy issues, as well as to disseminate best practices in education reform to state, local, and national policymakers. We train students and others to conduct cutting-edge research and facilitate discussions about this across the university community. So today, we are pleased to present Dr. Melissa Clark, a senior researcher at Mathematica Policy Research. Melissa specializes in the design and implementation of rigorous impact evaluations with a focus on education policy. She's here today to discuss her research regarding the effectiveness of secondary math teachers from Teach for America in high-poverty US schools. She's presenting evidence from the first large-scale experimental study of secondary math teachers in Teach for America. And I am looking forward to the talk and then Q&A. I think what we're going to do here is first ask everyone to turn off any electronics. Remind you to turn off your electronic equipment, just so there's no ringing and beeping during the talk. I think Melissa is going to talk for about an hour or so, and then we'll open up for Q&A. But I think you can tell folks whether you're open to questions during the talk, as well. And finally, before we get started, I want to recognize funding we've received from Charles and Susan Gessner for their generous support of this and other similar events. So thank you very much. Not further ado, Dr. Clark. Thanks, Brian. Thank you for having me. And you're welcome to interact with questions at any point. So I am presenting evidence on the effectiveness of secondary math teachers from Teach for America. And this is a study that's co-authored with two colleagues at Mathematica Policy Research. Mathematica is a company that does a lot of big evaluations in education and other areas for the federal government and lots of other clients, too. So the paper I'm presenting today is part of this larger evaluation that we conducted for the Department of Education's Institute of Education Sciences. We completed the study this past fall. And the paper looked at the effectiveness of secondary math teachers from two highly selective routes to teacher certification. So Teach for America and another program that's similar to Teach for America called the Teaching Fellows Program. And the paper I'm presenting today is going to focus just on the Teach for America findings for a variety of reasons. The first reason is that when we started to write the findings up to submit to a journal, we thought that the attrition rates in the Teaching Fellows sample were high enough that those results weren't quite as compelling as the results for Teach for America. And the second reason is that people just don't seem to be that interested in teaching fellows. So a lot of people know what Teach for America is and have strong opinions on it. It's a very controversial program. But I probably, when the study was first released in September, I talked to reporters about the study. I presented at conferences and forums. And I think in that entire time, I didn't get a single question about the Teaching Fellows findings. And people were much more interested in Teach for America. So I'm going to focus on Teach for America today. But if any of you are interested in the Teaching Fellows findings, I'm happy to talk about those at the end as well. But the backdrop for this study and the reason the Department of Education decided to fund it was the fact that high poverty schools really have trouble attracting qualified teachers. And this is a particular problem in secondary science and math. And so in response to this problem, most states have adopted what are called alternative routes to teacher certification, which are intended to lower the barrier to entering the teaching profession and increase the supply of teachers. So these programs typically allow individuals to become teachers before they have completed all their certification to become a teacher. They typically require less coursework than a typical traditional university-based teacher certification program and less or often no student teaching. And then the vast majority of these alt-cert programs are not very selective, which is to say they're neither more nor less selective than the typical traditional certification program, which also usually are not very selective in the types of people they admit. So Teach for America is an alt-cert program, but it's somewhat unique among alt-cert programs in that it is highly selective in the candidates it admits. And it's really before we started this study, we sort of looked at the landscape of alt-cert programs to see what other highly selective programs might be out there. And Teach for America and the Teaching Fellows programs were really the only two large, highly selective alt-cert programs, and I'll tell you more about what I mean by highly selective in a minute. We identified a handful of smaller programs that looked like Teach for America and Teaching Fellows in terms of their selectivity, but they were tiny. So together, these five programs probably provided 20 secondary math teachers in the year that we were looking at. So we decided to focus the evaluation just on Teach for America and Teaching Fellows, and then the work today I'm presenting is just on the Teaching Fellows. So probably many of you are familiar with Teach for America, but it's an organization whose stated mission is to reduce educational inequities by supplying qualified teachers to high-poverty schools. And to do this, they invest heavily in recruiting and selection, so they try to recruit high-achieving college graduates and professionals to Teach for two years in high-poverty schools. They just require this two-year commitment, although the teachers can choose to remain longer, and many of them do. They then provide a short, intensive summer training before the teachers enter the classroom, and then they provide ongoing training and support to their teachers throughout this two-year commitment. Teach for America is a growing source of teachers to high-poverty schools. So in the 2011 school year, they provided 9,000 teachers to 43 of their regions across the country. And they also got a $50 million grant from the US Department of Education and Investing in Innovation grant to scale up their program over five years. So they're expanding by 80% over five years, and so they're aiming to place a total of 13,500 first and second-year teachers this coming fall as a product of that scale-up. So despite the fact that Teach for America does supply a lot of teachers to high-poverty schools that have trouble hiring teachers, the program's very controversial. So one common criticism is that Teach for America and teachers from other alt-cert programs are underprepared for teaching, relative to teachers from traditional certification programs. The argument is, of course, they can't be as well prepared as somebody who's done several years of coursework in a period of student teaching would be. And then a second criticism of TFA in particular is that because it requires just this two-year commitment, most of the teachers leave after two years before they have a chance to gain valuable experience that might improve their teaching. So the argument goes, if a principal is choosing between hiring a TFA teacher who's just going to be there for two years and leave, or a teacher from some other program who'd stay for 20 years, they're better off hiring this teacher who will accumulate this valuable experience over time. But it's an empirical question as to whether these criticisms are actually valid. And so in part to understand Teach for America's effectiveness in this study, we're looking at the effectiveness of secondary math teachers from Teach for America relative to other teachers in the same high poverty schools. And we're defining effectiveness based on students' math test scores at the end of the school year. We're using a random assignment design. So within each school, we randomly assigned students signed up for a particular math course to be taught by a TFA teacher or a teacher from some other program. We have a large multi-state sample. So we have 45 schools in 11 districts and eight states. And the reason we focused on middle and high school math is because it's a hard to staff subject for high poverty schools. About 20% of TFA teachers teach secondary math. So it's a priority subject area for TFA too. Initially, back when we were doing a feasibility study for this broader evaluation for the Department of Ed, we asked if it would be feasible to include science teachers in this evaluation as well. We looked into that and we determined that it really wouldn't be feasible in part because there wasn't at least at the time a good science assessment that we could have used to assess. And most states, of course, aren't assessing kids in science every year in middle school or high school. So for that reason, we decided to focus just on secondary math. There is some high quality evidence in the literature so far showing generally that Teach for America is effective, particularly in math. So there's only been one other experimental study to date, also conducted by Mathematica back in 2004. And that study randomly assigned students to TFA teachers or non-TFA teachers in elementary schools and found that the TFA teachers improved students' scores in math and had no effect on their scores in reading. So the students did just about the same as their peers in reading. So our study is taking that same design pretty much but shifting the focus to this hard to staff area, secondary math. And then a few non-experimental studies have focused on the secondary level. So a couple studies in New York City have found positive effects of Teach for America teachers on middle school math scores. Those are using longitudinal student data and controlling for prior achievement. And then a study in North Carolina didn't have longitudinal data but instead used cross-subject data on students' performance and found controlling for students' performance in other subjects, students performed better in a subject taught by a TFA teacher. And these effects were particularly pronounced in math and science. So the real contributions of our study, it's applying this experimental design to the secondary level, which hasn't been done before. And it's using this broad multi-state sample whereas some of the previous evidence has just been focused on a single state. In our main findings, we find that Teach for America teachers are more effective than non-TFA teachers in the same schools. They increase student math scores by about 0.07 standard deviations, which I'll provide some context for that later, but we believe it's a non-trivial increase in student math scores. Sort of looking directly at this criticism, the Teach for America teachers are only staying for two years. They're going to be less effective than more experienced teachers. We compared TFA teachers just in their first two years of teaching to the more experienced comparison teachers with more than five years of experience. And even with that comparison, these novice TFA teachers were outperforming the experienced teachers from other programs. So I think the main takeaway from the study is that Teach for America can increase both the quantity and quality of teachers in high-poverty schools in secondary math. So I'll provide some background information on the study, and then I'll present our causal estimates of the effect of Teach for America teachers. And I'll present some non-experimental analyses we did trying to account for why these TFA teachers might be more effective than the non-TFA teachers. So I'll start with a brief overview of Teach for America. So its goal is to enroll people with characteristics that beliefs are associated with effective teaching. And so at the time of the study, they had seven core competencies they used to sort of gauge how effective a teacher would be in the classroom. And interestingly, I was up in my temporary office upstairs, and I saw on the file cabinet there was a magnet with a list of the Ford schools, seven core competencies. And they were strikingly similar to Teach for America. So it was like achievement, critical thinking, respect for the Ford school's mission. So Teach for America has a similar set of competencies that they think are predictive of future effectiveness. So they have a very rigorous admissions process. So people submit an application online, then they have a telephone interview, and those who progress beyond that stage go to this full day in-person interview. And Teach for America has a mathematical model they use to screen out candidates at each stage. So they don't provide a lot of information about this model because they don't want candidates to be trying to game the model, but presumably they're using data from previous cohorts of Teach for America teachers and sort of regressing these core competencies and other factors against some measure of that teacher's effectiveness in the classroom, and then using the coefficients on this model to predict how effective the incoming applicants will be when they start teaching. So they offer admission to about 12% of all the people who apply. So very selective compared to most teacher certification programs. After they've selected people, the main training is this five-week summer institute where the candidates take coursework. They do some practice teaching in the local school district summer school, and they do some self-directed assignments. And then Teach for America helps the core members, which is what they call their teachers, find teaching positions in high poverty schools in one of these 43 regions. After the core members start teaching, they get additional training. They have the opportunity to observe other teachers' teaching and to be observed by mentors. They have one-on-one meetings with their mentors. And they also take coursework in local alternative certification programs. And this is not a Teach for America thing. It varies from state to state, so all states require alternatively certified teachers to complete some coursework to gain their certification. So the Teach for America teachers follow these same rules in each state. So they're taking some amount of courses in, typically, a university-based program within their district. School of it here serves that function in Detroit. So the study used an experimental design. So we conducted the study in two cohorts of teachers and students. So in 2009 and 2010. And I actually think the experimental design is sort of one of the most interesting parts of the study, at least. For me, it was really, I'm not aware of any other random assignment studies that were conducted at the secondary school level that randomly assigned students within schools. So going into this, we weren't even sure it would be feasible. And we actually went out and talked to lots of schools with TFA teachers. And what we determined was that it really would be impossible, from the school's perspective, to randomly assign students across sections because scheduling at the high school and middle school level is so complex. So instead, we realized we could only do the study if there were two courses or more taught during the same time period, one taught by a TFA teacher and one taught by a non-TFA teacher. So basically, we defined these classroom matches these sets of courses taught during the same period across which we could randomly assign students. And so the treatment group in our study is the set of students who were assigned to the Teach for America teachers. So, yeah. I would assume in what the schools would be like, I would assume they'd have to be the most or the largest high school. Right. So I'll show you in a minute sort of what the schools look like and how they compare to the typical schools in which TFA places its teachers. But yes, they were larger schools. And the math courses were the more common courses. So for instance, we didn't have any calculus or pre-calculus because in these high-property schools, there typically weren't concurrent sections of those higher level courses. So in terms of the courses, we included sixth, seventh, and eighth grade math and then algebra one, algebra two, and geometry. So we included both teachers from traditional roots and other of these less selective alternative roots in the comparison group. The logic for this was we wanted to really be able to, the counterfactual to be, well, what's the type of teacher these schools would have hired? Had they not been able to hire a Teach for America teacher? And so we thought the best estimate of that counterfactual was the mix of other teachers teaching in that school. So in these schools in our study, about 60% of the sample were from these traditional programs. And 41% were from these ALT CERT programs. In these findings, there are no teaching fellows teachers. They're sort of an oral, those other tiny, highly selective ALT CERT programs we did not include in the study at all. So would alternative include, say, emergency certification, or where are these other 41%? They are just sort of state approved ALT CERT programs. Very few states at least have something that they call emergency certification anymore. So they're sort of teachers who are going through these other less selective AC programs, of which there are very many actually. There's a lot of the programs. And in the Teach for America sample, we included both those who Teach for America considers current core members, who are those in their first or second year of teaching, those who may have stuck around longer and are still in the schools. So that said, 86% of our Teach for America teachers were still in their first or second year of teaching, reflecting the fact that many teachers do leave after that two year commitment is over. We didn't place any restrictions on the experience of either the treatment group or the control group teachers, in part because, again, we wanted a snapshot of sort of, if I'm a principal, I want to know, I could hire a Teach for America teacher who's probably going to leave teaching after two or three years, or I could hire a teacher from some other program who's going to stay over the longer term and gain more experience. So we thought just kind of taking a cross-section of the teachers in the school in a particular year would sort of incorporate the fact that non-TFA teachers probably will gain more experience over time. And so we're sort of seeing the low teaching experience levels of the Teach for America teachers as part of the Teach for America treatment. That's part of the package you get when you're choosing to hire a teacher from Teach for America. Yep. Yeah, so there was very little attrition within the study year. So we were just looking. We had two years in the study, but each was sort of self-contained. So we weren't following teachers across years. Just a handful of teachers left teaching during that period. And we decided to leave classrooms in the sample if the teacher left sort of thinking like hire turnover might be part of the treatment, too. Like if Teach for America teachers are more likely to leave mid-year, their class is probably going to suffer by having some long-term sub for a long time. And so we wanted that. We didn't want to just toss them out of the sample and potentially bias the impact. So they are still in the sample. There wasn't a ton of teacher turnover, though. Sure, yeah. I mean, I think the Department of Education actually, a few years ago, had done a large random assignment study of teachers from these less selective alt-cert programs. So they sort of saw this study as an opportunity to learn more about this other side of the coin, the highly selective alt-cert programs, which is why we weren't lumping them all together in one group. That said, there's still a lot of interest in how the TFA teachers compare just to the traditionally certified teachers. So we definitely do subgroup analyses just looking at those comparisons separately. So we recruited schools from across the country to be in the study. So basically we started by contacting districts with a large number of Teach for America teachers. And then we would call the schools or visit them to see if they might have these eligible classroom matches or randomization blocks with the TFA teacher and a non-TFA teacher teaching a math class during the same period. So we ended up with a sample of eight states, 11 districts, close to 6,000 students. The data collection, our main outcome measure was student math test scores at the end of the study year. So for middle school students who were about 70% of our sample, we used math scores from the state assessments. And then for high school students, since states aren't typically assessing these students in math every year, we instead administered a test of our own, which is a computer adaptive test from the Northwest Evaluation Association. And we tested the kids in general math, algebra 1, geometry, and algebra 2. And this was actually a really interesting test. I was talking to Sue about it earlier. But we got to take, when we were deciding what test to use for the evaluation, we got to try a sample NWA assessment. And so we had a limited number of laptops. So I was paired with another econ PhD. And we did this algebra 1 or algebra 2 assessment together. And it adapts to your ability level. So it quickly got very hard. I mean, it quickly became very challenging, even for two people who had a pretty strong math background. So I think the idea is it's really sufficient in that in a limited amount of time, we had 30 minutes to assess the kids, we were able to presumably get a pretty close read on their actual math ability. We also collected baseline scores from state assessments. And then we got some information on teacher characteristics from a survey we administered to the teachers. And also, there was a lot of interest in how teachers' math content knowledge might affect their effectiveness as math teachers. So we got teachers' math scores from the praxis to math assessments. So there was some student attrition after randomization, meaning that students in our sample were missing their end-of-year math test scores and so could not be included in the analysis. There were various reasons for this attrition. One was parental non-consent. So not all districts, but some districts required us to get parents to sign a form to allow us to collect test score data for their students. And so as you can imagine, a lot of, you send home a form in the child's backpack, a lot of parents never get around to returning that signed form. So we had about a 94% consent rate, which was actually, we felt that was pretty strong. And then among the consenters, we were able to obtain test scores for about 84% of the consenters. And if we weren't able to obtain data, it was either because the student left the district and so we weren't able to get their state test scores or they were absent from school on the day of either the state test or the study test and didn't show up for the makeup sessions either. So in total, we got scores for 79% of the set of students that we randomly assigned. Fortunately, or encouragingly, there was little difference in response rates for the treatment and control groups. So 79.5% versus 78.5%, so hopefully little room for that differential attrition to bias the findings, but I'll show you some estimates that attempt to dig into that a little bit further in a minute. There was also these similar attrition rates reflect very similar rates of mobility of both the treatment and control group students. So you can see here, about 77% of both groups of students stayed in the classroom to which they were originally assigned. There was relatively little occurrence of students crossing over to a teacher of the different types, so a student assigned to the TFA teacher moving to the non-TFA teacher's class. There wasn't much of that, which was good from our perspective. There was more cases of students transferring out of the study classes entirely so that they were in Algebra I and they were moved to a remedial math class or something after the school year started and then a sizable chunk of students in both groups left the study school entirely, which is I think sort of a common, I think mobility is typically high in these really high poverty schools that we were including in the sample. Encouragingly, you would expect the sets of students to look pretty similar after random assignment, the treatment and control group students, but even when we just look at the students for whom we have outcome test score data, so the students who are actually included in our analysis, we see really no meaningful differences in the baseline characteristics of the treatment and control groups, which gives us some confidence that any differential attrition between treatment and control groups probably isn't leading to major bias in our results, but you can see across all these baseline characteristics we look at one, the percentage of students who are old for their grade level is marginally significant at the 10% level and no other significant differences. So this comes to Brian's question, I think about the characteristics of the study school. So first let's just focus on these first two columns which are the schools in our study compared with all secondary schools with Teach for America teachers nationwide. And ideally, even though we didn't randomly select the schools to be in the study, we had to recruit them, go around the country and try to find schools that were eligible and willing to be in our study. Encouragingly, the study schools look a lot like other secondary schools with Teach for America teachers nationwide. So a very high percentage of students are African-American and Hispanic and a very high percentage of the schools are a percentage of students and the schools are receiving free or reduced price lunch. The place where you can see the differences are where you might expect given that we had to recruit these larger schools that had multiple sections of the same course at the same time. So the schools in our sample had sort of a larger enrollment per grade than the typical TFA school nationwide. And we also had no charter schools in our study compared to about 23% among TFA schools nationwide. And in part, this was because the charter schools were smaller and they often had sort of less conventional configurations. So even if they were big, they'd still have just like a single math teacher teaching all the math courses or something. So it just, they weren't able to accommodate the experimental design. And then quickly, as you probably would expect, you can see that both the study schools and all schools with TFA teachers nationwide are much more advantage or less advantage than the typical secondary school nationwide. So higher percent minority and much higher percentage of students receiving free or reduced price lunch than the typical secondary school nationwide. And then they were looking at the characteristics of the Teach for America and comparison teachers in the sample. They looked very, very different, which you might expect. But you can see that the Teach for America teachers were somewhat less likely to be female than the comparison teachers. They were much more likely to be white. So 90% of the Teach for America teachers were white compared with just 30% of the comparison teachers. And I think Teach for America probably wasn't particularly happy to see this number here. They really prioritized diversity. That's a big push for them. And so I think among their teaching core as a whole, the percentage that are white is something more like 70%. But we were focused on secondary math teachers. So that might be why our sample is more white than sort of their statistics for their full teaching core. The Teach for America teachers were much more likely to have attended a selective college or university. They also had much stronger math content knowledge as measured by one of these two praxis math tests. So we use the praxis math content test for the high school teachers and the middle school math test for the middle school teachers. And in both cases, the Teach for America teachers were scoring about a full standard deviation above the Non-Teach for America teachers. So much stronger scores on the praxis. Did you administer this or did all these folks make it? So what we did, some states in the sample required teachers to have taken the practice. So in that case, we would obtain the teacher's permission to collect the scores from ETS. And in states that didn't require the practice, we administered it and paid the teachers to attend the session. So do TMA teachers, I imagine that the... No, states either require the practice or they don't. And if they do require the practice, it's required of all teachers whether traditional or altered. And then some states just don't require it. So we wanted to be sure. We weren't comparing a teacher who took a high-stakes practice test that they had to pass to get certification versus a low-stakes test where they're just doing it for our study and they don't really care. But that was never the case. It was always within a state. It was always high-stakes or low-stakes. Yeah, actually, I guess that's a good point. I was thinking like, oh, I don't have age on this slide. The Teach for America teachers were definitely younger. I can't really say we didn't collect data on when the teachers entered. Well, I guess we could calculate that. We know how long they've been teaching and we know their current age. So yeah, we could look at that. I mean, yeah, we didn't look at that though. So it could differ. I mean, I think especially that for the teaching fellows, they do, that program does try to attract sort of mid-career switchers. But I don't know, I honestly don't know what to expect among the just comparison teachers. Certainly the CFA teachers, most of them are coming right after college. So as you can see, the Teach for America teachers have much less teaching experience on average of two years compared to 10 years among the comparison teachers. I was actually a bit surprised that the experience level was so high among the comparison teachers. And then another thing, which I'll come back to later, turns out to be somewhat important in our non-experimental analyses, but the Teach for America teachers, it took many more hours of teacher coursework during the school year, and this was, as you might expect, most of them were in their first or second year of teaching and still completing the state-required coursework to gain their certification. So they were taking more coursework than the comparison teachers at the time of the study. So since we have this experimental design, we can use a fairly standard, simple impact estimation model. So we're regressing students' end of year math scores on block-fixed effects, so a fixed effect for the classroom match in which we randomly assign the student, a dummy variable for whether the student was assigned to a Teach for America teacher, some student baseline covariates, and an error term. And our main models were intent to treat estimates, meaning they represent the effect on a student's test scores of being assigned to a Teach for America teacher, whether or not that student actually stayed in that classroom and got a full year of exposure to the TFA teacher. In some sense, we think this is sort of, we think it's a policy-relevant estimate in that it reflects the potential for a Teach for America teacher to affect a student's test scores, given that not all students are gonna stay with that teacher. But we also present local average treatment effects that try to estimate the effects of being with the TFA teacher for a full year for the students who did stay with their assigned teacher. So that model, we're estimating math scores as a function of block fixed effects, and this measure of the duration of, the duration that the student had with the TFA teacher. So if they were with the teacher for the full year, D would be, this variable D is equal to one, they had a full year of exposure. And then D could be endogenous, the kids who are doing really poorly with the TFA teachers might be the ones who leave that classroom. So to get around that problem, we're using treatment status as an instrument for duration of exposure to the Teach for America teacher, which is a common approach and evaluation. And so, let's see. So I guess one complication with this duration measure, well, first we just have sort of a crude measure of duration of how much time the student spent with the TFA teacher. So we collected data at the beginning of the school year when we did random assignment, and then at three different times during the study school year, we would obtain current class rosters from the study classes. So we have these three snapshots of enrollment to see whether the student was with the TFA teacher for third of the year, two thirds of the year, or the full year. But for students who left the TFA classroom, we don't know if, or a study classroom, we don't know if they went to some other math class that was also taught by a Teach for America teacher or a classroom taught by some other type of teacher. So our approach was to develop bounds on the impact estimate. So what impact estimate would we have if we assumed that all the students who left went to a teacher of the exact same type? So all the students in the TFA classes went to a TFA teacher and same for comparison students in the control group. And then the other bound is sort of, we assume all students went to a teacher of the opposite type when they left the classroom. So that allows us to sort of see, we know that the actual effect of being taught by a TFA teacher is somewhere within those two bounds. So here are our main estimates. So we found that the students of the TFA teachers scored higher by 0.07 standard deviations relative to the control group. And if we look at these local average treatment effects, the upward and lower bounds on that estimate of being taught by a TFA teacher for a full year, it's somewhere between 0.08 and 0.09 standard deviation. So it pretty tight bound if you have a full year of exposure to a Teach for America teacher. So of course an important question is, how do you interpret this effect size? Some people might initially think it's a small effect and we actually think it's pretty large. So at the elementary school level, students sort of have pretty large gains in achievement from year to year. So 0.07 gain wouldn't be that large at the elementary school level. But at the middle and high school levels, students typically experience much lower year to year growth in their test scores. So if we compare this 0.07 gain and scores to the typical achievement growth at the secondary level, it comes out to about 26% of a year of instruction or about two and a half months of additional math instruction. So when you think about it in those terms, this is a pretty large gain from being taught by a TFA teacher. That said, they're still scoring well below the average in their state. So they're starting out really low. They're doing better after being taught by a Teach for America teacher, but they're still scoring 0.53 standard deviations below the mean in their state. So it's not solving the problem entirely of low achievement in these high poverty schools. We also, as you were asking earlier, we did this comparison separately for Teach for America teachers versus teachers from traditional and alt cert routes. So this first bar on the left is this main estimate I already showed you showing Teach for America students outperform the students of comparison teachers by 0.07 standard deviations. And then you can see that for Teach for America and compared to traditional teachers, the TFA teachers boosted student achievement by 0.06 standard deviations. And then compared to the teachers from alt cert programs, they boosted student achievement by 0.9 standard deviations. So in both cases, the Teach for America teachers are outperforming the comparison teachers in these schools. We also did this comparison just comparing the novice TFA teachers in their first two years of teaching to the more experienced comparison teachers who had been teaching five years or actually teaching more than five years to sort of directly address this criticism that Teach for America teachers don't stay around long enough to gain valuable experience. And we found, so, as you might expect, the Teach for America teachers outperformed the novice comparison teachers by 0.08 standard deviations, but they also, maybe somewhat surprisingly, outperformed even these more experienced comparison group teachers. To some extent, yeah, I mean, I could, I guess the, you could say, well, maybe that you can't compare them directly because they're not, you didn't randomly assign students. So strictly speaking, you can't, we didn't randomly assign students between more and less experienced comparison teachers so that we can't say that in some rigorous way, but yeah, it certainly suggests that the experienced teachers weren't that much more effective than the, yeah, I'll, I'll have, when I get to the non-experiment analysis, I'll talk a little bit more about that, but to break this in this, maybe starts to get at that question too. I mean, I think a lot of the literature shows that the real gain to experience is between the first and second year. So we also broke this down, looking at first year teachers compared with these teachers with more than five years of experience and second year teachers. And what you can see is the first year TFA teachers are doing about as well as these really experienced comparison teachers. And then the second year teachers are really outperforming the experienced comparison teachers. So, you know, it's the second year teachers who are driving this positive impact, but you might even find it impressive that these brand new TFA teachers in their first year are still doing just as well as these teachers who've been teaching five years or more than five years. We also looked separately at the middle and high school level and found positive effects at both grade levels. So we did this little analysis to try to sort of, there wasn't, as I told you earlier, there wasn't a whole lot of differential attrition. So, you know, like we had outcome data for 79.5% of the treatment group sample and 78.5% of the control group sample. So just this one percentage point difference in missing outcome data. But still, you know, you might worry that that could bias our impact estimates. So, to sort of put a bound on how much that could bias our impact estimates, we use this approach proposed by David Lee, which, let's see. So we focus, we're really just concerned about this group in the middle. This group of students who we think has missing data only if they're in the control group and if they're in the treatment group, we do have data for them. So our concern is, you know, maybe we're just missing data in the control group for like the really lowest achieving students and that's gonna bias our results. Or maybe we're missing data for the highest achieving students in the control group and we would have had their data in the treatment group. I guess I have to be a little above the advocate about the test that's being used to test the students and also the preparation that's given to the students by the teachers before they take the test. If the teachers are closer to this whole experience of test, they might be more familiar with this kind of test, they might be comparing the students in another way or giving them the basic thinking about it before they actually even come to the test because your attitude about how you're going to go to test affects a lot. Yeah, so I mean, it's possible there are increasing test scores without having any real effect on what students have learned. I mean, I think one thing that makes me feel gives me some comfort is that we saw these same effects at both the middle and the high school level and so the middle school level we're using state tests, they're high stakes, it's quite possible that the teachers are really focused on getting high scores on the state test but the high school students, they're these computer adaptive tests we're administering the teachers never seen them before, they don't really, they don't look like the state test in terms of they're on a computer and so I think the fact that we find positive impacts at the high school level as well as the middle school level makes me think it's less just a story of the teachers being really good at teaching to the test but it's possible. I'd like to speak on it at the TFA alone. I do understand that my first year at TFA we spoke about, we had on-the-job training so we had accountability outside of our job in the words, being at risk and so I think that it should be at least thought about how that could impact these results, like other teachers who didn't have that sort of one-job training that is not in the environment of my job making on-the-job and on-the-job by the end of this year and that pressure and that fear could impact how we do it. You mean that TFA teachers would have that pressure from TFA? Or that the teachers don't have that accountability, that non-oppressive accountability and so TFA is more, TFA alum, at least in my personal experience, more encouraged on-the-job training we had to meet with our TFA cohorts at least every month and a way to develop us more as teachers and also to develop us as people who teach kids for a test and not saying that other teachers didn't necessarily have that same sort of experience or that same framework and that made me think that they count for some different things. Yeah, I mean I think that's definitely probably part of, I mean we see that as part of the TFA treatment so that's part of the package when you hire a TFA teacher you're getting all the support that comes along with it and the accountability and so yeah I think, yeah that's part of the package. You know that said I don't, I should mention we didn't share individual teachers' test results with TFA so there wouldn't have been pressure on the teachers to get a particular, for their students to score really well on these tests that we were administering. But yeah I definitely, I think TFA would probably claim that their training and support's important too, yeah. You've had C scores I think for all the students statewide, but the high schoolers didn't take the same test statewide? Yeah I should have put that on a slide because students were taking tests in different states we had to end, the high school level they were taking to NWA we had to standardize the scores in some way so we converted them to Z scores by dividing by, in the states we divided by, yeah, divided by the standard deviation in that state and the NWA we used the national mean and standard deviation for the NWA so it's a slightly different standardization but it still puts everything on a common scale, sort of the best we could do. So basically with this attrition bias model I was saying that we just, we're basically, we're worried about this middle bar of students who are missing data in the control group and not the treatment group. So we basically run two models, one which assumes that it's the highest achieving students for whom we're missing data and we drop them and re-estimate the impacts and one assumes that it's the lowest achieving students who are missing data in the control group and we drop their scores and re-estimate the impacts and so we get these bounds on the TFA teacher impact ranging from 0.05 to 0.11 standard deviation so still positive effect of meaningful magnitude regardless of sort of which way the attrition might have operated. So let me show you the experimental results looking at, or sorry, the non-experimental results trying to account for why we saw these positive impacts among the TFA teachers. And this is, we thought this was potentially a useful analysis because principles are, when they're hiring a teacher, they're basically just able to see sort of the types of characteristics that teachers might list on a resume so are any of these factors useful in predicting how effective a teacher will be in the classroom? So we looked at things like academic ability, math content knowledge, instructional training and on-the-job experience. And the analytic approach was pretty straightforward so we took our original impact model and we just tossed in a bunch of teacher characteristics we thought might explain some of the impact. So the first question is, was this particular teacher characteristic associated with the teacher's effectiveness in the classroom and then do TFA teachers show more of that particular characteristic than comparison groups? Because if a particular characteristic is associated with a teacher's effectiveness but the TFA teachers have less of that experience than obviously that characteristic does not explain the fact that the TFA teachers had positive impacts. So we found that most of the characteristics we looked at were not significantly related to teacher's effectiveness. So we looked at whether a student graduated from a selective college, the number of college level math courses the student took, whether the student had used college level math in a non-teaching job. That was sort of more important among the teaching fellows because they are going after these mid-career changers. This is just within the TFA. This is not between the TFA and the traditional teachers? No, so this is all the teachers in our sample. So TFA and traditional and all sort teachers. So we're just tossing in these. So it's sort of we're looking at, since we have block fixed effects in the model, we're looking within each classroom match sort of the difference how the difference in a teacher's experience is related to the difference in the teacher's effectiveness. You surveyed them about their courses in college? Yeah, so we asked, yeah, it took us a while to come up to figure out how to ask that question but we asked sort of had a list of how many courses did you take and blah, blah, blah, whatever. Calculus one or a list of all possible math courses they might have taken in college. I have a related question. So how precise are some of your estimates here? I'm using an opposite system again. I'm a little worried like I can imagine among the non-TFA sample in many blocks there might be, well, in many schools, there might be no comparison teacher in the selected institutions. Right. And then you're kind of estimating some of the effects of the very small sample. Well. So how much should we conclude that this just doesn't explain your sample or is there, has that other literature actually value out of there on secondary school teachers that do show college selectivity? Well my, it's interesting you say that. I mean, our read of the literature was that typically it hasn't shown college selectivity to be related to teacher effectiveness. What for secondary school? You know, it's been a while since I looked at it but that was sort of our takeaway. We weren't surprised to see that it wasn't related to teacher, I mean, you're right that we're, I mean we have sort of presumably a lot of the TFA teachers have a one for selectivity and a lot of the non-TFA teachers have a zero. So you're right. We don't have a lot of variation but there is some variation between TFA teachers and within TFA teachers and within non-TFA teachers but you're right, that's a potential limitation of this analysis. So we looked at their practice scores, the number of math hours of math pedagogy instruction they'd had, the number of days of student teaching they had completed. And none of these things were significantly related to their effectiveness. Did they go in the right direction? Trying to remember what the exact estimates are. Not necessarily, I mean they were just sort of small and not significant. I don't remember, the math content knowledge scores did, they were sort of on the margin of significance and they did, at least for the high schools, math, the math content knowledge, not the middle school math. It was sort of on the margin of significance in the right direction, meaning those with higher scores were more effective. So I think for communicating this and understanding it we need to know how much variation there is in the effect size and the prior on how much variation there is. You should know the average is 0.07 but I don't know what's gonna be big and what's gonna be small. If everyone's tightly clustered around 0.07 there's nothing to explain. Right, in fact Jeff was on our technical working group for this study and he was the one who recommended that we include this nice graph showing the distribution of impacts which we did and there is, we did only do this analysis because we saw this nice distribution of impacts but your rise should be nice to have that in the presentation too, so. We have quarterly problems. Yes. So of all the things we looked at only two characteristics were significantly related to teachers effectiveness. One was the amount of coursework a teacher took during the school year and that negatively impacted a teacher's effectiveness which you may or may not find surprising. I mean I think what's going on here is probably, the story could be that teachers who are forced to take this coursework it's competing for their time and energy that they would otherwise be spending on their classes and that's lowering their effectiveness and in fact this, a very similar finding, so I mentioned Mathematica conducted another study for the Department of Ed looking at teachers from less selective alt-cert programs and that study actually found the same pattern that the teachers who are taking a lot of coursework during the school year are generally less effective than teachers who aren't taking a lot of coursework. So this could not be driven by, so basically the TFA people are all taking the same amount. Well it varies by state, but yes, yeah. Well and also some of them are past their two, they take more in their first year than second year. On the traditional side, I assume the main people taking lots of coursework are gonna be the least experienced people, right? So you're sort of, you've got two things going on which is their age as well as the amount of coursework. Yeah, so yeah, so we're controlling for all these things in a model but we're controlling for them simultaneously but that's not to say they're not sort of correlated in a way that's messing up these estimates. At any rate, so I mean I don't think we wanna, I would say these results certainly aren't as rigorous as the experimental results but I think they're at least food for thought. The second thing we found comes back to this question of teaching experience. So we found that teachers with two years of experience were more effective than teachers with one year of experience and after that, there wasn't too much of a gain to additional years of experience which is sort of consistent with much of the other literature on teacher experience. Sort of this big gain right in the beginning. So the bottom line is that these, none of the credentials we looked at can explain the TFA impact. So we found that teaching experience increases the teacher's effectiveness but TFA teachers were less likely to have two years or more of experience so that can't explain the fact that the teach for American teachers were more effective. Similarly, we found that coursework negatively affects a teacher's performance but the TFA teachers were taking more coursework. So again, this would predict the opposite that the TFA teachers would be less effective and we found that TFA teachers were more effective. So bottom line, we really can't explain in our analysis why the TFA teachers were more effective. So in conclusion, we did find that teacher American teachers were more effective than other math teachers in the same schools and their relative inexperience was apparently outweighed by these other attributes that we were not able to measure. And we can, I think say that these attributes that make the TFA teachers more effective aren't things that are easily observable on a resume like years of experience and practice scores. So clearly more research is needed to identify these attributes. I mean, you could hypothesize that TFA's doing something right and maybe it's part of this intensive screening process that they have where they have this full day interview and they're gathering all this information and rating students on these different competencies. Maybe that's really getting at something that predicts teacher's effectiveness in a way that just these, looking at a teacher's resume or a quick 30 minute interview really can't predict or it could also be the training and support TFA's providing or some combination of these factors. So I think an important question, so we've shown TFA teachers are more effective in secondary math, can we expand this? Can the TFA maintain these positive effects if the program is scaled up to reach more high poverty schools? And in part, that's the goal of this second study I mentioned. So Teach for America got this $50 million grant from the Department of Ed to scale up its program so we could learn about, you know, can Teach for America and other programs like KIP and other programs that have shown evidence of effectiveness, can they maintain their effectiveness as they scale up? And so, is part of that grant, Teach for America has hired Mathematica to do an independent evaluation looking at their effects in elementary schools as they scale up. So that's a study we're working on now and we should have findings from that in the next year. So that's it. So thank you. The attributes in terms of let's say practice for selectively schools, I guess you did selective like binary, did you also do practice? I was wondering if the effect of, you know, like maybe if you actually think about selective school that might include like 200, there's a range in there, changing the practice scores, like I know that there's, you know, distinction marks where if you get that, you really know your math whereas if you're just saying pass or didn't pass or I was kind of, I'm not even thinking I was wondering like how you tried to measure those attributes. Yeah, I mean we did do sort of a variety of different specification tests. I don't know that we, in terms of the practice scores, I don't know that we did anything other than, we might have looked at whether they scored above and below the median versus just these continuous scores in the main analysis I showed you. What was the other thing you asked about? Basically you kind of like looked at these. We tried some other specifications. It probably wasn't exhaustive. But yeah, we had fairly crude, well, you know, we had, within our measure of selective colleges, there were three different categories from the Barron's rankings of college selectivity. And I don't know, we didn't, I don't think we broke that. We looked at highly selective and selective, but it didn't break it down any further than that, so. Yeah. When you broke down the first and second year teachers, that was across all teacher groups, or was that just what the teacher meant? Or was it the big effect in the second year? Yeah, so that's all TFA teachers in our sample compared with whatever teachers they were matched with, you know, and that, so. Yeah, better to see. Do you have subgroups for opposed to second year? Because I imagine that would be a way of getting a nice, lightening effect. Being in the core of the extra supports. Yeah, so it was too tiny to, there really, and maybe, I don't know, five or not enough teachers to do that analysis. Jeff? Remind me, how should we think about spillovers here, right? So one could imagine an alternative designer which ran an assignment in a symmetric place at the school level instead of at the classroom level. And in school, I think there's issues with that. But one advantage of that design is that you don't run the risk that the two teachers who are randomly assigned in the school are somehow having influence on each other. The non-TFA teacher is motivated to excel beyond their usual norm as a result of the competition with the TFA teacher, or that there aren't knowledge spillovers from the TFA teacher, freshman undergrad, to the even more experienced, but further from undergrad, non-TFA teacher, whatever. Is there any kind of descriptive analysis that went along with that? Well, what, no. But what kind of descriptive analysis could one do, do you think? Well, you could, as part of the teacher interviews, and I guess I have a lot of time to suggest this, but in the teacher survey, you could say, how much do you interact with your colleagues about, specifically how much do you interact with that colleague? About stuff. I mean, it's not our social science evaluations are never double-blind, right? It's not clinical trials and that is not blind at all. So everybody knows what treatment is getting and treating people, not treating people and getting a new bubble of life. Uh-huh, yeah. I mean, in this second, the scale-up study I mentioned, we did ask some questions about how much time the teachers spend in various activities, including giving advice to other teachers, receiving advice from other teachers, but we didn't collect that information here. So, yeah, that's, I don't know. Yeah, I think that's a great question and it's important to keep in mind. This is just looking at secondary math. I don't think we can say anything about other subjects here, and it's sort of the previous literature has shown strong effects in math and science and no effects of Teach for America teachers in reading, so we really don't know. But yeah, I mean, it could well be that whatever these qualities that Teach for America's, you know, high-achieving individuals from selective colleges, that's a set of people who are well-suited to teach math and maybe less well-suited to teach kids to read. Yeah, I mean, I think my understanding is that Teach for America tries to keep that relatively standardized, although maybe a ROTFA teacher. I've also got a staff member. Yeah, okay, so even within regions, you think it varies? Yeah, so I don't know. I think we have some information on that. We did not analyze it, but maybe it sounds like it would be worth looking at. Can I imagine you can do that? Sure. Yeah. Exactly. It certainly could be. I mean, I think you guys have, I think people have made the point that it is an interesting thing to look at. So, and possibly we could go back and look at it in this study to not share the level of detail we have on that support, but we have something so we could look at that. Yep. I'm not sure if this would be reflected in the Teachers of Test scores, but I was wondering if you looked at the actual coursework that the teachers had taken whether or not that was through a mathematical department or if that was through a school of education? I don't know. I think that's a really good point because obviously like a college level calculus class in the math department might look different than one in an ed school. And no, we didn't. We asked if there was some broad category secondary math, secondary math course, but other than that, we didn't know if it was something that I think it would have been probably useful to look at. We didn't. I think we could. I think that would be a useful thing to do, so. Yeah. Do you have any data on how relatively how specialized TFA teachers might be relative to non-TFA teachers? I mean, I know just from like general high school courses, usually teachers have more than one subject, so I mean, it's impossible. TFA was tracked until like you were only going to teach algebra versus algebra and geometry. Hmm. We did not, well, let's see. We only know, we could certainly look within our sample whether they were teaching a given teacher had more variation or more or less variation in the subject they were teaching. We haven't looked at that yet. I think that's a good idea. And then also in this other scale-up study, we do have TFA's given us data on their entire teaching core and including their assignments so we could certainly look at it there too and even to inform the study because we have information on secondary teachers and that data, so yeah, that's a good point. Yeah. Is one of the counterfactual here is some traditional trained teacher who's gonna stay? Uh-huh. Do you have a sense of what the sort of comparative nutrition is between these two groups? Is there a literature on that? So we certainly from our sample don't know that other than we just know sort of average experience levels. Which can lead us to, I think, reasonably conclude that attrition is higher among the TFA teachers but not much beyond that. I don't know, I don't know if I'm sure somebody might know of sort of standard turnover rates among non-TFA teachers in high-poverty schools but I don't know a good source of that information. I mean, there's a study of Teach for America teachers that suggests that, what was it? I'm gonna get the number wrong but some sort of sizable portion remain into a third year actually and then after that they tend to go off into other things. I get Jeff? I don't have a number but I think there is a fair amount of attrition from high-poverty schools and it's also important to think about Dan's question about the experience of that among the paracetamol teachers and so the set of inexperienced non-TFA teachers in high-poverty schools are sort of less selected than the set of experienced non-TFA teachers in high-poverty schools because it solves the non-TFA teachers when they can go to lower-poverty schools. Some of the non-TFA teachers, right, yeah, so, yeah. The ones in the state are the ones we don't know. That's right. For kinds of selection. But to some extent, if I'm a principal of a high-poverty school, that's the comparison I care about, yeah. Yeah, no, I agree with that. It's the comparison you care about though when you were breaking it down into the different categories of what explains it, that's where you get messed up. Meaning, you know, you're not necessarily looking at experience in those high-poverty schools reflects two very different types of things. Selection in the teachers, right, versus how long they've been around. Right, yeah, that's a good point. That might not extrapolate to other settings. But yeah, absolutely, that's true. Just to follow up on that, I mean, it seems to me that do you have any way to get a purpose on how long that teacher's been in that school? Because you have that sort of lemon dance effect, right? Where you pass around the lower quality teachers. Yeah, I mean, presumably the lemons would be here. I mean, these are the really, I mean, Teacher America is targeting the schools that really struggle to get teachers. But yeah, we do have, we did collect that information. So we didn't look at it in this non-experimental analysis, but we could. Is there a cost-benefit analysis now? I mean, maybe we'll light it back on the envelope. Yeah, we thought a little bit about doing one, but it would require a lot of additional data collection. So we actually wrote up a little memo for the Department of Ed suggesting they might want to fund such a thing in the future, but. What would be, I mean, you know the standard scales for both the FAS? Well, they received the same, right. So the cost-benefit analysis would take into effect into account the fact that TFA teachers are cheaper for districts because, I mean, they're earning the same salaries as other teachers with the same years of experience. So if you hire a TFA teacher, you only have to pay this. I think about folding in the attrition to imagine, you took all the teachers to school and you just hired two-year TFA people, they all pay after two, even though they're expected. And then you hire new TFA teachers at the same low salary. Yes, and then. So the TFA teachers are cheaper to the district in that sense, because they're not. But you're almost gonna never get somebody with more than, you know, maybe 10% of them ever get more than three years of experience or five years of experience. And that's, I guess, the worst is having somebody who goes, this is way better. They have a high, a TFA teacher? Don't you get that from this? I mean, it's .07, I mean, it's .07. They're not even gonna be expected how long these guys are gonna be with the average teacher's salary is gonna be the same number. Sure. So if you hire a string of TFA teachers, you're gonna be paying a lot less than if you hire a single teacher who stays. But that said, so that's the cost from the district's perspective, from society's perspective, teach for America costs a lot. So I think I saw something like, if you divide their annual operating budget by the number of new core members they place, it's like $30,000 a teacher or something. So it's, so maybe that's not the right way to think about it. I don't know, I mean. That's a very important number, I think. It is, yeah. So, but I think there's, you know, more nuances than just, I don't know, I think a cost-benefit analysis would be more nuanced than just that simple calculation. But it's definitely cheaper to the district, but maybe to society more costly. Super. Over there. Yeah, sorry. I was thinking of the distribution of these two populations. In one sense, 0.07, when you express it in terms of, I used to teach at the teacher's work on an extra quarter a year or half a year like that. If you think about it in terms of student outcomes, that's really powerful. If you think about it in terms of what we also compare it to, because then years later, the population of TFA and non-TFA, it's only a percent kind of alert to a difference. So, by some, as you can see, they're the same. And I was wondering if you had any way of comparing this like other, not necessarily education, but how does it stack up to other ways of screening applicants or something like that? Because you kind of want to say like, oh, it's effective, why isn't it more effective? They can truly figure out, I was going to be a teacher, why is there such a small difference? You see what I'm saying? Do you know what I'm talking about? So, no, I mean, I think the challenge, I mean, if we did a cost-benefit analysis, this would be also the issue. We'd need to sort of find some other intervention of which we know the cost and we know sort of what effect size it brought about. And start at the elementary school level, but at the secondary level, there's... This is not that small. I mean, this is per year. So, like the Star Experiment was like two tenths of a standard deviation for being in a small class for four years or three years. The charter school effects that we're seeing in, say, Boston on a per year basis are similar to this. So you think if a kid spends four years or seven years, question, is it additive? We don't know from this, but if they did, that's half a standard deviation. So this is pretty big compared to some other intervention? It is, at least in terms of the effect size per year, it compares to things that we tend to think are big. At the secondary level? I mean, there's just not as, okay, yeah. I feel like there's not as much. Okay, yeah, uh-huh, yeah. I will just, I guess, one of the criticisms, I mean, there are a lot of people who don't like TFA, and so we did get some criticisms of the study from those people. But one anti-TFA bloggers claimed that we were blowing this .07 effect size out of proportion. And he had this wonderful graphic, let me find it. So he said the impact was extremely small, and we were overseeing its importance, and they had this graphic that put the effect size on a scale of zero to five standard deviations. So it's all relative. We have a lot of other things that go in five standard deviations. Yeah. So it's all relative. What blog is this? What blog? I'll tell you what I'm doing. Yeah, it's not a blog, it's public. I should be ashamed of this. I don't want to publicize it, but yeah. We'll send it around the email list. But yeah, we thought it was meaningful. And in fact, when we first did this feasibility study for a Department of Ed, they were sort of, all their big random assignment studies typically target a .15 effect size, so you're supposed to power the sample to detect the effect size of .15. And we argued strenuously that if they're looking at the secondary level, they need to target a smaller effect size because it would be unrealistic to expect anything to have an effect of .15 at the secondary level. So they were very receptive to that argument, to their credit, and allowed us to recruit a sample that was powered to detect the smaller impact of .10. So, well, thank you so much. This was great. Thanks. Thank you.