 Welcome everyone. I'm Brian Jacob, a professor here at the Ford School and the co-director of the Education Policy Initiative. It's my pleasure to welcome all of you here today. The Education Policy Initiative is a program of coordinated activities designed to apply rigorous research methodology to inform education policy issues as well as to disseminate best practices in education reform to state, local, and national policymakers. We train students and others to conduct cutting-edge research and facilitate discussions about this across the university community. So today we are pleased to present Dr. Melissa Clark, a senior researcher at Mathematica Policy Research. Melissa specializes in the design and implementation of rigorous impact evaluations with a focus on education policy. She's here today to discuss her research regarding the effectiveness of secondary math teachers from Teach for America in high-poverty U.S. schools. She's presenting evidence from the first large-scale experimental study of secondary math teachers in Teach for America. And I am looking forward to the talk and then Q&A. I think what we're going to do here is first ask everyone to turn off any electronics, remind you to turn off your electronic equipment, just so there's no ringing and beeping during the talk. I think Melissa is going to go talk for about an hour or so and then we'll open up for Q&A, but I think you can tell folks whether you're open to questions during the talk as well. And finally, before we get started, I want to recognize funding we've received from Charles and Susan Gessner for their generous support of this and other similar events. So thank you very much. Without further ado, Dr. Clark. Thanks, Brian. Thank you for having me and you're welcome to interact with questions at any point. So I am presenting evidence on the effectiveness of secondary math teachers from Teach for America, and this is a study that's co-authored with two colleagues at Mathematica Policy Research. Mathematica is a company that does a lot of big evaluations in education and other areas for the federal government and lots of other clients, too. So the paper I'm presenting today is part of this larger evaluation that we conducted for the Department of Education's Institute of Education Sciences. We completed the study this past fall and the paper looked at the effectiveness of secondary math teachers from two highly selective routes to teacher certification. So Teach for America and another program that's similar to Teach for America called the Teaching Fellows Program. And the paper I'm presenting today is going to focus just on the Teach for America findings for a variety of reasons. The first reason is that when we started to write the findings up to submit to a journal, we thought that the attrition rates in the Teaching Fellows sample were high enough that they weren't quite, those results weren't quite as compelling as the results for Teach for America. And the second reason is that people just don't seem to be that interested in teaching fellows. So a lot of people know what Teach for America is and have strong opinions on it. It's a very controversial program. But you know, probably when the study was first released in September, I talked to reporters about the study. I presented at conferences and forums and I think in that entire time, I didn't get a single question about the Teaching Fellows findings and people were much more interested in Teach for America. So I'm going to focus on Teach for America today, but if any of you are interested in the Teaching Fellows findings, I'm happy to talk about those at the end as well. But the backdrop for this study and the reason the Department of Education decided to fund it was the fact that high poverty schools really have trouble attracting qualified teachers. And this is a particular problem in secondary science and math. And so in response to this problem, most states have adopted what are called alternative routes to teacher certification, which are intended to lower the barrier to entering the teaching profession and increase the supply of teachers. So these programs typically allow teachers or individuals to become teachers before they've completed all their certification to become a teacher. They typically require less coursework than a typical traditional university-based teacher certification program and less or often no student teaching. And then the vast majority of these alt cert programs are not very selective, which is to say they're neither more nor less selective than the typical traditional certification program, which also usually are not very selective in the types of people they admit. So Teach for America is an alt cert program, but it's somewhat unique among alt cert programs in that it is highly selective in the candidates it admits. And it's really, before we started this study, we sort of looked at the landscape of alt cert programs to see what other highly selective programs might be out there. And Teach for America and the Teaching Fellows programs were really the only two large highly selective alt cert programs. And I'll tell you more about what I mean by highly selective in a minute. We identified a handful of smaller programs that looked like Teach for America and Teaching Fellows in terms of their selectivity, but they were tiny. So together, these five programs probably provided 20 secondary math teachers in the year that we were looking at. So we decided to focus the evaluation just on Teach for America and Teaching Fellows. And then the work today I'm presenting is just on the Teaching Fellows. So probably many of you are familiar with Teach for America, but it's an organization whose stated mission is to reduce educational inequities by supplying qualified teachers to high poverty schools. And to do this, they invest heavily in recruiting and selection, so they try to recruit high achieving college graduates and professionals to Teach for two years in high poverty schools. They just require this two year commitment, although the teachers can choose to remain longer, and many of them do. They then provide a short intensive summer training before the teachers enter the classroom. And then they provide ongoing training and support to their teachers throughout this two year commitment. Teach for America is a growing source of teachers to high poverty schools. So in the 2011 school year, they provided 9,000 teachers to 43 of their regions across the country. And they also got a $50 million grant from the U.S. Department of Education and Investing in Innovation grant to scale up their program over five years. So they're expanding by 80% over five years. And so they're aiming to place a total of 13,500 first and second year teachers this coming fall as a product of that scale up. So despite the fact that Teach for America does supply a lot of teachers to high poverty schools that have trouble hiring teachers, the program is very controversial. So one common criticism is that Teach for America and teachers from other ulcerate programs are underprepared for teaching relative to teachers from traditional certification programs. The argument is, of course, they can't be as well prepared, you know, as somebody who's done several years of coursework in a period of student teaching would be. And then a second criticism of TFA in particular is that because it requires just this two year commitment, most of the teachers leave after two years before they have a chance to gain valuable experience that might improve their teaching. So the argument goes, you know, if a principal is choosing between hiring a TFA teacher who's just going to be there for two years and leave or a teacher from some other program who'd stay for 20 years, they're better off hiring this teacher who will accumulate this valuable experience over time. But, you know, it's an empirical question as to whether these criticisms are actually valid. And so in part to understand Teach for America's effectiveness. In this study, we're looking at the effectiveness of secondary math teachers from Teach for America relative to other teachers in the same high poverty schools. And we're defining effectiveness based on students' math test scores at the end of the school year. We're using a random assignment design. So we assigned within each school, we randomly assigned students signed up for a particular math course to be taught by a TFA teacher or a teacher from some other program. We have a large multi-state sample. So we have 45 schools in 11 districts in eight states. And the reason we focused on middle and high school math is because it's a hard to staff subject for high poverty schools. About 20% of TFA teachers teach secondary math, so it's a priority subject area for TFA too. Initially, back when we were sort of doing a feasibility study for this broader evaluation for the Department of Ed, they asked us if it would be feasible to include science teachers in this evaluation as well. We looked into that and we determined that it really wouldn't be feasible in part because there wasn't at least at the time a good science assessment that we could have used to assess. And most states, of course, aren't assessing kids in science every year in middle school or high school. So for that reason, we decided to focus just on secondary math. There is some high quality evidence in the literature so far showing generally that Teach for America is effective, particularly in math. So there's only been one other experimental study to date, also conducted by Mathematica back in 2004. And that study found randomly assigned students to TFA teachers or non-TFA teachers in elementary schools and found that the TFA teachers improved students' scores in math and had no effect on their scores in reading. So the students did just about the same as their peers in reading. So our study is taking that same design pretty much but shifting the focus to this hard to staff area, secondary math. And then a few non-experimental studies have focused on the secondary level. So a couple studies in New York City have found positive effects of Teach for America teachers on middle school math scores. Those are using longitudinal student data and controlling for prior achievement. And then a study in North Carolina didn't have longitudinal data but instead used cross-subject data on students' performance and found controlling for students' performance in other subjects. Students performed better in a subject taught by a TFA teacher and these effects were particularly pronounced in math and science. So the real contributions of our study, it's applying this experimental design to the secondary level which hasn't been done before and it's using this broad multi-state sample whereas some of the previous evidence has just been focused on a single state. In our main findings, we find that Teach for America teachers are more effective than non-TFA teachers in the same schools. They increase student math scores by about .07 standard deviations which I'll provide some context for that later but we believe it's a non-trivial increase in student math scores. Sort of looking directly at this criticism that you know the Teach for America teachers are only staying for two years. They're going to be less effective than more experienced teachers. We compared TFA teachers just in their first two years of teaching to the more experienced comparison teachers with more than five years of experience and even with that comparison that these novice TFA teachers were outperforming the experienced teachers from other programs. So I think the main takeaway from the study is that Teach for America can increase both the quantity and quality of teachers in high-poverty schools in secondary math. So I'll provide some background information on the study and then I'll present our causal estimates of the effect of Teach for America teachers. Then I'll present some non- experimental analyses we did trying to account for you know why these TFA teachers might be more effective than the non-TFA teachers. So I'll start with a brief overview of Teach for America. So its goal is to enroll people with characteristics that believes are associated with effective teaching. And so they at the time of the study they had seven core competencies they used to sort of gauge how effective a teacher would be in the classroom. And interestingly I was up in my temporary office upstairs and I saw on the file cabinet there was a magnet with a list of the Ford schools seven core competencies and they were strikingly similar to Teach for America. So it was like achievement, critical thinking, respect for the Ford schools mission. So Teach for America has a similar set of competencies that they think are predictive of future effectiveness. So they have a very rigorous admissions process. So people submit an application online then they have a telephone interval view and those who progress beyond that stage go to this full day in-person interview. And Teach for America has a mathematical model they used to screen out candidates at each stage. So they don't provide a lot of information about this model because they don't want candidates to be trying to game the model but presumably they're using data from previous cohorts of Teach for America teachers and sort of regressing these core competencies and other factors against some measure of that teacher's effectiveness in the classroom and then using the coefficients on this model to predict how effective the incoming applicants will be when they start teaching. So they offer admission to about 12 percent of all the people who apply so very selective compared to most teacher certification programs. After they've selected people they the main training is this five week summer institute where the candidates take coursework. They do some practice teaching in the local school district summer school and they do some self-directed assignments and then Teach for America helps the core members which is what they call their teachers find teaching positions in high poverty schools in one of these 43 regions. After the core members start teaching they get additional training they have the opportunity to observe other teachers teaching and to be observed by mentors they have one on one meetings with their mentors and they also take coursework in local alternative certification programs and this is not a Teach for America thing it varies from state to state so all states require teach alternatively certified teachers to complete some coursework to gain their certification so the Teach for America teachers follow these same rules in each state so they're taking some amount of courses and typically a university based program within their district. So the study used an experimental design so we conducted the study as sort of in two cohorts of teachers and students so in 2009 and 2010 and I actually think the experimental design is sort of one of the most interesting parts of the study at least for me it was really I'm not aware of any other random assignment studies that were conducted at the secondary school level that randomly assigned students within schools so going into this we weren't even sure it would be feasible and we actually went out and talked to lots of schools with TFA teachers and what we determined was that it really would be impossible from the school's perspective to randomly assign students across sections because scheduling at the high school and middle school level is so complex so instead we realized we could only do the study if there were two courses or more taught during the same time period one taught by a TFA teacher and one taught by a non TFA teacher so basically we defined these classroom matches as these sets of courses taught during the same period across which we could randomly assign students and so the treatment group in our study is the set of students who were assigned to the Teach for America teachers so yeah right so I'll show you I'll show you in a minute sort of what the schools look like and how they compare to the typical schools in which TFA places its teachers but yes they were larger schools and the math courses were the more common courses so for instance we didn't have any calculus or pre-calculus because in these high poverty schools there typically weren't concurrent sections of those higher level courses so in terms of the courses we included six seventh and eighth grade math and then algebra one algebra two and geometry so the we included both teachers from traditional roots and other of these less selective alternative roots in the comparison group the logic for this was we wanted to really be able to the counterfactual to be well what's the type of teacher these schools would have hired had they not been able to hire a Teach for America teacher and so we thought the best estimate that counterfactual was the mix of other teachers teaching in that school so in these schools in our study about 60% of the sample were from these traditional programs and 41 were from these alzert programs in these findings there are no teaching fellows teachers they're sort of or all those other tiny highly selective alzert programs we did not include in the study at all they are just sort of state approved alzert programs very few states at least have something that they call emergency certification anymore so that they're sort of teachers who are going through these other less selective AC programs of which there are very many actually a lot of the programs and in the Teach for America sample we included both those who Teach for America considers current core members who are those in their first or second year of teaching and those who may have stuck around longer and are still in the schools so that said 80% 86% of our Teach for America teachers were still in their first or second year of teaching reflecting the fact that many teachers do live leave after that two-year commitment is over we didn't place any restrictions on the experience of either the treatment group or the control group teachers in part because again we wanted a snapshot of sort of if I'm a principal I want to know you know I could hire a Teach for America teacher who's probably going to leave teaching after two or three years or I could hire a teacher from some other program who's going to stay over the longer term and gain more experience so we thought just kind of taking a cross section of the teachers in the school in a particular year would sort of incorporate the fact that non TFA teachers probably will gain more experience over time and so we're sort of seeing the low experience teaching experience levels of the Teach for America teachers as part of the Teach for America treatment that's part of the package you get when you're choosing to hire a teacher from Teach for America yep yeah so there was very little within attrition within the study year so we were just looking we had two years in the study but each was sort of self-contained so we weren't following teachers across years just a handful of teachers left teaching during that period and we decided to leave classrooms in the sample of the teacher left sort of thinking like higher turnover might be part of the treatment to like if Teach for America teachers you know are more likely to leave mid year their class is probably going to suffer by having you know some long term sub for a long time and so we wanted that we didn't want to just toss them out of the sample and potentially bias the impact so they are still in the sample there there wasn't a ton of teacher turnover though sure yeah I mean I think the Department of Education actually a few years ago had done a large random assignment study of teachers from these less selective alter program so they sort of saw this study is an opportunity to learn more about this other side of the coin the highly selective alter programs which is why we weren't lumping them all together in one group that said there's still a lot of interest in how the TFA teachers compare just to the traditionally certified teachers so we definitely that was we do subgroup analyses just looking at those comparisons separately so we recruited schools from across the country to be in the study so basically we started by contacting districts with a large number of Teach for America teachers and then we would call the schools or visit them to see if they might have these eligible classroom matches or randomization blocks with you know TFA teacher and a non TFA teacher teaching a math class during the same period so we ended up with a sample of eight states eleven districts close to six thousand students the data collection our main outcome measure was student math test scores at the end of the study year so for middle school students who were about seventy percent of our sample we used state math math scores from the state assessments and then for high school students since states aren't typically assessing these students and math every year we instead administered a test of our own which is a computer adaptive test from the Northwest Evaluation Association and we tested the kids in general math algebra one geometry and algebra two and this was actually a really interesting test I was talking to Sue about it earlier but we got to take when we were deciding what tests to use for the evaluation we got to try a sample NWA assessment and so I was we had a limited number of laptops so I was paired with another econ PhD and we did this algebra one or algebra two assessment together and it adapts to your ability level so it quickly got very hard and it was quickly became very challenging even for two people who had a pretty strong math background so I think the idea is it's really in a it's efficient and that in a limited amount of time we had 30 minutes to assess the kids we were able to presumably get a pretty close read on sort of their actual math ability we also conducted a collected baseline scores from state assessments and then we got some information on teacher characteristics from a survey we administered to the teachers and also there was a lot of interest in how teachers math content knowledge might affect their effectiveness as math teachers so we got teachers math scores from the praxis to math assessments so there was some student attrition after randomization meaning that students in our sample were missing their end-of-year math test scores and so could not be included in the analysis there were various reasons for this attrition one was parental non-consent so not all districts but some districts required us to get parents to sign a form to allow us to collect test score data for their students and so as you can imagine a lot of you send home a form in the child's backpack a lot of parents never get around to returning that signed form so we had about a 94% consent rate which was actually we were thought that was pretty strong and then among the concenters we were able to obtain test score the test scores for about 84% of the concenters and if we weren't able to obtain data it was either because the student left the district and so we weren't able to get their state test scores or they were absent from school on the day of either the state test or the study test and didn't show up for the makeup sessions either so in total we got scores for 79% of the set of students that we randomly assigned fortunately or encouragingly there was little difference in response rates for the treatment and control groups so 79.5% versus 78.5 so hopefully little room for that differential attrition to bias the findings but I'll show you some estimates that attempt to you know dig into that a little bit further in a minute there was also these similar attrition rates reflect very similar rates of mobility of both the treatment and control group students so you can see here about 77% of both groups of students stayed in the classroom to which they were originally assigned there was relatively little occurrence of students crossing over to a teacher of the different types so a student assigned to the TFA teacher moving to the non TFA teachers class there wasn't much of that which was good from our perspective there was more cases of students transferring out of the study classes entirely so that you know they were in algebra one and they were moved to a remedial math class or something after the school year started and then a sizable chunk of students in both groups left the study school entirely which is I think sort of a common you know I think mobility is typically high and these really high poverty schools that we were including in the sample encouragingly you know you would expect the sets of students to look pretty similar after random assignment the treatment and control group students but even when we just look at the students for whom we have outcome test score data so the students who are actually included in our analysis we see really no meaningful differences in the baseline characteristics of the treatment and control groups which gives us some confidence that this any differential attrition between treatment and control groups probably isn't leading to major bias in our results but can see across all these baseline characteristics we look at one the percentage of students who are old for their grade level is marginally significant at the 10% level and it's not very few no other significant differences so this comes to Brian's question I think about the characteristics of the study school so I first let's just focus on these first two columns which are the schools in our study compared with all secondary schools with teach for America teachers nationwide and ideally even though we didn't randomly select schools to be in the study we had to recruit them you know go around the country and try to find schools that were eligible and willing to be in our study encouragingly the study schools look a lot like other secondary schools with teach for America teachers nationwide so very high percentage of students are African American and Hispanic and high very high percentage are of the schools are students are percentage of students in the schools are receiving free or reduced price lunch the place where you can see the differences are where you might expect given that we had to recruit these larger schools that had multiple sections of the same course at the same time so the schools in our sample had sort of a larger enrollment per grade than the typical TFA school nationwide and we also had no charter schools in our study compared to about 23% among TFA schools nationwide and in part this was because the charter schools were smaller and they often had sort of less conventional configuration so even if they were big they still have just like a single math teacher teaching all the math courses or something so it just they weren't able to accommodate the experimental design and then quickly as you probably would expect you can see that both the study schools and all schools with TFA teachers nationwide are much more advantage or less advantage than the typical secondary school nationwide so higher percent minority and much higher percentage of students receiving free or reduced price lunch than the typical secondary school nationwide and then they were looking at the characteristics of the teach for America and comparison teachers in the sample they looked very very different which you might you might expect but you can see that the teach for America teachers were less somewhat less likely to be female than the comparison teachers they were much more likely to be white so 90% of the teach for America teachers were white compared with just 30% of the teacher of the comparison teachers and I think teach for America probably wasn't particularly happy to see this number here they really really prioritized diversity that's a big push for them and so you know I think among their teaching core as a whole the percentage that are white is something more like 70% but we were focused on secondary math teachers so that might be why our sample is more white than sort of their full their statistics for their full teaching core the teach for America teachers were much more likely to have attended a selective college or university they also had much stronger math content knowledge is measured by one of these two praxis math tests so we use the praxis math content test for the high school teachers in the middle school math test for the middle school teachers and in both cases the teach for America teachers are scoring about a full standard deviation above the non-teach for America teachers so much stronger scores on the praxis so what we did some states in the sample required teachers have taken the practice on that case we would obtain the teacher's permission to collect the scores from ETS and states that didn't require the practice we administered it and paid the teachers to attend the session no states either require the practices I don't and if they do require the praxis it's required of all teachers whether traditional or all cert and then some states just don't require it so we wanted to be sure we weren't comparing a teacher who took a high stakes practice test that they had to pass to get certification versus a low stakes test where they're just doing it for our study and they don't really care but that was never the case it was always within a state it was always high stakes or low stakes yeah actually I guess that's a good point I was thinking like oh I don't have age on the slide the teacher America teachers were definitely younger I can't really say we didn't collect data on when the teachers entered well you I guess we could calculate that we know how long they've been teaching and we know their current age so yeah we could look at that I mean yeah we didn't look at that though so it could differ I mean I think especially that for the teaching fellows they do that program does try to attract sort of mid-career switchers but I don't know I honestly don't know what to expect among the just comparison teachers but certainly the CFA teachers most of them are coming right after college so as you can see like the teacher America teachers have much less teaching experience an average of two years compared to ten years among the comparison teachers I was actually a bit surprised that you know the experience level was so high among the comparison teachers and then another thing which we'll come back to later it turns out to be somewhat important in our non-experimental analysis but the teacher America teachers were taking took many more hours of teacher coursework during the school year and this was as you might expect most of them were in their first or second year of teaching and still completing the state required coursework to gain their certification so they were taking more coursework than the comparison teachers at the time of the study so you know since we have this experimental design we can use a fairly standard simple impact estimation model so we're regressing students end-of-year math scores on block fixed effect so a fixed effect for the classroom match in which we randomly assign the student a dummy variable for whether the student was assigned to a teacher for America teacher student baseline covariates and an error term and our main models were intent to treat estimates meaning there are the effect they represent the effect on a student's test scores of being assigned to a teacher for America teacher whether or not that student actually stayed in that classroom and got a full year of exposure to the TFA teacher in some sense we think this is sort of we think it's a policy relevant estimate and that it reflects the potential for a teacher for America teacher to affect a student's test scores given that not all students are going to stay with that teacher but we also present local average treatment effects that try to estimate the effects of being with the TFA teacher for a full year you know for the students who did stay with their assigned teacher so that model we're estimating math scores as a function of block fixed effects and this measure of the duration of the duration that the student had with the TFA teacher so if they were with the teacher for the full year D would be equal this variable D is equal to one they had a full year of exposure and then you know D could be endogenous the kids who are doing really poorly with the TFA teachers might be the ones who leave that classroom so to get around that problem we're using treatment status as an instrument for duration of exposure to the teacher for America teacher which is a common approach and evaluation and so let's see so I guess one complication with this duration measure well first we just have sort of a crude measure of duration of how much time the students spent with the TFA teacher so we collected data at the beginning of the school year when we did random assignment and then at three different times during the study school year we would obtain current class rosters from the study classes so we have these three snapshots of enrollment to see whether the student was with the TFA teacher for third of the year two-thirds of the year or the full year but for students who left the TFA classroom we don't know if a study classroom we don't know if they went to a class some other math class that was also taught by a teacher America teacher or a classroom taught by some other type of teacher so so our approach was to develop bounds on the impact estimate so you know what's the what impact estimate would we have if we assumed that all the students who left went to a teacher of the exact same type so all the students in the TFA classes went to a TFA teacher and same for comparison students in the control group and then the other bound is sort of we assume all all students went to a teacher of the opposite type when they left the classroom so that allows us to sort of see we know that the actual effect of being taught by TFA teachers somewhere within those two bounds so here are main estimates so we found that the students of the TFA teachers scored higher by point zero seven standard deviations relative to the control group and if we look at these these local average treatment effects the upper and lower bounds on that estimate of being taught by a TFA teacher for a full year somewhere between point oh eight and point oh nine standard deviation so you're pretty tight bound if you if you have a full year of exposure to a teach for America teacher so of course an important question is how do you interpret this effect size some people might initially think it's a small effect and we actually think it's pretty large so at the elementary school level students sort of get have pretty large gains and achievement from year to year so point oh seven gain wouldn't be that large at the elementary school level but at the middle and high school levels students typically experience much lower year-to-year growth in their test scores so if we compare this point oh seven gain and scores to the typical achievement growth at the secondary level it comes out to about 26 percent of a year of instruction or about two and a half months of additional math instruction so you know that when you think about in those terms this is a pretty large gain from being taught by a TFA teacher that said they're still scoring well below the average in their state so they're starting out really low they're doing better after being taught by a teach for America teacher but they're still scoring point five three standard deviations below the mean in their state so you know it's not solving the problem entirely of low achievement in these high poverty schools we also as you were asking earlier we did this comparison separately for teach for America teachers versus teachers from traditional and alt cert route so this first bar on the left is this main estimate I already showed you this showing teach for America students outperform the students of comparison teachers by point oh seven standard deviations and then you can see that for teach for American compare to traditional teachers the TFA teachers boosted student achievement by point zero six standard deviations and then compared to the teachers from all cert programs they boosted student achievement by point nine standard deviation so in both cases the teacher America teachers are outperforming the comparison teachers in these schools we also did this comparison just comparing the novice TFA teachers in their first two years of teaching to the more experienced comparison teachers who had been teaching five years or actually teaching more than five years to sort of directly address this criticism that you know the teacher America teachers don't stay around long enough to gain valuable experience and we found so as you might expect the teacher America teachers outperformed the novice comparison teachers by point oh eight standard deviations but they also maybe somewhat surprisingly out performed even these more experienced comparison group teachers uh-huh to some extent yeah I mean I could I guess the you could say well maybe that you can't compare them directly because they're not you didn't randomly assign students so strictly speaking you can't we didn't randomly assign students between more and less experienced comparison teachers so that we can't say that in some rigorous way but yeah it certainly suggests that the experienced teachers weren't that much more effective than the yeah I'll have when I get to the non-experiment analysis I'll talk a little bit more about that but to break this in this maybe starts to get it that question to I mean I think a lot of the literature shows that the real gain to experience is between the first and second year so we also broke this down looking at first-year teachers compared with these teachers with more than five years experience and second-year teachers and what you can see is the first-year TFA teachers are doing about as well as these really experienced comparison teachers and then the second-year teachers are really outperforming the experienced comparison teachers so you know it's the second-year teacher who are driving this positive impact but you might even find it impressive that these brand new TFA teachers in their first year are still doing just as well as these teachers have been teaching five years or more than five years. We also looked separately at the middle and high school level and you know found positive effects at both grade levels. So we did this little analysis to try to sort of there wasn't as I told you earlier there wasn't a whole lot of differential attrition so you know like we had outcome data for 79.5 percent of the treatment group sample and 78.5 of the control group sample so just this one percentage point difference in missing outcome data but still you know you might worry that that could bias our impact estimates so to sort of put a bound on how much that could bias our impact estimates we use this approach proposed by David Lee which let's see so we focus we're really just concerned about this group in the middle this group of students who we think has missing data only if they're in the control group and if they're in the treatment group we do have data for them so our concern is you know maybe we're just missing data in the control group for like the really lowest achieving students and that's going to bias her results or maybe we're missing data for the highest achieving students in the control group and we would have had their data in the treatment group. I guess I have to be a little doubtful advocate about the test that's being used to test the students and that's given to the students by the teacher. TFA teachers are closer to this whole experience of the test they might be unfamiliar with this kind of test they might be comparing the students in another way or giving them the ways of thinking about it before they actually even come to the test because your attitude about how you were going to test affects a lot. Yeah so I mean it's possible there are increasing test scores without having any real effect on what students have learned. I mean I think one thing that makes me feel gives me some comfort is that we saw these same effects at both the middle and the high school level and so the middle school level we're using state tests they're high stakes it's quite possible that the teachers are really focused on getting high scores on the state test but the high school students they're these computer adaptive test we're administering the teachers never seen them before they don't really you know they don't look like the state test in terms of you know they're on a computer and so I think you know the fact that we found positive impacts at the high school level as well as the middle school level makes me think it's less just a story of the teachers being really good at teaching to the test but you know it's possible. I'd like to speak on it as a TFA alum. I do understand that my first year of TFA we spoke about we had on-the-job training so we had accountability outside of our job being at risk. So I think that it should be at least thought about how that could impact these results like other teachers didn't have that sort of one-job training that is not in the environment of my job may be on the typical by the end of this year. And that pressure and that fear could impact how these... You mean the TFA teachers would have that pressure from TFA? Or that the teachers don't have that accountability that non-oppressive accountability. And so TFA is a more TFA alum at least in my personal experience more encouraged on-the-job training we had to meet with our TFA cohorts at least every month in a way to develop us more as teachers and also to develop us as people who teach the kids for a test. And I'd say that other teachers didn't necessarily have that same sort of experience or that same framework and that maybe doesn't account for some difference. Yeah, I mean I think that's definitely probably part of... I mean that's... What we see that as part of the TFA treatment. So that's part of the package when you hire a TFA teacher you're getting all the support that comes along with it and the accountability. So yeah, I think... Yeah, that's part of the package. That said, I don't... I should mention we didn't share individual teachers' test results with TFA. So there wouldn't have been pressure on the teachers to get a particular... to score really well on these tests that we were administering. But yeah, I definitely... I think TFA would probably claim that their training and support is important too. Yeah. You had Z scores I think for all the students. Stay wide. But the high school didn't take the same test. Stay wide. Yeah, I think I should have put that on a slide. So we... Because students were taking test in different states we had to... And the high school level they were taking the NWA we had to standardize the scores in some way. So we converted them to Z scores by dividing by... And the states we divided by... Yeah. Divided by the standard deviation in that state. And the NWA we used the national mean and standard deviation for the NWA. So it's a slightly different standardization but it still puts everything on a common scale. Sort of the best we could do. So basically with this attrition bias model I was saying that we just... We're basically... We're worried about this middle bar of students who are missing data in the control group and not the treatment group. So we basically run two models. One which assumes that it's the highest achieving students for whom we're missing data and we drop them and re-estimate the impacts. And one assumes that it's the lowest achieving students who are missing data in the control group and we drop their scores and re-estimate the impacts. And so we get these bounds on the TFA teacher impact ranging from 0.05 to 0.11 standard deviation. So still positive effect of meaningful magnitude regardless of which way the attrition might have operated. So let me show you the experimental results. Looking at... Sorry, the non-experimental results trying to account for why we saw these positive impacts among the TFA teachers. And this is... We thought this was potentially a useful analysis because principles are... When they're hiring a teacher they're basically just able to see sort of the types of characteristics that teachers might list on a resume. So are any of these factors useful in predicting how effective a teacher will be in the classroom? So we looked at things like academic ability, math content knowledge, instructional training, and on-the-job experience. And the analytic approach was pretty straightforward. So we took our original impact model and we just tossed in a bunch of teacher characteristics we thought might explain some of the impact. So the first question is was this particular teacher characteristic associated with the teacher's effectiveness in the classroom? And then do TFA teachers show more of that particular characteristic than comparison groups? Because if a particular characteristic is associated with a teacher's effectiveness but the TFA teachers have less of that experience than obviously that characteristic does not explain the fact that the TFA teachers had positive impacts. So we found that most of the characteristics we looked at were not significantly related to teacher's effectiveness. So we looked at whether a student graduated from a selective college. The number of college-level math courses the student took, whether the student had used college-level math in a non-teaching job. That was sort of more important among the teaching fellows because they are going after these mid-career changers. This is just within the TFA. This is not between the TFA and the traditional teachers? No, so this is all the teachers in our sample. So TFA and traditional and all sort of teachers. So we're just tossing in these. So it's sort of, we're looking at since we have block fixed effects in the model, we're looking within each classroom match sort of the difference, how the difference in a teacher's experience is related to the difference in the teacher's effectiveness. Did you survey them about their courses in college? Yeah, so we asked, it took us a while to come up to figure out how to ask that question, but we asked, sort of had a list of how many courses did you take and blah, blah, blah, whatever calculus one or a list of all possible math courses they might have taken in college. I have a related question. So how precise are some of your estimates here? I'm just saying I'm a little worried like I can imagine among the non TFA sample and many blocks there might be or many schools might be no comparison teacher that went to selected institutions. Right. And then you're kind of estimating some of these effects off from very small sample. So how much should we conclude that this just doesn't explain your sample or select? Has that other literature actually value out there from secondary school teachers that do show college selectivity? Well my, it's interesting you say that I mean our read of the literature was that typically it hasn't shown college selectivity to be related to teacher effectiveness. What for secondary school? Yeah, it's been a while since I looked at it, but that was sort of our takeaway. We weren't surprised to see that it wasn't related to teacher, I mean you're right that we're, I mean we have sort of presumably a lot of the TFA teachers have a one for selectivity and a lot of the non TFA teachers have a zero. So you're right. We don't have a lot of variation, but there is some variation between TFA teachers and within TFA teachers and within non TFA teachers, but you're right that's a potential limitation of this analysis. So we looked at their practice scores, the number of math hours of math pedagogy instruction they'd had, the number of days of student teaching they had completed. And none of these things were significantly related to their effectiveness. Did they go in the right direction? Trying to remember what the exact estimates are. Not necessarily, they were just sort of small and not significant. I don't remember, the math content knowledge scores did, they were sort of on the margin of significance and they did, at least for the high schools math, the math content knowledge, not the middle school math. It was sort of on the margin of significance in the right direction, meaning those with higher scores were more effective. So I think for communicating this and understanding it, we need to know how much variation there is in the effect size. And prior on how much variation there is, the average is .07. But I don't know what's going to be big and what's going to be small. If everyone's tightly clustered around .07, there's nothing to explain. Right. In fact, Jeff was on our technical working group for this study and he was the one who recommended that we include this nice graph showing the distribution of impacts which we did. And there is, we did only do this analysis because we had a distribution of impacts. But you're right, it would be nice to have that in the presentation too. So of all the things we looked at, only two characteristics were significantly related to teacher's effectiveness. One was the amount of coursework a teacher took during the school year and that negatively impacted a teacher's effectiveness, which you may or may not find surprising. I mean, I think what's going on here is probably, the story could be that teachers who are forced to take this coursework, it's competing for their time and energy that they would otherwise be spending on their classes and that's lowering their effectiveness. And in fact, this, a very similar finding, so I mentioned, Mathematica conducted another study for the Department of Ed looking at teachers from less selective alt-cert programs. And that study actually found the same pattern that the teachers who are taking a lot of coursework during the school year are generally less effective than teachers who aren't taking a lot of coursework. So this could not be driven by, so basically the TFA people are all taking the same amount. Well, it varies by state, but yes. Well, and also some of them are past their, they take more in their first year than second year. On the traditional side, I assume the main people taking lots of coursework are going to be the least experienced people. So you're sort of, you've got two things going on, which is their age as well as the amount of coursework. Yeah. So yeah, so you're, we're controlling for all these things in a model, but you know, we're controlling for them simultaneously, but that's not to say they're not sort of correlated in a way that's messing up these estimates. But at any rate, so I mean, I don't think we want to, I would say these results certainly aren't as rigorous as the experimental results, but you know, I think they're consistent. The second thing we found is, comes back to this question of teaching experience. So we found that teachers with two years of experience were more effective than teachers with one year of experience. And after that, you know, there wasn't too much of a gain to additional years of experience, which is sort of consistent with much of the other literature on teacher experience. Sort of this big gain right in the beginning. So the bottom line is that these, none of the credentials we looked at can explain the TFA impact. So we found that teaching experience increases the teacher's effectiveness, but TFA teachers were less likely to have two years or more of experience. So that can't explain the fact that the teach for America teachers were more effective. Similarly, we found that coursework negatively affects a teacher's performance, but the TFA teachers were taking more coursework. So again, this would predict the opposite that the TFA teachers would have. Bottom line, we really can't explain in our analysis why the TFA teachers were more effective. So in conclusion, we did find that teacher America teachers were more effective than other math teachers in the same schools and their relative inexperience was apparently outweighed by these other attributes that we were not able to measure. And we can, I think say that these attributes that make the TFA teachers more effective aren't things that are easily observable on a resume like years of experience and praxis scores. So clearly more research is needed to identify these attributes. I mean, you could hypothesize that TFA is doing something right and maybe it's part of this intensive screening process where they have this full day interview and they're gathering all this information and rating students on these different subjects really getting at something that predicts teachers effectiveness in a way that just these, you know, looking at a teacher's resume or a quick 30 minute interview really can't predict. Or it could also be the training and support TFA is providing or some combination of these factors. So I think an important question, you know, so we've shown TFA teachers are more effective in secondary math. Can we expand this? Can they, TFA maintain these positive effects if the program is scaled up to reach more high poverty schools? And in part that's the goal of this second study I mentioned. So Teach for America got this $50 million grant from the Department of Education to scale up its program so we could learn about, you know, can Teach for America and other programs like KIP and other programs that have shown evidence of effectiveness, can they maintain their effectiveness as they scale up? And so Teach for, is part of that grant, Teach for America's Hired Mathematica to do an independent evaluation looking at their effects in elementary schools as they scale up. So that's a study we're working on now and we should have findings from that in the next year. So that's it. So thank you. The attributes in terms of let's say like practice for selectivity schools I guess you did selective, like binary did you also do practice? And if you actually think about selective school that might include like 200, there's a range in there in terms of practice scores like I know that there's, you know, distinction marks where if you get that you really, you really know your math whereas if you're just saying pass or didn't pass or I was kind of I'm not even thinking I was wondering like how you tried to measure those attributes. I mean we did we did do sort of a variety of different specification tests. I don't know that we, in terms of the practice or anything other than we, we might have looked at whether they scored above and below the median versus just these continuous scores in the main analysis I showed you. What was the other thing you asked about? We tried some other specifications. It probably wasn't exhaustive but yeah, and we, yeah, we had fairly crude well you know we had within our measure of selective colleges there were three different categories from the Barron's rankings of college selectivity and I don't know we did we didn't I don't think we broke that. We looked at highly selective and selective but didn't break it down any further than that so. Yeah. When you broke down the first and second year teachers, that was across all all teacher groups or was that just what the teacher meant? Or was it the big effect in the second year? Yeah, so that's all TFA teachers in our sample compared with whatever teachers they were matched with. So, yeah, better to see. Do you have subgroups that would be a way of getting a nice lighting effect? Like being in the core of the interest of course? Yeah, so it was too tiny to there really and maybe I don't know five or not enough teachers to do that analysis. Jeff? Remind me how we actually think about spillovers here, right? So what one could imagine an alternative designer which ran an assignment in a symmetric place at the school level instead of at the risk that the two teachers who are randomly assigned within the school are somehow having influence on each other. The non-TFA teacher is motivated to excel beyond their usual norm as a result of the competition with the TFA teacher or that there aren't knowledge spillovers from the TFA teacher fresh from undergrad to the even more experienced but further from undergrad non-TFA teacher or whatever. Is there any kind of descriptive analysis that went along with that? Well, what descriptive analysis could one do, do you think? Well, it's part of the teacher interviews and I guess I have a lot of time to suggest this. In the teacher survey, you could say how much do you interact with your colleagues about specifically how much do you interact with that colleague about stuff. I mean, you know, it's not our social science evaluations are never double-blind. It's not clinical trials and it's not a lot. Uh-huh. Yeah. I mean, in this second investing, the scale-up study I mentioned, we did ask some questions about how much time the teachers spend in various activities including giving advice to other teachers, receiving advice from other teachers. But we didn't collect that information here. So, yeah, it's, I don't know. Yeah. And so, yeah, I think that's a great question and it's important to keep in mind. This is just looking at secondary math. I don't think we can say anything about other subjects here. And it's sort of the previous literature has shown strong effects in math and science and no effects of Teach for America teachers in reading. So, we really don't know. But yeah, I mean, it could well be that whatever people who are well suited to teach math but maybe less well suited to teach kids to read. That's a good point. Yeah, I mean, I think my understanding is that Teach for America tries to keep that relatively standardized although maybe, were you a TFA teacher? Yeah. So, yeah. So, okay. So even within regions you think it varies. Yeah, so I don't know. I think we have some information on that. We did not analyze it, but I can imagine. Yeah. So, uh-huh. It certainly could be. I mean, I think you guys have, I think, I think people have made the point that it is an interesting thing to look at. So, and possibly we could go back and look at it in this study too. I'm not, not sure the level of detail we have on that support, but we have something. So, we could look at that. Yeah. I'm not sure if this would be reflected in the actual course work that the teachers had taken and whether or not that was through a mathematical department or if that was through a school of education. I think that's a really good point because obviously like a college level calculus class in the math department might look different than one in an ed school and, no, we didn't. You know, we asked if there was some broad category secondary math, you know, that would have been probably useful to look at. We didn't. I think we could. I think that would be a useful thing to do. Yeah. Do you have any data on how relatively how specialized TFA teachers might be relative to non-TFA teachers? I mean, I know just from like general high school courses, usually teachers have more than one subject. So, I think that's possible. TFA was attracted to Algebra versus Algebra and Geometry. We did not. Well, let's see. We only know, we could certainly look within our sample, whether they were teaching a given teacher had more variation or more or less variation in the subject they were teaching. We haven't looked at that yet. I think that's a good idea. And then also in this other scale-up study, we do have TFA's given us data on their entire teaching core and including their assignments. So, we could certainly look at it there too and even to inform the study because we have information on secondary teachers and that data. So, that's a good point. Yeah. It's one of the power factor here is some traditional trained teacher who's going to stay. Do you have a sense of what the sort of comparative nutrition is between these two groups? So, we certainly from our sample don't know that other than we just know sort of average experience levels which can lead us to I think reasonably conclude that nutrition is higher among the TFA teachers but not much beyond that. I don't know I don't know I'm sure somebody might know sort of standard turnover rates among non-TFA teachers in high-poverty schools but I don't I don't know a good source of that information. I mean, there's some, there's a study of high-poverty teachers that suggested that what was it? I'm going to get the number wrong but some sort of sizable portion remain into a third year actually and then after that they tend to go off into other things. You get Jeff? I don't have a number but I think there is a fair amount of nutrition from high-poverty schools and it's also important to think about VAM's question about the experience of maximum comparison with teachers and so this set of inexperience non-TFA teachers in high-poverty schools are sort of less selected than the set of experienced non-TFA teachers in high-poverty schools because some of the non-TFA teachers when they can go to lower-poverty schools sort of move up. Some of the non-TFA teachers, right. Yeah, so, yeah. The ones in state are the ones who either can't get another job or who are worried for kinds of selection but to some extent if I'm a principal of you know, that's the comparison I care about. Yeah. No, I agree with that. It's the comparison you care about though when you were breaking it down into the different categories of what explains it that's where you get messed up. Meaning, you know, you're not necessarily looking at experience in those high-poverty schools reflects two very different types of things selection in the teachers versus how long they've been around. Right, yeah. That's a good point. I don't extrapolate to other settings. Yeah, absolutely. I mean, it seems to me that do you have any way to get purchase on how long that teacher has been in that school because you have that sort of lemon dance effect, right, where you pass around the lower quality teachers. Yeah, I mean, presumably the lemons would be here. I mean, presumably these are the really I mean, teacher America is targeting the schools that really struggle to get teachers. But yeah, we do we did collect that information. So we didn't look at it in this non-experimental analysis but we could. Is there a cost-benefit now? I mean, you think a lot of that can be out of the little calculations that would be very interesting. Yeah, we thought a little bit about doing one, but it would require a lot of additional data collection. So we actually wrote up a little memo for the Department of Ed suggesting they might want to fund such a thing in the future. I mean, you know the salary scales for both the FAS? Well, they received the same, right. So the cost-benefit analysis would take into account the fact that TFA teachers are cheaper for districts because they're earning the same salaries as other teachers with the same years of experience. So if you hire a TFA teacher, you only have to pay this. I think we're folding in the attrition to imagine that all the teachers at school and you just hire two-year TFA people in the output after two years. And then you hire new TFA teachers at the same below salary. So the TFA teachers are cheaper to the district in that sense because they're not... But you're almost going to never get somebody with more than, you know, maybe 10% of them ever get more than three years of experience or five years of experience. That's right. And that's, I guess, the worst is having somebody who goes, this is way, way better. Do you have a TFA teacher? Don't you get that from this? No, it's .07. It's .07. They're not even going to be expected how long these guys are going to be with the average teacher salary. So if you hire a string of TFA teachers, you're going to be paying a lot less than the average teacher who stays. But that said, so that's the cost from the district's perspective, from society's perspective, teach for America costs a lot. So I think I saw something like if you divide their annual operating budget by the number of new core members they place, it's like $30,000 a teacher or something. So it's... So maybe that's not the right way to think about it. I don't know. I mean... That's a very important number, I think. It is. Yeah. It would be more cost-benefit analysis. It would be more nuanced than just that simple calculation. But it would be definitely cheaper to the district, but maybe to society more costly. Over there. Yeah. Sorry. Kind of like thinking of the distributions of these two populations. Like in one sense, .07, when you express it in terms of... I use the teacher for America like an extra quarter a year or half a year like that. If you think about it in terms of student outcomes, that's really powerful. Like, what we also compare it to. Because then you're starting to have the population of TFA and non-TFA. It's only a percentile or two difference. So by some measures, you could say they're the same. And I was wondering if you had any way to compare this like other... Not necessarily education, but like how does it stack up to other ways of screening applicants or something like that? Because you kind of want to say, oh, it's effective. Why isn't it more effective? If they can truly figure out the whole difference. Do you still understand what you're talking about? Yeah. So, no, I mean, I think the challenge, I mean, if we did a cost benefit analysis, this would be also the issue. We'd need to sort of find some other intervention of which we know the cost and we know sort of what effect size it brought about. And start at the elementary school level, but at the secondary level, there's... This is not that small. I mean, this is per year. So like the star experiment was like two tenths of a standard deviation for being in a small class for four years or three years. The charter school effects that we're seeing in, say, Boston on a per year basis are similar to this. So you think if a kid spends four years or seven years, kind of, questionable. Is it additive? We don't know from this, but if they did, that's half the standard deviation. So this is pretty being compared to some other interventions. It is. I mean, the effect size per year, it compares to things that we tend to think are... I mean, there's just not as... Okay, yeah. I feel like there's not as much. Okay, yeah. Uh-huh. Yeah. I will just... One... I guess one of the criticisms... I mean, there are a lot of people who don't like TFA, and so we did get some criticisms of the study from those people. But one... One anti-TFA blogger sort of claimed that we were blowing this .07 effect size out of proportion. And he had this wonderful graphic. Let me find it. So he said that impact was extremely small, and we were overstating its importance, and they had this graphic that put the effect size on a scale of 0 to 5 standard deviations. So it's all relative. Yeah, all the other things, it's like 5 standard deviations. Yeah. So it's all relative. What blog is this? What blog? I'll tell you again. It's not a blog, it's public. I should be changing my... I don't want to publicize it, but, yeah. But, yeah, I don't want to publicize it. I don't want to publicize it, but, yeah. But, yeah, so... We'll send it around the email list. But, yeah, we thought it was meaningful. In fact, like, when we started, when we first did this feasibility study for a Department of Ed, they were sort of, all their big random assignment studies typically target a 0.15 effect size. So you're supposed to, you know, power the sample to detect the effect size of 0.15. And we argued strenuously that if they're looking at the secondary level, they need to target a smaller effect size because it would be unrealistic to expect anything to have an effect of 0.15 at the secondary level. So they were very receptive to that argument, to their credit, and, you know, allowed us to recruit a sample that was powered to detect the smaller impact of 0.10. So, well, thank you so much. This is for great comments. Thank you. Thank you.