 So thank you all for joining us today. So I'm Sue Donarski, and along with my colleague, Brian Jacob, at the Education Policy Initiative at the Ford School. We're happy to co-host this event with the School of Education. We thank the guestners, Charles H. and Susan Gessner, for their generous support of this event and for the staff who did all the work to make this happen so we can enjoy it. So this is a very exciting time to be talking about early education. The White House has been advocating for preschool expansion ever since the State of the Union in 2013 when the president announced his Universal Preschool Initiative. And since then, there's been a burst of energy to early learning at both the state and the federal level. Even here at the University of Michigan, we've been paying attention to this issue quite a lot lately. Just a few weeks ago, we had Greg Duncan speaking at the Institute for Social Research. And our colleagues at the Ford School hosted a discussion of child care policy in Ontario and Michigan earlier this week. So with so much interest in these topics, it's a good time to get the design of these systems right. And that's what these distinguished researchers are going to be talking about. So let me tell you right now who they are. So to my right here and starting us off is going to be Harvard Bloom. Howard Bloom, who used to be at Harvard. Yeah, he was. At the Kennedy School, Howard Bloom, chief social scientist at MDRC, a research institute based in New York. And Howard leads the development of experimental methods for evaluating program impacts, including a current reanalysis of the National Head Start Impact Study. To his right following on will be Christina Weiland. She's assistant professor right here at the School of Ed. And she focuses on the effect of early childhood interventions and public policies on children's development, especially kids from low income families. Next, we've got Daphna Basak, who's visiting from Warmer Climes, Virginia. She's an assistant professor in education policy in the Curry School of Education. And she was an undergraduate here at the University of Michigan. And she is less loyal to Michigan than she is to Zingermans, where she worked all the way through her undergrad career. She told me her research includes work on the effects of pre-kindergarten on educational outcomes, the early childhood teacher labor force, and trends in kindergarten becoming more academically demanding. And finally, we've got Hirokazu Yoshikawa, the Courtney Sale Ross Professor of Globalization and Education at NYU Steinhart. And he's a community and development psychologist who studies the effects of public policies and programs on children's development. So let me tell you about the format. So the speakers will speak. There's going to be a Q&A moderated by Brian Jacob. You've got note cards to write questions down on. And we will collect those periodically. And Brian will read from them and ask the questions. And if you're this type of person who wants to go on to Twitter, you can also put your question on to Twitter using the hashtag epi-early-ed. You can also snark there or say nice things on Twitter. And so now I'd like to present our first speaker, Howard. We'll get us started. Thank you. Thank you, Susan. And thank you all for coming today. And thank you, Chris, for inviting me to come. All right. What I want to talk about, I'm definitely not an early child education expert. But I'm going to talk about some work that Chris and I have been doing over the last four or five years, which is a reanalysis of something called the National Head Start Impact Study. It's a very important study. It's got a lot of information about Head Start and its alternatives. And in many ways, it's a really good case study in terms of how to study variation in impact. Let me just get back to the beginning of this thing. There. And that's what I'm going to talk about. I mean, what I do know something about is how to study impact variation. I've been working with a number of colleagues over the last years, trying to develop methods and ways of thinking about variation in impacts of programs or interventions across individuals or subgroups of individuals and across sites. And so the work that we've been doing on reanalyzing the National Head Start Impact Study is an application of those kinds of thoughts and that kind of thinking to substance that's highly relevant for today's discussion. The Head Start Impact Study was a congressional mandated evaluation of the National Head Start Program. It is the first large scale randomized study of Head Start and quite frankly is the only large scale randomized study of Head Start. It was conducted in the program year 2002, 2003. So you get a sense of when the program I'm going to be talking about was being fielded and what it was that's being evaluated. It was conducted. This is very, very unusual, not unique, but almost unique, very unusual in evaluation research in a nationally representative sample of sites, sites being Head Start centers. I'm assuming everybody knows about Head Start centers, but it's nationally representative sample of oversubscribed Head Start centers, which was pretty much most all of the Head Start centers but not all of the Head Start centers oversubscribed in the sense that they had more applicants than they had seats. And so that it was possible both logistically and on sort of political and ethical levels to justify a randomized experiment, much like I'm assuming most people here know what a randomized experiment is, you're randomizing applicants into the treatment, which in this case would be Head Start and an offer to attend a Head Start center, or control status, which in that case would be not getting the Head Start offer. That creates an ability to compare a treatment group of kids who has since won that lottery, if you will, to a control group of people who lost that lottery, who in all respects on average overall should be the same in both measured and unmeasured ways, which gives you the kind of rigor of that you don't get in the evaluation designs that aren't randomized. So that's why the randomized part of it's important. It's an unusually rigorous way to say whatever difference we see between a treatment and control group later on in terms of their outcomes is arguably caused by the offer of Head Start versus not getting the offer of Head Start. So that's a really important starting point for all of these things. The study produced a sample of roughly 4,400 kids that were randomized in about 350 Head Start centers. The analysis we're gonna be talking about is a subset of that. It's more like 3,500 kids in about 300 centers, but it's a lot of kids in a lot of centers that we're talking about, and Ashley represented the sample. And this sample, this data is not quite publicly available. It's what's called a restricted use file. So under certain guidelines, you can use this data, and it's been used by a number of researchers. So I'm not gonna talk just about what Chris and I have done and what we've found, but I'm gonna try to look across what others who have been using this data set seem to have concluded and a little bit about why, but mostly about our stuff since we did it, but also what others are finding and what that means and why they think that what they've found is to be believed. Okay, so I don't know how to separate these pages, so we'll look at them at the same time. So my presentation, the goal, I just said, is to briefly synthesize and summarize what's been learned about impact variation from the Head Start Impact Study. Okay, my conclusions. I'm gonna jump from the conclusions that are fine. So the first conclusion about, so I have conclusions about impact variation across individual kids. I have conclusions about impact variation across sites, and then a couple of conclusions about the role of early child education quality, because a lot of people have been trying to deal with the issue of what's the relationship between quality and outcomes and impacts and stuff. So I wanna talk a little bit about each of those things. All right, the first conclusion then is the first sub-bullet, if you will, that thing right there. Impacts tend to compensate for limited prior English. This is a finding that Chris and I came upon, and I wanna demonstrate to you the results that we base that conclusion on. And what I mean by compensation is kids who originally, before the experiment started, did less well than others on a pretest of, as it turns out, receptive vocabulary. They had bigger impacts than kids who did better than them on a pretest, but only inside of dual language learners. And now I wanna make this point. And this is a finding that is sort of unique to our work. The other folks, some people disagree with it. Some people simply aren't speaking to that point. But let me show you the findings. The findings are right next to it. There are two sets of findings. The top panel is one set of findings, the bottom panel repeats those findings for another cut through the data. And let me describe what those findings mean. The two columns are for different outcome measures. They're different tests of kids' ability to do cognitive things. So one is something called the PPVT, which is the Peabody Picture Vocabulary Test. That was a test for me. Effects like, this is receptive vocabulary, okay? Can people understand what they're hearing? Okay, another test is effects, that is for early numeracy. That's the Woodcock-Johnson Apply Problems Test. It's for early numeracy. Can they deal with early numerical kinds of things? Okay, we looked at other outcomes. But these two outcomes, we see this very, very pronounced effect. What is this pronounced pattern of effects? It's a pronounced pattern of effects, okay? We break the sample into dual-language learners. At baseline, kids, there were a number of characteristics of the kids were enumerated, data were taken on them. And so some kids were designated as dual-language learners or not, okay? And so dual-language learners is one subset of the sample. And English only is the rest of the kids in a sample. So it's a binary thing. It's one group or the other. And inside of the dual-language learners, we split that sub-sample into two sub-sub-samples, if you will, which are those dual-language learners that were what we call low-preq test performers. And we define that in a particular way as in this panel scoring at or below the 33rd percentile of all the kids in the study. Okay? So those who were scored on a pre-test before, pretty much before Head Start started, okay? Those who were in the lower third of the pre-test versus those who weren't in the lower third, everybody else. So that's the split inside of dual-language learners, low-preq test performers, other sample members. Likewise, the same split with the same criteria for English only sample members. And then we estimated the average impacts for each of those four sub-groups. Are you with me? Okay, and this is the pattern that we see. For the dual-language learners, the low-preq test performers have an average impact measured in what's called the standardized mean difference effect size. And most people who do education research know what that means. And those of you who consume education research have read findings that are put out in a metric. But it's a particular metric that is meaningful to researchers and not as meaningful to other people. But on effect size of 0.36, there's a very, very large effect. Anything, I mean, people differ, but I would say anything above 0.15 is something to be relatively pleased about, okay? And if you're talking about more than double that, that's a very large average effect, okay? That's on the PPVT. For the low, the dual-language learners who are low-preq test performers, the other sample members who are not low-preq test performers, it's a 0.09. That's not nothing, but it's not anywhere near 0.36. And the fact that that's bolded, that those two are bolded, means that those two findings are statistically significantly different from each other, okay? So there's a big difference in their actual values of those estimates and they are statistically significant. They're more different than could have happened by chance. Over here, for the applied math, you see a similar pattern, okay? A big positive effect for the low-preq test performance amongst dual-language learners and a negative but not statistically significant, so you can't actually say that's different from zero with any confidence, but the difference between those two is absolutely statistically significant and quite large, of course. All right, now, so what, that's a compensatory pattern. Compensate, the word that people are using, compensatory means you're compensating those who started off towards the bottom more than you're helping the kids who weren't at the bottom, so that word people are using all the time. You see it amongst the dual-language learners. And everybody sees that finding amongst the dual-language. You do not see it amongst the rest of the sample. You do not see that pattern. So among the English-only sample members, these are the comparable findings and they're all relatively, they're small to modest at best and there's no clear pattern at all between the low-preq test learners and the rest of the sample for the English-only. We think that that is basically evidence that Head Start is compensating for lack of prior access to the English language because we're talking about receptive vocabulary here as one of the outcomes and then math where you gotta, that test was given in English so you gotta understand English to do the math test. We don't think, there are other people, there's a group out of UC Irvine, Marianne Bittler and her colleagues where they're talking about finding a compensatory finding but they're not talking about an English language. It's like not specifying what it's compensating for, it's just compensating but not specifying for English language because they look at this pattern a different way and they don't think they're finding this effect. I don't think however you can explain this effect away. Now, one other thing we did since we didn't want it, we wondered whether is this finding, is this pattern of findings sensitive to where you make the cutoff for low-preq test? Okay, a third is low but it's not as, you know, you could go lower. So we cut it off, we did the same analysis for different thresholds, a different pre-test threshold which is the lowest 20% of the pre-test scores, are you with me? So it's the same analysis only calling low-preq test performers those who were in the lowest 20% of the pre-test scores and you see the exact same pattern only a bit more extreme, a bit more extreme. So we think that's evidence of compensation for the English language. Now, one of the things we did was, and we're not quite sure what to make of it and we've been changing the way we analyzed the data and I don't have a slide to go with this, is we followed them up, there are later tests. This test, these outcomes were at the end of the head start year. They were randomized before head start, there was a head start 2003, 2002 to 2003 year and the post-test was given in the spring of 2003. These findings are based on the head start year, the end of the head start year. There were later waves of follow up for the head start impact study where a lot of these findings faded away and some of you who saw Greg Duncan talk here recently heard him talk about fade out and this is one of numerous sort of examples of that. We're not, the question we have, in which we're not finalized on the analysis yet, is there fade out of this pattern here amongst the dual language learners? The dual language learners, there's less fade out than the other people. I mean, it's a bigger impact to start with and there's a lot of fade out but there may be some residual effect or not. We haven't really decided what we think yet on that but the policy implications of that are profound. If it fades out, what does that mean? If it doesn't fade out, what does that mean? But it's very clear to us that there's this English language compensation thing going on, that's one set of findings. The other thing I wanna talk about and then I'll try to talk about other people but I'm not gonna talk about other people given that I just had one minute but that's okay. We can do that with questions and answers. The other thing I wanna talk about is cross-site impact variation which is where we spent a lot of our time. Both developing methodologies and applying it. The question here is how much do the impacts of head start now on six different outcomes, four cognitive outcomes and two social emotional outcomes, how much do they vary across sites? So what you're looking at is a slide where the first column is the average, the cross-site average impact again in this effect size metric. What you're looking at in the second column is the estimated cross-site standard deviation across sites. So for example, this finding here and then I'll stop because I know I have to stop. It says for receptive vocabulary on the PPVT, the overall average that we estimate and this is the effect of attending head start is a 0.17. The estimated cross-site standard deviation is a 0.17 in the same metric. What that suggests, if the cross-site distribution of impacts of head start is anywhere near normally distributed, bell-shaped or anywhere near basically that 95% of the sites, what it implies is 95% of the sites are somewhere 0.17 plus or minus two times 0.17. Which by my calculation, since I don't do arithmetic particularly well is somewhere between 0.17 and a minus 0.17 and 0.51. A minus 0.17 and 0.51, okay? It's a big range. Most of it's positive, little of it's negative but it's a very, very big range. So then the question is why all that variation? And I'll stop in a second. The other aspect of this that's worth looking at if we've had more time or if somebody asks a question perhaps is that you've got an early numeracy outcome with a decent size average and a much smaller variation. Chris and others think that might be due to lack of variation in the way early math was taught in head start anywhere at that point in time and I'll stop there just sort of suggesting those things. Here we go. Hi everybody. It's great to be here having this conversation and getting a chance to hear questions about the work. Thanks for the invitation. My birthday was recently and so I feel like it was a big present to get to invite these wonderful colleagues here to have this conversation. So I volunteered to go after Howard Bloom which is something I rarely do and try not to do because he's always a hard act to follow because the content of the presentation that you saw about head start really sets the stage for what you can think of kind of as preschool 1.0 question. So if we increase access to preschool, who benefits, right? And so you saw some really interesting findings around dual language learners who have low baseline test scores in particular benefiting about variation and impacts across centers as well. And I'm gonna talk today a little bit about the 2.0 question which is jumping a little bit into beyond access to thinking about how you scale high quality in particular. So I'm gonna talk about what specific program elements work best for ensuring high quality and promoting initial and lasting gains for children. So to launch into that, let me define what we mean generally in early childhood when we talk about high quality. So we conceptualize it as being in two buckets. The first is structural features like class size, ratios, teacher education and training. These things are fairly easy to regulate and monitor from a policy perspective. We have done so and what we've seen is that we have moved to a place for very few programs, public programs at least do poorly on this domain of quality. They're sort of in a middling range. A harder one to regulate are process features. So these are high quality interactions within the classroom, particularly rich learning opportunities are the children being challenged in a way that's developmentally appropriate and it's pushing them along a continuum of skill building that they are fully capable of doing at this age. This is harder to regulate. And what we know about the relationship between structural and process quality is that structural quality sets the stage for process quality to occur, but it alone isn't sufficient, right? So class size isn't enough. And nationally on process quality from the data that we have, we do have some good news in terms of emotional support, which is a piece of process quality. We do a pretty good job in our public programs of making kids feel supported emotionally in the classroom. So these are data from five public systems around the country that are large scale, the Boston Pre-K, Tulsa Pre-K, Tulsa Head Start, National Head Start, and then the 11 state Pre-K study. So that dotted line up there is about the good threshold on this measure and pretty much all of these systems are clearing it. So if you look at instructional support, however, we have a problem on our hands nationally. So that black line is the adequate line as as you can see folks are pretty much not clearing it or well below with the exception of Boston, which I'll talk about in more detail, okay? So if we're thinking about what works in early childhood education, I think that's the problem we have to think about because that's probably the one where we're struggling with the most. And Hero, who's on the panel today and a group of experts reviewed the literature in 2013 as this proposal from Obama was coming out to give some guidance to policymakers about what we know works in education, early education. And identified the strongest hope model which across a set of about 12 to 15 RCTs that have occurred over the last decade, there's been a kind of a pattern of emerging success, which is that you take a domain specific curricula that was developed by somebody who's an expert in a particular area of early childhood development. So now Duke here at the School of Ed has one of these curricula that works for literacy, for example. And if you know now, you know she knows that domain like the back of her hand. So it's not a surprise that, you know that curriculum in particular works well. So if you pair that with regular in classroom coaching by a supportive mentor, that seems to be a winning package. So we have this pattern in which that is the thing, if you're gonna tell people what to do, that's probably the thing you would tell them to do in the program if they're gonna have one. So we do have some examples that are really important of combining multiple domain specific curricula. So preschool teachers are asked to improve children's learning across a variety of things, not just say literacy and language, but also mass socio-emotional development. So we do have some important examples of folks putting different curricula together and having success with doing so. And I'm gonna talk specifically about one of the places where we've seen success and that's in the Boston Pre-K program. And I, with a lot of folks in this room actually have been investigating this model. Hero Howard, students, Shayna Rochester's, who's in the back, Anna Shapiro, she's here, Sonia Zaidi, Becky Unterman, who's here from California, and Howard too. So Boston is an interesting model because it's not one that was tightly controlled by researchers or by these curriculum developers. It's a district that looked at the literature and said, what are we gonna do to improve our model and sort of ran this improvement system themselves. So to take you through this, at the beginning of the program's history in 2005, when they began the program, they made really strong structural quality investments. So teachers were paid on the same scale as K-12 teachers and they were subject to the same educational requirements as K-12 teachers, including a master's degree within five years. Those are fairly rare features, particularly in 2005 within our systems. However, in 2006, when the quality of the program was measured by an outside group, there was a finding that the instructional quality was pretty low. It looked like other places that we have nationally now and the headline on the front page of the Boston Globe is what you see on the slide up there. Boston preschools falling far short of goals hobbled by mediocre instruction, which is a pretty scary headline on the front page of your hometown paper if you're a new pre-K program. So they took those finding and moved forward with them. So they put into place proven language literacy and math curricula that they combine for teachers through teachers' teaching guides and they developed a coaching system in which there was a coach coming in to watch the teacher's instruction on a weekly to bi-weekly basis across the district and to give them feedback that was supportive and not punitive. And when the research teams that I've been a part of came in and we did work on this model, what we saw that it was that within these two years after implementing of making the switch to the system, that it had the highest instructional quality we've seen nationally in a large scale system and that it had impressive child impacts. So I'll show you some of these. Howard, I already explained to you what an effect size is, which was very nice. So these are the impacts on the domains that were directly targeted by this model. And so the impacts for vocabulary and math, in particular the largest we've seen in a public pre-K program that's large scale and the early reading one as well around alphabet knowledge and that kind of thing was a large impact. We also saw spillover of the impacts onto other domains, particularly executive function skills. So this was not directly targeted by the model but it's one kid, one brain and these things are linked. And so we saw some evidence that there was spillover here. We also saw that the children who particularly benefited for the model were children who were low income and children who were Latino, but everybody benefited from the model. And we also found that two thirds of our control group weren't just home with their parents. They were actually in other preschool programs around the city. So this is a pretty strong counterfactual relative to some of the other preschool programs that we have where maybe the options in the control group are not quite as robust. So as we look nationally though and think about what works, we don't see too many places making the decisions that Boston made. So nationally most programs aren't using domain specific evidence based curricula. They're using whole child curricula. And so I pulled out the most popular choice which is creative curricula. And this is its effectiveness rating in the what works clearing house. So for these important skills, mathematics or a language, phonological awareness, print knowledge, the effectiveness rating is zero. That's in contrast to building blocks which is the math curricula that's used in Boston and some other large scale systems where you see an improvement index of 36 percentile points and an effectiveness rating at the most positive end of that scale. For coaching, again, this isn't a practice that's particularly widespread. We don't have great data on this but it's not something that is really commonly implemented within large scale programs although that is changing somewhat. And the history of why that is is not particularly definitive but it's probably due to the fact that some of these curricula are newer. The ones that we have a stronger evidence based for takes time for people to pick up on the latest thing. They, we also have some requirements in some systems that teachers and systems have to pick a curriculum that covers every child domain which is gonna lead you to one of those whole child curricula. And we also have some programs that require you to collect data on students who are in your program and those tend to be tied often to whole child curricula. So if you're a district and you have to buy something maybe it's better to buy one thing than two things and try to integrate them yourself. And so we do have a lot of unanswered questions and thinking about best advice for where we should go forward and which model within the programs that we have. And some of those questions, I think I could have about five slides on what we don't know but just to call it a couple of things we need to, we don't understand entirely how folks are making these curriculum and PD decisions at the pre-K level. We also have a new promising curricula with Nell Duke and Doug Clements and others in which domain specific experts have come together to make one curricula so that you're not tasked with cobbling together a bunch of different curricula but it hasn't been tested yet. We also don't know really how to best sequence things from preschool to third grade in ways that recognize that preschool is part of a pipeline and just the beginning of the education system and to think about how to sequence it appropriately. But I think with all of those unknowns and given where we are in the literature right now one thing that I'm thinking a lot about is the potential to work within the Every Student Succeed Act which is our new federal education law to potentially nudge localities to adopt evidence-based curricula and coaching. So within that we have for the first time a definition of what evidence-based means and some incentives that are intended to help nudge folks to using things with higher evidence bases. I think there's a lot of work to still be done to see what the policy will look like on the ground level and some negotiations that are happening around the rules but that is one thing that as we go forward we may see some movement towards the models that have more evidence. We also have a lot of folks working on active ingredients in preschool for time, I won't go into that that will help us in five years have better answers about this what works question. And I think nationally we are seeing a shift that's hopeful around not just talking about whether we should have preschool or not but at the same time talking about what it should look like which is a very important question. It's not enough to just have access but you have to have access and quality at the same time. So thank you for the questions. Okay, so that was two total lead ins into what I was gonna talk about which will help. I'm gonna talk about some recent policy efforts to create some of the quality changes that Chris was talking about. So as Chris kind of mentioned, we've moved from should we have access, should we provide preschool to more around the quality and that was kind of my starting point as well. So this first figure I'm showing you is the preschool enrollment of children who are three to five in the United States and you can see that it's been sort of the blue bar on the top is the national trends in school enrollment for young kids. So this is not kindergarten but any sort of preschool type experience and you can see it's been rising very rapidly since the 80s but something to note that purple line kind of went up until the mid 90s and then has been pretty flat. That's the private enrollment in preschool. So parents paying for their kids to go to childcare and that green line is kids enrolled in public preschool. So a really large public investment in providing preschool for kids and really going to some kind of non-parental care is very, very commonplace for kids in our country right now. So those first three bars on the left are children three to five and this is from 2012 and you can see that about 80% of kids are in non-parental care between the ages of three and five and 60% of them are in something like a childcare center. So these are three to five year old and there's another 12% who are being taken care of by a non-relative in some sort of home-based setting. And even if you look all the way in the right which is all children zero to five, there's lots of kids going to childcare on a regular basis. So even including infants, 60% of kids are in non-parental care and about 50% of them are in non-relative care. So basically going to preschool and public support for going to preschool is basically a thing now. It's happening and the support for it is high. Where we're moving now is more of a discussion around what that should look like and how do we ensure that the quality is in place so that these preschool investments lead to the kind of positive benefits that people tout around early childhood. So we know from some of the studies that Chris discussed that high quality early childhood experiences can be linked to a host of kind of positive benefits, both short and very long term. But a lot of the programs that people are attending today are mediocre to not very good, particularly for kids in low income communities. And as Howard pointed out, the variation is really tremendous. And the variation that Howard and Chris talked about so far has been really around variation within the Head Start program or variation within the Pre-K program. But there's also a tremendous variation which I'll talk about sort of across the entire sector, which also includes licensed childcare centers that are generally of a much lower quality than either of the kind of large publicly funded programs. And especially because of some of the results of the Head Start impact study that have suggested fade out and recently have a study of a large-scale preschool program in Tennessee which suggested that by third grade, the kids who went to preschool were not doing any better than the kids who didn't. There's been a lot of talk about, well how do we take this idea of high quality preschool to scale and how do we make it work and the focus has really been on quality. So to give you a sense of the fragmentation, these are the rules for teacher education across state Pre-K and Head Start which are the more highly regulated relative to the programs with less regulation. So in family childcare home, this is a person who's taking care of children in their house. Only 18 states require that the person leading this have a high school diploma. And even when you look at private childcare centers, 36 states require a high school diploma or and no states require kind of an associate's degree or a bachelor's degree. So this is a very low level of education required compared to a Head Start program where essentially nearly 100% of the teachers have a degree and 73% have a college education and in state Pre-K, 53% of the teachers have a requirement for a BA. And the regulations are pretty powerful. So in 1997, if you looked at Head Start, only a third of the teachers had a degree at that time. And through two reauthorizations of Head Start, there were pieces in the legislation that said first by 2003, we need 50% of Head Start teachers to have an associate's degree. Then by 2013, we need 50% to have a bachelor's degree. The regulation was in place and you can see that today all the teachers in Head Start do have that education level but it required a large investment. And despite that investment within one sector, there's still huge differences across the sectors in what kids experience. So this is just one example. This is teachers' years of education across sectors and those purple bars on the left there are the teachers that are working with two-year-olds. And so you can see that the first one is formal care. So this is taking your child to some sort of center-based care and there you have teachers who on average have one more year after high school and in the home settings, it's just basically a high school diploma. If you look at the formal sector, quite a bit more education when you're looking at the teachers of four-year-olds than two-year-olds. So the teachers of two-year-olds have about one year post-high school whereas the teachers of four-year-olds have three years post-high school. And then finally, those blue bars on the end is the variation across the sectors within formal. So these are the programs that four-year-olds attend. There's private centers, Head Start and Pre-K and you can see that the Pre-K teachers, which are oftentimes linked to the public schools and have the same kind of requirements as the K-12 system, have substantially more education. And this is a very similar picture talking about, do you have a degree in early childhood? And again, if you're looking at the experiences of two-year-olds in this country, even in the formal sector, only 20% of the teachers working with young kids have a degree, going to the four-year-olds, about 60% have. So that's a huge disparity in the kind of person who is the educational level of a person working with toddlers versus four-year-olds and again, you see that centers, the private childcare center has much lower levels of education relative to the Pre-K programs. Okay, so accountability has come up as a strategy to address some of these quality programs and what do we mean by accountability? Basically, the idea is to create a set of quality standards that go across these sectors and say, here's what we mean when we talk about a high-quality program and measure it and provide both financial incentives and supports for programs to try to improve over time and disseminate the information to parents and other stakeholders, so hopefully they can make a decision to select a care that has higher level of quality. The Race to the Top Early Learning Challenge expanded interest in these programs by requiring that in order to get the money, which was $1 billion that has been distributed for early childhood programs, you needed to design and implement a tiered quality rating system and today 40 states have them, most of which started since 2011, so this is just a map of where these programs are located. You see 40 have them and the states that don't are working on it right now, so this is becoming statewide. And the idea is you measure quality, you provide these ratings, parents and providers respond and over time we're gonna work towards improved outcomes both because the centers will get better or because parents will opt for the higher quality programs and the lower quality ones will leave the market. Okay, so really quickly I just wanted to talk through four of what I think are the big central issues around whether these accountability systems are likely to have the desired impact in early childhood. One big sort of philosophical question is what should we be incentivizing in these early childhood systems? So if you think about K-12 accountability, what we have been incentivizing is test score gains and that is just not in the cards for an early childhood accountability system both because we don't wanna be testing zero to five year olds and because it's difficult and expensive and challenging to do that, but if not that, what should be the things that we are measuring? Should it be the structural kinds of quality measures that Chris measured? Should it be something about the quality of instruction and the trade off there is that the structural features are not very good predictors to kids learning. So knowing things about how the classroom won't necessarily tell you much about how much the kids are learning, the quality of instruction interactions is much better but also very expensive and time consuming to collect. What states actually are collecting is a lot and varied measures. So health and safety, curriculum, developmental screenings, family partnerships, professional development, education levels, ratios, environmental ratings and this is just kind of a smattering. States are basically containing collecting a ton of data so that brings me to the second question of so how do you combine all these things you're collecting into something meaningful that's gonna help you improve the system? Ideally you would wanna make a system where if you're gonna call something a three star program versus a four star program it's because the four star program facilitates something better, learning or some other outcome that we care about but we actually don't know that well the way to create a recipe of these many ingredients that we might think each of these things are individually important. We don't know exactly how to link them up all together to create what we want and certainly not in five little bins of this amount is better than that amount. So just as an example, this is Michigan's star rating system and you can see it's moving from a one star where the programs don't be much of the quality requirements to a five star when they meet all this is a little hard to see but within their program there's a bunch of different points assigned to different kinds of things like family partnerships the administration, the physical environment a lot of different items and they've recently done a study where they changed the point allocations to different things and it completely kind of overturned which programs were linked as high quality versus low quality so there's a lot of struggling around how to define the quality ratings and in a national study people took national data and tried to look at the different ways states are combining quality measures to predict for outcomes math, pre-reading, language and social skills and the findings were very striking the first finding was each of the individual pieces was not terribly good predictor of the outcomes they cared about so the staff quality, the ratios, the family partnerships and the environment did not predict to any of the outcomes that they cared about and really only the interactions mattered and then when they took a bunch of state models of how to combine these pieces into kind of an index they found that it was pretty much a smattering and not predictive at all of kids learning so there's this big puzzle around kind of if we're gonna collect this quality information how do we make it predictive things we care about a third big one is basically can these programs work do they create incentives that lead programs to improve and we do have kind of new research that suggests that they do that basically random assignment into getting a three versus a four star program incentivizes programs to try to improve and make quite a bit of change in quality over time so that's encouraging and the fourth one and then I'll wrap up has to do with whether it's really the case that parents can respond to these that given all their other constraints and will vote with their feet for the more high quality programs so to give you a sense the information is becoming much more common so preschools will note on their websites that they received a good rating and there's newspaper articles and there's a lot of kind of emphasis on trying to inform parents out of the ratings but there's no empirical evidence yet of the extent to which parents are responding to this information especially low income parents who have a lot to balance in selecting a childcare and have many kind of other features like hours transportation and services provided that might sort of trump the kind of quality measures that are being considered here but we do know that parents in general now tend to think of their childcare as being quite good so 74% of parents in a large study said that their childcare center was either perfect or excellent and so parents are not being particularly discerning about the quality and in work that I'm doing in Louisiana we saw that 80% of parents indicated that their center where their child was at was their top choice only two thirds never visited another center 40% never even considered another center and I think together that does suggest that there's potentially a really important informational asymmetry so that this information could come in and be very useful so to wrap up, scaled up preschool initiatives are really meeting a focus on quality and accountability initiatives are one way that has worked towards it I think there's a lot of potential there around reducing the fragmentation particularly with the childcare sector relative to the public preschools and pre-K but everything kind of relies on knowing how to measure quality and our understanding of exactly what to measure and what to rate is hard and in addition, without really focusing on supporting programs to improve like the I in the improvement systems they're unlikely to create the results that people seek. Let's stop there. Yes, great. So I actually think the sequence of these talks really seems pre-planned but it kind of was but so I'll be talking about this kind of from an eight year project in Chile what a long-term research project has shown us in that context around this really quite difficult struggle to improve the process quality that I think all three speakers talked about so before that this is a very collaborative project across Harvard University, NYU, in NGO, in Chile and local university there in Santiago. So the story around the United States is actually very similar to the one in low and middle income countries so there's been a lot of high expectations built up in the field around the long-term impacts of early childhood education but as much as the United States struggles with quality in low and middle income countries that struggle is perhaps even more magnified. The expectations are reflected in the past 20 years of research from the evaluation sciences and also neuroscience in the new sustainable development goal target 4.2 which is under education and learning which talks, which states that by 2030 the goal is to ensure access to quality early childhood development care and pre-primary education so that children are ready for primary education so you do see the word quality in there for the first time and early childhood development was represented in the 2000 to 2015 Millennium Development Goals only in terms of infant mortality and maternal mortality so this is in advance to think of beyond survival these issues of learning and development. On the other hand they raise the challenge of what quality means. So I'm actually, I think Chris and Daphna both covered this issue of quality so I'm just actually gonna skip over that slide but the context of this study is in Chile which is recently made it into the OECD and so it transitioned from being a middle income country to a high income country and yet showing a pattern of inequalities in school readiness and learning that are really actually quite similar to those in the United States and there's been a rapid expansion of early childhood access under the first Bachelet administration into through the Piñera administration and into the second Bachelet administration so that now for over 70% of four year olds are attending pre-primary education and they have a great structure that's similar to ours in the sense that five year olds attend kindergarten it's actually called kindergarten four year olds attend pre-kindergarten. So this is a project that started in 2006 and 2007 with a extensive stakeholder process around coming together around what would be the goal of a project in Chris's terms a preschool 2.0 project to improve the quality of primary education in Chile and so there was a wide stakeholder process piloting the actual setting of goals that in fact language and early pre-literacy skills would be a major focus of a effort to improve quality but secondary emphasis on health and on socio-emotional development and between 2008 and 2012 after a year of piloting and implementation we conducted the first school level RCT of educational improvement in the country of Chile with about 64 preschools about 2000 kids. What was the intervention? This was actually interesting because Chile at that time did not have evidence or curricula that really met the kinds of standards that Chris was talking about in terms of sequenced activities based on developmental evidence from that country. Instead what they asked and what was provided was a sense of what are the good instructional strategies to promote vocabulary development, oral comprehension and the kind of traditional focus on some early literacy kinds of skills. And so but at that point there was no ability to suggest for example how frequently teachers would do things like read books to children. There were suggestions on how to do interactive book reading but not how often to do it because that was not acceptable within the major systems of public preschool in Chile at that time. So this is a little bit like a coaching plus good instructional strategies but perhaps not curricula. This was the first test of coaching provided in twice a month to teachers with feedback and observation in the classroom. So what were the results? There were positive impacts and now you're familiar with the effect size metric so of between 0.4 and 0.8 on the class which is the most widely used observational quality measure for process quality in the United States. It is actually the monitoring instrument for Head Start. It is part of many of these QRIS systems and what we found for example was that it had exactly the same psychometric properties as it does in the United States and so it divides into these areas of emotional support, classroom organization and instructional support. And before you look at these bars what we found was exactly the same pattern as in the United States which is fairly good emotional support and classroom organization which is kind of like the organization of the routine of the classroom but much lower levels of instructional support and in fact a little bit lower than the average in many of these studies in the United States like in Head Start or in the 11 state pre-K study but these are the actual effect sizes of this intervention and so they look like they're really quite large by Howard's standards on classroom quality but why did they not produce then subsequent impacts on children's language outcomes which is the lower graph? Well if you look at the American evidence and our own study where we linked the class to child outcomes using the standard approach to doing this which is a lag model which folks want to get into the technicalities so we can talk about controlling for earlier child skills the relationship between the class and child learning outcomes is small which means that a one standard deviation increase in the class is generally associated with about a 0.10 to 0.12 ish improvement in child cognitive skills by the end of preschool so if you keep that in mind then that tells us something about the fact that you can get actually fairly robust effects on the class that are still not sufficient to drive statistically significant improvements or substantially meaningful effects now we do start seeing a 0.09 which is like the hint of an effect on the early measure of pre-literacy which is decoding understanding being able to identify letters and words we did do a follow up we did find that there were quite high rates of absenteeism from preschool and when we adjust for that and look at the impacts of this program on kids who are most likely to attend consistently we saw some more positive indication that the program was producing somewhat more robust positive language and pre-literacy effects for those kids who are most likely to attend consistently so average levels of absenteeism in Chile were about 23% on any given day and measuring on 15 kind of randomly selected days across the year any kid followed individually missed about that amount about a quarter of days so that's substantial absenteeism so what did children actually experience we were actually very interested in that and luckily what we did to observe classroom quality was we actually videotaped and so we have gone back to those videotapes again and again and again and they're a wonderful source of dissertations and one study by Cezana Mendeve well this is not a dissertation this was just a side study with an army of coders Cezana Mendeve at the Catolica University and Chris conducted a minute by minute video coding of both the targeted and non-targeted teaching strategies from this program across the experimental and control groups so we could actually look at experimental effects on the number of minutes of targeted instruction and the denominator is 80 minutes by the way of kind of the way the class works you kind of pick 20 minute segments four of them across a randomly selected preschool day and what we see here is that the number of minutes first of all in the control group is distressingly small just think of this as within a kind of an 80 minute denominator the average number of targeted which are good kind of language instruction strategies was about eight or nine minutes and before you get super distressed about Chile you can get distressed about the United States because this is actually not that different from the data on minutes of good language instruction in preschools in the United States so we're not too far off now this produced again significant increases but up to levels of about 12 or 13 minutes of good language instruction the good news was non-targeted things these are things like simply repeating syllables declined over time so what happened in the middle of the experiment was an opportunity to scale the program into another region of Chile and that was before the experimental results came out but we decided how do we scale and the approach that we picked because our health director Mary Catherine Arbor who leads this phase of work this is after the experiment had connections with the Institute on healthcare improvement which has developed a approach to improving healthcare systems at scale that has been used worldwide has reduced for example infant mortality nationwide in Ghana and this came out of the corporate world actually originally in the 1960s and 70s but it is about bringing stakeholder groups together to set quantifiable goals for quality improvement within a given intervention whether it's a healthcare system or in this case we applied it to the area of early childhood education for the first time so this actually if you've seen these kinds of cycles plan, do study act cycles are about a group coming together setting goals, setting quantifiable goals and then sharing information on the progress towards those goals and it really has a lot of links to other kinds of things like design thinking and I think some of these rapid cycle innovation models and so this was piloted in 14 schools with networks of teachers, principals, parents and teachers, aides and school leadership and the idea is that this group of 14 schools actually gets together once every two or three months first to set goals within the theory of change of this model and this approach in Buen Comienzo and they set goals for what they wanted to improve so for example, they knew that the average number of minutes within the main model was about 13 they'd heard this, they were actually given this information of language instructional strategies and they set the goal of let's get to 30 minutes per day of those instructional strategies and so let me well okay so these are technicalities that I'm not gonna go into so I'm gonna instead show you okay so these are 14 schools and we ended up comparing them to 49 schools that use the basic model but without this continuous quality improvement process so the setting of goals I'm gonna give an example of how this set of schools actually set a particular vocabulary based goal and this was to introduce one new vocabulary word per day with rotating strategies for incorporation of the new word the idea was to get beyond introducing new vocabulary with simply a definition okay here is a badger, badger is a kind of animal it looks like this and then just stopping or not even introducing a new vocabulary word the idea was to actually link that to what are other animals that look like a badger have you ever seen a badger? let's draw about, let's use multiple ways to link this to children's everyday experiences and to other kinds of words that are related in a kind of a conceptual network and so the idea was to try to build these more sophisticated vocabulary strategies and they developed a measure what is the measure to kind of track improvement in this? The number of kids within the classroom everyday who use this new vocabulary word with versus without an adult's help so what we want to see is spontaneous use of new vocabulary by children as an indicator of quality and this was measured in this kind of rapid cycle approach everyday by teachers they would fill these out and then share their Excel spreadsheets three months later and see where did this work and where did this not work and ultimately that approach is what we have started to evaluate because it turns out this continuous quality improvement strategy has actually not been evaluated causally and so we are starting to use quasi-experimental methods, propensity score methods to look at this, this is about I'm not gonna explain what this is but we're getting to the purpose of what propensity score is trying to do which is to bring a treatment and control comparison somewhat closer to an experiment and the good news is we are starting to see effects on these language outcomes looking like they're moving in the right direction so in effect size on vocabulary of about 0.31 comparing children who experienced this model with the continuous quality improvement to those in the model without continuous quality improvement so we think that this was an approach to develop buy-in in the new region for this model and teachers had a tremendous from focus groups very positive experience feeling supported and working with peers towards quality improvement and we are starting to see again language and literacy outcomes moving in the right direction there are limitations as in any study so looking forward to the discussion but the idea is that these forms of kind of quality improvement we need to be much more creative about thinking about how to improve systems that are already at scale such as early childhood education increasingly is around the world thanks very much well I'd like to thank all the panelists for some great setup remarks and reports of different results my name's Brian Jacob I'm the co-director of EPI welcome everybody I have the fun task of getting to ask questions I of course have some of my own but I also have lots that have come from the audience so to start out I wanted to ask the panelists to talk about what they think the goals of early childhood outcome should be early childhood education should be I think that maybe is an implicit assumption kind of underlying a lot of this is the assumption that we should be maximizing standardized math and reading scores receptive vocabulary so forth or other things and how does that interact with various measures of quality and how do you think about that so sure so I will jump in and then other folks please also so I think in terms of what should we be maximizing in preschool so I think this is a really great question I think one of the things that we don't want to do is just focus on these early academic skills right and I think most people agree with that it's not just about knowing the alphabet we need more focus on unconstrained skills so building language because what we know is that we're pretty good at teaching most kids to decode they've got that pretty much down by the end of third grade but most kids don't learn to comprehend in a way that puts them above proficiency levels in the test that we have and so I think focusing on those kinds of critical thinking skills and background knowledge and vocabulary is something that a lot of early childhood folks I think would get behind when you talk to them the practitioners on the ground but that when we walk into a classroom we often don't see nearly as much in the way of asking kids to solve problems in multiple ways and around pushing those kinds of rich conversations with children we have a lot of the time use studies are not that encouraging that we do have in early childhood classrooms so I agree with that I guess I think that one goal of the programs is to basically support families that have especially in targeted programs that have a lot of very complex set of circumstances have oftentimes single parents and who are dealing with a lot of issues simultaneously to give kids and families a safe place for kids to be a place where they can be engaged where they can have a lot of kind of the experiences that a lot that that my kids have on a daily basis at home so exposure to new experiences learning how to interact with other kids learning how to have challenging experiences in kind of a safe place learning how to talk to lots of new people so things like that and then I think that there's lots of skills that could fall into that but making kids comfortable in social situations and getting them to a place that when they are transitioning into a school setting they already know how to be in groups and they already know how to interact and make friends and basically learn is a key goal. I think the only big theme I would add is to think as we do about public education as a potential lever to reduce inequality now that's as difficult an issue for early childhood education as it is for public education so should we be surprised that we're running into the same issues that affect public education and primary and secondary and higher at levels I don't think we should be surprised but I think in a way the history of early childhood program evaluation started off with such a bang and such a kind of positive sense from these early studies like the Prairie Pre-School or the Epsidarium program that I think for some folks it feels perhaps like a harsh awakening that these systems issues, accountability issues what is quality and how to promote learning and development across multiple domains are, I mean they play out in a different way and in a even in a more fragmented system where in only certain places is there a move towards universality so as I think Daphna's presentation showed that the fragmentation across types of care is vast and another pattern that's coming from three or four studies using different methods in the Head Start Impact Study is that if you compare the effects of Head Start to kids who are staying at home that's when the impacts of Head Start are the most robust, uncognitive skills and that's an important point that many kids are in informal care settings many kids are at home, many kids are in centers but these systems themselves are not coordinated and have vastly different levels of supports if we're gonna think about these issues of inequality. Okay, another question coming from several folks in the audience with they wanted to hear more about Howard's cross-site variation so can you tell us what were some of the things that were kind of predicting cross-site variation? Well we don't, okay so here's it's very difficult to predict cross-site variation we actually will be doing, are doing some of it but fellow by name of Chris Walters who some of you know personally has looked at cross-site impact variation using the Head Start Impact Study and argues that he's found positive effects on cognitive outcomes of full day versus half day, positive effects on social-emotional outcomes of home visiting, more versus less home visiting and he also there's two studies that really point to the fact that the impacts of Head Start are much, much greater for kids who otherwise would have stayed at home than for kids who otherwise would have been in the center and the other study which looks, he looks at it sort of in one of his papers kind of in passing but the study that looks at it most closely is one by Abbey Fowler, Lindsay Page and others where they look at it very, very closely and they compare impacts of Head Start on the PPVT and particularly the receptive vocabulary measure so it's a cognitive thing and a social-emotional thing but they get a really big impacts on the PPVT for kids who otherwise, who if assigned to Head Start would have gone to Head Start and who if not assigned to Head Start would have gone, would have stayed at home, gets big impacts for them probably point three, I forget the exact numbers, affect size and gets virtually nothing for the kids who have assigned to Head Start would have gone to Head Start and if not assigned to Head Start would have gone to another center-based care so that's a really important theme and I'll just add one note of the impact variation findings as we try to think about them, one of the reasons they're hard to predict is that you're looking at Head Start centers across the United States in the Head Start Head Pack study and impact is a comparison of what was the outcome under Head Start versus what was the outcome under what the alternative was if you didn't get the offer of Head Start, okay? Now impact variation, what impacts are the difference between those two outcomes, impact variation are how that difference changes across this country and there are two sources to that variation, one is how does Head Start itself, the program Head Start vary across the country, that'll affect the way in which its impacts vary across the country but what most people aren't thinking about and aren't looking about as they're trying to interpret the results of this study, how do the effects, the effectiveness of the counterfactual or the alternative outcomes vary across the country and I think one of the reasons it's been so hard to predict the impact of Head Start is because you have to take into account both Head Start and its alternative and there aren't good measures on that at all in the Head Start impact study. Okay, great. I don't know that kind of, you know, combine kind of two or three themes that came up in some of the questions. One is kind of a big issue within the early childhood area of fade out and then how that relates to the early schooling experience, K3. So I guess maybe one part of that is, I assume we are hoping but do we have any evidence on whether there's less fade out in high quality, meaning high initial gain places and then second, just more broadly, are there ways that we need to think about restructuring K3, for example, that could interact with some of the early childhood work that's now happening. So I'll quote Greg Duncan's slide for those of you who were able to attend that talk where it said fade out is a mess. So I think it is a mess and that's a very good message around what we know about this. But as far as the gains lasting longer if you're in a higher quality program, I think a good example of this recently is that in the Tennessee program, which is lower quality compared to the Tulsa program, there is some remaining benefits of the end of third grade on math for the Tulsa program and for the Tennessee program that actually looks like the effects may be negative. And so based on the observational quality measures we have of those two contexts, we are seeing kind of a different pattern and we see from older studies that programs that on the basis of their inputs because we didn't have these observational quality measures. So that's sort of an important point that we don't know the instructional quality of the older studies the way we do now, that the ones with higher quality did maintain impacts on some measures of academic achievement along the way. So there is a pattern that's not clear. So again, it's the fade out as a mess is where we really are but I'd say there is some suggestion that fade out may be tempered by the size of the initial impact in particular. There has to be something to last. So you need a larger initial impact and so that comes from a higher quality program. Just to support that in a meta analysis that Greg and Catherine Magnusson and I and Hallow-Schemler have been involved on, the rate of fade out, which was about 0.02 effect size per year didn't interact with the initial immediate post-test effect. So I think that suggests that yes, the larger the initial boost, perhaps if the fade out rate is not different then it will simply take longer to get down to kind of virtual equivalence or convergence. There's many, many issues exactly like Howard said. You have to think about what the quote unquote treatment condition is and the fact that the control group in the United States now no longer receives quote unquote nothing. Like all kids are learning. No matter what context they're in and the vast array of settings that they're in in these comparison or control groups makes that question really quite difficult to think about. And then I think it is very important to think about what exactly is going on with instruction in these later years in kindergarten, first grade, second grade, third grade for kids who did not have that particular preschool experience and those who did. And so teachers could be doing all kinds of and we're not exactly sure how they're targeting certain aspects of instruction to certain subgroups of children, for example. So we, that's the part where it really is just a big area for many of you to write wonderful dissertations. I've been totally out of time for this. One more question. So, politically and policy-wise how do we make some of this happen? Kind of what are some of the most important changes, policy changes or resources or what had, what would you recommend as kind of in the top of your list that could be done to further some work that you think is useful? I think the quality rating system approach which I personally feel we shouldn't give up on it because I think if there are, if we do end up with more sensitive measures of quality that are also feasible to scale within large-scale monitoring systems, that would help with the fact that there is this information problem in the United States. I think we also need to create the kinds of messages around what quality means and its importance both for policy makers and the public. I feel like we have an effective message for whether to invest in early childhood, that the brain science, these messages around the value of early investment have really gotten through, I would say worldwide to increase investments in early childhood education. But if we're not going to kind of on the global side replicate the problems with universal access to primary education, which produce not a lot of great learning impacts, we need to message quality in some way and then that has to be linked to measurements of quality at scale that can be embedded within these monitoring systems and information-based policies. So I think that's a really good point and the other thing is I think that the very low quality of experiences very young children have is one kind of high-leverage area. I think the places that toddlers in this country are getting taken care of are of extremely low quality and I think that some of that has to do with the fragmentation, some of that has to do with the very low levels of regulations for family childcare homes where lots of toddlers are spending their time. So I think improving the quality, they're changing the regulations but also bringing some more access to more highly regulated centers for young kids is important and also helping parents navigate the hardship of kind of linking the complex systems and so parents are trying to do a lot and they need systems that work together and there's kind of a lot of challenges around finding a full day of care for your child and oftentimes the place that's gonna work for your life is not the one that's gonna work for your child's development and the fact that those things are so at odds I think is a real challenge. So trying to think about kind of especially women and women's work and single moms and how their lives can be supported in a way that also supports their kids is a really important piece. And I would just add that I am gonna be very curious to see ESSA hit the ground and to see where within that as the various rules get negotiated from state to state there's a big emphasis on state and local decision making and there'll probably be variation in the capacity that folks have to take on the flexibility that offers around adopting different models but I think that is something that is arrived that will be really interesting to watch around how those, that flexibility works or doesn't work in promoting quality. Okay, well, I'd like to thank our panelists for a great discussion and I hope to see you at the next event.