 Part of that, we have a public lecture series where we bring in some big names in education policy, and we are very pleased today to have Robert Pionanda, who is the Dean of the Curry School of Education at the University of Virginia, and also is the Director of the National Center for Research and Curriculum and Education in the Center for Advanced Study of Teaching and Learning. Robert Pionta got to start as a special education teacher many years ago, and most of his research, or at least recent research, is focused on understanding early childhood education, and in particular, recently understanding classroom teacher-student interactions within classrooms and understanding how these can be predictive of student learning gains, and then in some recent work, how we can actually, through Paul's intervention, professional development intervention, change some of these student and teacher interactions. He branched out from the early childhood arena, and now he's doing some work in elementary and secondary schools, and today he's going to talk about a collection of his work, and we're very pleased to have him here. So the format for those of you that don't know will be for about 50 minutes or an hour or so, and then we'll have about 20 minutes left for questions in the end, and then we'll have a reception out in the Great Hall where we can continue the conversation afterwards. So it's a great pleasure to welcome Paul Pionta. Thanks. Thanks, Brian. Okay, am I wired? Are we ready? Thank you very much. It's great to be here and be able to talk to you about the work that we've been doing for the last several years. What I'd like to offer is the possibility that you will think of this work not only on its own merits in relation to the problems that we're trying to tackle in preparing a better, in a sense, workforce of teachers of young kids, because most of it's going to be embedded around young kids, but as illustrative I think of what I think can happen when we set in for a long period of time to tackle complicated problems in education like what teachers do with kids in classrooms and does that matter, and chew on that for a fairly long period of time, longer than I wish in some days, and the fact that some results can come from that over the long haul. So I think that's a message to graduate students that it's worth it to hang around with these tough problems as you get going on them. So some of the problems that we have tried to address through this work first a very descriptive problem that is what actually are the experiences that are offered to kids in classrooms, what are the opportunities to learn at a level of a fairly large scale. So I'm going to talk about a very large number of classrooms that we observed across the country, and in a sense this is a question about the epidemiology of experiences and learning opportunities in classrooms. The question obviously of whether we're taking the stance that interactions matter, we need to test that stance in relation to child outcomes and whether the measures that we develop work in the ways that we would hope, and then can those assessments and those observations be leveraged for improvement of teaching and the effectiveness of teachers, and in a sense to be involved in the measurement evaluation and improvement of teacher quality at some level of scale through standardized observations. So the context for that last point really being, in some sense as we begin to watch the reauthorization of No Child Left Behind take place, what will be the metrics for accountability on the input end as well as on the output end. So we will continue to obviously I think assess children's outcomes, how will we define teacher quality and assess that as a set of inputs to educational processes. So I'm going to talk about results from two large scale observational studies. We did these observations in classrooms at some level by accident because we were involved in these studies and they ended up doing observations in classrooms. So the first one is the National Center for Early Development and Learning's Multi-State Pre-K Study, let me describe this just quickly. This was part of the initial, about 10 years ago, IES Early Childhood Research Center. The question of interest at that point was as state funded pre-K programs begin to scale out for kids, what's the quality of what's being offered in those classrooms and is there some association that can be gauged between the quality of what's being offered to kids and kids' outcomes. So are those, is there some connection there? The NICHD, that study involved 11 different states with programs selected randomly and then classrooms within programs selected randomly for observation. The NICHD Study of Early Child Care, a lot of you probably know about this and youth development is a longitudinal study of about 1,300 kids initiated at birth. Those kids, if you know the study, were observed at home and were served in childcare as they aged through those settings. When the children went to school, we continued observations in the classrooms that they went to and in a sense, this is a very typical sample of kids born in 1991 in the country and what we're seeing is what they ended up aging into as they entered school and observations in their first, third and fifth grade classrooms are included in that. If you add up all those observations, you end up with roughly 4,000 classrooms all across the country that were observed using fairly similar sets of standardized procedures and I'll talk about those in a moment, more than about 4,000 classrooms. It is the largest set that we can gauge of systematic and standardized observations in early education settings in the United States. A very important part of this is all the teachers that we went in to observe were credentialed or certified by the state to teach what it was that they were teaching when we showed up and that all those observations were done with the permission of the teacher and were scheduled on a day that the teacher indicated would be one of the more academic days so we didn't go in and observe during a lot of assemblies. Okay, we, as I said, we used two approaches to observing in classrooms, both of which were standardized. Everybody met rigorous criteria for reliability. We did drift tests as people went out into the field and all of these were conducted starting at the start of the school day and then running through a period of cycles throughout the day. Sometimes the observations were half-day, sometimes they were whole-day. At one level we went in and simply counted or actually time-sampled opportunities to learn that the children were exposed to in these different grades. So we would have roughly 44 different kinds of indicators of opportunity and behavior on the scales and then every 30 seconds the observer was marking which one of those opportunities, however many of those opportunities the child was exposed to. So this would be composed of things like what is the setting that the instruction is occurring in? Is it group setting? Is it individual seat work? Is it small group? It would also capture the content of what was being delivered in the instruction, literacy and language arts, math and science, history, social studies or the like. And then we observed whether the child was engaged. So this takes a very kind of discrete behavior approach to understanding opportunities to learn in classrooms. Just again, descriptively, a lot of these results are published in a paper that we published in Science in 2007. The vast majority of what we saw in terms of teachers' interactions with kids and instruction that was offered to kids occurred in context of whole group activities like this or individual seat work. 85% of children's instructional time is spent in either of those two contexts. So I think one of the things that we saw was the very, very low incidents or opportunities that occurred in the context of small group activities. We have this kind of myth that a lot of instruction in elementary school occurs in small group. About 3% of the time kids were offered that kind of instruction. We see very few interactions. Again, remember these are early childhood and early elementary classrooms. So lots of times we try to think of these as places where the teacher is engaged in. Lots of interactions with kids and is moving around the room. On average, during a typical hour, the typical child is exposed to about four interactions with the teacher. There's four occasions during which an interaction might occur between a teacher and an individual child. Almost all of what we see, this changes a little bit across the grades, but almost all of what we see is literacy in pre-K, in those state funded pre-K programs or at 54 months where we're in just child care observations for the NICHD study, we saw if instruction occurred, it occurred in broadly speaking literacy or language arts. As you move into first grade, observing in a first grade setting in a typical day all day long, we saw roughly 10 minutes of math occurring during a typical first grade day where you would see 40 minutes of literacy instruction. That begins to balance out by about fifth grade where we see about half and half, about a half hour of each occurring in fifth grade. The rule though is really, and I'm talking in terms of averages, but the rule is just nothing more than variation on top of variation on top of variation. So what we see is the entire range of codes on almost all of the codes that we see. So for example, on a day in which we were in a classroom with the instructions to the teacher that we want to see as much academics as could possibly be seen, we see plenty of classrooms lining up with zeros in terms of opportunities in any number of those content areas, even literacy. Going into a school in which we had multiple children in the same grade, the same first grade, let's say we would be in six first grade classrooms in a single school, those first grade classrooms would be operating roughly the same kind of curriculum across those three classrooms. We would go in and see stunningly different ways in which that curriculum was implemented from classroom to classroom to classroom, both in terms of time and the quality of implementation. So the rule really is this notion of variation, and that variation is consistent from pre-K all the way up to fifth. So we don't see the fact that classrooms become sort of more uniform as kids age up. And the other piece from a measurement standpoint is that when we used these kinds of metrics, so these time-sampled metrics in which we were figuring out the percentage of time in which kids were engaged in whole group instruction or were offered math or were offered literacy, we see very little prediction from those particular measures to gains in kids' outcomes that we would assess over the course of the year using a standardized assessment such as the Woodcock Johnson. So these kinds of metrics didn't seem to be purchasing us much in relation to prediction of outcomes. The only thing that we counted that actually predicted to achievement gains was the amount of math that the kid was exposed to. And I think that's quite frankly because any math is better than no math, and most of what we saw was very, very little math. And so you began to see some purchase on math. The other way that we went into these classrooms was to rate the qualities of interactions between teachers and kids. And I'm going to focus mostly on this for the rest of the talk. We developed this assessment called the Classroom Assessment Scoring System that we use across these pre-k to fifth grade classrooms, which was largely derived from a more developmentally oriented analysis of settings and their impacts on kids in relation to outcomes of achievement, social competence, and things like engagement. What we do, if you were to look at, so for example, the developmental psychology literature and parenting, you see lots of use of global rating scales of dimensions of interaction. And actually, even in the early childhood world, prior observational measures that have been used at some level of scale, like the early childhood environment rating scale, also does this kind of more molar interaction rating. So we developed a system where classrooms are rated on ten dimensions of interactions across three different domains. We, in some sense, think that this is kind of a theoretical claim about the organization of classrooms in which these latent domains, emotional support, organization and management, and instructional support are really the ways in which classrooms are organized, where interaction in classrooms is organized, and actually we find a fair amount of evidence for the fact that when we look and factor these ten different dimensions at those different grades, we find these pretty good evidence for this factor structure, this three domain structure working across all those grades. The way this plays out in a more, the way we train people and code things is really depicted better, I think, in terms of this slide. So we see the domains here, these larger domains that organize interactions in classrooms at the top level, at the most molar level, and then at this level here, these are actually the scores that are derived on the class. So these depict seven-point rating scales of dimensions of teacher-child interaction. Positive climate, negative climate, teacher sensitivity, teacher's regard for kids' perspective, effective behavior management, productivity, which has to do with the management of time, and the degree to which the teacher provides instructional learning formats that are likely to engage kids. Is it organized? Is it planful? Then in instructional support, we're really talking about the teacher's interactions with the kids as they stimulate concept development and higher-order thinking, the kind of quality of feedback, are there loops, feedback loops that actually occur as teachers engage with kids or as teachers just say yes, no, and move on to the next kid, and then the extent to which the teacher's model and engage in or a language behavior with kids that are likely to stimulate their conversation skills. Those dimensions are all then articulated and defined in terms of behavioral indicators that are then anchored to actual descriptions of behaviors at a one, a three, a five, and a seven on the actual scale points themselves. When someone does a rating, they're looking at behaviors such as the indicators such as these that are actually articulated at low and high levels. This is important because there's a high degree of specificity in the description of behaviors that becomes important later when we talk about professional development based on this tool. So that's kind of the overall organizational structure. When we go into measure quality, these are just smooth histograms across those 4,000 classrooms. And again, we don't find a whole lot of differences across grades actually in these dimensions. So you find that for the most part classrooms are fairly positive social places to be. Kids are reasonably busy although there's some variation. But then when you look at what teachers are doing to provide feedback to kids and is the feedback of the sort that would actually elicit a more complicated performance and offer the teacher the opportunity to then give even more feedback to the child tied to their performance, you see very, very low ratings on average at two. So what this means is that the modal type of feedback on performance that a child gets in a classroom is really about correct, incorrect, let me move to the next kid, correct, incorrect, rather than kind of eliciting performance and then commenting on performance. So this again comes back to that notion of just basic description of the environment in classrooms. Doesn't tell us much at all about whether these dimensions actually predict the outcomes, which would be an important thing if we actually want to take this to the bank and then start saying that we're measuring something that matters and we should start to try to improve it. Just again, descriptively, if we were to take rather than do it sort of the distributional properties that I just showed you, here what we did was just subject the emotional and instructional domains, those scales in those domains to cluster analysis procedures. And moving through that, you see roughly speaking about 20% of classrooms on average having a score of a one or a two on both of those clusters, those domains. So that would be, you know, if we averaged the scales within those domains, they would be coming up that low, about 17% of first grade classrooms looking like that. And then you see also the fairly low numbers and averages, if you will, for high percentages of classrooms on these instructional dimensions overall. These are first grade classrooms. So essentially this is saying almost 20% of first grade classrooms are the kind of classrooms that we really would think would not have any impact presumably on kids learning and be fairly negative places to be. Okay. But then the question really is, do these elements, this variation in these observational elements, are they predicting to children's learning gains? I can describe a couple of different studies that we've done to validate these both as they are predicted by structural and selection factors and then predict learning gains. So as I said before, we see this exceptional variability within and across grades. If we were just to take in the NICHD sample where we're watching kids, the same kids, first, third and fifth grade, and divide the classroom quality, if you will, based on those distributions, top third, middle third, low third, we would see that basically there's very, very little stability across grades for a given child. So the likelihood that a child is going to be in a high quality classroom across those three years is fairly low, 15%. It's also low for being in a low quality classroom. So there's a lot of churn is essentially what we see. You're likely to bounce around in the distribution quite a bit. We see almost no association between those ratings and the kind of structural features that often are the drivers for trying to change teachers or we think might change kid outcomes. So teacher's experience doesn't matter. The more experienced teachers are just as variable as the less experienced teachers are in these samples. More training doesn't seem to matter. Master's degree more classes doesn't seem to translate to more effectiveness in the classroom as least as we're observing it here and salary wasn't related. We do see some small associations. Again, these are just correlations and adjusted regression coefficients, but they're not large. So in class size, for example, we see that larger classes, and this is coming out of a spline regression model, larger classes which in this case are more than 18 kids are more structured and somewhat more rigid in the observations that we make. So they're less positive in some sense emotionally. And the kids in less than 15, not surprisingly, we find more higher scores on social dimensions and higher instructional quality. Family and income and education are only modestly related to these. We don't see really big effects on, again, only about 0.2, 0.1, 0.2 in terms of variation in family income predicting the kind of quality that kids are getting as we observe quality. That was much lower than we expected. We expected to find much bigger income differences. However, if you look at low achieving kids, who are scored below a standard deviation or lower on the Woodcock Johnson before they get to school. So these are kids that we're seeing are likely to be low achievers once they get to school. We find that those kids, only about 10% of them, are going to get access to stably high quality instruction once they get into school. So when you use not income but achievement for kids, we see that kind of selection. Okay, let's look now at sort of the prediction to outcomes then. Here we're looking at a number of studies. I'm going to just summarize quickly results from a number of studies where the designs that are used, you know, there's essentially a pretest that's given to the child at the beginning of the year. The observations are occurred and there's a post test at the end. All those pretests and post tests are standardized tests. In this particular case, they're all the Woodcock Johnson or the PPVT. So we're looking not necessarily at individual growth because we don't have three points in time, but we're looking at mostly averages here. I'm looking for the thing. Okay, and then we're controlling for family and demographic factors, kids' prior performance and structural features of schooling as sort of covariate blocks. And then we're looking at the degree to which if you put into the sort of a standard regression framework these instructional and emotional quality elements of the classroom, are they predicting to more positive achievement in social outcomes? Generally, we find small effects. We find larger effects on more proximal outcomes such as child's engagement. And as I said, we see some evidence that a little more instructional literacy early on, but typically math is the more consistent finding, predicts to those outcomes as well. So we're seeing, again, these somewhat smaller main effects. And then we see stronger effects as we look for interactions with various features of kids coming in, different groups of kids coming into school. So we see stronger effects for both emotional and instructional quality that kids are exposed to for kids coming from low maternal education backgrounds, adjustment problems in kindergarten and poor kids. And let me describe for a little bit what we, a couple more detailed studies that go into a little, drill in a little bit more on these. So one of the questions that we've been wrestling with in the pre-K world, this is a big question in policy, is the degree to which different quality metrics translate into bigger outcomes for kids. So if you track, for example, the lot of the debates in the state pre-K world as these programs are being scaled up, it's very expensive to have in the program a teacher with a bachelor's degree or master's degree, and many of the programs are scaling up with teachers who have an associate's degree. And so the question is, should teachers have a master's degree or a bachelor's degree in order for us to ensure that those are high quality classrooms in which kids are going to learn? And there's a variety of these kind of questions that sit within this, you know, how should we focus attention? Most governors that are trying to convince their state legislators to invest in these programs are convincing them on the basis of the fact that they'll make sure these are high quality programs because these structural features are in place. The teachers will be qualified, there'll be a certain ratio of kids in the classroom, the classroom will be running an effective curriculum, and the like. And so those structural features fall oftentimes, you know, in these kind of variables or can be composited in an index that different organizations tend to support. And in this particular case, we are going to actually test the combination of these structural features as they're composited in this index that's promoted by the National Institute of Early Education Research, which is a nine point index of is the teacher qualified and what's the group size and those kinds of things. These are the things that form the drivers for most state policy. We're going to compare that near index with observed interactions using the early childhood environment rating scale, which actually is a standardized observation of these settings. And the class, which is just looking at interactions. The Eckers actually looks at physical aspects of the physical environment as well. We find there's, again, we're looking at gains in kids scores from the beginning of pre-K to the end. We find no association of these structural elements with any of these outcomes, either singly, so like teacher's education, or when we combine them together. So the near index, if you have nine points on the near index, that's not showing any better return to kids' achievement gains than a three on the near index. What we do find is that instruction and emotional supports are predicting small effects, these gains that actually in achievement that do persist into kindergarten. So we find also some evidence that the gains made in pre-K, in pre-K classrooms that are higher, rated more highly on these particular dimensions actually last in the kindergarten. Another way of looking at this, I don't have the near index over here, but if the near index was over here as one of the inputs, there would be no checks. But you see here the consistency of effects, significant effects on gains in these different outcomes here using the class elements. Okay, let's shift to first grade now. So we see some evidence in pre-K then of this notion that interactions matter and that one can assess those interactions in a standardized fashion and that can do that in a large number of classrooms. Here we're looking at gains in achievement. This is from the NICHD study, a really childcare study, where we're adjusting Woodcock Johnson scores in the spring on the basis of prior Woodcock Johnson scores. We're in the literacy area and we're looking at 1,300 kids broken into two groups. One group where moms are classified as having higher levels of education, college and above, and one group in which moms are classified as having lower levels of education, in this case below college and further rundown. And we find here, just with that very crude cut, this is what happens. We go into classrooms and rate them low on instructional support, moderate and high, and these are tert-sily, these are determined using tert-siles on that distribution, so low is really pretty low and high is not really high on the 7-point scale. But we find that four kids coming from these two backgrounds, you see what happens here in terms of their gains in literacy scores over the course of the year, where these kids from the high education backgrounds are making gains despite being in a low, a classroom rated low on instructional support, and these kids are making much smaller gains, whereas both groups of kids in classrooms rated as high on instructional support are showing equivalent learning gains on this, or at least differences fall to spring, on average in these kinds of classrooms. Okay, let's look at the similar frame. We're still looking at literacy gains here, adjusted in the way I described, but here we're looking at kids who come out of kindergarten where their teachers rated them right at the end of the kindergarten years having no adjustment problems or their teachers rated them as having multiple problems. That means they're kids who don't pay attention, having difficulty learning, a variety of different kinds of problems reflecting adjustment in the classroom. These are what happens when they sort themselves into classrooms low, moderate, rated low, moderate, and high on emotional support, and you find here, again, the differences in learning gains for kids in low and moderate supports and the fact that they're making equivalent kinds of gains in high supports. This is controlling for all the things we have in the NICHD study of early childcare data set prior to these kids going to school, which is all sorts of information about family process, family backgrounds. These are not obviously randomized, kids aren't randomized into these conditions, but we're trying to control for as much of the elements that would predict the kind of classroom that they go into and predict their learning at the start of school. Okay, so we see, again, some evidence perhaps that interactions matter in these classrooms. We're working a lot, though, on a bunch of measurement issues as well. I just want to describe this before we move into the next phase of what we're doing. So we're seeing some evidence, I guess, from this to suggest that we can go in, we can measure at a fairly large scale observationally using standardized metrics that are predicting, at least in modest ways, to kids' learning gains. And so part of what the work we're doing is trying to refine some of the measurement challenges. So we're developing an extension that works up into sixth grade and sort of informant versions to see whether they work. We're also working with Steve Raidenbush and Howard Bloom on this kind of eco-metrics approach. So we're trying to cross as many radars with as many days and as many cycles of rating as we can to try to determine various sources of variance and to begin to understand the different ways in which radar and time of day, the different day, the season, alternative units of analysis factor into the scores that we're getting. This has actually become very, very interesting for us substantively because some of the things we're learning, for example, is that we're watching quality drop significantly over the course of a day. So a kid's experience within a day goes down over the course of the day. This is not surprising. Most of us have spent any time in an elementary school. The end of the day is not the most fun time to be there, but we notice this happen. Then we also see, because many of these observations we have at different days across the year and when you've got 4,000 classrooms, you've got a lot of classrooms on a lot of days across the year. So one of the things we find is an incredible drop in the last month of the school year to the fact that even the best classrooms about a month out start tailing off considerably. So when you think about policies and related to time and extending the school year and things like that, these become, I think, kind of interesting, these kind of analyses. And we're also finding pretty consistently that these global features as we assess them are more stable in the fact of capturing between teacher differences that we can locate stably and reliably as between teacher effects and not as kind of subject as much to the fluctuations of time of day. So if we were to conduct the G study analyses, the generalized ability study analyses with the codes of discrete behaviors that were time sampling before, we find incredible variation across the day that is far more problematic and adds a lot more noise to the system. So we're learning quite a bit from the measurement standpoint about things that might improve our capacity to do this work at a higher level of scale. And then we're also doing a bunch of work in relation to hypotheses about the importance of teacher's content knowledge. Okay, but given that, let me just summarize a little bit of what I think the implications are and then go on to what we're doing currently. So I think there's some evidence here that we can observe at reasonable scale interactions that are predictive of student outcomes, recognizing the limitations of those predictions and the designs that they're derived from. I think we have some purchase on the possibility of being able to define teacher's quality in relation to teacher's performance in the classroom as opposed to just simply teacher's production of gains and test scores. Okay, so that's an interesting policy question there. It may have some implications. We see that on average these interactions are of low quality, particularly on the instructional dimension. So there's a lot of implications for how we want to move that up. Remember how frequently instructional support popped up on that pre-case slide as predictive of achievement gains. So if it's that consistently related, if you want those classrooms that we're investing a lot of money in right now to be closing the achievement gap by the time kids go to school and the average quality in those classrooms is a 2, you've got a lot of room to move on that dimension that seems to matter considerably for kids' learning. So the other thing that happens here is that we also now have the possibility of saying we can observe in a standardized way what it is that teachers are doing that seems to matter for kids' achievement. We can capture that visually. We can see it. Okay, if you can see it, then maybe you can train teachers to do it. And so the idea is can that become the target for professional development for teachers and can we build a set of systems that would actually produce that? And what I will describe to you is our efforts in a moment to do that. In a sense what we've tried to do is map backward, is to say if this is teachers, if this is effective behavior for teachers in the classroom, then what would be the kind of supports that would need to be in place to be able to produce that? And so I'll describe that in a moment which we've tested experimentally. So in a sense what we're trying to do is approach these goals with a very systematic and sort of scientifically validated approach to observation at the core in a sense trying to build a science of teaching and teacher training that relies on that. So just another way of looking at this. If you're in the policy world, you're interested in these things I think to some degree or in the teacher education world too. As they might relate to these things, I think we can start thinking about filling in the black box that we have struggled to fill in to some degree over the last several years and that we can begin to look at the extent to which what happens in the classroom might be a mediator of some of those inputs and we'll describe for a moment what we're doing in relation to professional development in that regard. We've developed over the course of the last five years an approach to promoting more effective interactions between teachers and kids that we call My Teaching Partner. And our focus on My Teaching Partner has been entirely around how do you improve interactions between teachers and kids in the classroom. So we've tried to not get too stuck in the issue of whether the teachers need to have a curriculum. There's no question that there has to be some sort of curriculum in the classroom. But what we've tried to do is to say that teacher-child interactions are really the medium in which curricula are implemented and the bane of every person who tries to evaluate a curriculum because you end up with all this variation. So the point is can we actually focus on that variation as the thing that we want to change and be improved. We use the class as the means for defining this and as a target of professional development. And so these are some of the things that when I talk about backward mapping these are some of the skills that we think we could build professional development to be able to do that might actually translate into higher scores on the class for teachers. If we could increase teachers' observation skills by identifying interactive behaviors and cues using the class as a lens. So if I talk about it's important for teachers to be sensitive in their interaction and I want to improve teachers' ability to do that I need to train them in being able to see that and define that in their practice and in others' practice. So identifying that would be important. We also work hard on helping teachers identify how kids respond differentially to teacher behavior. So again creating this notion that the kids behavior is interdependent upon your behavior and that there's a loop here that you need to pay attention to and opportunity to learn is created in that looping that goes back and forth and that we also try to increase teachers' skills in a sense to identify alternative responses to kids' cues. I'm going to describe results from a project in which 240 teachers across the state of Virginia were randomized into three different conditions. At the base level a group of teachers only got a set of materials which are essentially lesson plans in literacy and language development. Then teachers got that plus access to a website that I'll show in a moment that had video clips using the class to define high quality practice and then teachers got what we call a consultancy which is a coaching model that was laid on top of the website access in the curriculum. So in the website intervention teachers would be able to go to the My Teaching Partner video library and let's say they were interested, this is completely on their own, they were interested in working on how they could improve their behavior management and they might click on that behavior management tab and get access to about 20 different video clips that had examples of, this was for teacher sensitivity, that would have about a two minute clip of a teacher who we coded as a six or a seven on teacher sensitivity and we pulled out of that clip an example at the indicator level. Remember I talked about those indicators in the class system, there's domains and then scales and then indicators. We pulled out indicator level clips and described in very specific language what it was the teacher was doing that was the reason why we rated it as a six or a seven on sensitivity. So it's very specific language about the teacher's behavior. That's the website intervention. So teachers get to go to it if they're assigned to that condition. The consultation intervention involves the teacher videoing herself, sending us the video of her instruction. Remember we're only focusing on language and literacy here so she's sending us a lesson that she's videotaped herself delivering in language development or literacy. The consultant reviews and edits that video clip in a very standardized fashion that tries to again pull out from the teacher's example, the teacher's video three clips that show the teacher demonstrating effective teaching using one of the dimensions, not demonstrating effective teaching and then focusing very clearly on her instruction. So it's a very standardized way of editing these clips. Those clips are pulled out, edited and annotated much in the same way you just saw before and the teachers asked a question to respond to. That set of materials is posted on the teacher's private website. The teacher goes to the website anytime she can within this two-week cycle. She looks at it, she writes back and then the teacher and consultant meet over the phone. All of this is done on the net. None of these consultants traveled to go to any of these teachers classrooms all across the state of Virginia and we had several classrooms that were hundreds of miles away. So this is all mediated through the internet. So again, one of the things we're trying to do is see can you develop something and actually push it to a point where it might be scalable and we thought the net might be a way to do that. This would be part of what the teacher would see if she logged on to her private website. So here, this is the first prompt. It's called nice work. So this teacher is working on teacher sensitivity. Her consultant writes, when teachers anticipate and respond to students' academic, emotional, and social needs they demonstrate sensitivity. What you see yourself doing in this clip that reflects your understanding of the difficulty the students may have in writing their personal narratives. So that's the lesson. That's the prompt. The teacher writes here and the consultant has also inserted a link, a hot link that will take the teacher to the video library and show her examples of sensitivity. This consultancy cycle repeats itself every two weeks over the course of the year. So let me describe to you some of the results from this intervention study then. What we find is that when we examine effects, now we're just going to look at effects on whether these forms of professional development changed teacher-child interaction, and then we'll look at effects on kids' outcomes. We've looked at this in a couple different ways. We've done effects of condition on outcomes, so web versus consultation. We've looked at treatment on the treated, and we've looked at some moderation in regard to whether these supports are more or less important on the basis of different classroom demands. We find that teachers who receive consultation are showing significantly greater increases over the course of the year and the quality of their instructional interactions. I'll show you these graphs in a moment. We see early career teachers who only have access to the website. So if you're randomized into the website and you're an early career teacher, we find that those resources matter for you. They show gains in interactions with kids. And as we've begun to try to unpack this, so we have all this way of coding what the teacher's doing while she's on the web. Is she looking at herself? Is she looking at other teachers? And we see that the time you spend on the website, looking at yourself, filling out those consultancy prompts really seem to be an engine, if you will, of change that we're observing. And we find the consultation moderates the poverty effect that I'll show you in a moment. So these are just graphs that show changes in sensitivity for teachers in the consultation condition and web-only conditions over the course of the year. Those are just months across the bottom. We've got about a one-point gain in sensitivity. I didn't go through this before, but actually, we've done threshold analyses of some of those validity coefficients to kid outcomes. And a one-point gain on class dimensions does, at least in the non-experimental study, and we'll see it here in a moment, does seem to be enough to push kid outcomes up a bit significantly. Is this relative to the control condition? This is relative to, this is consultation versus just the web. One of the challenges we had, a problem quite frankly that we had was we had these teachers sending us in videotapes, and the teachers in the materials group, we just didn't have them sent, we ran out of money basically, and we didn't have enough funds to have them be, so we didn't really have a true control condition here. Both of these teachers got, groups of teachers got some resources. So this is just two different levels of support randomly, teachers assigned randomly into both of them. Okay, so let me just point out that this one here we see for sensitivity, we see for language modeling, and we see for teachers' behavior management, so we're seeing these effects consistently across different dimensions. This is the effect on poverty. So what we did was take under the hypothesis essentially that teachers would need more support if they were teaching in more demanding circumstances, and we computed estimates of these effects for teachers who would be teaching with 100% of the classroom poor or 50% of the classroom poor, because we had variation in poverty in those classrooms, and so we're looking here at the degree to which the teachers in the consultation condition differ in the web in relation to poverty effects in the classroom, and here we see that both the consultation conditions are still showing this kind of gain. These are not different from one another according to the different poverty levels, but here you see what happens in the web only condition. So all you're getting is access to the website and you're teaching in a classroom in which all the kids are poor. You see essentially a decline across the year, that's actually a significant slope, and you see the big gap if you're teaching in a 100% poor classroom if you've got the consultation condition or you didn't get the consultation condition. So you see some evidence. Again, this is not randomly assigning kids to these kind of circumstances or teachers, so this is really... but adjusting for a variety of things going in, we're seeing some evidence that in high-demand classrooms the more support you get seems to matter even more than it would otherwise. Okay. When we look at effects of this kind of support on kid outcomes, here we are looking at all three groups, so the consultation group, the web group, and the activities-only group. We find again that when teachers participate in the consultation, kids are showing greater gains, tests of early literacy. We also see that teachers who use those activities and lesson plans more are also producing somewhat greater outcomes as well. This is just an SEM framework in which we're looking at the pre-test essentially in literacy, predicting the post-test, and then we're looking at the degree to which these groups' contrasts are significantly different, and we find that the consultation condition is significantly different than the improved over the materials condition. Small effects, very, very stable assessments over the course of the year. This is a treatment on the treated analysis where we're looking at these outcomes across the top, gains in these outcomes across the top over the course of the year. This is an HLM framework, so these are the kid characteristics here. Intervention usage really, components is at the level two, and here we're seeing effects, positive effects on a couple of different outcomes for the degree to which teachers were engaged in the consultation more than 20 hours in contrast to zero hours and more than 20 hours and less than 20 hours, and we see here's the effects for greater use of the language and literacy activities. So there's some evidence here of both use of a curriculum matters and getting coached and having some professional development support also matters for these kinds of outcomes that we're looking at. And then this just comes out of that analysis where we looked at whether there were interactions with teachers' experience, and we find here that for teachers grouped according to two, eight, and 14 years of experience, the teachers who received more or participated in more hours of the consultation had a greater impact for the teachers with less experience. Again, something we might expect given their possible need for coaching. Okay. So that gets us through kind of an in-service kind of tests of a set of supports, the consultation condition and the website condition. We finished that study and we thought we wish we knew a few things going into that study and we wish we did certain things differently and we wondered whether we could build a course that could be delivered in a college context that would produce changes in teachers' capacity to read cues, to know the class, to identify elements of effective interaction in their and other teachers' behaviors in video, in language and literacy context. So what we set about to do in this study was to conduct a randomized control trial of the effects of two different interventions. One was this course that we developed and one is the consultation. Teachers get this in a two-stage randomized trial where they're randomized into the course or no course and then they're subsequently randomized into consultation or no consultation. So this trial is actually being conducted right now. We've done an initial wave of course implementation and we are seeing some evidence. We haven't analyzed this completely and it's not a complete sample yet, but we're seeing changes in knowledge as we would expect it to. So the activities that we're running teachers through are changing them on some of the assessments of knowledge and sort of cue detection that we would have hoped to. So to conclude and then we can have some questions. So I think up to this point we're seeing some evidence that this approach to standardized observations of interactions in classrooms that teachers engage in may be feasible, reliable, and valid with respect to predicting children's learning gains. It may be a scalable language and or lens for classroom settings. We see the pretty good evidence that these three domains I didn't go into this study in a long detail, but when we do the kind of work that would confirm this factor structure at each different grade level, we find good evidence of consistency and fit. So maybe it is the case that good teaching is good teaching is good teaching, whether you're teaching in a pre-K or whether you're teaching in a fifth grade and that these dimensions of interaction that we're describing are applicable across those different grade levels, even though the behavioral indicators might change a little bit according to appropriate developmental definitions. We think that there's some evidence as well that these observations could be used in a sense as a lever for research on teacher professional development and preparation that might actually lead to increasing quality of what we observe out there in increasing kids' outcomes. Remember I said at the beginning, the frustration of what we're... I think what we're trying to do as we tackle... as we try to tackle this problem. Clearly I think as... if indeed things continue to play out as they have played out up to this point and we continue to subject these approaches to the kind of rigorous analysis we'd want to as it scales up into larger systems, we could see that these tools might have implications for accountability systems and definitions of teacher quality and for research on teacher education. So we might even envision that these observations might become the target outcome for what a teacher education program would define as effective teaching. So you might even envision a system in which if you believe the evidence that you might not license a teacher until they were above a certain level or you might use these observations in a teacher preparation program during the teacher's student teaching experience and say you have to get above a certain point or we won't pass you on to the next level of licensure. I mean you could imagine some of these kind of uses of it. Other people have imagined that you would use these kind of metrics to also assign teachers to certain professional development experiences and incent their participation in those. In the early childhood world, this idea of what we call quality rating and improvement systems is really beginning to take these kind of metrics and put them into systems in which classrooms are regularly monitored and assign a certain level of stars for certain levels of quality that they evince in those observations and that those stars then become publicized in certain ways so that people could purchase childcare with so many stars and not so many stars and the like. So I think there's a lot of research to be done there to know whether that's the right way to go but I think the implications are that we could at least do this as much with observations of what teachers do with kids in classrooms as we do with standardized tests as the metric for whether a classroom is an effective place. So we're seeing this currently used in a number of different teacher quality frameworks so the class is going to be used as part of the Head Start monitoring system nationally. This gives me shivers sometimes in terms of how this is all going to work. We see as I just mentioned before that Minnesota, Connecticut and Georgia, some others are using this as part of their QRIS quality rating and improvement systems. We and some other teacher preparation programs are starting to use these tools within the context of teacher prep and again you can begin to see whether if you can anchor a performance metric around some standardized measure and it's observationally based you can begin to build and test experimentally whether certain preparation experiences are producing that outcome and that's essentially the notion there and there have been some city school systems that have approached us. We haven't done this work yet that are considering these kinds of metrics in relation to tenure decisions and merit pay. This is what's happening essentially as we have seen this kind of filter up into the policy framework. So that's the conclusion of my talk but I'm glad to answer any questions for the time that we have remaining. Yes. Micro teaching approaches that teacher education used in the 1970s and left behind in favor of more constructivist or social constructivist ideas about learning. Can you understand how this approach is different than that and its attention to behavior and also how we can avoid seeing tools like this used to really selectively guide teacher training in the same way that we see teachers teaching to the tests that are used for the no child left behind Okay, so let me answer the latter question before I get to the former one. I would say that if there was strong evidence that what was being observed by this or any other tool so there are other observational tools around there was strong evidence and convincing evidence that those measures were related to children's achievement I would want teacher preparation programs to be training the teachers to that test because that would be a better indicator, the best indicator I can think of and the most defensible one for what a teacher's behavior is that's producing achievement so I would rather have that be the case than a case in which everybody does whatever they think is right and everybody has I think that's the problem we have quite frankly in teacher education I think that's probably why we don't see any evidence that teacher education matters a whole lot when you grind it around in large scale studies so the former question some of the theoretical basis how is this different? Well first off I think it's different from some of the constructivistic approaches in the tendency to try to quantify certain dimensions of interactions and in some sense it borrows a lot from that approach it borrows from both in some sense it borrows from the constructivistic framework that tends to pay attention to opportunity as it's created between the two of us as we interact with one another and a more molar kind of approach to that the children's learning emerges out of that on the other hand it says very clearly that teachers have a very direct role and intentional role to play in learning and then it borrows I think somewhat from the micro in the sense of saying you've got to define something if you want teachers to actually begin to do something you've got to define it at a level of specificity so you can see it and agree upon seeing it so I think it trades between both of those spaces but it's different than both other question yeah show that the teacher sensitivity decreased over the course of the year with the web only is that implying then that it's actually worse for the teachers to have just access to the web that have no access at all? I don't know I mean that was in the high poverty classrooms so that was in the 100% poverty classrooms that is a significant negative slope I think what that's saying is that if you don't give teachers much of a resource that the demands of the kids are going to overwhelm that teacher's capacity to teach effectively in that classroom so that's my hypothesis but it could just be that the web had a negative effect but we don't see that anywhere else right on average that was flat so basically what you saw before was those two groups combined and what you see is a flat a flat slope for the web group could I ask a quick question related to that? sure so the classrooms the teachers were seeing on the web were those likely to be classrooms similar to their own? yeah so would they be classrooms with 100% high poverty students for example? most of those classrooms were classrooms with the mixtures mixtures of kids but mostly drawn from classrooms where there were a reasonable number of poor kids in those classrooms so these were sort of teachers teaching in state funded pre-k at risk kind of programs yes? in some way determine the level of in the feedback that the teachers were getting from the coaches oh yeah good point so we had six different coaches that were interacting with these teachers and all of those prompts that are written are stored the great thing about doing this on the net is you have all of this information as data it's already stored and so we had a weekly check on all of those prompts to see that they conformed to what it was that we wanted those prompts to be which was pulling out an explicit example of the teachers behavior according to the dimension that the teacher and the consultant were working on so we worked very hard on that over the course of the year and we to the degree that that varied and it did vary we captured that as much as we could in implementation and all of those all of those effects are adjusted for the consultant so we adjust out any consultant effects and we do see some consultant effects you know some consultants are better than other consultants yes my reactions were treated independently in subject theory and I wonder whether so in some sense there is an implication of kind of generic quality of those performances and I just would like to hear your reactions to that idea the second which is somewhat unrelated was that with regard to the consultancy intervention it's positive effects were not altogether surprising although they were pretty dramatic but they seem like they might be very resource intensive so the question of scalability yeah okay good so on the content dimension we if you remember back to the context in which these measures were developed and this was sort of by accident and by intent we needed to go into all these classrooms all across the country and we knew that the teachers were going to tell us that they were teaching reading or they were tell us they were teaching math and we weren't going to see we're going to see something else and we also wanted to track kids across the day so we we designed them by intent to be content neutral so to be applicable across whatever content area the teacher was teaching and theoretically speaking what we're trying to do is capture aspects of co-regulation if you will between teacher and the child that we believe from various hypotheses are important properties of behavioral interaction that promote learning engagement across content areas the interesting questions and I don't know the answers to these yet because I really these are studies we want we very much want to do is do teachers with and honestly we don't see any variation when we contrast the teaching of math with the teaching of literacy with the teaching of history on average in those kind of dimensions that we observe the interesting question would be I think if you took somebody and you trained them up very well let's say in teaching a particular content area at a real level of depth would you see some differences in what we're seeing there and those are studies we'd be really interested in doing on the I think you probably would see differences I think you would see teachers more able to elaborate conceptually for example if they knew more about the content they were teaching then we're seeing the the question about scalability we calculated the costs of all of these interventions that we you know the cost of the consultation condition so I have one consultant full-time person per 20 teachers we when you cost that out you're basically spending roughly speaking about $2,000 a teacher over the course of the year to produce those now if you count up the and that's completely everything if you count up if Alan if you look at his work on what teacher professional development costs on the K-12 side that's very much on the low end of what the average teacher gets for professional development adding across all that ranges usually from about $1,700 up to about $7,500 in the studies that he's a per teacher so we're on the low end of the scale with something that we're demonstrating some effectiveness with so it's an interesting point okay let me go back here yes to have positive effects on student outcomes is wait time if a teacher uses wait time then they can have positive effects but when that gets translated into what do you teach a teacher to do related to wait time to run into trouble because what makes it a productive practice is knowing when to use it and why and so my question is how did you translate the behaviors that you saw you found to be connected to student outcomes how did you translate that into the course of consulting because there has to be an underlying there's an underlying reason for why and how that connection occurs and also in relation to one more piece sorry in relation to how do you teach that to a teacher so there has to be some sort of underlying theory for how to learn to teach it okay so why wait time works sometimes and not other times or why any number of things you know a discrete behavior works sometimes and not other times is that where the action is is in the contingencies okay it's in the degree to which that wait time is an appropriate response in that circumstance and enables the kids to keep stay engaged so that's why we define everything in terms of the behavioral response to kids cues does the kids response to the teacher's response indicate a greater or a consistent degree of engagement so we that's why we don't look at very discreet behaviors we're making ratings on dimensions of interaction that reflect those contingencies so the contingent the notion of contingency is built into the very definitions of what it is that we're rating and measuring and what it is that teachers are trained to do so teachers are trained to look at how did that teacher respond to what was the effect of that behavior in response to that cue sensitivity is defined fundamentally in that kind of case quality of feedback is fundamentally about feedback so so this notion of contingencies and that your behavior in a sense has meaning if you will or value or opportunity for kids only in so far as it is an appropriate response to that particular situation in relation to this larger dimension of interaction that we're paying attention to so and you can isolate you can isolate very clear examples of those contingencies in videotape you can show them to teachers and you can see how you don't respond and what the kid does and how when you do respond what the kid does and if you miss the target what the kid does and it's that's what you train teachers to do that's the skill we're trying to develop in them over here yes seem to you to be culturally relevant or neutral so that the way that teacher sensitivity is both conveyed by a teacher and received by a child would apply across cultural groupings ethnic so I think you know since all of these scales are you know you have to pay attention to whether you know whether there would be any differences we took the we took the stance ahead of time that if you define sensitivity in terms of timely responsiveness to a child's cues that conveys comfort support and respect then you're watching the child's response so a teacher can't get coded sensitive if she displays something that's culturally insensitive okay if the child were to react negatively to a behavior no matter how positive that behavior might look to you and me and feel to the teacher it's not sensitive if the child rejects that behavior somehow or it leads to a level of disengagement so what we're looking for are broader more molar properties of interaction that reflect this notion of co-regulation that that we felt at the beginning there was reason to believe you could describe in ways that were culturally neutral now we what we found is that when we look at ethnicity of the kids income background of kids language background of kids and similar information that we have in relation to teachers that we simply don't see any evidence of interactions statistically speaking between those features and kids learning gains okay so the scales operate the same for all of those different groups and you know again I think the other thing is that there's there's there is as much heterogeneity within those groups you know as there is between those groups I think quite frankly we focus far too much on the potential for differences when we can probably get a lot of purchase in improving outcomes for kids if we didn't spend as much time focusing on those differences but at the same time recognize that teachers do need to be responsive in ways that do reflect their attention to differences and where the kids may be coming from. We do actually find big interactions kids who come in who are it's sort of the one about the kindergarten adjustment problems but we've done some prior work on kids temperament encoding kids temperament. Kids that are kids that are very bold and boisterous are very likely to be rated as extraordinarily inattentive by their teacher and observed to be very inattentive by independent observers in classrooms where teachers have low levels of sensitivity when teachers have high levels of sensitivity those kids look just like the most shy kids who pay an extraordinary amount of attention to the teacher in the classroom. So there are some characteristics of kids that do matter in relation to teachers interactions but it has more to do with other I think more salient features of what the child's bringing to the interaction. Brian. In earlier conversation you mentioned some work you're doing looking at similar professional development in middle and high school. I'd like just to say a minute or two about that and what that is. This study that I just described about the coaching in pre-k we're basically replicating that in middle and high schools with early career teachers. It's been really interesting to do that so we find that teachers quite honestly have been a little more compliant with the intervention because I think they're a little bit more used to the tech dimensions of it and they're used to going to the web and doing the kinds of things that they have to do in the web. We're seeing early indications from that study that for kids whose teachers are assigned to the consultation condition those kids are at least reporting those teachers to be more engaging, more supportive of the kids the kids are describing higher levels of motivation to work for that teacher. That was part of what we hypothesized we haven't yet done the achievement and in that study we're going to be using the state standards test rather than here we're using sort of omnibus achievement tests that aren't even tied to the standards that the teachers are teaching and seeing some effects so I think we may see bigger effects on achievement but I don't know. Okay one more go ahead engagement on the part of the kid yeah this is very hard to do observationally so essentially what we're doing is it's easier to do on those video clips because you can actually pay attention to you can isolate a kid and you can draw attention to it that way but we include in the package of ratings a rating of average student engagement in the class in the classroom on average that's one of the elements that we use but then when we're looking at a more proximal level at individual kids to try to determine classroom effects on individual kids we're doing the best we can to capture active forms of engagement like body posture showing the kid is leaning toward the teacher that kind of participation effort that you'll see reflected in kids movement orientation of their head toward things one of the things I'm very interested in doing is talking to somebody about this earlier is Jackie Eccles and I were talking about this Brian Kevin is doing this work as well which is to try to capture at a much more refined level some perhaps psychophysiological proxies of engagement so heart rate eye gaze things like that that we might be able to capture to see if we can do some mining of data around that so we're trying to develop better measures of engagement thank you all very much this was a lot of fun