 about CSR, Center for Scientific Review, and also give you all a chance to have a conversation with him. So, Richard. Thank you very much, Eric. And it's indeed a pleasure to come and present to you an update on the Center for Scientific Review. It isn't very often that the President speaks about peer review, so we had to feature that in a talk to the National Academy of Sciences, and in response to some recent concern about peer review expressed in Congress, I think President Obama very nicely defended the notion of peer review as important for our ability to have the top science. That was very much appreciated. Let me just tell you a little bit about the CSR mission, which is to see that NIH grant applications receive fair, independent, expert, and timely reviews free from inappropriate influences, so NIH can fund the most promising research. In 2012, CSR reviewed almost 55 percent of NHGRI's grant applications. There was a total of 286, which is the equivalent of one of our study sections, about the workload of one of our study sections. These are distributed a little bit more than one study section, and we have 174 standing study sections. I'll go quickly over the path of applications through CSR, because some points are a little obscure, but most you as counsel would be familiar with. All NIH extramural grant applications run through CSR, we receive all NIH applications, and they are referred from CSR to NIH institutes and centers into scientific review groups or SRGs. We review the majority, about 65 percent of grant applications for scientific merit for NIH, but the other 35 percent or so are reviewed within the institutes, and in the case of NHGRI, slightly larger fraction. We have a somewhat unusual peer review process in that applicants, PIs, send in applications based either on their own initiative or on funding announcements. There is peer review either at CSR or the institutes. Those applications go through study sections where they are ranked, and then they are percentile. So as you know, percentiling normalizes the output of all of our study sections. That's an important step and is increasingly being discussed at NIH. At the IC and in front of counsel, the strategic goals are applied to the decision to make awards and to decide about funding. Funding itself is a more difficult activity these days because of the low number of awards and the low proportion of dollars for the number of awards. Finally, we expect research to be the outcome and outcome progress to be represented by publications, and we hope these ultimately will affect the public health. Within the center, 85,000 applications are received each year. Of those, the center reviews 58,000 of those. This involves 16,000 reviewers, over 230 scientific review officers, and almost 1,500 review meetings a year. So this is really a factory of review. We like to think we do it well and we do it efficiently. This graphic is an overwhelming fact of life both for CSR and all the institutes these days. As we all know, especially since the doubling of the budget, the success rate has been dropping. It is now at historic lows. The last couple of points on this graph, the last one is wrong. It should be flat over the 2011-2012 period at 18%. However, an 18% overall success rate conceals an important issue, and that is for any given award. The odds have fallen to approximately 10% of an award coming from any given application. That essentially means that of our primary constituency, the scientific community, 90% are inherently unhappy with CSR at any given time, and cause me second thoughts about accepting this position. However, I think we, the institute directors and myself all believe that this is a critical role and very important for the future of science in the United States, and that we all must step up to the plate when it comes to being invited to serve. This applies not only to the institute directors, but also to those who serve on review itself. They too have to suffer the consequences of this curve, not only in their personal activities and in their labs, but as an acquaintance of many PIs that are having or friend of many PIs who are having a hard time. Some review issues that we faced early in 2013 look like this. One critical thing was great inflation. We were seeing quite a bit of inflation, especially since enhancing peer review started and the new scoring system started. As you know, the scoring system went from 10 to 50, 100 to 500, to 1 to 9. This is an examination of how scores worked out to percentiles. As you can see, in 2009, it's hard to see the pointer, it's down at the bottom is 2009, October, and you can see the distribution of scores representing a percentile score at the top of 20, 25, and 30. As you may know, it's rare that scores above 20 can be considered for an award. So essentially a score of seven represented a score of 20, a score of 13 represented a score of 25, a percentile score of 25, and 19 a percentile score of 30. Now that essentially means that two digits are available for indicating within the award range out of the 1 through 9 scale. This got compressed over time where towards the end, nine represented a score of 20 and a 15 represented a score of 25. This year, we implemented some changes in order to decompress the scores. This is an indication we've gone back to better than 2009 decompression and we're not finished yet, so that now we have all of two digits available to us and in some cases the third digit. I'll show you a little bit on how that works. In 2009, the scoring chart was developed and we have changed the scoring chart for the beginning of 2013 slightly. This is felt to be a tweak but an important tweak because one thing that had become the practice in a number of review committees is that the strengths and weaknesses were treated as inverses. As you can see here, a number one score is seen as exceptionally strong with essentially no weaknesses whereas nine is very few strengths and numerous major weaknesses. We started to hear some reviewers say, well, this has no weaknesses therefore I'm giving it a one. This was a severe distortion of the original concept of enhancing peer review in which the significance or the strengths of the application were supposed to be score driving features not counting weaknesses. This wasn't universally true but it was enough true that we felt we had to make a change and so we created this scoring chart. It actually looks more different from the other scoring chart than necessary largely to impress upon our review committees that the scoring chart had changed. There's an emphasis on overall impact on the chart and an emphasis that you cannot get into the high category that is a score of one, two, or three unless the application had major importance. You could get low even if you had moderate to high importance if there were major weaknesses. The other point on this chart is anchoring the scoring at five which is the thing in dark gray at the bottom. It turns out that a number of reviewers were thinking that a five based on the old 100 to 500 system was an extremely bad score and were reluctant to use it. So they were essentially compressing the range from one to five. Making this point helped them spread scores more. And here is the outcome of the first round of decompression. The pink bar, the pink line is the one which is the most recent. It looks relatively small change, but it crosses the 50% line of overall scores at the score of 59 which is a significant change from the old where it was crossing that line at 49. More importantly is what happens at the 10 to 30 range and particularly, well, there has been about a 30% shift in the score from the yellow line to the pink line. It's important to note that jump that occurs at 20, that jump is a cumulation of scores at the grade of 30 at the merit score of 20, I'm sorry. And that kind of jump is within the gray zone of many institutes and therefore provides program staff with relatively little information and we're working on that particular issue. It's relatively rare, in fact, I don't know of any other presentation of the preliminary impact scores. So these are the preliminary impact scores that we get before the discussion. And you can see that the peak of this curve, well, one thing, it's a skewed normal distribution. So it's skewed to the positive end of the scale, but it does indicate a normal distribution which suggests that our use of the percentile score as a translation of these impact scores is not very helpful or accurate. You can see that the peak here is between 30 and 40 or the scores absolute preliminary scores of three and four. This means that most of the 50 percent of the application are squeezed within this range, which is not very helpful to you or to program staff. After discussion, it gets a little better in that the non-discussed applications on the right are 46 percent of the applications and then there's a re-spreading as the discussion spreads scores a bit. So here the peak is still between 30 and 40, but at this point a quarter of the applications are below that rather than 50 percent of the applications. This provides a little bit more scoring range to work with an interpreter for program staff. However, here's another look at the way your scores from reviews are coming out. You can see there are huge jumps at 10, 20, 30, 40, etc. And those scores are on their side, so they're a little harder to see. And then they come down. So there really is a massive agreement around constant scores and having that kind of agreement on review is not very helpful. Figuring out how to spread off of particularly 10, 20, and 30 would be helpful. Most review committees reflect that they can easily differentiate more finely than they are actually differentiating with their scores. So there remains a problem with the scoring system and we are having a discussion of ranking, of going to a more frank ranking system to try and address this issue. Another suggestion that has been made by a number of reviewers would be to get the opportunity to have a half point to create more differentiation. However, when we looked at the old 100 to 500 scale, when you go back to that, you get a similar kind of peaking. So the great compression that we had under the old system and the inclination to agree on an individual score that has overlap with other scores seems to be a chronic problem and we need to have another way of dealing with it. Diversity and fairness in peer review. As you can imagine, fairness is the key goal of peer review and needs to be its hallmark. In 2011, as I was coming on board the Ginther et al. article in Science came out that showed that though applications with strong priority scores were equally likely to be funded regardless of race, African-American applicants were 10 percentage points less likely to receive NIH research funding compared to whites. 10 percentage points sounds fairly innocuous or small in some sense, though this was highly significant, but if you consider that for the number of African-American scientists that applied to NIH, if you look at a comparable number of equally of control white scientists, the African-American scientists were getting 55 percent of the awards expected for the white scientists. So that 10 percentage point difference translated into a huge likelihood of success difference. Some suggested explanations in that paper were the possibility of bias in peer review and a cumulative disadvantage that could be experienced by African-American scientists based on differences in education and other forms of bias over the course of their careers. NIH was extremely felt that this issue was extremely important. This was important because NIH believes that it should be represent fairness for all groups. And the level of this difference got the attention of Francis Collins and Larry Tabak, the director and deputy director of NIH, to the extent that they immediately formed a set of internal committees, raised this issue to the level of the advisory committee to the director of NIH, asked that immediate action be taken, but also that both CSR and other groups at NIH get to the bottom of what was the cause of this discrepancy or the disparity. So a peer review subcommittee was set up by the way. We accept that this whatever is going on happens before award decisions are made. The impact score for applications entirely determines the award rate. So we accept that if there's any problem in the NIH system, it must be occurring within the peer review system. It remains possible that there's a difference in the applications coming in. The advisory committee asked that a peer review subcommittee be set up. That's been done. I'm a co-chair of that committee. It asks that we provide more information for applicants who have non-discussed as their outcome. African Americans have many more applications not discussed than other scientists. That's been done. They ask that text analysis of application summary statements and discussions be looked at. They look, ask for an evaluation of anonymized applications. And they ask for diversity awareness training of NIH staff. All of those things are being worked on. And we're hoping to do this as a true experimental science so that we can know the causes of these disparities. This is the initial group on the work group on diversity subcommittee on peer review. It's mainly social scientists to try and look at this problem. We're also bringing on board other scientists with strong records on peer review and perhaps and more biologically oriented scientists. We have also increased the representation of minority groups on our study sections. Just since I've come on board, we've increased African Americans on CSR study sections by 42 percent and Hispanic scientists by 22 percent. I show you those in highlight. But if you look at the rest of the chart, you can see that from 2006 to 2011, numbers had actually been declining across the board. And so I can't do more than say that we've brought back the numbers to a proportion that's about 10 percent right now. This is about double the representation of underrepresented minority scientists in the award pool of NIH. We've also created an early career reviewer program to help early career scientists who are just beginning life as an independent researcher to understand more about the Center for Scientific Review and Review. This is to train qualified scientists without significant experience to help emerging researchers advance their careers and to enrich the existing pool of NIH reviewers by including scientists from less research-intensive institutions. The requirements for being an early career reviewer is not having reviewed for NIH before beyond one mail review. A faculty appointment is that we are told by the university that it's expected that the individual become an investigator and that they've established an active independent research program and have not had an R01 or equivalent. Now these individuals, there's no more than one per review panel. These individuals are given a very light review load as a tertiary reviewer between two and four applications. So their primary job is to look and learn. They are under the wing of the SRO and the chair of the panel. So far we've nearly 700 have served on study sections and of those 32% are underrepresented minority scientists. But these include scientists from all universities who are considered independent. Feedback so far has been very positive, 98% found the ECR experience to be useful, 90% reported themselves to be in a better position to write their own grants, 97% would recommend the ECR experience to a colleague. We have not been getting negative comments from reviewers and the experience of the SROs and the chairs is very positive. For your own information this is how to apply to the ECR program to CSR early career reviewer all one word at mail.nih.gov. You'll be given copies of my presentation if you don't want to note this. CSR is also looking at additional review platforms. As part of the general effort to try and ensure that we have appropriate review platforms for different circumstances, we're trying different kinds, such as telephone assisted meetings, video assisted discussions, internet assisted meetings and telepresence meetings. These are various forms of either phone, video or completely asynchronous electronic reviews. We're also looking at editorial board style meetings. We're looking at the strengths and weaknesses of these various forms, what kinds of reviews there are best for, people's impressions of these forms of review compared to face-to-face are standard form of review. There's a lot of interest in editorial board style meetings since these have been used for the director's special application program. It is felt that that might be a style for the future, but it is more expensive than regular face-to-face review. Here's just an example of a video conference based study section similar to telepresence. This is where reviewers on both sides of the room, the ones on the back are just on video. The ones in the front are live. The ones in the back appear nearly full size so it's possible to see their expressions as they conduct their reviews and to see whether or not they're paying attention. One of the nice things about this is that the microphones are set up so they're directional and you can see the voice appearing to come from the person that's speaking. It's a very nice feature of this. I'm going to shift over to an issue which has been bothering many scientists and has been a source of frequent complaints to CSR and this is the issue of the A2 application. This is the graphic that convinced NIH directors and NIH that we should address the A2 issue. Here we see that the awards to A0s went from about 60% in 1998 and it dropped to the lowest form of award by 2008. The A2 awards had risen, had passed the A0s and you can see from the intersection that seemed to be looming it was likely that it would pass the A1s in popularity for the proportion of awards. After the A2 was eliminated in 2008 some applications were grandfathered but as the grandfathering ran out those curves went down. The A0 rebounded strongly as we had hoped and passed the A1. You can also see this was the latest result from 2011. We're looking for 2012. I'm very concerned that A1 may intersect with the A0. What these lines suggested and what we heard in the actual conversations that went on review panels was that a line was being set up that people were expected to wait until their A2 application before receiving an award in many cases. It was very easy for the committee to try and pick out a few things that they wanted to see get an award. They became more urgent at the A1 once they came to the A2 level. It became a strong temptation to line up PIs in order of their applications. When the A2 was eliminated the time to funding dropped quite a bit from a little over 90 weeks to a little over 50 weeks on average. These two things and the lack of difference between new investigators and established investigators it was through a lot of statistics the Office of Extramural Research determined that there were no groups that were substantially more affected by the loss of the A2. The time to award increased reduced went down dramatically. It was felt that we should eliminate the A2. The order of scoring seemed to hold across this period of time. There were no dramatic changes that were occurring. It was just delaying when the award would occur. I want to talk a little bit about the future and then I'll take some of your comments. One of the ways in which we think we can improve the outputs of CSR scoring is if we look at the distribution of applications across study sections. Current distribution is quite non-random. First of all we base distribution on the areas of science. Two we base distribution on PI preference. PIs get to ask for the study section they'd like to have and we honor those 80% of the time and 75% of scientists ask for a review committee. This means and the observation of many people that there may be a non-random distribution of the highest quality applications. If that's true and we normalize the output of all study sections that means that you are not looking when in council and in other places you do not get to look at necessarily the best applications across all the applications. You're just looking at the best applications from each study section. We are looking at how that works. We are trying to get established statistics on whether or not that impression of most people is real. We are going to be scoring or re-ranking applications from a broad set of study sections to try and see what the relationship of that ranking is to the individual study sections. If there is consistent differences across study sections that will cause us to ask the question how can we look more generally across study sections in a systematic way and provide you with the information. We're trying to develop better tools for applicants for referral and review. This is part of a general process to try and make CSR more user friendly for applicants and to make the system of award intake more helpful. We are also trying to increase diversity and reduce award disparities obviously. In general, we'd like to provide better service to applicants and to the ICs. We are trying to have more discussions with program staff to see what we can do to make our summary statements the most helpful. Finally, we're trying to develop a science of peer review. We've established an office with money to do experiments in peer review. I've implied some of those ideas to you earlier in this talk. But I think if we are to keep the U.S. lead, especially in this time of difficult funding, we've got to figure out ways to make it easier to find the best applications and make awards to those and to provide you through CSR with the best information for making those decisions. Thank you very much. Thank you, Richard. We have time for questions. Let me start with the first one. For this, I don't know, I forgot the exact three letters, the training one, where you take youngsters and basically give us... Early career reviewer program, ECR. What's the curriculum that you give to them? Is it purely just by showing up, or do you have materials or mock study sections or other things that one could imagine you could do? We don't go to mock study sections. What we do is each ECR is given both a PowerPoint set plus a training from the SRO. And then they're given guidance. They can't attend peer review more than two full sessions. And then they're given guidance about how to work with it. So far, the reaction from the SROs and from the chairs is that has been sufficient. That is the... In general, the ECRs take this very seriously. They work very hard on the reviews that they do. They have not caused embarrassment by lack of knowledge around that issue. Yeah, so I just wanted to share some thoughts and get your impression about this set of slides that you showed about the distribution of review scores. Because at some level, one is assuming that the quality of these proposals and the science that is being proposed is normally distributed. But it's not a random population of proposers. And so that assumption may be fallacious at some level. But also I think in my own personal experience on study section is that there is this tension between doing a review sort of on an absolute versus relative terms. On the particular study section on which I served for say four years. At the beginning of my tenure, the overall quality of the proposals was just not very strong on absolute terms. That's not to say there weren't some good proposals. And then being sort of forced, our SRO kept saying, your mean is much lower than all the other study sections, and this refers to what you were talking about just at the end. On the other hand, does it make sense if for some reason a particular batch of proposals in general is not strong to rate them on this sort of relative scale? On the other hand, if a particular batch of proposals are all very, very strong, you're naturally going to get that compression either at the top or the bottom. So I'm just wondering sort of what recognition there is about this. Now when you look at statistics, statistics can tell you one thing. But if you don't look at the priors, you're sort of maybe being led to a slightly false direction or strategy. So I'm just curious to hear what your thoughts are. Yes, we're not going to, we're not planning on doing any given thing. We, right now, we observe that we have a problem, that we've long had this problem, that different efforts to address the problem have systematically failed. And that there's this broader issue of what's going on across study sections, which we've never addressed. So we're planning on having some meetings with individuals who are experts in decision theory. Obviously, if this were easy, both corporations and NIH would have solved this a long time ago. This is a deep, difficult problem. And so far, we remain with the idea that strong scientists, excellent scientists are the best judges of science. Aside from that, the way to get the most maximal information from them, either as a committee or as individuals, I think there are many questions about and that one of the things we should be doing is exploring how we get to answers about that. I'm not proposing and don't plan to propose. We know the answer. Here's what we're going to do. I think what we'd like to do is to say, here is a theoretically better way of approaching this. Let's try it with some study sections and maybe within an IRG. I think we're going to have continued discussions with the scientific community about any changes that we're thinking about. I was interested in your actions related to the diversity issue on application success rates. So there has been research which has shown that there's gender and racial bias in the review of grants. And given that knowledge, why isn't there something where you're trying to address that with the review panelists themselves? They bring it with them their own societal stereotypes. It's nice for diversity awareness training of the NIH staff, but they're not making their review scores and the decisions for funding. It is the intent of the advisory committee to the director to address that issue with the reviewers. And if we determine that there is bias in the peer review system. Right now, we're running some experiments which we hope will show not only whether or not there are differences in the quality of applications, but differences that could be attributable to bias and the proportions that that might account for. I think we need both pieces of information to develop the right intervention. And so, but if it turns out that there is bias in the peer review system, it is the intent to try and train reviewers. The ACD asks us to take any validated system for countering bias and apply it first to NIH staff and then to the reviewers. However, there is no such validated system. And so the first thing we're going to do is to see if there's any system that can make a difference with NIH staff. There's about 500 staff that we could apply this to. So we there are reasonably large numbers. So it's very taken by the figure that you had with the spikes of the priority scores. Can you give us a sense of what causes that underlying phenomenon and the degree to which the kind of group dynamics of the study section and congealing scores on those and what that might be telling you about the underlying review process? There's no question that there is a strong temptation once the general agreement seems to be existing on review committee that an application is in the award domain. Remember we score based initially we group based on preliminary scores. So when the first few reviews they know are in the possible award range. It looks like there's just a strong group temptation to agree on a score and to have everyone vote that in many cases. And it's mainly those that vote outside the range that cause differentiation among the scores. That's a tough position for individuals to take on many committees. So we think it's a severe problem, a serious problem. The scientists themselves they can differentiate better than the committee scores reflect. But we haven't figured out a way to get that expression. This is one of the reasons for discussion of ranking systems. So sorry, is that because say the high and the low are 2.3 and 2.5? And so everyone has to vote in that range? No, remember that the primary reviewers give scores of one, two or three. They only have those three digits to work with. So one is heavily discouraged by the system and so you're talking about two and three and they know that if you want to stay out of the gray zone it has to be below two. So given those, committees easily get stuck on a consensus. So right, sorry, it used to be you could give the decimal. Now it's like two and two. You cannot give the decimal. So everyone votes two. And how many scores go into a priority score? The overall priority score is based usually on 20 scores that are generated around it. So even though you have 20 scores you're still getting these spikes? Three reviewers say that's actually hard to gerrymander that way. You have to work really hard to cope with the system that ends up with those optimal. We have a problem. I wanted to follow up a little bit on the idea that Jill was introducing earlier. What's the assumption and what might be the implications of trying to standardize things across study sections? And in particular you made a comment that rare that scores above 20 can be considered for an award. And I'm thinking of the societal and ethical issues and research study section that was formed to bring the research ethics and the LC together. And traditionally I think it's well recognized that those scores have always been high in the sense of bad remarkably so compared to other study sections. And that's just the way that study section scores. So I guess my question is how would you handle something like that? And I understand what you said earlier that you don't have a solution yet you're looking into it. But as a related question to that then I guess when you've created and I don't know whether there were other study sections that got merged presumably there were. When you've looked at the experience of the disciplines that came in or the groups that came in and were merged into a new study section have you traced at all how those applications have fared and is that something that you have done at the study section level and how would you handle that going forward? We do do some tracing. Most of the tracing is based on concerns or complaints by subsections of committees. So whenever we hear a subgroup complain about the outcomes of mergers, et cetera, we do look at that kind of issue. But I'll remind you that today because of the huge drop in awards every group is saying we've been selected out for loss of funding. And we very rarely statistically see that difference when we look at it. Everyone is losing money these days and we're not picking on any one group. But everyone seems to have the impression that they must be picked on because so many of their compatriots are losing funding. So I'm glad you're taking a science-based approach and trying to use actual data. That's wonderful. One of the things with the Internet Assisted Review is there's at least some level of the scoring that's locked in prior to the actual in-person visit. And I'm wondering if you've looked at that data, the pre and post, to see how well does the scoring in isolation reflect what happens after the group dynamics? And really where I'm going is can we get to the point where we don't have to have in-person at all even though I really enjoy it? We are asking the question, what does in-person contribute? Is there a socialization process that's important in peer review? I can't tell you what the answer is. There's a lot of myth in peer review obviously as well as the possibilities of science. And we want to systematically explore this. Some of the methods allow us to determine in the long run whether or not electronic review is the equivalent of face-to-face review. Richard, do you actually have funding to carry out controlled experiments? Yes, in two ways. I have been allowed to keep some money we've saved through electronic review, a small amount of money, and we've also been given extra money to carry out the experiments and bias and studies and bias by NIH. So I think if we get results which help clarify what we need to do to improve quality of peer review, the institutes and Francis will give us money for that. I think everyone agrees that corporate America believes that you need to have resources like that and experiments like that in order to improve quality. And so we hope we will be able to do that ourselves. I just want to make a comment that I do think the social aspects of it have a big impact. And I just was on a review pretty recently. And the scores before we all met together were significantly different than when we were together. And there were good reasons and bad reasons for that. The good ones were they were maybe someone who could clarify some technical issue that someone else didn't understand. But then it also comes down to who was better at persuading the rest of the group to go one way or the other. Or if we don't score it well enough, it's not going to get funded. And there are so many statements in there that were like, could we, when we were the other so easily. And so I do think it's very important to have this decision making expert to advise you. I agree that there are lots of different dynamics that are going on that we have to understand. We are doing some studies of the shift in scoring that occurs and the scoring that differences that occur on different committees. Some committees basically stay with the original scores and some committees shift quite a bit. So, and it's up to you, the staff of the Institute and of Council to make decisions about what's important based on the mission of the Institute. And so you can ignore the scores. It's difficult to ignore, let's say, a nondiscussed, but within the scored range. I think the general expectation is that the relevance to the Institute's mission is very important. So have there been any experiments of the system of the type where you might send some subset of applications to multiple study sections to compare what the outcome would be of the exact same application going to the system at the same time? Okay, we are discussing the possibility of reviewing some subset of applications twice to get the reliability and validity estimates that we need for many of our other studies. You might understand that that makes a lot of people very nervous. And I have not gotten the go ahead yet to do those studies. But we think that it is important to develop some power estimates of our system. My guess of that would be that there's going to be, you know, high concordance on the things that are not discussed, right? There's a bunch of things that ultimately, you know, no study section would fund. And then there are going to be a couple of things that would score in the 10s, no matter what study section you would get to. And then there should be a whole 20 to 30% that's going to be all over the place. And depending on who you send it to, would or would not have gotten scored. I completely agree. I think most of us believe that peer-review system, as it's currently designed, works well when success rates are 30 to 40, not so well when success rates are where they are. That's the point, you know, that it would be important. If that's true, then that's important no so that we could communicate that to policymakers to say that, in fact, when you're drawing down the funds, you're drawing it into a percentile range where, you know, sure you're funding good science, but there's a ton of great science that gets left on the table. And the only way to really secure that we're funding all the, you know, the best science and to keep U.S. competitiveness is to figure out how to drive the funding range back up into the 25 or 30% rate. I agree that current funding is catastrophic. Richard, I wonder if you probably have compared notes across the mechanisms used across different agencies for their peer review and whether they're seeing the same kind of issues that you're seeing. And I've served on a number of panels across DOE or NSF and they do things quite differently and not better or worse, but I'm wondering if you look at their numbers, do you see the same trends? All the agencies that I'm aware of in the United States are having a harder time with funding and are seeing loss in science. And at the same time, we're watching our agencies, related agencies in other countries. I was earlier, I guess last year, I was in South Korea presenting on peer review. They were extremely interested. They were frankly feeling that we were giving them an opportunity to emerge as a first class science country. Their representative from their government came to report on the 2013 budget. He was deeply apologetic for the funding that he was going to announce. It was only an increase of 5%. And so they have concluded that American success is built on science and technology and on federal funding of science and technology. And they're all delighted, in some sense, that we've given them the opportunity to compete. So just one quick comment and then another question. Didi was talking about some issues around review and grants that are not discussed. I just want to point out that the way we order grants for discussion now, if a grant, if two out of three reviewers score a grant well and there's one like WACCO review, or sometimes it works the other way, but that grant now, according to the ordering, will not be reviewed. In the old days, it used to be that these kind of outlier scoring grants would also be discussed. So I just want to suggest that we might think about how to go back to that. But the real thing I wanted to mention was that I agree with you. It's really important we get the best scientists to review grants. One of the pressures on us who have done that service is that the review panels often come very close to when we have to submit our own grants. And while there is this quasi-policy that you get some dispensation for that, that your grant submissions can be late, that does not apply to all the grants that you would submit. So if you're responding to an RFA or a PAR, you don't get any dispensation for that. And many of us who are experienced reviewers, and I like to think that those of us sitting around this table are reasonable scientists, we therefore have this additional pressure that we may have 10 grants of other people to review. And we've got immediate deadline to fund our own labs. So that's part of this idea of trying to get good people to review that I think it would be great to revisit that policy and how it's actually implemented. Well, I can tell you simply that we are revisiting those policies. I'm glad to hear that. And any individual reviewer can ask that something be discussed. And we've also asked our SROs when there are strong discrepancies in the preliminary scores to try and get some resolution, even if there's not going to be a discussion, to try and get an understanding among the reviewers, which is the more correct score. And to have a discussion around that, to try and reduce the problem this causes to PIs who receive highly discrepancy scores. Well, and this is related to the question in terms of transparency to the applicant as well as to council when it has to decide. Is there a movement towards including the distribution of, you know, the exact distribution but aggregates in the critique back to the applicant, such as means, mode, standard deviation? Of the score, of the individual scores, they are provided with some of the criterion scores that came to that information. But I'm not quite sure what advantage that would have. I think that would point to outlier reviews because you would have a large standard deviation or range and you would have a means and a mode that might disagree very much. Well, we are asking SROs to try and get some resolution of highly desperate reviews so that it's less confusing. But I wanted to ask you whether there's an ongoing program of quality assurance, quality improvement for the SROs. Is there presiding over these study sections? Is it observed by outside people within CSR but outside of the study? There are two forms of, I guess, multiple forms of observation. Obviously, reviewers are there and there's a hierarchy over each SRO. We expect the IRG chief to attend many of the meetings run within their groups. We expect division directors to attend them. I attend maybe six to eight study sections around. So we also, each study section is observed by program staff. We hope there's better communication these days with the program staff and a better sense on the part of program that through their hierarchy, I'm very receptive to information about problems on review committees. Does the program staff often express concern that they're actually been told, you know, not to participate, that they have to stay as quiet, silent observers? During the reviews themselves, that's correct. We want the SRO to be the federal lead on that. Between sessions, I expect there to be good communication. I think we've encouraged communication between program and review staff, both at the higher and at the individual SRG levels. I think that our SROs are feeling more able to work with program staff and to hear about problems. We're also trying to create systems. So electronically, it'll be easier for your program staff to follow what's going on at reviews of their applicants. I just wanted to second Lucila's point, because I think it's a really good one, which is that you're already collecting all this data on the scores. So even just getting a histogram back with some summary statistics of what the scores were for your grant, at least for me would be... I'd love to see that, right? I mean, it's sort of the same as if you were writing a course and you wanted to compare... And this would be really useful data to also have in terms of trying to figure out whether to normalize scores across study sections or not, right? You're saying that sometimes some study sections you're not funding all the best grants, overall you're just funding the best grants in that study section, right? So both in terms of giving that data back to individual PIs, but also keeping it for broader staff analysis, it would be very useful data to have. And you already have it, so you're already collecting it. Yeah, one of the problems that we have on any individual level collection of data is that we really go out of our way to make sure that things can't be traced back to individual reviewers. Even the statistics that we kept here, we've deleted all the original PI information. We collected the numbers. And each time we do these collections, we do it for a specific purpose. So I can imagine that something like this could be arranged. We'd have to think about it a bit to make sure that the underlying confidentiality is kept. But anything that we can do relatively readily like that, I'm willing to consider. If you're the PI and you're getting a histogram of your scores, right? If you get a histogram of scores and they are bimodal, right, that tells you something different, then you get a histogram of scores and everyone will give you a five, right? Then you go, okay, clearly, right, you know, I don't, I really need to rethink what I'm doing and it's either the question or, but if you get this bimodal score, you might say, well, clearly some people didn't understand it, maybe I can address this point better or something like that. I think it would also help in the discussions that the program officers have with individuals' PIs, right? Because I know I have called my program officer and said, what was the study section thinking, blah, blah, blah, and they pointed out, look, there's no variance here, right? Then, you know, you could, you know, take that to mean what it means. I hear you, okay. So Carlos, even if you strip identifying information, there are scenarios where an individual could be linked to a score. We can talk about it at lunch. But I think that would be the underlying. You mean an individual reviewer on the study section? I mean a review panel that has one expertise and you read a written review because that person's assigned and then you see a score that's outlying over here and you say that reviewer gave me that score. Whether it's accurate or not, it's still open to the door. But you do get the scores from the three reviewers. Anywhere back. You get the three, you get three reviewer scores and they're, you know, they don't necessarily tell you what that person's final score was. But if you see the printed scores and there's an outlier, you might link it. Correct or not, you're creating social havoc in the follow-up. We'll talk at lunch. Well, thank you very much. Thank you, Richard. It was terrific that you would come talk to us. So thank you, council, for a good discussion. So we will break for lunch now. Should we try for 1.15? We will reconvene at 1.15. So 55 minutes. Go get your lunch. And thank you for a good morning. I just got all these sort of people sitting around the table. Is it still for you guys to?