 So on behalf of Mark Siegler and the McLean Center and myself and the Center for Health and Social Sciences, I want to thank you for coming to this next lecture in our series on Proving Value in the U.S. Healthcare System. It's really a pleasure for me to introduce Eric. He's the Senior Vice President for Policy and Research at the Commonwealth Fund, which is a national philanthropy engaged in independent research on health and social policy. Eric is a member of the Commonwealth Fund's executive management team. He provides guidance to the organization of research topics and policy health services and health delivery. He helps do scientific review of the proposals. And as I just learned, I'm a mentor to some of many of the staff there. Eric, I'm trained in primary care, general internal medicine, health services research, and has become one of our country's leading health services researcher. His work is in health policy, quality, measurement, quality improvement, health system innovation, primary care, health information technology, health insurance access to care in vulnerable populations, almost as if anything you could imagine. Eric was on the faculty at Harvard Medical School and the Harvard School of Public Health for, I guess, about 15 years or so. And where he taught health policy and also practiced primary care. And then after that he went to the brand corporation. He was there, the distinguished chair in health care quality. Eric graduated from Columbia University. He holds a master's from Berkeley and then he from UCSF. He did his residency at Burgum and Women's. He had the dubious distinction of being my first resident. I was an intern who kept me from killing people during that first month week or two. So thank you for doing that. Eric's talk today is entitled, Miss Measure of Health Care Can Measurement, Improvement, and Cost Reduction, Reunited. So Eric, welcome and great to see you. Thank you. Likewise. Thank you, David. That's a really incredibly kind introduction. And there's something about having seen people in scrubs that persist for a lifetime. So it really is a pleasure to be here to talk with you today about this topic. This is a small group. So I hope people will actually interrupt and ask questions as I go through this talk. It's really something I've been thinking about for a while and I'm really still developing ideas. Your feedback reactions would be incredibly helpful to me. As David mentioned, I have had a research career going back to the mid-1990s when the notion of quality measurement was just getting off the ground. And I'll say more about that in a minute. But we've actually come quite a long way in the last 20 years since those first efforts to measure quality. And I think others are beginning to express questions about whether we're at an inflection point, and it may be time to reevaluate how we've developed our quality measurement programs, public reporting programs, and payment reform programs or payment models, and really think hard about what the right pathway is going forward. So that's what I'm going to try to touch on today in a very broad way. If there are any acronyms or any concepts or organizations here that you don't recognize, please just put up your hand and ask the question. I'll try to make this as user-friendly as I can. So about 400 years ago, Galileo took what is probably equivalent of what you could buy in a standard hardware store in terms of magnification, a pair of binoculars. You could take this and do this experiment. And he developed a telescope, which was a wooden tube with lenses at both ends, this picture there, and began to observe the planet Jupiter. And noticed that there were these little four little stars around Jupiter, and they were exhibiting odd motions compared to what would have been predicted at the time. At the time, the belief was that all of the heavenly bodies rotated around the Earth. And Galileo saw that these night after night sort of observing this, that these little stars around Jupiter were actually moving in a particular way. And he literally, this is the text or not the literal text, but a translation of his observations night after night as to where these stars were. And over time he came up with the idea that in fact the only thing that would explain these movements was that these are moons, you didn't know they were moons, but they were circling Jupiter and not the Earth. And most of you probably know that for that scientific insight he's very memorable and he was almost burnt at the stake because this was very controversial in his time. He was showing you through a very simple measurement instrument, a truth, an underlying truth that was not understood at the time. So about 300 years later, Percival Lowell, who has a very famous observatory named for him, had developed a much more sophisticated telescope and was observing Mars and noticed that there were patterns of change on the surface of Mars through his telescope. And as he sort of thought through what might be happening, he came to the conclusion that Martians had built an extensive canal system and that this was part of a big agricultural project and they were, the seasonal changes actually on the Martian planet were signs of alien life, cultivating food or something else on the planet Mars. So kind of an interesting sort of mash-up to think about Galileo with a very simple instrument observing night after night coming to a conclusion that has been verified. And Lowell, who this was actually quite newsworthy at the time, announcing final proof that the planet Mars is inhabited. I can't help but notice that it was from Chicago. And it was published in the Chicago, out of a Chicago newspaper. And so this was a sort of, I think, a warning sign to anyone who wants to use measurement to detect changes about the potential mishaps that can occur around inference. So with that kind of metaphor in mind, I'd like today to walk through, and actually, do we have till one o'clock? Okay, great. I'll try to keep it within that bound and I'd love to have some Q&A. So I want to walk through. Okay. The U.S. Health Care Challenge, don't tempt me. The U.S. Health Care Challenge and Evolution of Performance Measurement. I'm going to do this fairly quickly, but again, interrupt if it's too quick. And then I want to talk a bit about some of the limits of the current measurement efforts that are out there. This hopefully will introduce you to some of those. And then I'll finish with some thoughts about resetting the measurement agenda as we go forward. So this is a performance report that the Commonwealth Fund has actually been generating for 20 years now. And when I came to the fund four years ago, we did a sort of top-to-bottom review of the methodology and made some modifications to the methods. But essentially what this represents is a summary graphic of 72 quality indicators in various domains, access, process of care, some outcomes, and administrative burden measures. And we organized those and selected those with the help of expert panels. And then this, we can actually, there's a report and a summary of the findings of this. And the results here are pretty consistent with what we've observed over time, which is that the U.S. among the 11 high-income countries represented here is the lowest performer, much of that having to do with access and inequalities in care, some with health outcomes as well. And the top-performing countries are the U.K., Australia, and the Netherlands. It's not shown here, but if you look sort of at the statistics of this, there are really three groups of countries. There's Canada, France, and the U.S. There are the U.K., Australia, and the Netherlands, and then the middle countries. And this is really just to make the point, which has been made by the IOM and others, that the quality of care in the U.S. hasn't been optimal. And that continues to be the case here, at least in comparison to other international countries. And then we also make the point in other materials in this report as well, that healthcare spending as a percent of GDP has been growing faster in the U.S. than in other countries. I don't think that's news to anyone here. That's been a trend since at least 1980. And it has continued with sort of fits and starts as the economic conditions have changed in all of the countries, really. But we're an outlier on spending and an outlier on quality. And then another observation, just to throw into the mix, because it may come up later, is this paper from Bob Kocher about labor productivity in healthcare compared to other sectors. And the notion is that for healthcare compared to other sectors of the U.S. economy, over this 20-year period, 1990 to 2010, employment growth has been robust in healthcare, but labor productivity growth has been negative, at least by this measure. And what that means is that the labor productivity is lagging the other sectors. And that actually is potentially unsustainable in an economy, just because more labor is not necessarily the best solution to problems of healthcare delivery. Can I ask a question? Yeah, please. How do they measure labor productivity in this context? I'd have to refer you back to the paper for the exact details. Yeah, I just wonder, because wages are very complicated to think about in this context. And then sort of objective measures of productivity are also really hard to figure out in very complicated teens and other things. Yeah, no, absolutely right. And you know, there are different ways of approaching this sort of problem, which could give different results potentially. I think Kate Baker, who's here at the Harris School, has made the point that, she makes the point that this might not be a good jobs program to sort of produce healthcare, and that there could be inefficiencies. Another question? How does health care measure compare to other countries? Yeah, I'm sorry I don't have that, because that's an obvious question, kind of in this context that was presented, but I don't think it's that different, but healthcare is also really different, good, and it's a luxury good. The other countries tend to have spending constraints through global budgets, capitated budgets, and so in the U.S. we tend to have a more open-ended approach, which means that our labor productivity could be pretty much the same as what they're producing. So there's an economics literature, and this goes back to William Balmall. This was originally called Balmall's Cost Disease, and it was supposed to apply to service sectors in general. So he looked at education. So education is another sector that tends to be more labor intensive. He also looked at things like TV repairmen, which basically disappeared as someone repairs things like that anymore. But that literature is not without its critics, both in terms of whether the underlying fact is true, but as well as the measurement issues that are part of it. And you think, okay, healthcare is fundamentally labor intensive, but then you realize you can substitute, you know, clever sanitary products for nursing support. You can reformulate drugs to be given orally rather than IV, long-acting preparations rather than shorts. So I think the point of this sort of slide, and again I'll come back to it later in the talk, is the notion that maybe we haven't imagined all of the potential efficiencies that could be achieved in healthcare, and so we'll come back to that. So now I want to take the little short history of performance measurement and capture a few notions that I find sometimes people aren't aware of. And the first is that the origins of the quality measurement movement in the U.S. really coincided with the HMO movement of the 1990s. This significant concern of going to a risk-based, capitation-based payment was going to lead to stinting on care, to rationing of care, and that some mechanism was necessary to try to notice whether that was happening to detect that. And so the National Committee for Quality Assurance, which I worked at briefly for a year in the 1990s, was just getting off the ground at that point. There were three measures or seven measures when I joined. There are now dozens of measures, if not more. But the central idea was that this would be a public reporting kind of mechanism as a check on HMOs and their growth. And there was a similar conversation earlier when the DRG payment system came to hospitals back in the 70s and 80s that led to the development of quality measures of hospital care under the same concern that DRGs would lead to as quicker and sicker discharges. But just one of the interesting parts of this history, which I think is really important, is that the types of measures that were developed, particularly by NCQA, were things like preventive care and things like that. And if you think about the underlying economics of capitation, the underlying economics of capitation is actually to spend more money on helping people because they're the people who you want in your plan. What you don't want are the people who are really sick. And so like, yeah, measure me on mammograms and pap smears and how much patients like coming to my social events. But don't measure me on whether I've got high 30-day readmission rates or really good care for super complex patients. So this part of the history, I could also kind of distorted the industry. Yeah, you know, it's a very interesting point and really had to do with the availability of data at the time. This is one example of the street lamp problem. But I also think you're right that there was a sort of influence. There was a logic model for what HMOs would do. They would prevent illness. And so we wanted to measure the salutary prevention strategy and know that that was actually happening. But to get at the issue that I first mentioned about, bringing people in and then restricting their access to specialty services, having them die earlier, restricting access to hospitals and emergency rooms, that really wasn't captured well here. It was captured through utilization measures that actually have just been retired recently at NCQA. But that raises that question of like, what's the proper set of domains to evaluate these sorts of organizations? There was a kind of an interesting pivot, I think, after that point that led, I believe, mostly by Don Burwick, but a committee that he had put together and appears in Journal Medical Care around performance measurement and reporting, not as a sort of regulatory check on HMOs, but as an improvement motivator. And so they posited that there were sort of two pathways for this motivation. The first was the market transparency and consumer choice argument that consumers would actually use this information from performance measure and reporting to actually select providers. And actually that was embedded in the NCQA thing, too, that people would actually have that information so as they were choosing their health plans at open enrollment. And that volume-demand market mechanism would actually motivate competition among provider groups. And then Don being Don, they actually thought there was a really also critical role for organization and professional improvement and that these measures would motivate providers, the managers through reputation and brand and providers, too. We want to make sure we're at the top of the quality mountain and that they would also drive intrinsic motivation. If people saw that they were underperforming, they would be motivated intrinsically to want to fix the problem. And so I think that notion of this as a motivating path was important. Then a sort of different thing happened in around that same time that they were positing that mechanism, which was that financial incentives could be used to drive this more quickly and get the attention of providers. And that's when pay for performance, the early 2000s when pay for performance programs were introduced. In the hospital setting, this is sort of a CMS evaluation, over 2003 to 2005, looking at hospitals with pay for performance, pay for performance plus public reporting versus hospitals that were just doing public reporting. And for three conditions, myocardial infarction, heart failure, pneumonia, and a composite, the hospitals that were on the pay for performance group in addition to public reporting were actually showing some difference in trend, steeper trend, actually improving at a faster rate over this period of time. And this paper, which published in 2007, I think kind of opened the floodgates for the policymakers and there were others looking at physician group pay for performance. And it actually got formalized to some extent in 2010 when the ACA was passed that we actually could use a variety of measures to guide pay for performance incentives and then signal the market and literally pay people bonuses and penalties depending on their performance on this. And the payments would be for hospitals. This is the hospital value-based purchasing notion. And you'll see this is the 2017 version of the measure set. You can see the relative allocation that waits given to each of the domains in the payment formula. And an interesting thing sort of creeps in here around efficiency and cost reduction as one of the key domains of performance measurement. So we're not only going to reward for quality, we're also going to reward or penalize based on efficiency and cost reduction goals that will be embedded. Isn't the Lindenauer paper kind of an exception in this literature? I mean, I'm thinking this review Meredith Rose involved it a couple of years ago looking at pay for performance and doesn't she kind of mostly decide it really doesn't work? Yeah, we're sort of heading in that direction. I just wanted to sort of set the stage. I mean, maybe this is all as familiar to everyone in the room that this was going on. But the other thing that sort of arises in this context and it'll arise in another context so I'll get to that is the notion of where the variability sits in these complex schemes. How much variation there is among organizations can differ a lot between these different categories of performance. And that actually makes a big difference in terms of what the composite sort of outcome looks like for hospitals. And we're going to come back to that. Another sort of ACA innovation, I guess, was the hospital readmissions reduction program. And the HRRP was people had been measuring readmissions rates at CMS I think since the 1990s. And they knew there was a lot of variation among hospitals and readmission rates. There was a lot of questions about what explained those variations. But program officials with expert input felt comfortable going forward with an actual penalty program and a special quality improvement program to reduce readmissions. And there's nothing, readmissions can be an issue. But there was no attempt in this scheme to sort of differentiate appropriate from inappropriate readmissions because that's actually a complicated thing to try to do. But they did risk adjust the readmissions and that's going to turn out to be an important point later. But you can see here this is actually claimed as one of the sort of sentinel successes of the ACA hospital value-based payment initiative overall that readmissions dropped a lot between 2010 and October 2012. And then one thing to point out here is that the penalty started in 2012. So a lot of the change actually occurred before 2012, before there were any penalties. You could say, well, is that a sentinel effect? Are they anticipating that they're going to get penalized and so they make changes? Or is there something else going on? And then after that, once the penalties are in place for both targeted conditions, that's the upper one, and for non-targeted conditions with readmissions, and the targeted are heart failure, pneumonia, AMI, and some others. The non-targeted conditions, you can see the slopes are actually fairly flat even though there are penalties in place. So the logic of all of that was extended in 2015 when the macro was passed as a payment reform for physicians. It did a great service by eliminating the SGR, the Sustainable Growth Rate Formula, which had been the prevailing way of setting physician payment but was kind of unworkable as a policy because year after year, Congress would override it and physicians would get a little bit of an increase but no one was really sure that they were going to get the increase. And this did away with that, which was certainly a benefit. And it introduced something that looks relatively similar to the hospital value-based payment program, which was a notion of categories of performance measures which would then be feed into a payment update on a fee-for-service basis. So you get paid a higher rate in the future years if you performed well in prior years. And the categories were quality, resource use, again, coming in here, efficient use of resources as an idea, advancing care information, which is a fancy term for making better use of electronic health records, and then clinical practice improvement. This is sort of a nod to the idea that clinicians should be rewarded for participating in improvement activities in their offices and professions. And then you can see the relative allocations change over... are planned to change over time. The 2019 allocation has 50% on quality, 10% on resource use. By 2021, that shifts to 30% and 30% sort of quality and resource use on an even footing with the balance going to the other two. So let me ask if there are any questions before we move to limits. These programs are familiar to folks? Okay, and I've gone very quickly through them. There's more detail, obviously. So let's talk about what some of the limits have been of current measurement approaches. So on the theory of consumers, well, let me sort of give a summary slide and then I'll walk through them. They should be in homage to Sergio Leone. It should say the good, the bad, and the ugly, but it's mostly the bad and the ugly. And so I've done a little violence to Sergio Leone, but I think that'll become apparent why. So the first point is that it has been extremely hard to get consumers engaged in using quality information from these formal reporting systems, partly because people don't trust government. That's a sort of recurring theme as a source of this information. But there are other factors as well. There's been some speculation in recent literature that that might be changing, but I'm not persuaded yet. The second is that there's limited evidence from many studies now that have been done that there's really improvements in population health as a result of these programs, and I'll walk through a few examples of that. I think professionals have been rightfully skeptical of the value of these results. They've seen issues, technical issues, sort of come up and come into play around risks, sampling, different populations cared for by different types of providers, coding issues around changes being more driven by coding than by actually care delivery changing. And then patient preferences, which has always been a concern about to what extent do any of these measures, which are mostly based on adherence to guidelines, reflecting the complexity of patients with multiple comorbidities, patients who may prefer not to get care. We were doing a study of flu vaccination, for example, and vaccination is a classic. You know, people might have different preferences depending on whether it's a mandatory or non-mandatory, but for non-mandatory vaccinations like flu vaccination, patient preference could even vary from place to place. And we found out there wasn't much of an issue there, so that's just one example of the concern. And the other thing is that there has been relatively limited utility in the daily work of clinicians. It's been a really hard sell to bring these measures to groups of clinicians and engage. You can engage them. The changes are possible. I've seen this in my own primary care practice over time, but it doesn't seem like there's a lot of benefit based on these measures. These adaptations of these measures in specific settings have driven quality improvement programs, but outside of a very deliberate quality improvement project, just getting this information has, to most clinicians, has been not much return on an investment that has included having to code and having to enter data in EHRs to do this. So that's to the issue of burden. There have been redundant and misaligned measures in the measurement sets. There's been confusion around some of that and the data collection and reporting requirements that I've already mentioned. Can I ask you a question? Yeah. I have many other answers. Maybe I'll share and then you can comment on it. So the question is, with all these negatives, why has this happened? So that's a great question, actually. Okay. Let me share my thought. I'm happy to share a thought, too. I was just going to say... This is what it was like on rounds every morning. I guess part of what annoys me about this is that there's a whole industry that pushes this. So that's one. And actually that's a well-known public policy issue that once you create a system, it creates its own kind of set of demands and lobbyists and inertia around change. So I totally agree. This industry has grown up to respond to these needs coming from CMS and from health plans. There's a whole vendor industry around performance measurement, as one example. And a not-for-profit industry that hopefully believes it's doing the right thing, but it's also a good job. The other answer to your question, for me at least, and I'd welcome other thoughts on this, is that if you look at performance measurement in many contexts, it's a way of regulators saying that they're regulating without admitting to being regulators. And that in the current... As things have evolved in our society, this gives a credibility to the public policy makers, the people who are setting the agenda, that they're actually dealing with a problem without them having to be responsible for what they're intervening on or what the results are. So they can say, well, we've created this for consumers, we've created it for hospitals to improve. They take the heat, no question about that, but from a political standpoint, when they're questioned, they can say, well, we've gone along, we've created these programs. And that has value, and I don't mean in the big P political sense, but in the agency political sense or the regulatory political sense, that has value because it's action that actually doesn't... It doesn't necessarily affect the stakeholders as directly, so their decisions are not having a direct effect. Like, we're cutting your budget, this is a global budget state, and we will manage your budget for the hospital. Thank you very much. Instead, it's, okay, we'll have these performance measures, we'll look for outliers, and we'll try to drive the mean, but we're not actually directly impacting that. And there's also a professional resistance on any base part to being measured, so there's no incentive to make these programs work. There's probably in some of them not work in the city's kind of the way. Yeah, you're right. I mean, there's certainly been pushback on these measurement programs, and a lot of it from the professional community at many levels. I think the sort of zeitgeist of the, since the 1980s at least, around consumers and markets is an ongoing sort of ideological or philosophical orientation that probably isn't going to go any time soon, but it's part of what I think I'm challenging here. Yeah. How active are you because what I've been told and observed is that if you're not at the table, the rule's going to be written before you. Yeah. And at least if you have one physician or one surgeon right there at one of these meetings. Yeah. And maybe you're the person that does that. So I have played that role in many instances. So it's an interesting thing to reflect on because it varies tremendously. I would say of the groups that I've worked with, and actually I should disclose I've been co-chair of NCQA's Committee on Performance Measurement, the idea of bringing multiple stakeholders to the table and really working hard to make sure the measures will be informative has been a very productive approach. I've seen other organizations that have tipped the balance one way too much one way or the other and sometimes it can depend even on just who's the voice in the room. And that dynamic has a lot of influence as well. But AMA, for example, had a physician consortium for performance improvement that was advancing measures to CMS and NCU and National Quality Forum, which is another measure endorsement organization. A lot of those measures were rejected because their physicians didn't invite other stakeholders into the review process. And there was questions among the health plan leaders or the regulators or the consumers about whether those measures really represented quality from the standpoint that they cared about. So it's quite, I mean, it's a complicated answer to your question, but it makes a big difference for physicians to be involved. And if people haven't seen it, there's a paper calling for a timeout on performance measurement. Eve Kerr I think is the first author. And the ACP committee, which I had chaired before Eve came to it, did a review of the validity of the performance indicators, a sample of performance indicators from the National Quality Measures Clearing House. And they determined about a third of them kind of met validity criteria according to this review panel. It's actually a nice publication to look at because the types of measures and the validity problems are well outlined in the appendices. So there is a dialogue on this and some data actually. The reason they called for a timeout is that if two-thirds of these are not producing meaningful information, why are we continuing to spend $15 billion a year just to collect measures in the ambulatory setting if you believe that estimate? Yeah. Because of this performance, doctors want to have the best performance at any moment. They are like a violinist. They don't want to play a wrong note. They are like an ice skater. They don't want it to fall or break. So, unless a person is ill, mentally a doctor, he always wants to do his best. If the performance is not good because there are all other around it, if I operate on a person and I am forced to send that patient home in two days rather than keeping it, it's more likely to come back because of a suture infection because of a horrible thinking or because if that person economically is not doing well, that person at home does not follow my rule. But the performance has always been, as if we develop performance in ballerinas or performance in ice skates, we always wanted to do good. And I am always puzzled by this improved performance. What do you mean? What do you do? Yeah, so that's a great question and it gets to there being multiple different perspectives on thinking about performance. The physician's perspective that you just outlined and the performance athletes, high-performing musicians, there's a coaching cycle and a continuous improvement cycle there. In this context, people are often thinking about what's happening to the health of the population. And sometimes there's a mistaken attribution and we're actually going to touch on this, that the attribution is what's the problem. So, a hospital may be taking care of a sicker group of patients on average in a poorer community with fewer social and economic resources. Can that hospital perform at the same level as another hospital? Will the outcomes of those patients look the same? Probably not. And I'm going to come back to that because I think that's an important point that you've raised. And it gets also, I think, to the skepticism of professionals about the results because professionals, in my own experience there, sometimes I felt like I lived in two different worlds. I'd be talking about performance at a committee meeting in Washington, D.C. and then I was in clinic seeing patients and I want to try to address that. I think I've covered this already, but for those who want sort of detailed reports on why consumers don't use performance information, these two reports, one from Rand by Tom Concan and the one that New York State just did a review on, there's just an endless hunger to actually try to get this right through reformatting and different digital distribution approaches and it just hasn't been that easy to engage consumers and consumers are typically disappointed when they see the measures that are available. The other problem is that we have measures that, I call them street lamp measures because the data are available and it sounds like a good idea to do whatever the process is, the action is and so it gets collected and here is adult BMI assessment in health plans reported by NCQA and it's broken out by different types of payers. I think commercial PPO is the orange one on the bottom but what I wanted to point out is that there's been a dramatic increase in the documentation of adult BMI. There are now electronic scales that automatically record this. If you come on the scale it probably goes into the record but health plans actually spend money making sure they've collected these data on their members which is another complex issue about the relationship between the plans and the data the plans hold, the data the providers hold but what's been happening to OBC during this relatively similar timeframe it's continuing to go up. So we have now a situation where the process measure, kind of a weak process measure as you think about it, documenting a BMI, what is that? Does that really change things? And the answer is no. We don't have weight reductions which is actually the health outcome we care about. The other challenges in this area is that what are called high stakes measures, measures where at least some committee of people believes that they're precise and reliable enough to be the basis for payment, to drive payment changes. A lot of those measures have been around for a while and this GAO report looked at several of them and what you see here is you can't see the individual items but they're all sort of process measures and hospital value-based purchasing. Measured in hospitals while the hospitals were just doing reporting and then when the incentives started in 2013 more or less and that shaded area you can see by the time they actually start the program most of the hospitals, the hospital performance is topped out. You know, can you get from 95% to 100% maybe but what then happens is you've introduced a financial incentive in a situation where there's a very narrow band of performance at a very high end of the group and the curve is going to be a killer. There's going to be very little variation among those hospitals on those measures and so differences of 92% versus 94% will start to drive the payment return to hospitals. So this is also a perennial problem. As soon as you ask people to measure something they start to improve it. We don't know whether they're actually improving the care delivered. They're definitely improving the documentation and they top out. This study looked at process outcome and outcome for the hospital quality reporting program and it's interesting they had a group that had started before another group of hospitals came on board and so the early adopters and the late adopters were compared and you would think that over time hospitals would sort of learn and the hospitals that were late adopters might take longer to sort of learn and do all the process changes that are necessary to change improvement. But what really happens is the early adopters kind of get up there and the late adopters come along and they get up there too and throughout this whole time hospital mortality is declining. You can see the annual variations with the seasons and it kind of flattens out by the time the hospital value based purchasing program is in place. So what this says what this said to the investigators is that these process changes don't seem to be there's not a learning curve that we can sort of detect here. There's a learning curve in general but it doesn't seem to be different for more experienced hospitals versus less experienced hospitals and that's a strange finding that there should be a difference if what's really happening here is that processes are changing. Now I want to talk about the readmissions penalties and some of the evidence that has emerged and I'm again going to go fairly briefly over this. The first sort of warning sign was this readmissions reduction was weakly correlated with mortality reductions published in 2017 and they even found that in the heart failure group mortality, risk adjusted mortality among heart failure patients increased after the readmissions reduction program was implemented. So the concern is sicker patients being sent home they destabilize no one wants to readmit them so they put them in OBS or they do something else and you can imagine a fragile heart failure patient might not be able to tolerate that change. So that was out to one year concerning is this over time degrading the care of this group of patients? The other observation just recently published and this has been seen in other settings and we had actually simulated this effect in one of our early studies of primary care is that the penalties disproportionately fall on the hospitals with less resources who are taking care of a more socioeconomically disadvantaged population and that actually is not what we want to be doing. If you're taking care of a disadvantaged population you need more resources to do that effectively not less and so there's this worry that the rich get richer, the poor get poorer the hospitals taking care of the richer patients are getting rewards and the others are getting penalized and they quantify it it's not a huge amount of money but over time it could become quite a threat to the bottom line of safety in a hospital recovery. The last is also a recent publication that called into question whether the signal success of the ACA on hospital re-admission reduction was actually a success at all and that I showed you this already another thing happened in 2010 and nobody kind of noticed it or at least they didn't until this study was published in health affairs but another study had come out last year suggesting it too and this year CMS changed the electronic reporting requirement for the technical format for reporting from hospitals they went from 10 diagnosis codes to 25 and what happens when you do that is hospitals knowing that risk equals money and also just documenting if you can sort of reflect all the complexity of hospitalized patients that's probably a good thing they started using more diagnosis codes but they didn't adjust the model for that and so these reductions happening for both non-targeted and targeted conditions during this period the authors estimate about at least half of them are due to that coding change that in fact when you look at non-targeted conditions and targeted hospitals and targeted conditions and non- participating hospitals you can see that much of the effect is this technical change in the way the data are collected has really very little or nothing to do with the incentive program of course it also precedes the actual penalties and then when you look at the penalty phase not much change well they've probably maxed out on the coding change so that's not going to gain anything additional and so what is this program really doing it looks like this may have been a documentation change that accounts for what's called a success in the use of financial incentives I'm going to skip over this but people have asked whether measurement and incentives work better in ambulatory care settings and this is a randomized controlled trial I'll just tell you that they found there were better improvements in primary care practices where there were financial incentives tied to achieving preventive services so that it can work but this review by Mendelssohn of the literature all the literature of that date on ambulatory pay for performance suggests that there's little or no evidence that it actually results in change the evidence is weak a lot of it comes from England they had the biggest incentive program and the effects have been inconsistent if I haven't put the nail on the coffin yet this is what happened in England when the quality measurement program retired measures they actually didn't retire the measures they just took away the financial incentives this is clinical process indicators where the incentives were removed and these are things like glyco hemoglobin down here cholesterol testing diabetes retinopathy screening and you can see there's a clear decrement after the financial incentive is removed so it looks like there's less testing happening after the incentive removed and they're measuring, they're still measuring in the background which is very clever that they're able to do this here is blood pressure documented in patients of serious mental illness and alcohol consumption documented in patients with serious mental illness incentive is still maintained no change this are intermediate outcome measures where they remove the incentives cholesterol control the first two lines up here there's a bit of a decrement again this one is the documentation that patients with epilepsy are seizure free it goes from 60% down to like 10% overnight down so a documentation indicator where the clinicians just said oh great I don't want to get paid for this anymore I'm not going to document it or maybe they're not even asking a question we don't know same thing but it seems unlikely blood pressure control, cholesterol control with the incentives maintained no change so there are a couple of ways to interpret this one is the clinicians kind of get sloppy and don't do things they should be doing the other way to interpret it is that they were testing and documenting for patients who maybe weren't going to benefit that much the clinical judgment might be playing a role here the measure definitions might not be specifying a population where there's discretion as to whether a cholesterol test is the most intelligent thing to do in this patient with advanced multi-comorbidity medication so I think the jury is still out but what this tells me is that there's a tremendous sensitivity of these measures to the financial incentives and we can't rule out the possibility that people are gaming them just documenting things clicking boxes the other weakness is that whatever time it took to do this presumably was taken from some other million set of things so that's my other biggest concern here is the sort of the driving to performance teaching to the test sort of reaction or responding to the test and then sort of putting other things off to the side and so my question is whether value based purchasing is to that point diverting health care from actually working on quality and affordability and I always liked Franklin Delano Roosevelt's description at the depth of depression it's common sense to take a method and try it if it fails admit it and try another but above all try something and here's where I think we really we're sort of locked into a performance measurement scheme right now that is kind of running ahead but it's unclear whether it will get us where we want to go and it's also unclear whether people are going to be willing to retire it for the reasons we discussed earlier there are a number of diagnoses this one could sort of apply to this problem of the lack of evidence now and I won't go through these in detail but there are a set of hypotheses one could have about weakness of incentives for example some people say we should just be putting more financial incentives in place it's just too weak it's not a strong signal you can make a lot of stories around inability of professionals to adapt and change for reasons that are really outside the control of professionals the way payments work the way systems organized and there's always this inherent uncertainty lurking in the background so let me give you a few thoughts on measurement reset one theme that's popular right now is well we just need to shift the focus to measuring health outcomes and then everything will work out okay we're two sort of buried in process measures at this point the second is a paper that we published on reimagining performance measurement in a different way I think there's probably a need to invest in research and development on novel uses of emerging data sources we actually are now in a data environment that's changing rapidly and opening up new opportunities but there's very little research going on on how to address that and then this but repurposing measurement to support disruptive innovations which would lead to these higher efficiencies is another thought Michael Porter at Harvard is the one who's really been begging the drum for health outcomes achieved per dollar spent there's a paper there's a follow-up paper describing this beautiful conceptual model which sort of begs a lot of other questions around well what is an episode of care what's the outcome which outcomes matter how are they going to be measured and I think the international consortium for health outcomes measurement which is sort of the Harvard Business School initiative to try to do this has run into a lot of challenges trying to implement that and I probably won't go through the reasons that's hard a second is this notion of reimagining quality measurement and we do this mostly as a thought experiment but right now guideline adherence drives what is measured and what is considered adherence to or performance and we postulated that actually that's not how clinical medicine works it's that plus the patient's preferences goals needs desires and the effectiveness of the clinical interventions and the goals and preferences really need to be balanced against one another so is there a way one could get to that model and then every patient actually becomes part of the denominator because every patient has a set of conditions set of goals and so at the simplest level we thought you could get a comprehensive inventory of who this person is their clinical health status risks and health care needs that would be useful if you had analytics that could match guideline based interventions to evidence based interventions to the documented patient needs that would be helpful that's kind of what's going on in a clinician's mind what's really not there is the structured record of the patient's health related goals and preferences to inform how one would prioritize the interventions but one could imagine that as the health outcome would be how well you optimize the well-being of the patient around these goals and preferences and that aggregate estimate could be done at several levels actually and figure out how to do the math and do the computation and also figure out the temporal trends because patients' goals and preferences change over time the evidence changes over time but more slowly and so that was the idea in that paper we actually did a little thought experiment with two patients who have the exact same clinical current care opportunities and then we walked through different preferences and said what would the outcome what would the actual therapy, chosen therapy look like in that circumstance and with the power of data and the cloud computational power I'm not sure this is so far off we actually beginning to think that this isn't impossible to do and actually talking with David earlier about some work they're doing here along this line but the notion of passive data collection with consent personal interactive data with a little assistance that could actually be measuring this stuff in real time your Alexa could be trying to figure out what your goals, preferences and needs are if it wasn't too annoying and then using these large data computational methods to predict who's going to benefit or not now this is obviously space age for the future but maybe closer than we think in terms of at least the technical capacity to do this sort of work I wanted to throw in this about social media data this is a paper from 2016 on sentiment analysis of tweets people tweeting at hospitals and it's a very technically complicated way that they were able to sort of do the sentiment analysis on the sampled tweets and involve huge volumes of data and all publicly available they did this probably on a shoestring with just some computational work and an Amazon capability that allows them to crowdsource the tweet analysis tweet sentiment analysis but they did find a relationship between mortality and sentiment among hospitals that had at least greater than or equal to 50 tweets that is preliminary people are mining Facebook data Twitter data and other social media data to try to refine these methods I think that's something we should keep an eye on in the future you know the time is going to run out here so I'm going to leave the question about disruptive innovations for another time if that's okay and go straight to let's see because I do I guess the main point is that I think that the digital health innovations that are coming relatively quickly are going to make it possible to deliver care in ways that we haven't kind of yet grasped I have a whole different talk on that maybe I'll come back another time and do that one I would love to do that but the principles I think for this pivot or reset are that we should we really should just retire pay for performance applications of measurement the incentives are the wrong incentives are not the effects that we want and we're spending a lot of money and time and energy that could be spent in more productive ways I do think there's a key role though for retaining measurement and reporting at an aggregate level for regions, large organizations, hospitals large delivery organizations, health insurers because we do need some way of monitoring what's happening to the population as we change and reform the health care delivery system but I don't think these should be annual routine reporting exercises, they should be targeted analyses that inform policymakers and regulators and managers who can help to interpret the data and figure out how to kind of make it real in terms of improvement projects I do think we're not focused enough on some of the key policy objectives one of the things we discovered in our comparison to the other countries that do well on our performance is that policy objectives like improving the population self obesity, suicide and all the rest are under resourced at this point access to care is still a problem especially for people who are uninsured or underinsured and we have a weakening and dying primary care system in this country which needs a major overhaul we also could reduce administrative burden, one of the challenges in our system which I think everyone can resonate with in this room is the amount of administrative stuff that has to be done that takes also diverts people and then we given the socioeconomic inequalities in our country measuring and reporting on disparities of care is going to be crucial too so in conclusion you know how we the telescope that Galileo had wood and glass has been supplanted by this is the Mars orbiter which was mapping the planet's surface and so with a much better more complex technology with more bandwidth and more frequencies to be able to observe was able to give us a very detailed map of the surface of Mars I think that is possible this is a vision that we should continue to strive for it's potentially an expensive vision but like I say the changes in the digital environment not just in healthcare but more broadly could support it there are innovative cost saving care models percolating in the environment the common fund actually is supporting some of those and studying them and sharing that information if you go to our website I do think there's a diversion of resources I think we can still use quality measurement judiciously to serve specific improvement goals and I think we're under investing right now in R&D much of what came to us as HEDIS actually was an investment made in the 1990s by the Agency for Healthcare Research and Quality can argue with that investment but we don't even have version 2.0 at this point CMS is developing quality measures and they're just trying to serve their program needs so anyway I'll stop there it's been terrific to talk with you all and I'd love to take questions if there are yes to measuring quality of care or at least the effects of quality of care is that the consumer isn't making choices based on that quality but to me it seems that the reason that patients wouldn't be making choices because they have no choices right like especially if you're someone with on Medicare you or Medicaid you have one position for you so it doesn't matter wouldn't it be much easier or at least this would answer wouldn't it help with part of the problems that we're looking to solve the increasing quantity of positions and then working on increasing access to care for people so then the consumers thoughts on how well are you treated would be automatically accounted for in the fact that physicians who weren't treating patients well just wouldn't have patients yeah it's a great question in the 1980s 1990s New York state started its cardiac surgery mortality reporting system and that was a public health department initiative and it was not publicly reported it was there was a team that would go to the hospitals where there were potential mortality problems work through what they were doing differently they actually would bring in people from other hospitals it was essentially a directed sort of peer review model and it was very effective because they discovered practices going on in the high mortality hospitals that like rushing patients to the OR rather than stabilizing them before surgery that turned out to be very powerful and reducing mortality there were a few physician outliers who were also identified as someone who should have been retired a few years ago and so that that only became a public reporting system because Newsday demanded it under the freedom of information act and then they felt like they had to provide a consumer element to it but you're right in our first studies in the 1990s of these mortality programs 78% of people were like in the hospital for three days before they had surgery and didn't have a choice that's probably changed now and cardiac surgery has certainly changed in that time but I think your point about targeting the performance information to regulators, managers, public health people, improvement teams that can actually affect change to deliver more for patients over the long run especially when they have no choice so there's another part of that question that's very interesting which is just the matter of having choice and one of the things we've seen is as the information requirements for the federal government of general models rise there are huge returns to being a large system being a large system implies less choice that's right they're definitely interrelated there's a lot of consolidation in the industry which does mean at every level so it's insurance fewer choices doctors and hospitals fewer choices even ambulatory centers they're consolidating so that you may not have a choice physicians still have a choice within those systems we are capable of measuring and I like your idea that as our data capabilities go up we'll be better but I still think that there's a lot that I'm not hostile of female just here and spend a lot of my time doing these things and we're still limited by what we can measure in terms of actual human behavior and what people do you can measure how patients feel about their experience with a physician but you can't measure what the doctor did that did or didn't result in that kind of an outcome and this is not to say that certainly outcomes do help us with that to a certain degree but who fits into that denominator and why they fit and why they don't fit and whether or not they should fit is another issue entirely based on what data we can collect and what data we can't collect so that results in things that we can measure being low hanging fruit automatically and then as soon as you start to measure those things everyone becomes about the same as you showed in your graphics but I've not seen the same amount of movement to figuring out what's the next level of hanging fruit instead we just keep looking for more low hanging fruit that is measurable. Yeah it's one of the significant challenges that CMS has defined for its programs that everything has to come off claims data. Yeah that is a real limitation. Yeah and as long as you're kind of hampering you know one of the reasons I think there's not a lot of innovation is because they've sort of set that as a condition and once that's done there's not a lot of fruit I mean people actually are making heroic efforts there's a great paper Debbie Pikes and her group from Mathematica that actually came up with a way of measuring the comprehensiveness of primary care by observing diagnosis coding patterns over a period of time and looking at the relative allocation of those codes. I think that's kind of innovative. They had three different kind of measures the range of practice and they could actually sort of identify these primary care docs but that's that takes a fair amount of imagination and technical work to sort of figure out how to make that work and then how to test it and validate it and make sure that it's capturing something real. Yes but the good news is it's just claims data so they didn't have to bother any clinicians that weren't already bothered. That's a great question thank you. And be able to go to the lowest common denominator which in terms of rating which is an Amazon review one through Vibe or a Yelp review because I would think that there's some people you know consumers, health care consumers that want to go through the reuse of data about their options every October then there's the people that look at the their pamphlet and say oh this health system or health insurance is Vibe or that one is four and a half so I'll just go with them Vibe. I think that there's two you know maybe two different types of people and I think the vast majority of the people are just going to be like yep that doctor is Vibe or that doctor or that insurance program is Vibe I'll just take that. You're saying that you need to get all of this data and synthesize it and try to accurately profile something quality that most people will just be like I'll take the Vibe. Yeah so it reminds me of the Saturday Night Live Skit where the census taker comes to Christopher Walken's apartment knocks on the door and says you know how many people live here and he said Walken says what did most people say? Yeah it's a challenge and actually there are methods for the fact that most ratings are biased upwards there are actually some methods and survey research to try to disentangle that the other thing I think is interesting and maybe more to your point is the Mark Schlesinger's group at Yale has been working on narrative analysis and narrative comments and extracting information from narratives which is actually a richer source of information if you can about how to extract the information from that context I'm actually a little down on the rating notion the idea of a score I don't think that gives people a lot of guidance in terms of what needs to be improved and so I show summary slides like that but I don't think that should be the goal I think the goal is always what do we think we want to improve what's the targeted set of data that could help us make that improvement happen do we have a model for what we would intervene on to help it happen and then taking it from there so I sorry if I misstated my position there you have health care gets a lot of feeding because it has become a political send people to congress and get people out of it but if you review the literature actually you will find out it's one of the best system because here people are treated individually based on the individual a lot of other european people are based on group and policy for example I was reading that in a Scandinavian country they recommend no mammography for the men after age of 7 alright so you will say a lot of money on this you don't have to treat the people who have cancer so quality you do not do kidney transplant or do all this after the age of 60 so 65 so imagine how much you save money on this type of thing and you are not dealing with very ill people remember my family they sent X-rays to me from London had a metastasis to the brain age of 55 we operate here but they told them go on and celebrate the Christmas maybe you are better do that so the child mortality here is supposed to be the worst but if you take the first year out which comes from a lot of lack of health addiction then exactly become like European system become like Canada if you take it so I think it is so hard to try to find out how to correct it because there are so many statistical issues so as a doctor you are sort of saying stop to do what to do you are raising a really fundamental question about what is the purpose of a health care delivery system and is that purpose as has been decided in the Netherlands or in the UK the purpose is to provide a basic level of care for everyone make sure that happens along with social investments that make people less likely to become sick, addicted or other things or is the purpose to focus all our efforts on people who are advanced illness or and I think we have made a different decision and I think you are absolutely right about that our policies reflect a different decision about whether we prioritize access to care as opposed to highly technical costly care some of us believe that shift we are the richest country in the world we should be able to work that out and I think that is one of the reasons the political debates are as heated as they are because there are different views within the country about what the right way is to do that and expand access the question is more about what is the mechanism for getting there this is what makes performance measurement so interesting is that one can take so many different lenses and apply them and they can teach you different lessons about what a system is doing, how it performs and what the needs are and how those could be modified I appreciate your comments and question by the way in any conference I have ever been at if you needed care where would you go most of the Europeans will say well I would go to my country in fact our Harkness fellow spend a year in the US and they would say I would rather be in my country and say if you had advanced cancer of a rare type where would you rather go the US, I would definitely make the trip to the US for that so right there you have that dynamic of if there is a trade off how sick am I, how complicated is it can I get the best technical expertise in the world versus how I don't get a bill I walk into the doctor's office I get seen, I get taken care of I walk out and I don't get a bill