 So it is actually the inaugural chair of the Department of Medical Social Science at Northwestern University. I could say that this is a department that has been built around him and his work and that of his team. He is truly an internationally known researcher in measurement science and patient-reported outcomes. And the way that we first met each other was around examining cancer patient-reported outcomes in oncology clinical trials. So clinical trials are large randomized studies where you're examining one arm versus another arm and seeing not only how our cancer patients do in terms of survival, or does the tumor shrink? Yes, no. But how does the patient feel as she or he is going through this therapy? So Dr. Silla's early work actually was a tremendous contribution for this country in how one measures patient-reported outcomes associated with cancer treatments. He now works within multiple diseases and has established what's called patient-reported outcomes measurement information system for the country. And so this is a huge NIH-funded project which really is setting the stage and the state of the science for how we're going to be producing outcomes measurement in the future associated with many, many trials. So for example, if you're working on a study on depression or a study where fatigue might be a problem, overall quality of life, sexual dysfunction, there's going to be state of the science measurement for any of those domains. And this is the reason why we're so pleased to have him here because he is the expert in this area. And it's a wonderful opportunity for our MPH students to learn this. Thank you. Thank you, Dr. Wenzel. Thank you for inviting me. So I don't know how many of you know this, but I'm a Chicago Bear fan. And so I didn't have any problem getting on a plane at six o'clock yesterday to come here, which is in the six o'clock Chicago time, four o'clock. In the middle of the Super Bowl, I heard on the plane that Green Bay Packers won the Super Bowl. I don't know if she's a fan. I don't know if she's from Wisconsin. So I got you some cheese. What do you feel for you? Cheese is for you. Oh, thank you. It's such a gift. It's such a gift. I'm ready to share. After the talk, I'll share with my cheese. You're supposed to put it on your head. See, Dr. Wobble knows that I won yesterday. He congratulated me. Green Bay Packers, win the Super Bowl. All right. Bears, don't even get to the Super Bowl. Okay, so I'm going to talk with you about promise. Before you saw this announcement, is there anyone here that heard about promise before you heard about this talk? A few. So I am going to kind of assume that you haven't heard of it. I'll tell you the story of promise. From the beginning, we're also trying to move through a lot. I'll monitor the time and make sure that we relieve some time. Later, if I run out of time, I might zip ahead because you mentioned an interest in diversity. And that's sort of the last part of the slides. So, what is it now, about eight years ago, the NIH put out a request for applications, to put together a cooperative group that's a collection of funded institutions to build what they call dynamic assessment of patient reported chronic disease outcomes. That was their focus. What does all that mean? The assessment means a way to assess how patients are doing that isn't requiring them to answer every single question on a validated questionnaire that's long and tedious to get to an answer because that's the way the questionnaire was validated. But instead to take advantage of modern measurement that I'll go into in a moment that enables you to be more efficient and ask a much smaller set of questions and answer that's very precise about how someone's pain is or their fatigue is or their depression level or their anxiety levels. So, the NIH recognized that there had in the 80s and 90s been a growing amount of research using these item response theory models that are measurement models that they could be applied to measuring health. They've been used for decades in education and also in psychology but they really had not been applied in any systematic way to health but there was no reason they couldn't. So, they put this RFA out. And since I was one of the people that was beginning to do this in health and was writing about it and saying, hey, we really should be doing this in health, I applied. And we actually were awarded the Central Social Coordinating Center for the first round of promise. And so this all began in 2004. This was the release of the RFA was in late 2003 but the grants were awarded in late 2004. They said in this RFA that clinical outcomes research would be enhanced greatly by the availability of a sacramentally validated dynamic system to measure patient-recorded outcomes of PROs efficiently and study participants with a wide range of chronic diseases and demographic characteristics. They wanted to be able to measure pain in arthritis patients the same way you measure pain in cancer patients and the same way you measure pain in post-MI patients, et cetera. And currently there really wasn't a way to do that across a number of domains. You could do it with a zero to 10 pain scale which is in common use which couldn't really do it in fatigue because people use different questionnaires and couldn't really do it in depression, et cetera. So there were a lot of challenges to getting this to really work that we faced in the early years of PROMIS. There are lots of measures out there of the same thing. Literally hundreds of them. Actually there are thousands of them if you go across all the domains that we've tackled in PROMIS. They vary widely in their quality. Some are quite good and some are really not very good at all. But the end users aren't always very good at differentiating the good from the bad and they just use what the psychologist down the hall says they should use and they don't really ask many questions. And so they end up filling the literature with a lot of confusion. It's very difficult if not impossible to compare across diseases. Now the NIH again who started all this wanted to be able to see if they were getting a good band for their buck in heart and lung blood institute versus cancer institute versus arthritis and musculoskeletal institute versus neurology institute, et cetera. So they wanted to be able to have a common metric across those diseases so they could compare better. It also helps for meta-analysis to be able to include patient-reported outcomes when you're doing a meta-analysis of a given therapy in healthcare. So they wanted to look across studies, across different conditions and also to be able to deal with the fact that most existing measures were quite complex and quite long. So this was a tall order that we were asked to fill. Now NIH, you know there's a little aura there. The NIH has actually registered the trademark for Promise which is kind of interesting. So it is a registered trademark of our government and it stands for patient-reported outcomes measurement information system. And just to give you the basics, when somebody says, what is promise? Well it's actually three things. It's a set of measures or questionnaires or some people call them scales and I'll show you some of that. It's a science of different methodologies that are applied toward the goal of standardizing and streamlining or making efficient, asking people how they're doing. So it's really an organized way of saying how are you? You know the doctor's first question when they come into the exam room. And it's software. I'll show you also later that we have freely available web-based software for using Promise. So it's all these things. There's measures, there's a science, there's a methodology that we put out and there's a software available. So Promise 1 as we call it really went from 2004 to 2009. It started very late 2004. And that was a fairly small group. And during that period there were seven funded groups that say we were the coordinating center and there were six independent research sites that we coordinated. We created a lot of measures. We built this assessment center, this web-based software that I told you about and did the first pass of validity studies of the Promise measures in clinical populations. Now, Promise 2 then goes from 2010, if you will, to 2013. So it's a shorter period. There's a five-year Promise 1 and a four-year Promise 2. I'm assuming and I only formed at one center out of that other thing. So they formed one coordinating center, yeah. And six sites. That also submitted independent proposals? Yes. So there were six sites, apart from the coordinating center in Promise 1, there are 12 sites now in Promise 2 and actually three centers. We're at Northwestern and there are two of the centers. There's a stat center and a technology center at Northwestern. So there are three sort of centers of excellence and 12 research sites. And in Promise 2 now we're working on new measures and additional validity testing in larger samples of patients. So that brings us to today. So it's being used now. Promise, you'll see it in grant applications and they're actually getting funded. The first couple of years with grants going in, reviewers were like, what's this? We don't know it. Now I think it has a pretty good reputation. People understand it for what it is and seem to support the idea of going ahead with this common standard metric approach. It's also being used by the Diagnostic and Systemal Manual reorganization to DSM-5. This is use in psychiatry. How many of you know DSM? All right, so it's about half a group. It's the way that psychiatrists diagnose people with mental disorders. The DSM-5 field trials are now using Promise to measure general functioning. There are also a number of population health monitoring examples, the National Health Interview Survey, which has been around since 1957, is now using Promise, some Promise measures in its current surveys. The Random American Life Panel has used Promise in their studies in Healthy People 2020 and the Healthy Lifestyles component. We talked about Healthy Lifestyles earlier. They are also using some Promise global measures of health. I'll show you a little bit about the Promise global in a minute. I'm using it in a cancer cohort of a couple of thousand people. Now I want to start talking to you about why we're excited about Promise and why we think it's the way to go for the future. There are four things I want to talk about with you. One of them is comparability. The idea being if you're doing a cancer chemotherapy trial or an osteoarthritis trial or a heart failure trial, you're going to get the same metric. The second one is reliability and validity. I want to be able to hit the target with the same number each time but also hit the right target. All of the arrows go into the same place. That's reliability. Each time you shoot, you get the same hit. But getting the bullseye in the right place, that's validity. So you want to be reliable that it's consistent, stable in the same answer. You also want to be measuring something that's meaningful. Third is flexibility. Meaning not just paper and pencil and computer but also being able to do it with a telephone. Sorry, that's not an old phone. I don't think anyone has that phone. And maybe my kids actually have one of those because they play with it because it's a toy from the past. It goes fast, doesn't it? And then finally, fourth inclusiveness. I mentioned the diversity. These are the four values of promise. Comparability, reliability, validity, flexibility, and inclusiveness. I'll run through each of those. But first, I need to explain a couple of terms. Because if I use them, you'll say, what's he talking about? What's a domain? What's an item bank? So let me just explain these to you. A domain is a specific feeling, function, or perception that you want to measure. So it is what you're measuring. It's a thing. We call it a thing, but that just sounds kind of weird. So we call it a domain. Some people call it a concept. But others will think of that as an idea. So this is... We come up with a term domain, so it may be, you know, absorb it, sink it in. It's basically what you're measuring. It's a physical function, et cetera. So physical function and erectile function. Satisfaction with social participation or even general health perceptions. Each of these can be a domain that you're measuring. They cut across different diseases. The key point with promise is that we're not disease-specific. We're domain-specific. So you can call it a domain-specific approach to measuring patient-reported outcomes or self-reported health. Not all about patients, by the way. You can use some of the population health monitoring. Those are not patient cohorts. So you can use promise tools because they're generic. They're really about life experience and symptoms. You can use them in general health models. Now, what about item bank? An item bank is a large collection of items that measure a single domain. A domain like anxiety or fatigue, something like that. But any item and all items can be used to provide a score on that domain. That's never the case with standard old-school classical test theory where you have to ask every question to get the score. You get the total sum score and that represents the person's anxiety. With item response theory modeling, every question in that bank, that item bank that captures a domain is a representation of that domain. So you literally ask one question or you can ask 50 questions. The benefit of asking more is that you shrink your confidence interval around your estimate. One question will give you an estimate. There will just be a big confidence interval around it. Two questions will shrink. Yeah, confidence interval. Three will shrink it more. Now, envision that it becomes very useful to a population scientist who says, I want to ask 20,000 people about their depression and I want to track them over time. Well, now you can start doing that with one question. Because you're getting the benefit of the averaging of 20,000 people and tracking that over time. So it becomes feasible to use that and actually link it to this larger item bank. So here's an example of a physical function item bank going from left to right from a better impatient to Lance Armstrong over there in the right riding his speed bike. It's racing. And the items are lined up along that bank also from left to right. They're lined up along that domain from poor physical functioning to excellent physical functioning. And you have questions or what we call items. Another jargon term an item is the thing that you ask somebody an individual inquiry unit. Sometimes it's a question. Sometimes it's a statement that people endorse. So for example, are you able to get in and out of bed? That's an item. Are you able to stand without losing your balance for a minute? That's an item. So these are an example. of physical functions items in a physical function item bank. There have to be 120 items in the physical function item bank. This is just a sample of six of them. And they're ordered along this continuum for how easy or difficult they are. So you can start to get a sense that somebody who's very impaired and says they can't get in and out of bed. A smart computer or a smart interviewer knows they don't need to ask if they can run five miles after all. It doesn't even get out of bed. But these old questionnaires used to make people answer all those questions and they'd annoy people and say, I just told you, I can't get out of bed and now you're asking me to run five miles? If you insulted and everyone's time is wasted. So this is why you can get away with this because all these items are now calibrated after the same latent domain, if you will. Okay, so that's what we're trying to accomplish today. So this is the network. I mentioned Northwestern here in the heart of the middle. It's so loving. And then the six, these are the original six centers. I don't know if it's bright enough for you to see, but here's the Washington, Stanford, Stonybrook, Pitt, UNC Chapel Hill and Duke. That was the first set of the seven of us in the first and the promise one. Now promise two, as I mentioned, has these three centers of excellence, the Staten Center and the Network Center and the Technical Center. Network Center is in North Carolina. And then the 12 funded research sites down across the bottom of the slide. I won't go over each one. So to get to these banks, we integrated a lot of different aspects of science. It's not just asking questions and scoring numbers. We integrated qualitative research to get the right questions being asked and make sure they were going to go over this comprehensive with people. Health information technology to create a venue or a window if you will for people to do these assessments, psychometrics which is statistics applied to measurement, survey research because we had to create creative samples of people mixed with items to do factor analysis and at the same time calibrating these item banks and then clinical research making sure these things have meaning and value in the clinical research setting. So it really integrates all of that. And we did that through the course so far of the first seven years of PROMIS through 30 research protocols that are all aligned with one another and connect to this evolving sense of PROMIS standards for research. There have been nearly 40,000 people that have contributed data so far more than 1,500 in qualitative research and 35,000 in quantitative research. That includes almost 10,000 kids about 2,000 as proxies for their children and then more than 25,000 adults were answering on their own began. This also includes Spanish speaking participants adults and children are so far over 4,000. So let's build now into comparability. I used to think I knew the story of the tower of Babel and I wonder how many of you when I told the story will I used to think that the tower of Babel was the tower that represented language that people were speaking different languages and nobody could understand each other because they were all in this tower. Well actually the tower of Babel as Genesis reports it in the Bible was a tower that was built after a few generations of descendants from Noah had gotten together and created a city and built a tower that was ascending toward heaven and got the angry God nervous about people rising up and getting too close so God supposedly struck down the tower and created language to deal with the fact that people speaking one language were actually getting too far. This is the sort of Old Testament rendition of an angry God. Language is the result of the tower of Babel but at any event that's just a little how many of you thought it was what I thought it was before I know the real story. We have the same problem in measurement. We have the sort of post-Babel measurement problem because we have all these different questionnaires and people call them scales, questionnaires, tools, instruments and there are a hundred ways to measure depression and the scores are all different and we're all speaking different languages even when we're all dealing in the English language so we have a huge problem of too much language being a distraction from actually accomplishing something in measurement but this quality of research that we've done and then the use of these item response theory models what I'm calling modern measurement allows us to build a common PRO language and to do in a sense what you could refer to as a patient-ported outcome goes out of stone that is to take these different questionnaires calibrate them together with promise banks and somebody to get a promise depression score for example even if you use the CPSD depression scale or the Beck depression scale because they're all calibrated to this common promise method and that's a project that we're doing parallel promise with a separate grant so in the beginning in those early years we built a health outcome framework of common patient-port outcome domains what domains are and we define those domains and we actually went through the trouble of saying what we think that those are and therefore how they should be measured and we informed that our own consensus with the literature, with analysis of existing data that we all had and got from other sources and with qualitative research with people that are anxious or fatigued or in pain so we did this literature review it's 79 focus groups across the network with people from different disease populations and also with people in the general population and we came up with tables like this so for physical function almost 2,000 items were identified out there in the literature after we revised, reviewed, carefully scrutinized threw away ones that were poorly worded whether redundant or just didn't get at the right point of what we would define as physical function we ended up with more like 250 did that also with fatigue and depression anger and anxiety as you can see there were thousands of items but we still had over 1,000 items at the end of the day across the domains of the promise even after we reviewed and revised them to work so that became that column of the lower row became the beginning of the sort of soup if you will for building the item banks and we started then using qualitative research these focus groups I mentioned also individual interviews with people rewriting items that were unclear really getting into the first year and a half of problems was really focused on qualitative research exclusively and then we moved toward the end of the second year into testing in both general population samples and clinical samples were needed for example in pain you can't calibrate a pain bank if you're testing it on people that don't have pain and then analyzing that data interpreting the data, refining it and going back to the qualitative drawing board really in this pervasive way of these item banks this is the framework that we worked with always started with in the beginning was this right here in the beginning we said we're all going to agree that there's physical health, mental health and social health that these are three components of health later through arguing discussing and data we came up with these breakdowns of physical health dividing into symptoms and function mental health dividing into affect, behavior and cognition and social health relationships and social function so this is the basic promise domain framework that we worked with much more articulated than this as you go out to the right but I'm not going to detail that any more than to show you the actual products now, so today, February 2011 in the physical health domain if you will a set of domains we have the following domain on the adult side and the pediatric side pain, fatigue, physical function sleep and sexual function on the adult side pain, fatigue, physical function and an asthma impact study because the site that was doing pediatric work was particularly interested in pediatric asthma in the mental health side anxiety, depression, anger and then some more specific mental health things like the impact of illness and cognitive problems and on the pediatric side anxiety, depression and anger parallel to the adult side finally in social health we have the ability to participate in social roles and activities that includes work, family roles and leisure activities and then satisfaction with those roles and activities and then we have some newer we just knew this past couple months banks and companionship and social support of various types and social isolation and the pediatric side so far we just have clear relationships as an item bank or as a domain or as an item bank so those are the domains we have so far I went through about 30 of them so we have those existing now they're on this assessment center I'll show you in a few minutes but what's the metric? I keep talking about a common metric well we went with a T score a T score we didn't embed this it's been around for 50 years or more it has a mean of 50 and a standard deviation of 10 it's simply a way to take a distribution the average is 50 and the standard deviation is 10 so you know what you're talking about when you have 60 you're one standard deviation higher than whoever your reference group is our particular reference group is the US general population all of these promised T scores are referenced to the general US in this case adult population once you have these banks capturing these domains the T score you can then use the following tools to do your testing one is computerized adaptive testing remember that first slide I showed you on the RFA that said dynamic tools to measure health the RFA from the beginning said they wanted computerized adaptive testing to be done by this group and it is a natural outflow of item response theory the idea that if that person in bed says I can't get out of bed the computer won't ask any questions it will ask the room or down the hall or running a mile or two it just says we know this person is over here and we're going to ask questions that are over here and then we're going to stop asking once we sharpen that estimate that's called computerized adaptive testing the computer keeps estimating where the person is and then checking and then selecting questions that would be informative and only those questions that would be informative and stops asking once you reach a specified confidence interval averages around six questions actually the average would probably be lower but of course so far because it's still early we force a minimum of four in our cats so if you're going to do a cat we tell the computer ask at least four even if you're confident after two or three questions that you know where this person is just ask four that actually pushes up the mean probably not the medium but pushes up the mean closer to six in terms of items that are needed to be able to get an estimate of any one of these domains you can also use a fixed line form we have these available they're on the website by domain they tend to be 8-12 items in length that's 8-10 questions or you can do it by health profile we have what we call a PROMIS 29 which I'll show you in a minute PROMIS 43 or PROMIS 57 I'll show you what that is finally you can ask this global health remember I mentioned the healthy lifestyles CDC is working with NCI to measure in their healthy lifestyles 2020 survey using the PROMIS global 10 which is 10 questions around six or seven different domains anxiety, depression physical function, social health let me show you what cat how cat works and I'm going to just give you a sense of how quick this can actually be used and you can actually download the stuff and use it from the website in this particular case we have a cancer patient who has fatigue pain and depression in the moderate level of cinematology so it's a moderately symptomatic cancer patient and we're measuring with three cats a fatigue cat, a pain cat, and a depression cat so it starts by asking how often do you have to push yourself to get things done because of your fatigue this person said often this person tells the computer to go to this question I have trouble starting because I'm tired and this person says quite a bit again the estimate gets revised the confidence interval shrunk and drives this question I feel fatigued quite a bit how much have you bothered by your fatigue on average quite a bit at this point the computer is done with fatigue has the estimate and moves on to pain and asks how much do pain interfere with your day to day activities somewhat pain interfere with your ability to participate in social activities a little bit these are screenshots again you can go online and do this yourself ask about pain and pain with enjoyment social activities a little bit and how much do pain interfere with work around the home somewhat four questions it's done with pain I felt depressed sometimes felt hopeless, rarely felt worthless, rarely felt helpless sometimes that's enough after 12 questions four per domain remember I said we forced the computer to ask for so we did ask for we have confidence you see the confidence intervals around these estimates in the range of 60 to 70 remember mean 50 standard deviation is so this person has moderate symptoms and is exactly where you would expect this person to score in the range of one one and a half standard deviations worse more fatigue, more pain and more depressed then the US general population this is a cancer patient with these problems so this person is more symptomatic than the average person in the general population so let's say you provide a treatment you give them some medication you teach them about energy conservation for their fatigue and you treat the cancer that helps the pain they then get the same cats different questions how often do you have to push yourself to get things done because of your fatigue this person got this question it's always the same the first question by the way this person says sometimes I'm not going to drag you through the whole assessment but this because the person's response is sometimes a set of often triggers a whole different set of questions based upon the person's responses and another 12 or so questions you get you see that this person actually improved in terms of their fatigue, pain and depression this time takes about two minutes for the patient to do 12 questions takes about six minutes per question so that's about two minutes and it's very efficient I mentioned the profiles I said I'd show you this is the promised profile so we take two mental domains, anxiety, depression three, I'm sorry four physical domains fatigue, pain interference, sleep disturbance and physical function and one social domain social world satisfaction and we have four item short forms six item short forms and eight item short forms for each so if you take all the four item short forms put them together as 28 questions we add a 0-10 pain scale because doctors like that and providers like that we call it the promised 29 so it's 28 questions four times seven plus the pain to the 0-10 pain intensity scale it's a promised 29 it gives you a profile of scores so this common measurement has a lot of implications now we have the domains and these metrics this T score metric it means a lot we think for clinical research and for policy you can compare conditions on individual promised domains so for example, chemo trial diabetes trial, osteoarthritis trial heart failure trial even with two different classes of patients you can pick different item subsets you can do a cap with one or more of them and in every case you're getting the same fatigue metric and the same meaning of the score across these different groups sort of mission accomplished there in terms of the NIH request you can also start to compare fatigue scores for example of people that have different a lifetime history of different medical conditions this is actually the T score 45 of people in the general population we have no history of any of these conditions so if you have never been told you have any of these things there is actually 45 not 50 because remember the 50 is the general population that includes people with illnesses so this is kind of a healthy score if you will that makes sense it's half a standard age and less fatigue than the general population so that's the starting point but then when you look at these different conditions you can actually get scores and see how far off that is starting point people are whether they have depression, anxiety, migraine, COPD and so on same thing with depression the average for people with no condition is around 47 or 48 and then this gives you of course major depression is going to have the worst depression relative to people in general population with none of these conditions same thing with pain here at the top of the list are arthritis and migraine makes sense these are conditions we tend to think of as pain related so you are getting that comparability and you can also get profiles across these conditions so for example in anxiety disorder profile looks like this these are people that say that at some point in their lives have been diagnosed with anxiety disorder and they end up, if you plot them as differences from the average T score of the whole sample you see that anxiety and fatigue and a little bit of depression are worse in people that have a lifetime history of anxiety disorder and slightly worse down below the line to the left is bad on physical and social function but not that much different arthritis on the other hand is very much worse in things like pain pain behavior and pain in the face but also fatigue and in physical function also worse this would be the arthritis profile if you will and then this is the COPD profile like emphysema pain and fatigue and physical function also worse so you get a sense of different conditions having different profiles so not reliability it's nice to have a comparability but are we doing it with reliable valid tools that's important to show this is kind of kind of a busy slide and it's supposed to be animated but it's not so I apologize so bear with me don't look at any of that yet I'm going to animate it for you the y axis is error we don't want error now we're talking about reliability or precision the x axis is the severity the latent trait or domain is being measured in this case it's physical function remember mean is deviation U.S. general population on average ranges from 40 to 60 once standard deviation to either side of the mean remember that's always going to be the case on this metrics so this won't change if you ever look at the slide like this you may see some in the future this is where the general population resides mostly say roughly two-thirds of the general population right in that range 40 to 60 on physical function what this tells you for these different tests is how precise they are as a function of where somebody is on the continuum by the way this would be a comparison where we would write a situation standard design from 25 to 45 so much worse physical function than the general population so we take a questionnaire like the health assessment questionnaire let's start with that that's these red dots here the health assessment questionnaire was built by Jim Freese in Stanford he's actually a promise investigator joined in to promise because he believed that the health assessment questionnaire was a great questionnaire down here and not very good over here at all I mean that's exactly the case can you see the health assessment questionnaire 20 questions does a very good job what you really want to be is below it's even okay to be below this it's been a very good job with our guidance criteria the SF-36 is 10 items and it does a very good job but only in this kind of narrow range now the 10 item cap from promise remember the cap that's smart actually does a good job better than the 10 item SF-36 and covering much more of the domain or latent trait of SF-10 this is the 10 item cap here so you see that the short form that we built does a pretty good job better than SF-10 because we had the benefit of learning from the SF-10 but the real winner here is this, across the whole range of physical function from 15 out to 65 you're getting really good measurement with a 10 item cap and you don't get that with these existing tools like the SF-36 over here nothing against them for their purpose but again if you don't know your population and you want to be able to use the cap to narrow people in it does it very efficiently so this is the same type of slide so I won't go through that same tutorial again but just to say this is kind of the watermark for the goal of reliable assessment equivalent to reliability at .9 and we compared the 4 item SF-36 vitality scale which never drops below that line it never really gets precise measurement for the individual person to a 4 item cap which actually does for a fair amount of fatigue and here's a 13 item facet fatigue which I did about 15 years ago and this is the actual full fatigue item back so cap is beating out these kind of what we call legacy questionnaires well the precision is there the reliability is there that's part of why we're excited but if you're not valid then it doesn't really mean much so here's some evidence of validity just to run through a little bit this is a legacy questionary mood and anxiety symptom questionnaire it's used a lot in measuring anxiety and this is the promise anxiety bank now you'll notice the correlation of .81 so they're highly correlated that's good it means we're kind of measuring the same thing but you'll notice the distribution here is much more normal than the distribution here we're actually pulling people off the ceiling here these are people who have very low anxiety but they have some anxiety so it's also like the people who have no anxiety and people who have a little bit of anxiety whereas the promise anxiety bank does so it actually is probably doing a better job especially in the mild anxiety range but it correlates quite well with the existing questionnaire same thing with depression compared to the CESD of .84 correlation that's pretty high but you get rid of that ceiling effect that the CESD has and actually get a more normal distribution so we're happy about that is measuring the same thing now what about responsiveness because that's really the bottom line does it actually measure change and through all we want to be able to see what we're doing in healthcare is making a difference just from a few examples with cancer patients 310 patients these are outpatients getting chemotherapy the assessments were a baseline in two to three months so two assessments standard legacy instruments and I'm just going to kind of touch on these this is for physical function fatigue, anxiety and depression visually you can kind of take home the message that the promise fatigue the promise scales are pretty much performing in terms of the responsiveness to change in these cancer patients from baseline to two to three months in the same way that the legacy instruments are and in many cases they're about the same number of items 10 items and 10 items the period of promise is only 7 items compared to a 13 item fatigue so it's shorter but doing the same job of responsiveness same thing with pain you've got the cancer pain the promise cancer pain here in the blue compared to the comparing people who get better who stay the same in the get worse all is encouraging there now what about depression these are outpatient depressives treatment for major depressive disorder at Pitt at Western psychiatric institute and they have three assessments baseline one month and three months and here comparing and promise is always going to be in blue comparing the anger cat which had a median of five five and a half items compared to a 12 item aggressive question actually picked up more benefit more improvement over time in patients here again a median of four items anxiety compared to 11 items on the MSQ picked up more improvement with really less than fewer than half the number of questions here about the same actually a little bit less improvement but I'm not sure that's significant but again here median four questions for promise depression as opposed to 20 questions on the CESD so one fifth the length and comparable performance back pain patients now same kind of idea 226 patients these are people who are just scheduled for a dual steroid injection for back pain with sciatica baseline one month and three months and you see the promise pain interference the median number of items four comparing to the seven brief pain inventory pretty much exactly the same responsiveness over time showing improvement here a four item pain behavior promise scale outperformed the 24 item role and works in picking up change over time so we're pretty comfortable pretty happy with that I meant to flexibility all these banks now I'm going to move through this and get to the inclusiveness stuff in just a moment at least in time but there's a couple key points here on flexibility promise was built on a computer backbone computer assessment backbone web based computer assessment so you can't assume that if you put these things down on a piece of paper and have them fill them out they're going to be able to use the computer generated data as validation of a paper and pencil form it may be different people might respond differently to a paper and pencil survey than they do on one or they do on a computer even more so they might respond differently on the phone if they're listening to the questions and responding with a telephone so we felt the need to demonstrate if we could that the computer based data that I've shown you all of that was computer generated collected actually is comparable or comparably valid if you administer these by paper and pencil by telephone or by PDA type device so we have assessment center which is the computer based way of doing this it's a 508 compliant web based research management tool has a downloadable user manual you can do study specific URLs online consent and registration it's pretty cool you can do all this stuff online now it's free and you don't need to only use promise instruments you can actually put your own instruments you can go in register and set up a study and study your friends and family just don't tell the IRB unless you get in trouble but you can actually there is a consenting process available online so what's available on assessment center any of the items I've shown you those study item banks you can use all the short forms I talked about and the profiles are all available on assessment center the CAT report gives you information you can go online and take this and get this report generated for yourself you can even look at a plot you'll get a plot for your scores the confidence intervals around your scores this particular person red is bad and blue is good so this person has got a little bit of anxiety a little depression a little trouble with physical function they're not doing bad they're doing pretty well with fatigue so this person is bouncing around with lots of energy and is depressed and anxious a little bit I'm not sure you'd want to be in the traffic multiple ways to administer measures so I talked about this there's paper and pencil which we needed to validate there's the computer there's the phone there's the PDA to make sure that we test whether these different modes have differences in how people respond to the iron so we did this mode of administration study comparing them 921 community adults and arthritis patients we've selected fatigue depression and physical function as the measures and fortunately equal to one another less than one and a half points on it's not really a hundred point scale but it's that scale 50, standard deviation of 10 but if you think about it in a 50 standard deviation of 10 one and a half points is .15 standard deviation less than two tenths of a standard deviation we've powered the study to be able to pick up anything from two tenths of a standard deviation or more and here are the actual results so the average is 50 again I think we're probably that much in and these are the kind of acceptable limits if you will of scores and actually nothing even came close these are the actual scores this is for the physical function fatigue depression paper and pencil compared to PC physical function fatigue depression IVR compared to PC physical function fatigue depression PDA compared to PC all of them essentially spot on equal to PC administration we breathe the sigh of relief when we saw this because it meant we could feel comfortable to say to people it's probably okay to do this you might want to keep studying this into this context but our first look at modern administration was really encouraging we asked about preferences people prefer the computers now I'm going to move on is it rather for me simple to have someone read the questions a bit of course anything is permissible but it's a little risky because what I advise is to actually have the person looking at it visually so they're also processing it visually and not just auditoria so okay the promise has taken on a lot of interest that we hadn't anticipated we thought about but hadn't really planned for in health information technology in so-called meaningful use that is there's a pressure now for health providers such as UC Irvine to show that their use of health information technologies is meaningful creating a meaningful improvement to care and one way you can do that is by incorporating patient-reported outcomes so we've had lots of requests from providers to integrate promise with their electronic health record and lots of interest from payers to provide a level playing field for patient-reported outcomes because that's I think what promise represents for the one life most everything else out there and there are some forward thinking groups like this Gretzky group that like Wayne Gretzky who said the reason he scored so many goals was going and not where the puck was so this group, the Gretzky group is trying to follow his advice and go where the puck is going and these beacon communities that our HHS funded demonstration projects in health information technology advancement so we have this assessment center and we have all these health providers with different electronic health records and so it's not always easy for them because they're not necessarily compatible and so what do we do about that we've actually coded the promise items and scores and language into SNOMED and Loink codes so that they can then be connected to the electronic health record so it's also red cap I don't know what red cap is but that's the CTSA based language and clinical research so finally inclusiveness let's take it to a little bit of this what do we mean by that there are lots of challenges lots of language and previous questionnaires is inappropriate for more literacy respondents and we were very careful about not being the case as much as possible with promise measures not everyone speaks English so you need other languages there are few accommodations for special populations with special needs and assisted devices and answering questionnaires for example fill out questionnaires fill out promise questionnaires and there hasn't been very good measurement across the life course very often pediatric pain measures don't connect to adult pain measures fatigue, depression, etc there's a disconnect somewhere around age 18 all of a sudden the world seems to change so we've tried to tackle all these within promise and just to breeze through a little bit of this you probably know some of these statistics but there are 30 million adults who are below what's considered to be basic literacy in the US and over 90 million if you include the people that would be considered having basic literacy skills that's a lot of people 90 million people who are basic or lower according to the National Adult Literacy Survey so we wrote simple items elementary school reading level in all cases and did cognitive interviews with all the questions we made sure that people could actually say back in their own words what they thought the question was asking and if they didn't get it we also made sure that people we had to have a minimum number of people with high school education or less looking at each of these so every promise item was pre-tested and field tested and individuals were well literacy and then with our guided translations we have 35 million people that speak Spanish in the home in the US and we're counting for around 900 promise items and those are also available but not yet available on assessment center and they're not quite yet calibrated but they're available for your use and we've had requests for 33 different languages to be translated so we're getting there we have English and Spanish for now and we're doing a little bit here and there with German and Dutch but every item has already been reviewed by translation experts for what we call translatability and we know what questions in English just aren't going to work well and will tell us to stop or redirect and pre-word this because this won't work in other languages common example is I feel blue it makes no sense in French so we have we have physical function items that apply to users of assistive technology if the only questions we have are can you walk across the room and somebody is in a wheelchair it's not a very fair question there are some questions for people that use assistive technology that help them actually be able to show improvement in their physical function even if they're not ambulatory and then the life course issues we're covering across development from early childhood to aging and across ranges of dependency and autonomy as we try to create this sort of womb to tomb if you will probably more fair to say and we're trying to look our way down but that's the basics I think we're just pretty I'm going to skip over these because I think I've covered enough if you have questions about peds let me know that's just to remind you of the four things I tried to cover I think I probably went too long thank you for that I think it was on that assessment center slide I can't go to the slide because I don't think I'm online here but I can put the URL up on the screen because I think it's on the assessment center web yeah it's easy to find I promise we'll get to that yeah Google promise without the E I think even if you Google it with the E you'll get it it's one of the first maybe the first thing that comes up but let me just find it while we're here we go in terms of accessibility is the phone administered survey taken using a texting mode or a voice good question was through an interactive voice response so it's in this particular case it was human voice recorded as opposed to computer generated most computer generated voices are kind of creepy so we spent the extra money to actually use really nice work and it makes so much sense I'm wondering if you have any idea about what proportion of investigators who are doing this kind of work are using promise who are doing clinical research with patient record outcomes yeah patient record outcomes well that's a really you've got me thinking that that ought to be something we start to track because my guess now is the number is well below 5% yet you know it's the kind of thing that will keep growing and 2 to 5% maybe be a guess if you think about instruments as having a life span 7 years it takes 20 to 30 years I think of the SF-36 that was published it was our first published in 1991 but it had been around before that so that's 20 years if NIH really was interested in this they could make it part of an RFA that could give some preference to people who would use these standardized yeah they're a little you know it's a tough time at the NIH so you can tell you that they're not sure they have enough money to keep the grants going that they committed to this and they're all being warned about a 5% they don't have the, maybe the students don't know this but the NIH doesn't actually have a budget the fiscal year started October 1st and it's what, February 7th they don't have a budget yet they're going to get it soon but they've been warned that it's likely to be 2008 probably 2008 level again, because of this whole larger 2008 levels we're going to find a 5% reduction they're committed how do you stress in the promise? no, not in promise we have a measure of perceived stress in toolbox which is a sister of the promise toolbox is a we're actually using the perceived stress scale we made a decision with toolbox not to reinvent the wheel if there was already something good and efficient out there and perceived stress scale it's 14 items so we're currently testing the full portion that has been anticipated pretty darn good we didn't seem to need to probably should you mean like IOT modeling or something that would be comfortable to a promise score that we'd be able to map on to the PCL or the caps or some of the other no, we should there's not a plan right now it's a really good idea and there actually are some RFAs or PAs out there these days around stress so I noticed your three major domains are physical social mental health including environmental health because I'm in the PhD program and you say I do environmental health no, I love these questions you'd have to help me understand because that's what's been on the patient on the person's reports availability of parks proximity to liquor stores does someone smoke in your household those are all lots of big things we've been getting to partner around this obesity thing for example and also with some things that NCIS in terms of GIS mapping dealing with health disparities because it's a huge health disparity issue that if you're living in an area where there aren't parks nearby there's not a decent grocery store or all of that how can you be healthy so we're partnering with the GIS approach but we haven't do you normally use self-report because some of that's really knowing where somebody is GIS in the environment so to the extent that there would be person's perception of feeling safe or feeling they have access we could certainly do that we might put that under social social environment in any of the so I was glad to see the translation to 35 or so and you said globally it reminded me there's been a lot of papers on the World Health Organization QL which is 100 or 25 or so and that's almost two decades now how does this come I know this is based on the US population it's a good question we have done some actually been criticized for not using the ICF in any other classification which is we used the old World Health Organization physical, mental, social it was parsimonious and elegant in its simplicity for us and gave us a lot of room for the data to speak as opposed to the ICF which is really a framework for more than self-report so we've done mapping to that and there are areas where Promise is mapped to the ICF we need to do more in the WHO so there's interest in bringing these together to the extent they have different purposes to some degree ours is really focused on self-report it's a work in progress one other question about issues of diversity you said the Promise score has a mean of 50 years I got it the American population is extraordinarily diverse everything from Mexican-American European-American, African-American I don't need to go on does that mean and standard deviation applied to all of those sub-populations and if so do you have norms for the sub-populations derived from it no we just have the one set right now partly because to put too many out at once I think would be confusing and I don't need to be patronizing but people do get confused easily and it's not their area and also partly because we don't really have so much by group of people as we do right now by condition so for example I know that every time I've looked at specific illness groups standard deviations are lower it makes sense because they observe standard deviation so the score is set to a broad spectrum of people very diverse, really heavy changes general population but then when you go and you measure it in a specific group of patients there's standard deviations end up being lower I suspect we might see the same thing with Mexican I don't I just like to say five you don't have to go somewhere I just noticed that you have fewer items under the pediatric it's supposed to adopt one that caught my attention is pain behavior which you didn't have with the pediatric and I don't know if that's a population size or it just doesn't apply well no it applies it applies two explanations quickly one of them is that all of the pediatric work that I showed you was done by one side UNC as an independent project the way the first promise was structured was that there was a network which goes all of us as a coordinating center and two thirds of each site but each site also had one third to do a network study so all of the UNC all of the pediatric activity was done in the University of North Carolina so in fairness to them that they have any banks at all is pretty amazing so that's one one thing so some things you don't see as much of a piece but with regard to pain behavior that's the issue of the ability because it requires an ability to kind of reflect on yourself because the questions are like you don't want any pain and flinch or a garden cell or I get nervous when things are moving around the things that you do to guard yourself and protect yourself from the impact so the insight question I think is a little is a little more there with kids and we would tend to go to parents so I mentioned you know the parents play a big role in assessing kids pain behavior is important with kids but we probably won't have to go well we don't have an orange for you but we have an orange