 We have with us a distinguished faculty Dr. Josh LaBeurre, he is another pioneers of the field of protein microarrays especially NAPA technology, nucleic acid, programmable protein arrays. Dr. LaBeurre also is the key leader of biomarker discovery programs and he also leads one of the program operated by the Early Detection Research Network or EDRN in US for the biomarker discoveries. So he brings lot of his expertise, his experience of both technology development as well as how it can be applicable for the clinical problems and especially biomarker discoveries. Imagine that you know we are working let us say right now in Indian context I am in Mumbai based and we have the samples coming from Maharashtra from you know different hospitals from Kata Memorial and KEM and Hinduja various local hospitals here. Now our population is very restricted we are talking about people only coming to these hospitals and trying to look at in a given context of a given disease what kind of proteins are being changed and let us imagine that you know that particular protein a given protein looks pretty interesting which looks very uniform in a given disease context in this population base. But if you think about can we claim that protein as a biomarker I think that it may not be the right claim right. So how to claim that you have a good biomarker of course a biomarker should be generic it should be global and it should really work in you know variety of clinical settings. So validation is really really required that is the key for making the success of any potential candidates to the right you know biomarker for the clinical and therapeutic interventions. Today Dr. Josh Nevere is going to talk to you about some of the details about biomarker and validation strategies. Okay so the first thing that most of us as scientists would do when we do a biomarker is we would observe a difference. So you remember those two graphs I showed you that is the first step you take a bunch of samples you know cancer samples and healthy samples or early stage and late stage or whatever your comparison is you measure something and you see that the value of that X is much bigger here than here and there's a difference. You say wow okay and the first thing you have to do is say I don't have a biomarker yet because you don't have a biomarker yet but you do have an observed difference and the type of statistics you might do are simple statistics. You might do a T test you might do a Wilcoxon rank test something simple to confirm that those two values are different but that's not a marker yet. So now how do you go about getting a marker? So the next step is you need to say okay I think I have a biomarker now I need to do a larger scale comparison I have to look at more people right and so we would call that a candidate biomarker and we'll do a comparison between properly matched cases and controls. So how do you match the controls to the cases though? Age right, gender right, those are the two big ones I would say maybe as you pointed out the population right so you're not going to take a bunch of people with HIV in Africa and compare them to a bunch of Americans who have no HIV that would not be a fair comparison to two very different populations. So you need to you know try to stay within the same communities, same age group, same gender group, ideally the best matching of cases and controls would be the same group of people that go to the same hospital except that this group has the disease and this group doesn't so that's they're coming from the same population and then that's what we call a matched population sometimes like in their specialties like in the case of a cancer, a lung cancer study you would want to make sure that the cases and the controls had similar smoking histories right because you don't want to be finding a marker that predicts smoking you want to find a marker that predicts cancer so you have to consider your cases and controls carefully you match them and then the first thing you do is determine how many cases and controls you need to study and how do you do that? How do you figure out how many to study? Power analysis yeah yeah you need to you need to get a statistician to help you do what's called a power analysis and a power analysis is a statistical mathematical study that takes into account how big a difference in the value you expect to see, how prevalent the disease is in the population, you know how narrow the variation is in the measurement that you're making does it vary a lot does it vary a little it takes a lot of these things into account they do some mathematics and they'll say you know what for the difference you're trying to achieve you need to do this many people cases and controls typically when they say the difference you want to achieve the way they will phrase it is if you want to detect a difference in with 80 percent certainty this is how many you have to study and so you have to say at what level you're willing to say I might miss it so you'll say I'm willing to do this I'll do this study if I can get it 85 percent of the time so that's what a power analysis is if you see a study where people are doing biomarkers and they didn't do a power analysis they didn't do it right and I can tell you that 85 percent 90 percent of what comes to my desk as an editor they never did a power analysis right and so that's a real problem alright then you're gonna then you're gonna eventually measure sensitivity and specificity and we're gonna come back to that we're also gonna talk a little bit about the receiver operating characteristic curve analysis and false discovery rate compensation but all of these types of mathematics will come into play when you do this first candidate biomarker study okay so you did your study and you got it looks promising so you get a marker and it has let's just say 85 percent sensitivity at at 95 percent specificity so are you done can you publish no what do you have to do next so you did a study you did you did the power analysis you compared the populations you found a biomarker it has 85 percent sensitivity what do you have to do well you certainly could look for other people who have done the same work but the simple answer is you have to repeat the study because you're gonna get markers right typically many of us are gonna be studying thousands of variables if I on my array right on the NAPA race we have now maybe 15,000 proteins right so let's say the chance of fight let's say that the if you the p-value that people often say is point oh five right five percent so the chance of finding that value by chance alone is five percent that's what it means when you set a probability a p-value of point oh five so take five percent of 15,000 how often am I gonna find a biomarker by chance alone quite a bit right just by chance alone when you study a lot of variables you're gonna get you're gonna get a marker that works all right so the first thing you have to do when you get markers that look promising is test them again on another population and that's what's here so you repeat the study you verify the marker and it's important in this case to use a completely different set of patients and controls and that's important why why is it important when you do the second study to use different people prevent redundancy okay maybe expand that a little bit so you've already shown that that marker works for that population for whatever reason that marker let's assume you did their study carefully separates cases and controls the question you're asking in this study is is that a general fact or is that just happened to be a random chance for that one population so by doing it in a different population you are verifying that in fact it really is for the disease and not just by chance alone so there's a famous story in proteomics some of you may remember this but at the beginning of this century there was a pot there was a paper published in the Lancet it was a proteomics paper and they developed a blood test for ovarian cancer and it was based on mass spectrometry and they predict they claimed that they had nine a hundred percent sensitivity and ninety nine percent specificity astonishing numbers anybody who knows anything about biomarkers looked at that and said bullshit that's not right there's no way that you could get a hundred percent sensitivity biology is not that predictive well so they got a lot of press whole programs were started at the NIH around it huge amount of excitement it was a big deal that proteomics had solved the detection of ovarian cancer and it all failed it was a huge miserable failure and it set back proteomics by a decade because people stopped funding us because they said that we make claims that we can't support and one of the fundamental mistakes that they made in that study was in their validation step they use the same control group they did use different cases but they use the same controls and so they didn't follow the rule that this group has to be different from that group and consequently for whatever reason that control group had a defined pattern that was definable as control and that's what allowed their biomarker to work but it was just random chance it had nothing to do with ovarian cancer and so that was a huge error so you have to be careful about that so that if you get to this point and your marker still holds up now I think you're ready to publish at this point you can say I've got a verified biomarker this is worth telling the world about and and then you can send it out for you send it out for review I will tell you as an editor for JPR if I don't see this I don't even review it I send it right back to the author if they don't do a validation study they're out I just I won't even look at it all right so then I'm sorry it could be as long as different people different people they can be from the same hospital they have to be different controls and they have to be different cases all there can be no overlap in the people no same disease but different people yeah so for example if let's say you you have a you have 200 people with ovarian cancer at your hospital and you found 200 women with you know that are good controls you could split them into 100 cases and 100 controls and do your first study and take the second hundred and the second hundred and do your verification study that'd be perfectly good design okay so after you get your verification you still have a long way to go to get a validated marker now you have to do what's called a validated biomarker study these sorts of studies are typically a level past most academic labs most of us can't do these studies they have to be done very formally these studies should be done under what's called either clear or good laboratory practices certification they should be large studies they should be blinded studies blinded means that the the scientists who are measuring the values do not know who has the disease and who doesn't right and and all that is hidden in the documents they have to make their predictions based on what they set back here typically this should be a prospective study what do I mean by prospective study right so what does that mean that's right so you're not looking at samples that you collected last year you're collecting samples in the same manner that you would be doing it if you were treating patients you collect the sample and you test it and you see whether it predicted properly or not and then you you need to do these in more than one location if you get this done what that tells you is your marker is truly predictive it really does predict the disease that's great that is already something to be very proud of now you have a marker that predicts disease are you done you can tell you're not done because there's still space left on the slide right right so there's still more to do right so just because the marker tells you that the patient that can predict the disease you still don't know if it will be of clinical benefit using that marker and so the next step you have to do is what's called a utility study you have to ask if I use this marker on a population will it tell me something that reduces mortality or morbidity in that population because I detected the disease early okay and so here what you do is the same thing as here randomized blinded study prospective study but in this case you're doing it as an intended use you're measuring you're measuring it you're predicting an outcome you're telling the patient and you're acting on the prediction and you ask the question in those people with whom I use the marker did they have a better outcome than the people who did not use the marker did the markers save lives did the marker reduce disease and this is where a lot of markers fail so some of you may be familiar with this marker called CA 125 which is a very good marker for ovarian cancer there is no doubt that CA 125 levels correlate with ovarian cancer that markers used all the time as a disease progression marker to monitor ovarian cancer it's quite specific the problem is if you do CA 125 to detect cancer it you don't see any better outcomes and the problem appears to be that by the time the CA 125 levels are measurable the cancer it's already too late they it doesn't come up early enough and so it's it's a predictive marker so it fits it succeeds here and it fails here if this works here then you get an approved marker and now you're in good shape I can tell you that this whole process is is very long very expensive and has only been successfully done a handful of times okay so so what are the skill sets that you need to accomplish all these tasks right and so that's what's shown here and this is just to emphasize that to get a good marker you need a multidisciplinary team there's no way around that so you need to have early on you need to do these first sort of studies you need people with molecular and cellular biology experience throughout the study but especially at the beginning you'll need genomics and informatics as you go further into the study you need good statistics you need to develop strong robust markers that you can that do in the clinic you need good analytical chemistry obviously you need good clinical understanding and understanding of epidemiology and then when to when to use these markers depends on looking at health policy so at different stages of the game you're going to need different experts but throughout the whole process you're going to need a lot of experts okay so yeah well how you do it varies a little bit it depends the way you do the power analysis depends on the study and what goes into it so for example oftentimes when you're at this phase you might be doing protein arrays or you know next-gen sequencing or some kind of large scale omics scale study where the number of variables is very large and the type of power analysis you have to do with large variable numbers is different than if you're testing doing a power analysis for just one marker that you have as a predictor in this case you may have to do modeling statistics to get a good predictor you might have a simple formula you could use over here but the idea is the same it's just the execution is different okay so where does this go wrong so this can go wrong in a lot of places and it does all the time so the first mistake is you you you discover some kind of a difference but without defining a clinical need you haven't defined the clinical need your difference may be meaningless or maybe useless people often do inappropriate statistics on these candidate biomarkers they'll look for p-values instead of doing proper biomarker statistics people don't do and they do what's called an underpowered study what's an underpowered study the what the sample size is too small yeah exactly the sample size is too small and there's two consequences to that the first consequence to that the most common and historical consequence is that if you if your sample size is too small then you you run the risk of missing a good marker because you didn't study enough people you didn't you won't have as enough of a chance to find the marker in the modern era the problem is a little bit different the modern era the problem is these days we don't study a few variables we study tens of thousands of variables and so in the modern era an underpowered study usually means that you're going to find differences that are meaningless you're going to find random chances that this gene is different from that from in the cases and controls and it's not really to the cancer at all because of what's called overfitting overfitting is statistically finding something that isn't really real and it's a huge problem in our field I can pretty much guarantee you if you see a paper published and typically they're published in the best journals science cell nature you'll see a paper published next week a month from now on a marker that has a hundred percent sensitivity and 99% specificity and if you look carefully they probably overfit because no markers are ever that good okay so failure to account for overfitting I just said it you heard it here first using inappropriate samples poorly selected controls so people don't carefully match the controls so for example I've seen studies where people used a bunch of cases from one location and then they ordered their controls from a company and then they compared the two and said I found a marker I can tell the difference well they can tell the difference between samples that came from the company and samples that came from this hospital they didn't necessarily find the disease in fact if you know Paul Temps he's a proteomics researcher at Sloan Kettering Paul did a study where he was looking at prostate cancer he was trying to replicate the kind of approach in that ovarian cancer study I mentioned earlier that totally failed but he was trying to do it right and what he found was he was looking at prostate cancer and he found a marker that was remarkably good at predicting prostate cancer but you know credits to Paul Temps because he he looked a little harder and what he realized was that the prostate cancer samples were all drawn in blood samples from men who were about to get biopsies they were all in the hospital and they were going to get biopsies and the samples that came from the controls all came from the outpatient clinic and it turned out that the two two locations used a different manufacturer of the blood tubes so the blood tube type was a little bit different and when he did all the analysis what it turned out was he had found a really good biomarker blood tube type and nothing to do with the disease at all it had to do with the types of the tubes that it came with so you have to be very careful and so and then people often fail to develop a good robust reproducible assay if you're going to do the kind of late stage validation here you need to have a good assay for that some many people forget to do this study here or they don't do this study here and so that that's that sort of summarizes some of the major problems that that you can encounter ok so lots of challenges finding a good clinically useful biomarker is very rare these days in the US on average maybe one to two biomarkers a year will succeed in making it through the FDA so this is very very challenging and that's combining all the work of academia and industry all combined that's all we get I would argue that the biggest change the biggest challenges of the biology itself it's very hard to find a molecule that specifically can predict the outcome of a patient so you have to look extremely hard to find it but journals don't publish negative results and so oftentimes people don't realize when when markers are bad and so they end up you know only publishing bad biomarkers no one likes to do validation in fact in an NIH in the US it's very hard to get funded to do a validation study so let's say you do a good biomarker you have all the best intentions you do the object you do the observed difference you do the initial study and then you do the verification study and you say okay now I want to validate this marker the response you'll get on your grant application almost always is well you've already studied this marker why do you want to study it again and you say because I want to validate it they're like no no you already studied it you're done it's like no I'm not done so that's exactly one of the problems that we face all the time all right so let me move on then so nonetheless the public really expects to see these results and that's partly because there's thousands of papers that report good bar biomarkers and there's usually only one good one per year and so everybody thinks that it's easy but in fact it's really hard so that's kind of a take-home message all right so just to conclude Dr. Josh Rebaer has talked to you about different basic consideration how you can be confident that a lead which you have identified as a you know potential protein candidate whether you can term them as a biomarker what type of test you should do both from the statistics point of view as well as the right clinical assays in the clinics in the labs which can ensure that the candidate which you are identified that is actually potential biomarker so these basics are very important for you even if you are a student or you are a researcher who are planning to be involved in the biomarker based programs I think you know your strategies thinking about the power calculation the statistics looking at the sensitivity and the specificity of the biomarkers as well as your plan to do validation of the candidates becomes very crucial I hope these basics are really giving you new insights about how to now utilize this understanding this knowledge for the actual clinical applications thank you very much