 After studying nucleic acid programmable protein arrays or NAPA technologies and biomarkers, today Professor Josh LeBair will talk to you about how to discover biomarkers in context of breast cancer. Let me hint you that Dr. Josh LeBair is also a medical doctor specialized in breast cancer. So he brings that perspective not only as a researcher but also as a clinician to give you a real good understanding about how to use the technology platforms for a very relevant biological problem breast cancer. This lecture will be more like a case history where Dr. Josh LeBair will walk you through one of his approaches and how to do immune profiling using protein arrays. Let me welcome Dr. Josh LeBair for his lecture. We've talked a little bit about the production of NAPA, we've talked about the concept of NAPA, we've talked about the concept of biomarkers. So now I want to talk a little bit about developing biomarkers specifically in the case of breast cancer in particular and how we went about that and maybe illustrate some of the things that one needs to think about in doing that. So I think I already mentioned to you this idea that in certain diseases individuals produce proteins that may induce an immune response, particularly a B cell immune response that leads to antibodies and that those antibodies can act as biomarkers for disease. And so we've talked today a little bit about using biomarkers for diagnosis, for prognosis. In this case, this is not so much for biomarker use, but the fact that these are aberrant proteins in the disease might shed some light on the mechanisms of the disease. So the fact that the body responds to these as abnormal proteins might be telling us something that's important. And so you might look into this to understand the disease itself, hopefully you could use this to predict a treatment response in some cases or even help us develop novel treatment regimens and that becomes especially true if you believe the possibility of using these antigens to vaccinate the patient against the cancer. Can you induce a stronger immune response to kill the cancer? So you remember, I think we talked about this but I'm going to reiterate a few things in the next few slides, that the classic way to measure an antibody response is this assay called the ELISA assay, which stands for enzyme-linked immunosorbent assay, but nobody says that, they just say ELISA. And the idea is you put your protein in the wells of a dish, you attach the protein to the bottom of the well, you then add to each well the serum from a patient. If the patient has a strong response to that protein, then you'll get a strong response like this. If the serum has no response, you'll get no signal there. And so that tells you that each well tells you which patients had a response or not. The challenge of course, as I mentioned before, is that it requires a lot of protein to coat these wells. Some proteins are not easy to make and you're testing proteins by this method one protein at a time. And if you want to think like a modern systems biologist and you want to think at scale, you'd rather be testing thousands of proteins at a time. And so that was this, and then this time I have the picture there, so it's much better. That's the idea of these arrays. You take an array that has a couple thousand proteins on it, you add patient serum, and then certain spots on the array light up. And looking at the size, looking at the intensity of that response might give you a clue as to how strong that patient responded to that antigen. But you also get essentially a listing of all the different proteins that the patient blood recognizes. For this method to work, you need to know that this protein array platform is reliable. You need to know that when you run it on different days using the same sample that you're going to get the same answer. Because if you're going, it's one thing to do research and just hope that it worked pretty well and get some responses that you can then follow up in other studies. In this case, you're going to base your clinical decision on whether or not this is a predictable marker and that only works if the platform is reliable. Okay, so I think I went through this a little bit before. This is the classic way of making arrays, which is to purify the proteins and spot them on the surface. Right? So you take purified proteins and print them. It's a method that definitely works. People like Hung Ju have done this for many years. They purify proteins in high throughput. It has limitations. The amount of protein that you end up printing varies over several logs. So by several orders of magnitude, so much of the protein tends to be on the lower end of that spectrum, so very weak amounts of protein added. And then a few proteins maybe, you know, hunt 10,000 times more than that. And so you end up with an array that might have this look where you see some areas of strong signal, but then lots of areas where it's relatively black because there's almost no protein there. And of course, if you're doing a biomarker study and what your intent is to look at the signal of specific spots and there's very limited protein on those spots, then you won't know if the limited signal is because there was no protein there or if because it's a weak interaction. And so that's one of the challenges. So the approach that we came up with is this nucleic acid programmable protein array where we print the gene for the protein on the chip, and then we synthesize the protein inside you, capture it to the spots, and then display the protein after it's been captured. And of course, we store the arrays in this state here when they're unexpressed. So they're just DNA arrays and they're very stable. And then once we make the protein here, as you guys are doing in your experiments, within hours, we immediately probe it with another protein and test the fresh protein with possible interactors. This is the repository of clones that we've been making. And I think I showed you this collection here. This is the actual freezer that we have. And I went through all these various characteristics here. So this is the large collection. Of course, I also showed you this. This is taking a NAP array and we test it for total protein levels by looking at GST because all the proteins on the array have a GST tag on their C terminus. So if we get staining of GST, it tells us how much protein is present. And if we stain the array for one protein, we get that one protein. And I think we've been through all of these sort of advantages of this approach. OK, so let's talk now a little bit about rigor. How do we know that what we're looking at is good? Well, the first thing we want to measure is how well does the platform express all proteins, right? So we're going to take the array. We're going to test it for DNA binding, which tells us how evenly we printed. And we're going to test it for protein expression, which shows us that we're displaying all the different proteins and we're displaying them at a reasonable level. And then this is what that looks like. It's summarized quickly, but it shows you for membrane proteins, transcription factors, kinases, large, medium, and small proteins. The green line on the bottom is the level of detection. So if you're above that dotted green line, it means you can detect it. That means it's like five standard deviations above background. And then the top green line, of course, is maximum detection. And what you see is almost all the proteins fall between these two ranges. And it's only a single log. So no protein is present. That's more than 10 times than another. And the vast majority of proteins are within two fold of the mean. So they're all very close together. OK, right, I showed you that. So it's key if you're going to do a clinical study to know that there are no biases against specific protein types. If there were biases, that would be a problem when you try to do conclusions. So these are some of the things that we've done with it about the protein phosphorylation. And I'll probably do that in my last talk. I talked a little bit before about mapping protein domains. We talked about the protein interaction studies. And now I'm going to focus a little bit more on biomarker discovery. All right, so this is what we're looking for in biomarker discovery. So here is the DNA array. Here is after making protein. And here is after adding serum. Wherever you see a bright spot, especially a red spot, that means that the patient is making very strong antibodies to that protein. The color, by the way, is a false color. So these are not really red. What we do is we get a readout of signal intensity, a numerical readout of signal intensity. And then the software adjusts the image. And the color that it represents tells you which level of intensity that equals. If we were to do just grayscale intensity, then we would have pretty much only a 10-fold range. By doing different colors, we can cover a larger range. And so typically blue-green is weak. Green is stronger. Orange is stronger than that. Well, yellow is stronger than that. Orange is stronger than that. And red is the strongest. One thing that you notice when you look at this is that we have very uniform levels of protein. So these are all roughly the same intensity. There are some a couple of weak spots here, but for the most part, they're all very strongly expressed. And of course, we already mentioned that we've expressed them freshly. And then the key here is to compare cases to controls and look for proteins that are only present in the cases. So that's the first part. So now we know we talked about the advantages of making fresh proteins, getting different classes of proteins well-expressed, and looking at the proteins being folded by natural chaperone proteins and natural lysate. Now what we need to know is if I do this experiment today and I do it next week, will I get the same answer? And so the way we went about testing that was we created a control sample. And to make a control sample, we took several sera from different individuals, mixed them together to make a large volume of a mixed sample that would have a lot of responses. Now it has a lot of responses because we mixed it. And because we now have a large volume, we can use a little bit of that sample every day we do the experiment. So every day we do the experiment, we take a little bit more of that control sample and we get the data for that sample. And then we can compare that sample to yesterday and to the day before yesterday. And we can ask, do we get the same signal every day? And so I recommend doing that if you're going to do a clinical study to build a control sample and run that sample every day you do the study. So you can say that that day everything was working well. If we see that the control sample deviates significantly from what it looked like on the previous days, we'll throw out the data from that day. It's just not worth it. I'd rather have clean data. And so this is what that looks like. Here you see comparing day one to day two. I think you can see that these two arrays look very similar in their intensity. But more importantly, if you map all the signal intensities from all these spots on a dot plot, what you'll see is they all line up on the 45 degree angle here. The signal intensity from this day and this day for that spot is the same and so on and so forth. They line up pretty well along that line. So that's just comparing two days. Now imagine if here's a couple of other examples, slide A to slide B. Here's another one. We also often will print two spots for the same protein on the same array. And the advantage of that is that we can compare intra array. Do we get signal signals? And again, you see very, very good correlation from spot to spot within an array. So we have good signal intensity within the slide and we have good correlation from slide to slide. Let's look at it for the entire experiment. And that's what's shown here. What we did was I told you we took that control sample and we ran it every day we did our experiment. And then here's that same array on different days using that same sample. And then you can plot every day versus every other day using a heat map like this. So the heat map is set up so that if it were bone white, that would be a 90% correlation. And if it was solid red, that would be 100% correlation. So as you can see, no matter which two days we compare, nothing gets even close to white. Almost everything is medium pink all the way up to red. And so what that tells us is that no matter what day we compare to any other day, it's better than 96% correlation. So everything agrees really well. So that gives you confidence that when you use this platform and you measure samples, you're going to get the same answer every time. All right, so then one of the things that we want to do is we want to be able to validate the results we get from our discovery study. So we talked about this a little bit earlier today. The discovery study, the observed difference that we talked about, typically in the early stages of a biomarker study, you're going to begin by looking at a large number of possible variables. You're going to look at as much of the proteome as you can get your hands on. So that leads you to this issue of potential overfitting. But it gives you the advantage of having searched as much possible experimental space as possible to find any marker that's going to correlate with that's going to be predicted for your disease. So when we begin these studies, we're going to study these days in my lab maybe 15,000 different proteins. That's a pretty good list. Now, attached to that list is a price tag. Because by definition, even using NAPPA to get 15,000 proteins on a slide, we have to do 15,000 DNA mini-preps. And that's a lot of DNA mini-preps. It's certainly easier to do mini-preps than it is to make protein. But it's still a lot of mini-preps. And even if the mini-preps were two or three American dollars per mini-prep, to do that many mini-preps is nearly $50,000. Just to do that number. Not to mention the labor and all the time involved in preparing it. So once we've done our initial study where we compared, let's say, 50 cases to 50 controls. So that's 100 array sets of 15,000 each. When we get to the next stage, we don't necessarily want to test all 15,000 proteins. We've now eliminated most of those proteins. We now know that of that 15,000, 14,850 of them are probably not good markers. We can toss those out. So now I don't want to have to use my big chip that does all those proteins. I want to focus on 150 candidate markers in my next study. And so that's where this next platform becomes very helpful. Now we can come back to the ELISA assay. And so the ELISA assay says, OK, I can make individual proteins at a time much less costly than making 15,000 protein arrays, right? But I have to do them in a larger scale. And so one of the things that we developed in our laboratory was a way to make ELISA without having to do a whole protein purification from bacteria. And we call it rapid ELISA. And the idea is that we follow kind of a similar chemistry to the one that we use for NAPPA. So we put the plasmid into the well. We then, at the bottom of the well, we have a capture agent, an antibody that recognizes GST. We express the protein in the well. We capture the protein to the GST. We wash it away. And now we're left with a protein displayed in the well. So if you remember from my second talk, where I showed you those 96 well plates, the sort of early NAPPA, that's what we're doing here, but in a more routine way. And we can do this to the point where I think it costs less than $1 per well to do the assay. So in the big picture, that's pretty good, especially because we can set up one of these ELISAs for just about any protein within a couple of weeks. One reason we can do that is, by definition, if the protein was detected on a NAPPA array, then we know that we have the plasmid for that protein because we had to print it on the array. And the very same plasmid that we printed on the array, we can use for the ELISA. So the system is immediately compatible with moving from the array to the ELISA assay so we can very quickly set the ELISA up. So we add the expression plasmid. We make the protein, we add to the well, we capture it, wash away everything we don't need, and then we come in with serum to the well and look to see if we get a response. And now I want to just show you that the ELISA assay is also reproducible. And again, you don't want to trust any clinical studies where you can't show reproducibility. So here what we're doing is basically looking at a variety of different antigens. These are two different antigens, comparing them on two different days using the same assay, using all of these different samples, probably 96 different samples here. And again, you see that from day to day, you get the same answer when you use the same platform. This is within an assay and this is between two different, this is within a day on two identical plates, and this is between two different days. And then just showing you some examples and showing you that the correlations are typically close to 1.0. Means that they align very nicely. We've also spent a little bit of time looking at our detection limit. That's what's shown here using a purified amount of anti-P53 antibody. The assay's linear from 30 femtograms to 25 picograms per mil. So that gives you some sense, this is the range where we can operate very sensitively. All right, so then once you go through all of this, one of the things that you have to do when you develop a platform to start studying these things is you have to start thinking about what are my quality control checks? How am I going to make sure that as I do my experiments, everything is working the way it should be working? And in particular, if it's not working that way, then you have to jettison that step and go back and fix it before you move on. And so a good biomarker study involves a lot of QC. And so we have criteria at every step in our flow path that we follow and make it to decide whether to move forward. So I already mentioned to you before that we do clone tracking electronically. We do end-to-end DNA sequencing, so we have to make sure that the gene is correct by sequencing it from one end to the other. And I think I mentioned that it has to have less than two amino acids different. So we will accept up to one amino acid substitution before we move on. When we make arrays, we do a number of things. First of all, we make sure that there is greater than 300 nanograms per microliter of DNA for 95% of the genes that we're going to print. So before we even print, we look at the plate that has the DNA in it and we make sure that we have an adequate amount of DNA for every protein. If we don't, we go back and we fix the ones that are broken or that are too low. Then when we print, we express the proteins using antibody, using anti-GSD antibody to make sure that there's protein present and we ensure that 90% of our spots have more than two million relative fluorescence units on it. So this tells you that we have good protein levels and then we look at two different arrays from a printing batch and we assure ourselves that they agree with each other by better than 95% in a correlation coefficient. So we're checking to make sure that we're adding enough DNA, that we're making enough protein and that the features on the arrays from one array to another agree with each other. So we do that for every experiment we do. Okay, and then when we do serum screening, we run the common control sample. I mentioned to you that already and we show that from day to day, the correlation is better than 90%. If we're doing a rapid ELISA, then we make sure that we have at least that amount of protein concentration when we make them proteins and then we make sure that the assay variability has a CV of less than 15%. And then these other things are basic routines. One of the things I will caution you about is this little subtle point down here that a lot of people forget to do and it can really come back and bite you if you forget to do it, which is to randomize your samples when you do your assays. I can't tell you how many students I've had come to the lab and they will do all of the cases today and they do all of the controls tomorrow. They think, all right, I did it right. It's like, no, because they will get big differences. And they say, I found something really cool. And it turns out the difference is because they did all of the cases on one day and all the controls on the other day. And there are subtle variations to this. Maybe you do everything the same day but you first load the cases first and then you load the control second. And so the first ones that run through the machine are the cases and then the next ones that run through machines are the controls. So you have to make a concerted effort to make sure that you mix up everything, that there's an even distribution of cases and controls in every step you do and that there is no order bias or plate bias or day bias to the cases or controls or else you will end up chasing your tail. You'll think you found something really good and then you'll discover a year later after investing all of your time in it that it was an error because you just didn't load them in the right order. So don't miss that. I had a postdoc who chased his tail for four months thinking that he had found something really cool. And when we got down to it, it was because he ran the case plate first and then he ran the control plate second. And the control plate sat for 10 minutes while the case plate was getting read. And that was the difference. You can't do that. So I'm sure after attending to the lecture, you found this very interesting. You have seen how auto antibody responses for the patients could be measured using protein microarray-based platform, especially NAPA technology, which could be useful for early detection of breast cancer. The intensity of signals, they show how strongly a patient responds to a particular antigen. In a single experiment, you will also know how many antigens a particular patient recognizes. You now know the importance of testing the reproducibility in microarray experiments within the slides, different batches of slides, as well as your day-to-day variations in assay. All these performance has to be recorded compared to test out the reproducibility of your data and your experiments. Protein microarray platform is definitely very robust technology, but your assay has to be reliable and you need to document the quality control checks and the data to provide the significance to provide the confidence to the reviewers and to the clinicians who want to take your lead forward for the patient care or actual biological applications. You studied about the challenges one would encounter while developing a biomarker and how to find solutions of these problems. Finally, today you learned about rapid ELISA, a routine technique which is developed to measure the patient's response to any antigen, but at very low cost, usually less than a dollar per well. This topic and lecture by Dr. Josh LeBair will be continued in the next lecture. Thank you.