 Dean Chang is the John A. Edwardson Dean of the College of Engineering. His research received a 2013 Allen T. Waterman Award. His online courses and textbooks reached over 250,000 students, a small number. And he co-founded several startup companies and a nonprofit consortium. So Dean Chang. Let me see if I can turn this off also. Great to see all of you here. And welcome to this installment of the Purdue Engineering Distinguished Lecture Series. As we aspire to the pinnacle of excellence at scale, we're delighted and honored to invite a number of just incredibly distinguished colleagues from around the world, mostly from academia, some from industry. And today we are delighted and honored to welcome Dr. Arup Chakraborty from MIT, who is the Haslam Professor of Chemical Engineering there, and also the founding director of the Institute for Medical Engineering and Sciences at MIT. Today's distinguished lecture I think is very special in three ways. One is the very subject matter. How do we hit HIV where it hurts? And secondly is just the incredible scholarship of the lecture today. Not only is Arup a member of all three national academies, National Academy of Engineering, National Academy of Science, National Academy of Medicine, and also American Academy of Arts and Sciences, but also he was elected into the NAE and NAS for two completely different sets of contributions. And that is just unheard of, a reinvention of one's own research career, a true inspiration to all the bottom-maker engineers here. Now thirdly is also because of Arup's leadership, both at Berkeley where he was a distinguished professor and chair of chemical engineering, and then since then at MIT where he founded this Institute for Medical Engineering and Science. And as Purdue Engineering starts the deep partnership with IU School of Medicine, and we had this partnership announced just about two weeks ago across research, teaching, students, faculty, and translation. We have a lot to learn from our friends and colleagues over in Cambridge, Massachusetts, with leaders such as Arup. A warm welcome. Thank you so much Arup. Thank you very much. Well thank you very very much for the very kind words of introduction. Probably best after that type of introduction not to give a talk because things can only go downhill after this, but nonetheless I'm delighted and privileged to be here and so I thank you also for the invitation. So as was mentioned in passing, the preponderance of my work over the last 20 years has been focused in trying to understand how the immune system functions and then trying to harness that knowledge toward practical ends. And by way of introduction let me say that we have an adaptive immune system which is remarkable in that it enables us to mount pathogen-specific responses against a diverse and evolving world of microbes including your ability to mount a pathogen-specific response against pathogens that may not have evolved when you were born. To roughly see how this works, consider infection with a virus. When a virus infects a host cell, it hijacks its transcriptional machinery enabling the synthesis of new viral proteins and the assembly of new virus particles that can go on to infect other cells. Important players in adaptive immunity are B cells. Most B cells have a receptor on their surface that's distinct from the receptor on another B cell. So if the receptor on this particular B cell can bind sufficiently strongly to the proteins that make up the spike of this particular virus then biochemical reactions cause it to get activated. It starts to proliferate, that is make many copies of itself and then a Darwinian evolutionary process occurs in a very short time that I shall describe a bit later that ultimately leads to secretion of soluble and even more high affinity forms of this receptor and that is what is called an antibody. These antibodies can then bind specifically to the spikes and strongly to the spikes on the surface of this particular virus and then dispose of it in a number of different ways that I shall not describe today. But antibodies principally act on free virus particles in your blood or extracellular spaces. Now some of these viral proteins are chopped up into short peptides about 10 amino acids long and displayed on the surface of these infected cells in complex with human proteins called MHC proteins. We each have 6 to 12 types of MHC proteins but there are thousands of variants in the human population and different MHC proteins tend to bind different types of peptides. Now T cells are also important players in adaptive immunity. Again, most T cells have a receptor on their surface that's distinct from that of another T cell and the receptor on this particular T cell can bind sufficiently strongly to this viral peptide. It gets activated, it starts to proliferate and then it can coordinate an immune response in a number of different ways. For example, certain kinds of T cells called CTLs are such that if an activated CTL sees another cell displaying the same peptide that originally activated it, it can secrete products that will kill this infected cell and the virus particles that it harbors. Now after you have cleared an infection, most of the B cells and T cells that proliferated in response to this particular infection die but a few remain as so-called memory cells which can mount robust and rapid responses upon reinfection with the same pathogen. This pathogen specific immunological memory is the basis for vaccination. With a vaccine, you try to induce memory B cells, T cells and antibodies that are specific for the bug against which you wish to protect the host. Now, a barrier to the quest for mechanistic understanding of these processes is that they involve cooperative dynamic events with many participating components that have to act collectively in order for any given phenomenon to emerge. Moreover, these cooperative processes span a spectrum of scales from many molecules in a single cell to many cells in tissues to the scale of the entire organism and indeed populations of organisms that are evolving to try and evade our immune system. So this hierarchy of cooperative processes with feedback between the scales can often make it difficult to intuit underlying mechanism from experimental observations. And further confirming intuition is that many of these processes are inherently stochastic in character. So over the last two decades, what we've been trying to do is help confront these challenges by bringing together approaches that have traditionally been stovepiped in different disciplines. In particular, what we have tried to do is bring together theory and computation that's rooted in the physical and engineering sciences with the diversity of different biological experiments and clinical work. Now, a conceptual framework that we use a lot in our work is statistical mechanics. Its central goal is to understand how fluctuating interacting microscopic objects can conspire together so that coherent mesoscopic and macroscopic behavior emerges. So given some of the challenges and immunology that I just described, perhaps it is not so surprising that bringing these apparently disparate disciplines together has been fruitful. Now, I should point out that these experiments and the clinical work that can test predictions that emerge from the theoretical and computational work have often been carried out in collaboration with diverse immunologists around the world. And indeed, with the key assistance of these collaborators, indeed my immunology teachers over the years, we've had some luck in shedding light on a diversity of different questions that span the scale from molecules to human disease, and the latter will be my focus today. So except in certain parts of the west coast of this country, it is not a matter of debate that vaccination has saved more lives than any other medical procedure. But today, some pathogens have evolved that defy successful vaccination using the empirical paradigms that were pioneered by Pasteur and Jenner over two centuries ago. HIV is a prominent example. What is required in order to eliminate such courageous from the planet is rational rather than empirical design of vaccines based on a mechanistic understanding of the pertinent virology and immunology. Now, a major barrier to the quest for a vaccine against HIV has been the high mutability of the virus. So in these phylogenetic trees, the number of ends corresponds to the diversity of circulating strains. And you can see in this diagram that is to scale the diversity of the flu virus in the entire world in a particular year is comparable to the diversity of HIV strains in a single infected individual and completely dwarfed by the diversity of strains circulating in just one region in Africa. So this high mutability presents a challenge for vaccination because HIV can mutate to evade vaccine-induced antibodies and T-cells to get live ready and waiting for certain strains because we have a specific immune system. I'm briefly going to tell you today about some of our work on both the antibody and T-cell arms of this challenge beginning with our work on T-cells. So because of the high mutability of the virus, there has long been interest in trying to focus a vaccine-induced T-cell response to target single residues of HIV proteins that appear relatively conserved when you look at sequences of the same HIV protein derived from virus samples extracted from diverse patients. The idea, of course, is that if a T-cell attacks such a residue, then HIV will have to evolve a mutation there to evade the T-cell pressure, but that should likely make the virus replicatively unfit because why else do you see that particular residue relatively conserved from one strain to another? But because of the high mutability and replication rate of HIV, this strategy can be blunted because often it can evolve other mutations, so-called compensatory mutations, that can partially restore the fitness cost incurred by making the first immune-evading mutation. So now you have a virus that's almost as fit as before and it has evaded this T-cell response. So the point is, if you want to define the mutational vulnerability of a virus like HIV, you have to determine these collective compensatory pathways that the virus uses in order to evade human immunity so that you do not target them with a vaccine-induced T-cell response. And you also want to determine those combinations of mutations that the virus cannot make simultaneously and remain viable so that you do target them with a vaccine-induced immune response. Or in short, you need to determine the fitness landscape of this virus. And in order to illustrate what I mean by a fitness landscape on a two-dimensional slide here, just for illustration, I have to consider a cartoon virus that's made up of one protein with two residues, otherwise I can't draw it here. So on this axis, every grid point is a mutant amino acid that may arise in residue one. And on this axis, every grid point is a mutant amino acid that might arise in residue two. So for this cartoon virus, any point in this two-dimensional plane is a mutant strain. And information about its ability to replicate and propagate infection is shown on this third axis. So you can see how this information of fitness or the function of sequence will trace out hills and valleys, except, of course, in a much more higher dimensional space than what I have illustrated here. Now, knowledge of this fitness landscape can be very useful for vaccine design because with a vaccine-induced T-cell response, you want to push the virus off of one of these fitness hills and force it to make imunivating mutations that correspond to one of these fitness valleys. And you also want to block these mountain passes that correspond to the mutational pathways that the virus takes to go from one hill to another when this one is under immune attack. And armed with this fitness landscape, you can then design the immunogen or the active component of the vaccine, not to behold proteins of the virus as it's typical, but one that contains only one part of the proteins of the virus, those that satisfy these three criteria. They minimize regions that are right with these compensatory pathways. They maximize regions where multiple mutations are especially deleterious so that that is targeted and the compensatory regions are simply not there to target. And they maximize regions that can be displayed as peptides by people with diverse people so as to get broad coverage of a population. It is this very practical goal is what led us to try and convert the data of thousands of sequences of HIV viruses into knowledge of these fitness landscapes. So this work has been going on for a while and suffice it to say that the people in the top row are all informal lab members who have their own faculty positions now and I shall mention those in the bottom row here as we proceed. So the way that we do this is the following. Each viral sequence can be represented as a vector because it's simply a list of amino acids and different residues. Let's call that vector Z and let us first set a more easy goal of trying to determine what I call the prevalence landscape of the virus that is the probability of seeing a particular sequence in circulation. The sequence data contains this information. The sequence data also contains information and the probability of seeing single mutations at every residue, double mutations at every pair of residues, triple mutations at every triplet of residues and so on and so on and any mathematical model that can recapitulate these mutational probabilities is a legitimate model for P of Z. And then extending and abstracting methods that were developed in other contexts such as in neuroscience, we asked what is the least biased model for P of Z that by construction fits the one and two side mutational probabilities. And then exploiting the connection between statistical mechanics and what electrical engineers developed as information theory, least biased is interpreted as the maximum entropy model. Let me just take you through two and last pieces of algebra for this general talk. If ZK is the K at sequence, P of ZK is the probability of observing the sequence and that is what we seek and the entropy of that probability distribution is given thusly and we want to maximize this entropy subject to constraints that it is normalized. It's predicted one point mutational probability is the observed one. It's predicted two point mutational probability is the observed one and we enforce these constraints through Lagrange multipliers to show you that P of Z then has this familiar form from statistical mechanics with a Hamiltonian given like so and we fit the fields and couplings here to match the one and two point mutational probabilities. Now I should mention, interestingly, that this model, because the GIJs have different signs by the way today people would like to call this machine learning which it is and these signs of GIJ have both plus and minus signs so it actually has a beautiful analogy with models that half-field developed for memory in the brain and we have exploited these connections but I shall not describe them today. Rather, let me say that the model we obtain here is a model of prevalence. Now you could say that the more prevalence strains are also more fit otherwise why else are they more prevalent? The relationship could be highly complicated because the sequences that we used to obtain this prevalence landscape were not sequences of viruses growing in petri dishes. They were viruses that were growing in humans and there was a battle between the immune system of this human and the virus and the virus may have been forced to make a mutation to evade this TESOL response that made it intrinsically less fit but it became the fittest virus in this person. Now we seek the intrinsic fitness landscape because we want to then push and pull on it with a vaccine-induced immune response to corner the virus. So without the details with my colleague Maren Cardar and former student Karthik Shaker soon to be a faculty member I think in Berkeley we used a variety of methods developed in completely different contexts in order to address this question. And the answer that we got from all of this that the prevalence and fitness for HIV were not the same but for HIV there is a simplicity that is the rank order of prevalence and rank order of fitness is the same and that's all we care about. We need to know which one's fitter than which one, which one's less fit than the other one, that's all. So I'm not going to take you through how we did it except to note that we were able to map this problem on evolution in the population of viral strains to problems that have been developed by Richard Feynman in totally different contexts and so forth. But rather what I want to focus on is what this analysis revealed about the reasons why this relationship between prevalence and fitness was so simple for HIV. And there are three principal reasons. One is I told you that there was a great diversity of these MHC genes that present peptides. So if you go and look in the clinical data you will find that no particular region of the virus's proteome is targeted by a significant fraction of people because each person is targeting a different region. The other is also the clinical observation that if I force the virus to make a mutation that incurs a fitness cost to evade my T-cell pressure and then I infect another person who then targets a different region because HIV is a chronic infection over time in this second person the original fitness incurring mutation will revert. So there's a scrambling back to where it was. And finally and very importantly, unlike influenza the population of HIV has never been subjected to a few effective classes of vaccine or natural vaccine induced or natural immune responses. So it has not, the population of HIV unlike influenza has not evolved in narrowly directed ways in order to avoid past herd immunity. And you can see this from looking at those phylogenetic trees of HIV and comparing them to flu. The flu phylogenetic tree looks like a flowing river with a few branches coming off it. And I showed you the HIV tree is like a dendrimer. So the point is that while human immune response they force the virus to sample sequence space at the end of the day for these reasons it's just like shaking it rather than changing its order and so prevalence and fitness look roughly the same. Look, this kind of analysis, machine learning and arguments that I just made are approximate arguments, approximate methods directed to a very complex problem and could be completely wrong. And the only way to test their veracity is to confront predictions that come out of this model with in vitro experiments and clinical work. And I shall show you snapshots of that now beginning with in vitro experiments. So we can make predictions of the fitness of different viral strains. Our collaborators can then make these strains by side directed mutagenesis. We can then measure the ability of the virus, the mutant strain, to infect human cells and grow out in petri dishes and compare that with some reference strain. So the metric of fitness for us is a quantity called E. It's the value of that Hamiltonian for the street sequence. And the theory predicts that the value of E for a mutant minus the value of E for some reference strain would correlate negatively with the fitness of that strain divided by the fitness of a reference strain plotted in the logarithmic coordinates. Now I'll first show you the data where the mutants were all in one large polyprotein of HIV called GAG. So we made predictions for 36 different strains. Our collaborators, Bruce Walker and Tombi Udungu made those measurements. And this is the prediction and the data are the dots. And for nine of these strains we said that they would be extremely unfit and they didn't grow out at all. And they're not graphed here simply because the difficulty of plotting the log of zero. But if you include those nine data points then you see that although we are certainly not perfect with high statistical significance we capture about 80% of the time right. Here it is for another set of strains where we compare with 103 different old and new data points for another large polyprotein called envelope of HIV. And again you see we are not perfect but we do reasonably well. Comparison with other HIV proteins are similarly encouraging. I'm going to show you that. Let me show you one snapshot of comparison or the tests rather with clinical data. And this work was done in collaboration with Andrew McMichael's lab at Oxford University. So Andrew had historical blood samples from HIV infected people who had never been under treatment and he had temporarily ordered blood samples. So he could sequence the viral swarm as a function of time in these individuals. At an early time point he could also determine what were the targets of the T cells which region were the targets of the T cell for this individual person. Unfortunately in these individuals the virus was always able to make mutations to evade the T cell response. So he asked based on your fitness landscape can you tell us that of all the places where mutations could have occurred to evade the T cell response where did they actually happen in these people and can you also tell us the relative times at which they escaped in these patients? So for a hundred years people have done stochastic evolutionary dynamics ever since Wright and Fisher but to our knowledge always with toy models of these fitness landscapes but for answering these questions we did very similar things except that we now did this with the fitness landscape of a real human pathogen with known T cell response. And the only input to this calculation is the initial viral sequence and where the T cell attack is. So how do we do with predicting where the mutations are? Well roughly speaking 86% of the time we get it right. Rather than show you this data in detail let me show you examples that highlight how HIV uses these compensatory pathways to evade human immunity which also illustrates the importance of the sequence background in which the same mutation has to arise. And this is best done by comparing pairs of patients who target exactly the same region of the virus with their T cell response. So these are two patients who target exactly the same peptide. Escape happens by precisely the same mutation. The circle here represents the protein in which this peptide is embedded and the marked residues are those where mutations existed in the virus that infected this patient. Blue lines indicate that we predict from the fitness landscape that the pre-existing mutation has a compensatory effect on the final escape mutation. Red lines indicate that the pre-existing mutation has a negative synergistic impact on the final escape mutation making it even more unfit than if it was the escape mutation alone. And the thickness of the lines is the size of the effect that we predict. So you can see in this case very vividly that this person was infected with a virus that had many more pre-existing mutations that were strongly negatively coupled with the ultimate escape mutation making that escape virus much less thick and therefore it took much longer to grow out and in this patient the escape happened 40 times more slowly than in this person. Now how do we do our predicting the relative times? So we can only predict evolutionary generations not actual time. So here's the comparison. This is our prediction of the escape time in evolutionary generations. This is the actual escape time of the danger from the blood samples. And while there is a lot of scatter especially in the middle we don't do too badly with high statistical significance. Please note that these are not contrived experiments. These were natural infections that happened in these humans. And significantly we can do very well in separating regions where escape occurs very slowly. Those are the things you want to target in a vaccine and also can tell where things happen very quickly. These are things you don't want to target with a vaccine induced immune response. So based on these sorts of successes and others that I have not shown you we were encouraged to develop an immunogen based on the criteria I told you earlier. So these are only parts of proteins. And we've done that and the challenge now with this is delivery of these long peptide immunogens in a way that gets you a big response that happens when you give the whole shebang. This is also the problem in cancer vaccines by the way. So there is a whole industry of people who are engineers trying to do this like my colleague Darrell Ergen who's trying to make nanoparticles that can deliver these subunit vaccines as they are called nicely. And we are collaborating with him but the most immediate thing that has happened is that Dan Baruch at Harvard Medical School has this replicating vial vector called AD26 which is approved for human trials and he synthesized our immunogen into this vial vector. And then he first immunized mice with this to see if there is an immune response that is the thing is immunogenic and without showing you the details of the data about one and a half months ago this data came in that it is immunogenic. So now it can move to the next step which is monkeys and so two weeks ago, a little over two weeks ago there were four monkeys were immunized with this so we went from statistical physics to monkeys that's, even if it fails after this, okay. Anyway, the first blood draw was earlier this week and it appears that three of the four monkeys had some response which Dan is hopeful will grow and then we will boost it and so on. So all of this will never work but if it does, then what one would like to do is to use this as a therapeutic vaccine that can redirect the immune response of people who are already infected to hit parts that will corner the virus and then be able to take them off of treatment, for example. But this is all well over my pay grade so let me spend the remainder of the time that I have to tell you about our work on the antibody arm of the challenge. So this is a schematic representation of the viral spike of HIV. It has some residues, such as the ones shown in yellow which are relatively conserved because they have to bind the human proteins in order to propagate infection. But they are surrounded by these highly variable residues shown in red and shielded by these sugars shown in blue. Now recently, immunologists, like Michelle Nussensweig and Dennis Burton, have been able to isolate antibodies from infected people that at least in vitro seem to be able to neutralize diverse HIV strains. These so-called broadly neutralizing antibodies, or BNABS, contain such conserved regions as part of the region to which they bind. The trouble is that these BNABS emerge in these individuals several years after infection and that too in low numbers so it doesn't help these people. Nonetheless, it provides proof of concept that the human immune system can evolve such antibodies and that thus raises the tantalizing possibility of inducing such BNABS efficiently in diverse people by properly designed vaccination protocols. So upon natural infection or vaccination, antibodies develop by a Darwinian evolutionary process called affinity maturation, which occurs in a very short time. I told you that when a B-cells receptor binds sufficiently strongly to the spike, it gets activated. And then it can seed structures in your lymph nodes that are called germinal centers. Here's what happens in there. The B-cell starts to replicate and as it starts to replicate, it also starts to turn on mutations into the antigen binding region of the receptor at a very high rate. These B-cells with mutated receptors then interact with the virus particles that are displayed on the surface of other types of cells called FTCs. If the receptor on this particular mutated B-cell can bind sufficiently strongly to the virus, it can eat it up. And then inside the B-cell is chopped up into short peptides and displayed on its surface with these MHC molecules. And then these B-cells that have internalized antigen compete with each other to bind to another kind of T-cell called helper T-cell that's present in limited numbers. A productive interaction results in survival, otherwise the B-cell dies. So you can see the B-cell that binds more strongly to the virus is more likely to eat more antigen, therefore stochastically more likely to display more peptide, and therefore stochastically more likely to beat out its peers. Now a small fraction of these positively selected B-cells are let out and these secret soluble forms of the receptor in the form of antibodies, but the vast majority are recycled for further rounds of mutation and selection. So you can see how as time ensues after infection, the affinity and potency of the antibody grows for the antigen. So this problem has been described extensively through experiments ever since my late colleague Carmen Eisen first described it in 1964 and has also been subject to much computational analysis, but my own interest is focused on trying to see how we can trick this Darwinian evolutionary process to make cross reactive antibodies to diverse strains rather than strain specific ones. So you can see that the induction of broadly neutralizing antibodies will require vaccination with multiple variant strains that share the conserved residues but are different elsewhere. Because if you just do one strain, this Darwinian evolutionary process will make a very good strain specific antibody. So once you have to give many antigens, new questions arise. What are these variant antigens? How should I give them as a cocktail sequentially or in various different temporal patterns that I can think up? Which one? What should be the concentrations? How many should they be? How different should their variable parts be and so on? So the answers to these questions are drawn from a very large space of possibilities and it may be difficult to find the golden nuggets that we seek by randomly sampling the space in experiments with monkeys. Now mechanistic understanding of affinity maturation could guide the choice of answers to these questions, but until about four years ago, affinity maturation had only been studied in the context of single antigens. So how it occurs when there are multiple antigens is still poorly understood. Which is why a few years ago, we started to build computational models of affinity maturation in the setting of multiple variant antigens with the goal of guiding vaccination strategies to more promising directions as well as to understand this basic problem in immunology. Again, a number of people have played a role in this. Let me just highlight Chen Chen Wang who was the first postdoc who worked with me on this problem and is now a faculty member at UCLA and has involved collaboration with diverse life and physical scientists. So the model that we developed is so-called agent-based model which is fancy wording for saying that the computer is given a set of instructions. It follows these instructions. What are these instructions? These are the rules of affinity maturation that I just told you except that there are experimental numbers for the rate at which mutations happen and so on and so on. It's exactly those rules that I told you. And the computer executes these instructions except now it does it in the context of multiple variant antigens and complexity emerges because of the coupling between the rules that may not be so easy to guess intuitively. So I'm not going to take you through this and it's a stochastic process and so forth. So there are stochastic probabilities of internalizing the antigen and winning the battle with your peers and so forth much like taking exams or writing papers and so forth. One thing I will tell you is that when there are multiple variant antigens some things we don't know anything about experiments from. Like if there are multiple variant antigens are they distributed so homogeneously that every time the B-cell interacts with this it sees all the types of antigens or is it very heterogeneous every time the B-cell interacts with this it sees only one type of antigen. We will see that which of these scenarios is true makes a big difference. So the calculation goes like this. You activate, you replicate, you mutate, you select, small fraction let out, the rest cycle. As the simulation comes to an end in one of three circumstances all the B-cells die unlike in physical sciences and engineering in biology there is this hard boundary condition that can happen which is called death. And the other is when the B-cells exceed the threshold number which is simply a proxy that's eaten up all the antigens so there's no food it stops. And the final one is after a maximum time because even if you haven't eaten up all the antigen the antigen decays with time. So then now antibodies come out from this in silicoaffinity maturation we check the binding affinity of that antibody against a number of strains which are used for assaying for breadth in the experiments and the breadth of coverage is defined as a fraction of strains that bind with an affinity above a threshold. Because evolution is a stochastic process we do this multiple times under identical conditions collect statistics and the results are presented as histograms that describe the probability that a typical vaccinated person will make antibodies of different breadth. So here's what we first see when you immunize with three computational immunize with three antigens that share conserved residues but are separated from each other in their variable domains by a large mutational distance of 11 non-overlapping mutations. If you give a cocktail all simultaneously we find that the probability this is the probability of generating antibodies of different coverage or breadth one being perfect breadth and you see under these conditions you don't get cross-reacting antibodies. We also find that you get very few most of the B-cells just die and the reason for this is seen very vividly if you first consider the case where every time a B-cell comes to interact with these antigens it sees only one type of variant antigen. So in this cartoon these are the three antigens that share the red conserved residues and the different colors indicate that they have different variable residues. Now suppose a B-cell in one round of mutation selection is positively selected upon interacting with this guy. Now in the early stages of affinity maturation the affinity of the receptor for the antigen is not very high that is why you're doing this mutation selection. So it's very unlikely that strong interactions have developed with these shared conserved residues. So the next time if this B-cell goes and interacts with this guy most likely it will not interact sufficiently strongly and it will die. The point I'm making is that these variant antigens are conflicting selection forces that can frustrate this Darwinian evolutionary process of affinity maturation. In fact, the word frustration here I use a bit technically because very similar things happen in spin glasses and so on and so on. Of course you will say that this frustration doesn't exist if you can see every type of antigen at every time because then it can just keep growing to the same one but then there's no driving force for evolving breath or cross-reactivity and that's what you get, now if you give the same antigens in a sequence, first one then another, then another then we see the chance of getting cross-reactive antibodies much higher. You also get many more. And the reason for this again is explained here in terms of a cartoon where focus only on this panel which describes the footprint of the antibody on the antigen. What is it binding to? So when you give the first one of course is a broad footprint that covers both the conserved residues and the variable residues because that's the driving force but when you come with the second one that has mutations in the variable residues compared to this one we see that two types of lineages develop. One builds strong interactions with the shared variable residues between these two parts. The other builds strong interactions with the conserved residues that are also shared. Now when you come with the third one that has non-overlapping mutations in the variable residues compared to these two these guys are in trouble because they'll have to undo these interactions and build new ones whereas this lineage can keep going and keep building strong interactions with the conserved residues and become cross-reactive. So this basic idea has been tested in model experiments in mice. Here four different strains or variants of the protein that makes up the spike of HIV were designed that shared some residues but had mutations everywhere else and mice were immunized with this once with a cocktail and once sequentially and this is the binding affinity of the antibodies and this is just how much antibody you have and you can see that even for the four strains with which you immunize there's a huge variance in the binding affinity that is these are not cross-reactive antibodies whereas if you do it sequentially you see a much narrower distribution that is cross-reactive antibodies have evolved and they did a number of assays to also show that sequential immunization caused the antibodies to focus on the conserved residues and cocktail did not and a number of studies have since been published that show the importance of sequential immunization let me close by just noting two more things with same point, two more slides all of this is done with three antigens separated by large mutational distance but what if I did a cocktail of many antigens separated by tiny mutational distances will it be better? So we looked at this on this axis and the number of antigens is growing, the variant antigens but as we increase the number of antigens we reduce the mutational distance between them so here is the situation we studied before few antigens separated by large mutational distances and here we have many antigens separated by tiny mutational distances and this is our prediction for the brat you see that there is an optimum at some optimum number of antigens and mutational distance now remember frustration is very high here I already argued why that is the case and on this side frustration is very tiny because you have 20 antigens separated by a mutational distance of one they are almost the same so it appears that you need an intermediate level of frustration to do the best you can to get cross reactivity so what is the reason for this? the reason for this is very simple to understand in any round of mutation and selection you can either not mutate and survive mutate and survive or die there is nothing else that can happen to it so let's say that there is some survival probability and this depends on this frustration there is some mutational probability and let's do this arithmetic trivially what is the chance that this path will be occupied? it didn't mutate survived mutated, survived died, did not survive so if p s, the survival probability is very low as in high frustration everything dies if p s, survival probability is very low very high low frustration between these two paths it will choose this one because mu is less than half so that's why you need this goalie lock situation that allows this path to be taken so you can collect lots of mutations that are required to confer this breath now this principle of optimal frustration remains to be tested in experiments so let me close with this last slide affinity maturation is a Darwinian evolutionary process which is a highly non-equilibrium process that has two very strong attractors one is the valley of death and one is strain specific antibodies a very narrow ridge line that takes you to this promised land and what the whole field is trying to do is to make this narrow bridge line a broad stairway to heaven if you like the trouble is that we don't until very recently understood how frustration depends on all of this and now we've done a number of studies that have culminated in a design of an immunogen that is being tested now in mice in scripts but all of those studies what they are telling us is that there is not an optimal frustration but there is an optimal temporal trajectory of frustration you want the frustration to be low at first so that you get diverse types of B cells to survive and temporarily you make it harder and harder for it to survive other than the cross reactive ones so you have many possibilities that start and then slowly you winnow this down and so on but my time is up so I shall not tell you about these the last few things that I would like to take that point but rather just stop here except to say that I think I've thanked all the people involved in the work I have described and these are some of the people who provide the dollars that make it all possible so thank you for your attention I apologize for running just a little bit late I'd be happy to answer questions if permitted thank you what have you extrapolated from the two cases where people were effectively cured what have you how have you injected that into your studies and what have you kind of pulled out of those unique situations well short answer is we don't because of the following reason the two people who have been cured and court cured have happened by the following way they so there is a particular molecule so the principal target of HIV is that it infects our T cells themselves which is why it's such a disastrous thing because you have to whatever it has to do, you have to do quickly and there is a protein on the surface of HIV of our T cells called CCR5 or equivalently CXCR4 and there are people a small fraction of individuals who have mutations in that protein in their T cells and thus HIV cannot infect them so this was done a long time ago by seeing that certain sex workers in Thailand would never get HIV in spite of constant exposure what happened in these two individuals the first one by chance and the second one by design is that they also got cancer and as a consequence of this cancer they had to have a bone marrow transplant and the bone marrow transplant they were transplanted with people whose T cells because bone marrow is where T cells are made had this mutation so this is not a practical way to cure somebody and nor is it somewhat something that any person would like to do but it's not an expense issue it's another so what people are doing, are trying to do is to say can I have small molecule inhibitors that will go and inhibit this particular protein and therefore HIV will not infect this person the others have dreams of using CRISPR-Cas9 to go and make this mutation but frankly you know a major part of the burden of this is in countries like Southern Africa so a region around Durban that I visit a lot because of my HIV work in the Quasalun Natal province 66% or so of the women in their 20s HIV positive and it's a very very poor location I think what is really required as has been required for eradication of any bug through history is some kind of mass vaccination protocol so that's why we are not so informed by those particular studies you talk about statistical physics applications to this problem but whenever you have the anti-bodies and the antigens and all that it's more deterministic you don't consider chemical engineering effects such as equilibrium constants kinetic limitations mass transfer effects can you comment on that? Now it's a good question what I have not told you is well I think I understand your specific question you can do stochastic dynamics of transport as you well know but those processes the affinity maturation the second part of my talk where things go from one place to another and so forth that is by transport and in fact there are transport limitations so and for example if you take a human being or any animal and you do affinity maturation you give them some oxygen you will find that you will never get an affinity of that antibody to exceed a particular threshold and protein engineers like Francis Arnold who directed evolution is affinity maturation done in a laboratory she can get you things not just she the world can get you things that will bind like a rock I mean very hard much higher than the threshold in humans you know why? The reason is that there is a rate of transport of the mutated B cells from where they mutate to that cell that holds up the virus so that's a particular time scale if you now have developed B cells with mutations where the on rate that is the rate at which they bind the virus is faster corresponds to a faster time scale than the transport rate what is the evolutionary selection force for generating better antibodies? Nothing because the limiting step now has become diffusion and transport that is why we cannot make antibodies that are of more affinity than some threshold and neither can humans I mean both our competition I do not describe that bit whereas in vitro laboratory you can make whatever you like because you can beat that transport limitation so they are all there in various ways inside these things for the T cell part they are not and they don't need to be because in 24 hours there is circulation through your blood and lymphatic system of the T cell so they should likely find various parts and you become a mixed bag but it's not totally true some tissues are more privileged in the sense that diffusion limitation prevent things to go there and so forth it's not diffusion there it's transport through the tissue they are all there but sometimes they are important to treat and other times not So for the algorithms you were talking about I have this physiological I'm getting old so I don't hear very well For the machine learning model and the algorithm you were talking about interesting this how do you think this similar algorithm can be used in the cancer immunotherapy and how do you use it to determine the driver mutations in the cancer Yeah so you know right after we did this work I suddenly started getting invitations from cancer to cancer meetings because they were asking the same question that you are asking that is a well related question that how can you identify new antigens as the cancer people call it that is mutated evitopes that will not be able to mutate away from whatever immunotherapy you are doing and so on I think you can use it for that the only issue right now is that in order for doing this you need a large set of sequences because your learning algorithm is just as good as the extent of data you have in fact even in these cases we have to I mean I hid a lot of detail but not only that but how we decommeluted effects of immunity from intrinsic fitness in this case we have to be very careful about the finite size of the data you have to do things like regularization and things like that to worry about the finite size of the data but at the moment there are not enough whole genome cancer whole cancer genomes to have statistical power to do that a day will come when I think this will be possible not necessarily just by our learning method there may be other learning methods that could achieve the same goal but right now I don't think there are enough whole genome sequences to do this