 All right, so my name is John Kedera. I'm one of the open force field PIs. I'm an associate member, which is an associate professor equivalent at the Memorial Sloan Kettering Cancer Center. And I'm gonna tell you a little bit about sort of what we envision being the next generation of parameterization philosophies that overcomes a lot of the limitations we've been having previously in building force fields. So Leaping has done a really great job of telling you about the current generation of technology that's sort of at the cutting edge. You know, those of you who have been through the force field parameterization more so obviously have, things have changed a lot over the years. And now this full automation that force balance presents using these regularized squares optimizers where you express different data weights for the different classes of data and you really have the computer do all the optimization for you starting with a good parameter set and moving things towards better parameter sets has been really useful. And we think that's gonna power the next year of force fields, if not the next two years. But going beyond that, we think Bayesian inference might provide a useful way and I'm gonna try to persuade you why this might be really interesting. And please feel free to interrupt me if you have questions. We can send a microphone around because this might be a lot of new material for people. Just to ask as a show of hands, how many people are familiar with Bayesian inference? All right, that's a good number of you, great. How many of you are frequentists? It's okay, we'll still let you stay anyway. All right, okay. And how many of you would say you know something about stat mech? A lot of you, okay. So if anyone who has already raised their hand about stat mech, you already know everything there is to know about Bayesian inference. That's the good news. I'll tell you more about that a little bit later. Okay, so what would we really want? If we're soul searching and asking ourselves, what do we want out of a force field parameterization approach? We really want everything to be automatic. We don't wanna have to hand tweak anything ever. We're gonna have a force field strike team to fix the more glaring errors, but we'd like to really have them putting themselves out of a job by automating everything we need to worry about. We also know that we're not just seeking like a local griddle optimum, which might be a very narrow little optimum or a shallow optimum that might have a much deeper, better objective with better force fields nearby. We instead really seek good generalizable parameter sets. And generalizable is a hard thing to say and codify, but I'm gonna try to argue that it actually has a well-defined meaning. It's like a free energy minimum that might maybe is a large basin of parameter sets that all fit pretty well. We also don't want to necessarily have to assign data weights to say, how do I weight quantum chemistry against physical properties? How do I weight densities against enthalpies? Because I want the data to really, from the data error to drive those choices. Also, we would like to get rid of the necessity had to have feats of chemical insight. Ideally, if we have enough data, the computer can do the chemical insight for us and really understand chemistry, but be guided by statistically sound methodology. We don't want a random proliferation of atom types that's completely unjustified by data because we're vulnerable to overfitting them. Also, we don't want too few atom types. We want just enough that the data tells us that's what we need to fit the data well. We also want appropriate selection of functional forms. We're a bit constrained by what we have in molecular mechanics engines that we're gonna support in the first couple of years, but we'd like to break free of those limitations, a lot of which are very much historical, like the Leonard Jones 12-6, and do much better by selecting functional forms that do fit the data in a parsimonious way. We'd also like to rapidly and systematically improve the force fields. A lot of this whole effort is based around the idea of sprints where we go through epochs of optimizing parameters and validating force fields, and we'd like to make sure that this can systematically and sort of autonomously go on to produce better and better parameter sets. Finally, we'd like to have an assessment of the reliability of these predictions. We can quantify statistical error now when we perform a free energy calculation, for example, but we can't really quantify the systematic error which really dominates the prediction error, and that means that we can't make great decisions if we don't know how wrong we are. We have like stories like, oh yeah, sulfonyls are terrible in gas, right? But that's not a very useful thing to put into actionable practice if we're trying to really make predictions that hedge our bets about what molecules to make. And finally, it would be great if we could collect the data exactly where we need it, where the force fields can somehow tell us which data is most valuable for improving accuracy. And I'm here to argue today that the Bayesian approach is the answer to all of these questions. So parameter space might have many local minima, and a Bayesian inference method that uses something like Markov chain Monte Carlo or some sort of a sampling scheme can go over those local minima and find good low free energy basins or high probability regions, even if there's many minima in the landscape. Now also, these parameter optimization problems generally feature nonlinear, nearly degenerate solutions where you might have a weird banana shape. In fact, when Junmei Wang was parameterizing GAF, he found his genetic algorithm had huge regions of parameter space that basically fit the data equally well. So he just picked one parameter set, but that's not necessarily the best thing to do because some predictions might be very sensitive to those choices and other predictions might be highly insensitive. We also want good ways to characterize the systematic error. So these Bayesian methods allow us to not get one parameter set out, but whole family of parameter sets that fit the data equally well that we can then use in a rapid way to estimate, at least with like an underestimate of what that systematic error is by after the fact recombutation of some of our properties. We also want to make data-driven decisions and I'm going to tell you a little bit about how the Bayesian method can really make data-driven decisions, including decisions about how many types we need or which functional forms are right. Also, the data is automatically weighted by its measurement error in constructing these likelihood functions. I'll tell you a little bit about how that works. So we don't need to come up with data weights anymore. It's all automated. And finally, we can balance this proliferation of parameters by using the Bayesian decision-making in terms of selection between different models to figure out, for example, maybe in a later generation of the force we would want to allow polarizability at certain sites, what sites really require it and justify the increase in parameters in a statistically sound way. And finally, we want to identify which experiments are going to give us the new information that's most valuable in reducing uncertainty on prediction areas we want at the least amount of cost. And there's a somewhat complicated but still very important field of Bayesian experimental design where you can simulate the results of experiments you haven't done yet and compute the expected information gain. And all of that's isomorphic with what we normally do for our day jobs is doing free energy calculations or doing other molecular simulations. So just to give you one idea, for example, is that when you do any sort of nonlinear optimization problem, often you end up with something that looks like the right thing. Like this is a banana shape in two dimensions. You say, okay, I'll pick one parameter set, but there's no reason to prefer that particular one over any of the degenerate sets of parameters. And indeed some predictions might be highly sensitive to which parameters that you chose, even if the data used in training was not very sensitive to that, which is what this means. And the left plot is a plot from Josh Fass and my group who's looked at parameterizing GVSA models, for example, and has found that you can change the C radius and the H radius in this weird way that's highly coupled. And anything along that path will give you essentially identical hydration free energies for methane. So it just goes to show you that sometimes these problems really do crop up for parameterization, though I think they're still poorly understood in the context of getting force field parameters. Okay, so what is the Bayesian way? So for those of you that know Bayesian inference already, this is very rudimentary, but essentially it allows us to write down the probability of our model, which in this case is the force field parameters, given whatever data we put into parameterization. So we've already heard a little bit about the kinds of data we'll use. We'll use quantum chemical data. We'll use a lot of liquid phase data, physical properties. We'll hear more about that a little bit later, I think tomorrow when we talk about what kinds of data sources we're exactly going to use, that data D informs our choice of which parameter or what different probability we should assign different parameters consisting of the data. And that consists of this, so we're after this posterior and it's written proportional to the likelihood, which I'll talk about in just a moment, and then something called the prior on the force field parameters, which basically encodes our prior information from either previous rounds of inference or anything we know physically reasonable about those parameter sets. So the likelihood function, if you have independent measurements, is just simply the probability is the independent product of all the measurements of the probability of a particular measurement given the force field parameters. That is, for example, if I have a density, what is the probability that I observe some measured density and a given uncertainty or error in that measurement, given the set of parameters for all the Lender-Jones and everything else in the force field. And as long as I can write that down, I can go through these rounds of inference where I condition the prior, which is my initial state of confidence about parameters, on that data and get back a much narrowed idea of what the probability of my parameters is. Now computing the likelihood requires two different exercises for us. And a lot of what we're trying to do is build a modular framework where we can plug in different components that correspond to different kinds of physical properties that will be interested in. So we'll hear more about that during the infrastructure talk tomorrow in our physical property discussion. But essentially, for example, say I want to use densities as a physical property that maybe it comes from the ThermoML archive. There's lots of liquid density data for individual liquids and mixtures of liquids. And I can write down, first a forward model that says, how do I turn my force field parameters into a density? I generally have to run a simulation, though there are shortcuts that will allow us to make that much cheaper that we'll hear about tomorrow. And I would then compute the expectation where under that parameter set, what's the ratio of the number of particles to the total volume, for example, for a number of density. So it's a mathematical expectation for that parameter set. So now I can go from the parameters I'm currently considering to a density without any measurement error. And then I need an error model. Whoops, how do I go back here? An error model that connects. What the true density is to what I would actually observe in an experiment. And so NIST is highly familiar with how you do density measurements. You put a liquid in a vibrating tube. You look at the frequency at which it resonates and you know the volume of the tube because you've calibrated it. There's all of this other stuff that goes into the experimental error that's reported. And maybe I simply use a Gaussian error model or maybe I simply use a log normal or some model of what that likelihood is, that the true value is a row star given that I've observed row or vice versa. But given those two things, I can form the likelihood by writing the probability that I've observed some data, which is a measurement in its error, given some actual set of force field parameters. All right, so every time you add more data, the cool thing is that it conditions the posterior to reduce the uncertainty. So if I have a number of measurements, this is just an idealized case, no measurements, this is my prior, it's the big broad Gaussian. As I make more and more measurements the parameters become peaked around what we think their true value is, what we're trying to get information about what nature's true potential function is like. And you can actually quantify this in an information theoretic manner by looking at something called the entropy. And it's the same as the thermodynamic entropy. So this is the information theoretic uncertainty. And as things become more and more peaked, we can compute what the reduction uncertainty is. That's the information content of the new measurement in the context of the previous measurements. And it's something that we can also compute for experiments we haven't done yet, the expected information content. And this will allow us in the future to do design by saying we should really collect data for this set of liquids in order to improve our, to gather the most information with the least expense. And in the slides I give you a link to a cool interactive visualization you can play with. So what about the prior? Now, often we want to use priors that don't encode a lot of information other than the physical constraints on the parameters like Leonard-Jones sigmas should be positive, for example. So as LePing had suggested, we can use a parameterization that allows us to use one thing that can assume any value and then take its exponential, for example, to get a Leonard-Jones sigma. And what you'd like to do is to make sure that any choice of transformation to variables doesn't change your answer a lot. It'd be weird if, for example, using the angle theta or the cosine of that angle gives you totally different results. So there's ways to correct for that to make sure that you're not biasing things by choosing a particularly information-less set of prior parameters. There's also things like nuisance parameters. Sometimes you do an experiment and we don't know what the size of the error is, but we have lots of data. So we can put a nuisance parameter on the size of the error for that class of experiments and infer that along with everything else. And then that parameter isn't something we care about. It's not a force field parameter. We can just throw it away and that's just called marginalizing it out. And we can also use prior rounds of inference for our prior as well. So if we've already done this whole exercise and sampled a bunch of different parameter sets and you know that you've measured something else that is really important and could write down a likelihood function, then you can actually condition your predictions and make them more accurate using very simple and very inexpensive ways of conditioning on that previous set. You can also use the previous rounds of samples of force fields to seed a new round of posterior sampling. So everything you know about stat mech can be translated directly over to statistical inference. In fact, we use all of the same algorithms to do all of the sampling. It's just a much easier problem because instead of dealing with 50,000 atoms, which each have three dimensions each, we're in this case only dealing with a few hundred parameters that are also tightly coupled. So it's actually a much simpler problem than trying to sample the equilibrium space of a protein jiggling around. So in statistical mechanics, you have the potential energy and in statistical inference, you also have this quantity as well. It's the minus logarithm of the unnormalized density. So potential energy in classical mechanics has an additive constant, which is totally irrelevant. Same thing in statistical inference. There's a potential energy that you can think of as having a unit temperature that goes along with it. And you could just sample using normal algorithms. Partition functions, which represent the partition between media, for example, are the same as normalizing constant statistical inference. There's basically a Rosetta stone where you translate all the language from one to another. We have binding utilities and partition coefficients are ratios of partition functions in statistical mechanics. These are called Bayes factors or ratios of them or model evidences. And I'll tell you a little bit more about that in a little bit. Physical property measurements like a density is just an expectation in the statistical sense. And entropy is basically the same in both different fields. And the cool, really cool thing is that the same algorithms are useful for tackling both problems. So everything that you've used in molecular simulations, we can also use in sampling parameter space is just that we have to use a different program to do that kind of sampling. All right, the same kinds of algorithms. Metropolis Monte Carlo is one of the simplest. It was invented in the physical, statistical physics land, but it's used all the time for sampling in the statistical inference land. Hybrid Monte Carlo is another kind of thing where you can do some MD basically and then accept or reject the MD using some acceptance criterion. That works really well when you have gradient information but doesn't work when you have lots and parameters. But fortunately, there's now some really cool Langevon integrators that you're familiar with in MD that work very well at sampling high dimensional problems when you have gradients available and are accurate and efficient. So basically you can use them and not have to metropolis. And then other strategies like Gibbs sampling where in replica exchange or expanded ensemble simulations, you'll do some continuous propagation of variables by running dynamics. And then you update some discrete labels like which thermodynamic states each of the replicas is associated with by which temperatures. That allows you to mix discrete and continuous sampling at the same time. We can use the same kinds of strategies to sample from parameter spaces if we have discrete and continuous variables. If you want to learn more, there's this great book that John Lewitt Harvard wrote which basically is a Rosetta Stone. It takes all of the algorithms from both of these different fields and puts them on the same level, the same mathematical framework. It's very simple to understand and it's basically a good tool belt to keep with you to solve all of these problems. So sometimes the innovation comes from the statistical inference side. Sometimes it was invented first on the stat mech side but putting it all together is a general way of tackling all of these very complicated problems. And you might recognize some of your favorite sampling methods on either side. All right, so what kind of data are we actually talking about? We'll hear a lot more about this and how we're going to compute these, especially in the near term, a little bit later. But there's a lot of different kinds of experimental data and quantum chemical data that informs different classes of interactions and different classes of force field parameters as a result. We want to again make this modular as possible so that we can plug in new kinds of measurement classes or quantum chemical classes and infer those by having this likelihood be modular as well. So just as in force balance, your objective function is a sum of penalty terms. In this log likelihood, it becomes a sum of the log likelihoods of the individual data elements of these. Just to mention that there's also large amounts of this data available. We'll come back to this. Unfortunately, the folks from NIST couldn't join us because of the shutdown, but there is huge amounts of data that NIST has archived that we can pull from. And just like I said, that conditioning on data reduces the uncertainty in this very simple case of getting Gaussian information. We can actually show this really works in physical properties as well. So this is data that comes from Michael Schurts and Levi Naden, for example, where we're looking at a uniting at a methane and looking at the posterior probability you get for parameters. In this case, there's only two, epsilon and sigma for Lunar Jones, for this United Adam methane. And you can see that if you just use density information across a range of thermodynamic states, you get a swath of parameters that all has high likelihood. If you use only enthalpy information, enthalpy is a function of temperature and pressure, then you get, again, a different swath of parameters that looks very different, but is still this nonlinear manifold. And putting them together by combining the likelihoods for each of them, you get this posterior that is now much more highly localized. And the act of using more and more experimental data is just narrowing and narrowing this posterior to the point where we can start to really find a good region of parameter space. But it's still not one parameter set. It's a whole region of these. And our goal is to sample this parameter space pretty thoroughly so that we can have multiple force field parameter sets that are consistent with this data that will allow us to do actual inference of what the systematic error is. All right, so to do that, there's a way to write down the probability of any prediction we want to make. It might be a free energy of binding. There might be a density. It might be a population of some conformational state in terms of the data that we use to parameterize the force field. And this is just a convenient mathematical way to write it, but it's essentially just looking at generating samples from our posterior probability density of parameters and then looking at what different values of the expectation or reservoir we have. And the cool thing about this is that if we're predicting the binding affinity of two different ligands that might be part of a congenital series and we're looking at the binding to the same protein, for example, we can exploit the favorable cancellation of errors. So it might be that the individual parameter, the uncertainties are quite high, but because it's the same swath of chemical space we're looking at, the parameters might make very similar predictions and that might give us very beneficial cancellation of errors for some compounds but not for other compounds where we've introduced a weird moiety that we don't have a lot of data on. So we'll really tell us in a problem-dependent fashion what our expectation of the errors might be. So the old way of doing this is that we take the one true force field from on high and run a single simulation of some sort or a single calculation and then get a single prediction. We report the statistical error and that woefully under predicts the actual error of our method, but the new way of doing this would be, in principle, you would take many different parameter sets, run many different calculations and then look at that distribution of posterior predictions. Now that's very expensive. We don't want to tell you you're gonna have to make everything a hundred times as expensive. The way we're going to get around this is that we're gonna pick one good set of reference parameters that has good thermodynamic overlap with all of the different parameter sets that we sampled. And then we are working on ways to have a fast reweighting process that happens after the fact. So you'd save some of the data during the calculation and it will depend on what kind of calculation you're doing but particularly, I think, free energy calculations of ligand binding are gonna be the killer app for this and then we use a fast post-processing step that just uses the frames you've saved and recomputes the energies of the parts that have changed in a way that allows us to get at least an underestimate of what that systematic error is. I think that's gonna be much bigger than the statistical error we've typically been using. All right, now, any questions about that so far or I'm gonna switch gears a little bit to talk about what happens when we have other discrete choices. So there's a lot of discrete choices we have to make when building force fields, right? For example, right now, if we're using Leonard Jones, we have to make a choice about which mixing rule is actually gonna lead to the best generalizability with the least proliferation of parameters. We also have to make choices about even which functional form. So maybe Leonard Jones isn't the right way to go but a Buckingham exponential six is a better fit to the data. You can go and add as many parameters as you want, of course, and fit quantum mechanical data as arbitrarily, finally, or really any data but is it worth that extra set of parameters? Same with atom types. Like do we need 10 atom types? Do we need 100? Do we need 10,000? Having this not driven by human intuition but instead driven by something that statistically balances the actual ability to fit the data with penalizing overfitting is gonna be super important. We've been talking a lot and we'll hear more tomorrow about electrostatics models for small molecules where we'd like to use Christopher Bailey's wonderful bond charge corrections idea where we do a very cheap calculation that's often a simple semi empirical quantum chemical calculation. We get some initial charges and then we correct them to better fit physical data by using these bond charge corrections across different types. There's something like 160 of these right now. I forget exactly how many. 300, okay, underestimate. So how many do we really need and which ones are really the most important ones? All phatom charges, there might be a need for charges where there's special sigma holes or maybe we need them everywhere for better reproduction of particular kinds of physical property data like binding affidities and maybe we want polarizable sites sometime in the future but it would be nice if it could tell us where we need the polarizable sites. Maybe they don't need to go everywhere. So to ask a question in this context that has different categorical choices or different models that may have even different numbers of parameters, there's a way to do this very straightforwardly in Bayesian inference by computing something called the Bayes factor. And remember I told you that was the same thing as an partition function in statistical mechanics, right? Generally in stat mech, we have to compute a partition function, we compute ratios of them because it's much easier to compute that. And here exactly the same, it's much easier to ask the question, is one model preferred over another model? And there's a very, very straightforward interpretation in terms of this ratio that you get from these Bayes factors or model evidences is what you would, how you would hedge your bet 10 to one for example or a hundred to one over two different outcomes in an iterated gambling scheme to make break even with the house. So if you want, you can take this and go gambling in Las Vegas afterwards. You'll at least break even but we're not guaranteeing that you'll make any money. It's always a losing deal. So in this case, the model evidence is just basically computing what is the probability of the data over the different parameters? This is the posterior case. This is the likelihood, the prior and then your prior probability on that model space. If we can integrate this, then we can compute the model evidence. It's generally difficult to compute this directly. So we use the same schemes we do for free energy calculations. You introduce this part, the likelihood as a field by turning on more data in the context of some reference model. There's lots of different ways to do this efficiently. I'll tell you about one specific way we've been using for parameters in just a moment. But, and that's in this case, the reversible jump Monte Carlo. So if you have the choice of two or more models and you have a, you can simply write a simple Markov chain Monte Carlo move that will sample between the two different models. For example, if you have a Leonard Jones 12-6 and a Buckingham exponential six or a Morse potential, as long as you can write a metropolis Hastings detailed balance criteria that takes you from one model class to another model class where you also map parameters including Jacobian for that mapping, then you can simply write down a metropolis Hastings criteria and compute the acceptance probability for hopping between those models. And now when you run the simulation, it's gonna sample both parameters and which model you're in. And the more time you spend in one model versus another model is the ratio of Bayes factors. So it's just like you were doing a grand canonical simulation where you're asking how many particles do I have because you're creating and destroying different degrees of freedom for the positions of those particles. So if anyone has ever done a grand canonical Monte Carlo simulation, they already know how to do this reversible jump Monte Carlo for parameters. It's a very standard technique. It's a little bit tricky because sometimes you need to optimize the proposal of how do I go from mapping of parameters in this model to mapping of parameters in that model to also have a high acceptance rate, but it's a straightforward engineering challenge. All right, just to give you a simple example, let's see if I can get this graphic to play. So what Josh Fasten, my group has done, for example, is to show that the simplest case to illustrate is what happens if I have a mixture of Gaussians and I don't know how many Gaussians there are and I don't know what components the parameters of these Gaussians are. So what would happen is that if you mix these Gaussians you see on the left and you get this mixture distribution, a statistical sample will give you this. And in this case, it's maybe fairly obvious to pick out by eye that there's three Gaussians, but we can't scale picking this out by hand to the number of calculations that we need to do. So if we feed this to a computer that can do reversible jump where there's moves where we create and destroy different numbers of Gaussian components and also sample their parameters, what will happen is that you can see the means change over time and you see there's different numbers of traces as it creates and destroys different Gaussians and eventually it settles down and finds three Gaussians that fit the data quite well. So I'll see if I can actually play this. No, I guess I can't play the, it's an animated GIF. It looks very cool if you watch it. So I'll see if I can play it separately elsewhere. But also you can see that the weight is of those Gaussians change and the standard deviations of those parameters change on the bottom as time goes on. So this temporal trace really traces out exactly what you're seeing. And if you look at the, so you can see on the left it looks very different from on the right, just like an molecular simulation. The left part is sort of the equilibration. We throw that data out and we look at stuff on the right after we've run it for a while and that tells us about how likely each model is to be to fit the data and how much we believe each of the models in terms of these Bayes factors. So I'm hoping this next one will play too because it's super cool. So the basic idea is we're taking this and putting that into the question of how many different types of things should we have. And you'll hear in a little bit later that this Smirnoff approach to typing things direct chemical perception uses atoms, bonds, angles, and torsions. It types them directly. So we're hoping that this method could actually be scaled to determine how many of each of those types do we need to fit the data well without overfitting. So one way to do this for just an atom type, for example, for a Leonard Jones would be to choose a parameter from the current list in a hierarchical type and we can create a child or we can kill a child. So we delete, maybe that's a bad thing in out of context. We can create a child parameter that's a more elaborated version of a parent parameter or we could delete one of those leaf node parameters and collapse the parameter space. But we have to be able to go back if we can go forward so that we can write down this reversible jump criterion, Metropolis Hastings criterion. And there's many ways we can elaborate an atom type or a type we can delete an atom, we can add a substituent, or these decorators, for example, we can add different kinds of things that in this smart typing that you'll hear about shortly elaborate on the description of chemical space, the fine grain description of chemical space. And so for example, we can make fairly elaborate ways of describing chemical environments where we've tagged one particular atom as a colon one. So any of these are this first atom and then any of these are a second atom. We can elaborate on ways to describe them in the middle too. And so this process, for example, can generate really interesting chemical variety that leads to a typing tree. So we're always typing everything. It's just a matter of which of these leaves you end up at, which of these children ends up typing your particular atomic environment. And if we run this to a certain extent, I think we hear more about this a little bit later, this method has enough different variety in it to recover the human derived atom types that people have used in the past. And so Caitlin, I think are you talking about this in more detail a little bit later? Yes, okay, great. So Caitlin will do a much better job at explaining this than I will, but I just wanna show you this concept can be applied to actually learn parameters. So what Josh and my group has done, for example, for our first example rather, is to sample over GVSA atom types to fit small molecule hydration free energies. Now this is not something that's really gonna drive our parameter engineering because we're gonna use condensed phase properties rather than hydration free energies. We're gonna not be building GVSA models, but we're gonna be building Leonard Jones parameters. But this is just the fastest, simplest wave of testing this idea because it's very rapid to compute hydration free energies, basically instantaneous. So one of our moves might start off with a set of parameters which are descended from a universal parent type. These are just different elements and with different GB radii. And then we can propose, for example, that we're gonna take number eight oxygen and create a new, sorry, it's oxygen, right? Yeah, and create a new derived parameter type where it has two substituents, right? To distinguish it from one that has one substituent. So a carbonyl from an alcohol, for example, or an ether. And then we can propose a new value for this parameter that is maybe a Gaussian perturbation on the original value for that parameter to create a new type that could be more specific. And maybe this model fits a lot better. Maybe it fits a lot worse. Either way, the prior will also penalize complexity by preventing us from creating too many atom types. For example, if it doesn't make any difference in your likelihood for the data that you have this other atom type, then there's a cost to creating this atom type that has to do with which parameters you actually choose. So let me back up on that actually. So if you have to have exactly the same value, 0.121 nanometers to get a good fit to the data, when you create a new type, it also has to have 0.121 for that oxygen to get a good fit to the data. It's gonna penalize that creation because it's a small chunk of the prior. Let's see if I can actually get this to play. Is there a way to do this? No, it's pointer play. No, it looks like it's not gonna play the animated GIF. So online we'll make this available at some point. But the cool thing is that this would be a real time display of the creation and sampling of different child atom types and the sampling of the born radii using exactly the same procedure I mentioned before and sampling the Gaussians. And the long posterior trace would increase so that's your likelihood of fitting the data. And then you get a histogram that says, here's how many GB types I think there should be. And there's a peak for this histogram that tells you I think there should be maybe about 10 different atom types that are justified by the data but it tails off on either side. And if you run it for more elaborate sets of data, for example, you can get very elaborate models that have discovered really interesting kinds of chemistry. In this case, I think this was the end of a very long run where Josh found it had found nine hydrogen types. That's usually where people elaborate these GB models and found 20 different carbon types. It found that nitrials, for example, had distinguished sulfonyls as being important enough to warrant their own class. So the cool thing is that this can actually discover chemistry that is important enough to deserve its own parameters in a way that doesn't cause a combinatorial explosion of parameters. All right, so what we're thinking is that this second generation fitting approach will involve using this Bayesian inference concept. And so we're gonna reuse all the software components that we're engineering. That's why we're focusing a lot on making modular components that we can build together. And so one of the things we like to use it for is automated atom and valence type determination. So you're gonna hear more about Caitlin's work on the Smurkey sampler, which basically says Monte Carlo moves can rediscover existing types that have come up by humans. So hopefully it can do a better job when it's actually driven by data. We're now working on sampling over GBSA types, just as a simple example of something to refine the procedure to discover chemistry. The next step in this will be working closely with Michael Schritz's group to come up with automated Leonard Jones type determination with the idea that sometime, hopefully in the second year of the project, we'll not just be refitting existing types, but we'll be discovering which types are necessary to actually get a lot better increase in accuracy. And then beyond that, we'd like to really play with mixing rules and ideally functional forms. But we also have to work with the package makers to make sure their MD codes can support those different functional forms. And part of this is a chicken and egg problem because we want to be able to show that these other functional forms are much superior to convince them to support these, but there's also some possibility that we can convince these other packages, package makers like Gromax and Ember to support more general functional forms or encode them as spline fits, for example. Oh, here we go. Thanks. If you can use that. So when you talk about automated mixing rule or functional form determinant, it's limited to the functional forms that you give it. Am I right? It doesn't, okay. Yeah, all of these, you have to define a discrete, it could be large, but it could be, it has to be bounded, space of functional forms. So you could still do things like maybe I have a polynomial and I can have parameters to the polynomial. I don't know what, which powers I want to include or not include. You could also choose a spline and you could have different values of that. We've been talking about how we're going to interface with the community that's working on neural network potentials. That's a very general functional approximator, which in principle could either be fixed or maybe even the topology of the neural network could be a hyperparameter that's sampled this way. So there's many interesting ways you can enumerate a discrete choice. In our case, we want to learn from the things that people have tried before. So we can try the Hulgren functional forms and things like that, that people know work well and just see which ones are best for the short term. But we can go far beyond that in the future. Okay. I'm up with crazy things. It's like let's build some solids to do some of it. All the severing things that people, and then learn from that. Just as long as I have it, one other question. Yeah. In, how does this handle intrinsic redundancy and parameters in the functional form? Intrinsic redundancy of parameters in the functional form. So you mean redundancy in the sense that you have two versions of the same thing and you really only need one or that there's tight coupling between the two different parameters? No. For example, in a torsion, if you assume nine torsions or... Yes. Okay. So we're actually, we have a high astern if you can raise your hand. She's gonna tell us a bit more about the fragmentation pipeline but she's been working a lot on the torsion fitting pipeline using Bayesian methods. There's multiple minima. There's slow mixing. There's all sorts of interesting things. There's like really a low dimensional manifold that all of these torsion combination coefficients live on. So there's really interesting challenges there. It works fine to sample this nonlinear manifold. In the case of torsions, you could either just sample over all the coefficients and it will still propagate the uncertainties or you can actually turn on and off different Fourier terms and then you have less of a degeneracy in that space too. So both work well in this framework. Sorry, I'm not sure I see it. Maybe I'm not understanding but let's say you have a double bond. So there's an intrinsic degeneracy if you treat it as four independent torsions. Yes. You can move them in different ways, yeah. And that... And fit that data equally well. You can't... What you could take one out but I mean there are lots of solutions there. So what this would do is it would... The Bayesian method, if it mixes well enough, it'll find all the solutions and sample them because they fit the data equally as well. So you would get out of this all of the different minima of this fit space, fit error, that fit the data well. And that would be our different parameter sense. And if that causes problems in the future because one of those minima, that turns out to give a different prediction for a different context. And but we didn't include that in our training set. That's part of the characterization of the error in trying to estimate the systematic error by reevaluating with different parameter sets. As you condition on more and more data, the hope is that if we have the right kinds of data, it will narrow the number of different redundant minima. But if it's really that the parameters, the way they're constructed are simply redundant. It'll sample over all of the redundant combinations of those parameters. All right. So we're also hoping that when we put in, so Leaping's regularized least squares has a very clever optimizer that tries to give you the best bang for your buck and as few iterations as possible. But going beyond this, we'd like to start sampling more broadly and really characterizing all of the different, very wide basins of parameter space that fit the data well. So sampling with this MCMC will avoid searching local minima and we're exploring different ways to do this in parallel. It becomes very important that you're asking for many different parameter sets in parallel so that when your infrastructure is grinding on, running new simulations with different parameter sets, you're paralyzing that as much as possible to reduce the number of iterations and the amount of wall clock time. So distributed computing is playing a very large role in this and then we'll be working on ways and I think the killer app is really binding free energy calculations. So if there's other things you're computing where you'd like to be able to compute uncertainty and quantify that from the systematic error, then please talk to me. But I think developing a tool to compute what that systematic error is and our predicted binding free energy use is gonna be super important. So there's a few challenges that we're still working on though. So this is why it's a research project and it's towards a longer term. So we're not gonna be using it for the first year at all. One of them is that if our model space is too small, that is if we have too little a space of functional forms to fit the data because none of them fit particularly well, then the estimate of our systematic error can be misleadingly small. So only if your model space is big enough to accommodate things that look like the real solutions, the real models, then you'll get a good estimate of what that systematic error is. And we're working on ways around that, for example, one of them is just to ideally ensure that our potential function model space is large enough but we're constrained by what's implemented in the force field packages. Things like neural networks can actually help us generalize that substantially. Go ahead, yeah. But the other problem could be that if you're too general, then you have space in your parameter space that are not evenly affecting the error, right? So you could be having all your samples in a given space and not realizing that there's another space that may be smaller and has a bigger effect on the unknown. So in principle, there's a sampling challenge there, right? If we sampled space broadly and completely, that wouldn't be a problem at all, we'd accurately capture the uncertainty. But the real challenge as we run into with our day jobs of molecular simulations and sampling real protein confirmations is that you can get stuck and you might not find the thing that you'd want to find, right? So that's exactly the challenge here, it's the sampling problem. But we do that as part of our day job. If you make the space though, the more you have a challenge on the sampling. I'm sorry, that one more time. Bigger you make your space. The more you have. Possibly. Sometimes it actually gets easier. Sometimes you have artificial bottlenecks that are relieved by allowing to go around. If you run a parallel tempering simulation, you know that if I now have a different parameter, temperature that I can go up and down on, sometimes it's easier to get over barriers. So it's not always the case that that's true. All right, and then there's another way to do this, which is to introduce additional model errors, which are nuisance parameters that basically characterize how badly we're doing it in each class of experimental data. But that's something that we can infer along with the process. So it's okay that we have a limited functional form for reproducing this kind of thing. We're going to add a model error term and then infer that as part of the model. The prior that we used, it was called Jeffree's prior, it penalizes this model error being too big to fit everything too well. But if it's too small, then the likelihood becomes very small. So the data error becomes very large. And like I said, the efficiency is the real concern here, these reversible jump moves are difficult to make efficient, which is why we're starting small on very fast things. But we have cool ideas about how we can really make this work at scale. And if there's more you want to learn about the theory and philosophy, there's a great book by Edwin Jains, who's also the originator of Axiom Entropy. And I already mentioned this book by John Liu, and all these slides will be online in just a few minutes. So with that, I'll thank you for your attention. The experiments that we're dealing with are in this GitHub repo. Everything is online. And a lot of the toolkit stuff is being built into the main toolkit we'll hear more about tomorrow. There's Slack channels, if you want to discuss any of our progress on this. And happy to take any more questions you might have. One of the criteria you mentioned for including new functional forms is whether the force field does well enough or not. And I think that's one of the most tantalizing questions as to what's good enough. So what is your criteria? So the criterion that Bayesian inference uses, it's a little bit hard to understand because it's somewhat complicated. These are whiteboard walls, but we don't have a whiteboard. We do, okay. So I apologize for the people who might not be able to see this at home. But I just want to give you one click idea. So suppose we have a value of parameters theta where you really need the parameters to be in this small region in the middle to have a good fit to data, right? And so I have a current value of this parameter here. And now I'm going to make a new type where I'm going to have another value of the parameter. So instead of that beta one only I'll have theta two as well. And in order to fit well, it also needs to be in that same region because it's a Lener-Jones parameter. Maybe it's oxygen versus just the general oxygen parameter initially and maybe we have an ether parameter but they should have the same Lener-Jones radius. So now it has to be in this tiny little square. So what that means is that if my prior allows me to sample anywhere in this range of A to B for either of these, now instead of having our ratio for the prior be this little region H over this interval, let's call it L is this width. It now has to be H squared over L squared which is much, much smaller. So the prior is going to penalize you if you have to be in such a narrow range that the increase in probability of the likelihood doesn't make up for that loss in prior. So a very natural way so that it prevents this proliferation of parameters but it really does depend upon your prior in this case. It can't be so big that it permits anything. I was referring to something else, I think. Oh, sorry. That's okay. When you're talking about increasing your functional form space. Yes. So the criteria for adding new functions presumably is that you've done your optimization with your set of functional forms and you didn't get a good enough fit to the observables. Yes. I thought you said that was your criteria for adding, perhaps adding new functional forms. And the question is, what is the criteria for deciding you need new functional forms that your force field is not adequate to fit your experimental properties? I see, yeah, yeah, yeah. Two-kilo-calorie, two-kilo-calorie, finding energy with densities. The human criteria is just whether or not we're capable of exploring additional functional forms. We'd like to do it if we have the resources to do it and if we can actually deploy it on the different tools that people want to use them on. So if we get to that point where we can now start to explore different functional forms, the next question is, what is the space of functional forms we can enumerate that will be efficient to sample over? So we're really, again, constrained by resources then. Once we do that, the criteria that the sampler uses is simply asking if I was to change functional forms and find a region of parameters that increases my data likelihood so much it can overcome these entropic penalties of having more parameters or more functional forms, then the sampler is gonna prefer that. Maybe it also is the worst case, which is that it doesn't care. Like it can find good fits in any of these functional forms. And even though you've done all this work to add four more orders to your fit, it gets the same likelihood essentially. So in that case, you'll prefer the simpler ones. Any other questions or should we reconfigure? Are there any questions from Zoom? We have another question in the back too. I don't hear anyone, so let's go ahead, yeah. So in part of your parameter, how are you gonna represent your parameters? It's kind of intriguing because you're gonna have some where you're gonna show what the quote error is around a parameter. You're gonna have some features where you have an acceptable range for a parameter where it works well across whatever, any of those spots in your banana. So are you, and obviously you're gonna work to restrict to the overlap of those, but how are you gonna rep? I'm trying to understand. Yeah, how we represent that. How do you represent representation? So someone if they're going to develop a new parameter can say, oh, I really actually do have a choice of picking anything in this ellipse and not necessarily have to restrict myself to the exact point that we picked. So I'll start with one, an important caution, which is that people find that individual parameters, often the fit to data is highly sensitive to those. Like you can't change any of the, like you can't change a linearger in Sigma without changing the Epsilon as well, right? Changing one thing at a time, they're all very sensitive to that. But this nonlinear phenomenon is that you can change them both together in this nonlinear way and be it basically equally valid fits to data, equally good fits to data. And the way we intend to summarize that is that what the sampler does, it runs a simulation and samples different parameter sets that are all equally good. And so we'd actually get multiple complete parameter sets and you'd need to figure out what that looks like or you would just use the different parameter sets as equally valid ways of making predictions. And we're hoping to make that inexpensive by picking one good set right in the middle and then allowing us to save some data during the simulation that we then rapidly reweight using the other parameter sets at a very small fraction of the cost. So it might be 1% or 2% of the overall simulation time for a free energy calculation. You could also get this reweighted idea of what the error is by reevaluating the parts that change on those different parameter sets after the fact, only saving one out of every thousand samples, for example. All right. Next up is any more questions from Zoom or got a few minutes. Yeah. Or we could have more coffee. Okay. We could get started. Do you want to start a little bit earlier? Yeah. We can get to lunch early. All right. So next up is David.