 All right. Can you all hear me? Great. Next, we're going to talk about fitting and benchmarking condensed phase properties. We heard a lot about quantum chemistry already this morning and how we're going to use that information for parameterizing valence terms. But for now, let me share this. I go back to sharing. We're going to talk about how we compute some of the condensed phase properties. We had a breakout session yesterday afternoon where we overviewed some of these concepts. But we also wanted to get a little bit into the science here. And Michael Schurz is going to come up to present the next few slides. So this is a large effort where many people are involved in different aspects of this, with the idea that we're building a common infrastructure, that we can then all plug in and explore different ideas of what kinds of condensed phase data we want to use for assessing the quality of the force as well as actually fitting parameters. One of the main motivating factors is that our first effort here is ignoring polarizability. We're building fixed charge force fields where we know that we have to build the effects in in some average way. And even if we did have polarizability, there's other things we might have to include from condensed phase data in order to get high accuracy because we're still missing things like nuclear quantum effects or other things people think are important. Let me just give this to you in case there's a question during that. All right. So the properties that we care most are often very difficult to compute directly, like binding affidities for complex protein ligand systems. And so our goal is to find condensed phase data that informs the parameterization of force fields. But it's still very easy to compute, both for fitting and for benchmarking. There's an interest in getting good quality protein ligand data, and we'll talk a bit more about that in the afternoon, perhaps, for benchmarking and assessment. And then maybe there's also possibilities for using things like host-guest affidities, which are much more rapid to compute, but like protein ligand data for the parameterization. But right now, we're focusing on properties of organic liquids because that's been traditionally used with great success. And there's an abundance of this data. It's all free, and it is in a structured format that allows computers and humans to read it. There's lots of different kinds of experimental data we can, in principle, access that's reasonably easy to get to. Some of it is a little bit harder and a little less clear about how we would get open data sets, but we're still looking into those as well. The different kinds of data that we look at inform different classes of parameters or different classes of the interaction potential in different ways. And we're really working on trying to prioritize how we tackle these, but still make it possible for different researchers to independently explore how to bring these different data sources into our parameterization scheme and then how to make sure it's modular enough that even though they're developed independently, when we bring them together, everything can work together for joint parameterization. We found in the past that benchmarking against condensed phase data can identify systematic issues. As was brought up the other day, I can share the pointer, there are some systematic issues, for example, with some of the liquids being displaced in densities, for example. And these are things that we can look into about whether there might be some systematic parameter issues or some other data sources we can use to improve things. This was just the Smirnoff 99 Frost unoptimized parameter set. There's also this issue of this being water, and we're probably not gonna use Smirnoff water to begin with, but it's something that, at least you can see that it's a clear outlier and that it deserves some attention. We also identified other systematic issues with dielectric constants where these are inverse dielectric constants which highlight the way in which this introduces an air into the interaction energies of different molecules, and you can see that certain force fields have, like Smirnoff, have very poor reproduction at very low dielectrics, which may cause some problems in reproducing things like transfer for energies into low dielectric media. This is something, again, we can improve by using some dielectric data to improve our force fields. We'd also computed things like hydration for energies. Their utility for condensed phase properties is still uncertain because of the fact that they don't reflect the, or they contain something that we don't include, which is the polarization contribution from going from gas phase to water phase, and it's still useful to be able to benchmark them reproducibly and reliably. There's some really exciting stuff that's going on in my Gilson's group with being able to use host guest systems, which make lots of derivatives of these that could inform things like amino acid-like interactions with small molecules that very much resemble drugs, and we're getting to the point where we can compute those more and more reliably and more and more rapidly, so those could be used in benchmarking and then fitting of force fields as well, and are possibly very useful in decoding different failures and then hopefully fixing them by identifying the right data sets to include. So we thought that Leonard Jones' parameters haven't been revisited in quite some time and certainly merit an opportunity for improving the accuracy if co-optimized with everything else, and there's sort of a little bit of a history about when these things were last modified. A lot of them haven't been touched in some time. There's a really cool, very recent effort in 2018 where Charm's, Alex Mackerel and Ben Waru have reparameterized a lot of small molecules using a set of meat-liquid data. We still haven't seen how that is reflected in predicting or how that changes prediction accuracy because it was done independently of the other terms, but it'll be interesting to see how that evolves for them as well. And they only used densities and heaps of vaporization of pure fluids, and there are so much data on mixtures that reflects how different kinds of functional-group moieties interact, that it seems silly not to make use of that. So that's something we would very much like to make use of. And it's not just binary mixtures, there's ternary mixtures, there's lots of classes of data that we could explore. So this was, a lot of you have seen this before. This was sort of our attempt to figure out what's the easiest thing to do that will give us the biggest win first. We've heard a lot about reparameterizing the valence terms and the refitting the torsions or coming up with the valence types. And we think early on this is very big improvements from using liquid data that informs the improvement of Lener-Jones parameters and then getting into things like the host-guest thermodynamics. And then later on, we're thinking more about possibly reparameterizing everything for our initial sprint here for small molecules we're trying to keep compatibility with the Lener-Jones parameters of biopolymers without having to change those. And there's some discussion maybe that'll happen in the afternoon about how best to do that, how best to preserve compatibility with protein parameters from Amber. So what we're trying to do as we've heard in the software discussion is come up with a design that really allows this effort to grow and scale and to allow independent researchers to do their research somewhat independently and then plug the fruits of their efforts into this modular framework that will allow us to use everything together in a synergistic way. So a lot of the focus has been on careful design and coming up with plug-in architectures through clear APIs and clear standards for documenting how we compute things is also going to be integral when we, because there's many ways to compute every kind of physical property. We'd like to pick the best way, very efficient, something that doesn't have systematic biases in it if at all possible, document that and make sure everybody agrees that this is sort of the best practice for the field. And there's other ways we can disseminate that information besides just documentation as well. There are cool things like the Living Journal of Computational Molecular Sciences that Michael and David have co-founded where we can report and continue to evolve our reporting of what we think the best way to do things is. We'd like to have a life cycle that I'll tell you a bit more about in the next slide about how we first use things for benchmarking and once you've got them to be fast enough, we can potentially use them for parameterization. And again, the way we're pulling all this together is there's basically three different levels of API. If you want to do your own benchmarking, you'd be using this property calculation API where you ask it for some properties given a set of experimental data and it'll tell you how wrong it is or how right it is. We have an API that allows us to develop new properties and by defining some code that plugs in independently into this property plug-in API and we're working on ways which we can have different surrogate models speed up our parameterization of these force fields. So we'll internally be dealing with all three of these and but we're trying to standardize them one, two and three as time goes on so that we can modularize our efforts. This is how, for example, you'd use the public API. This is just an example of what this might look like from the outside. You'd start with defining a data set and we're using this thermo ML archive which is a NIST archive of lots of data that has been published in thermo physical property journals that is reflected in IUPAC standard XML format. So it's a very standard format where the data is captured directly along with its uncertainty, its provenance, the exact specifications of what kind of system and their dynamic conditions. There are journal site keys that go along with this and might specify a data set so we could curate these ourselves or we could come up with other sources of these or URLs that you could pull from. So you can create something like a data set object which knows how to interpret all the data from that data source by going to the thermo ML archive. You can filter it based upon things that we want to actually compute like the excess molar enthalpy and you can then filter it subsequently by temperatures or pressures to get down to a data set that's going to be relevant for our different uses for other utility in dealing with chemical reactors that weird temperatures or pressures or if you're dealing with surfactants you might filter this data set entirely differently which is why we also want to make this reproducible for you to do at home. And then we can take our parameter set and create a property estimator which knows how to do connect to the open force field of the cloud for example and ask it for the properties of this data set given the parameters and then we can look at how the measured values and how the computed values actually might differ. So we intend this to be very simple to use even though it's hiding a lot of stuff that can be going on on the back end which is a continually improving efficient way of computing all of these properties. So when we come up with a new physical property we might want to include we'd first define an object model that represents what it is that the property measures and how to get it from a source database like thermo ML or the binding DB for example if we're using host guest data we then curate some data we think we might be interested in using for benchmarking and then later fitting we develop a forward simulation model which says if I run a molecular simulation how do I compute a property like a density or an enthalpy or a static dielectric constant? Then we would document those best practices so that we know exactly what the code is doing and that other researchers can check this. We develop an error likelihood model which says given the experimental errors if it's a property that has to be measured through multiple different measurements how does that, how do my deviations from that reflect differences in how likely my actual parameters are reproducing the right experimental data? Then we sort of integrate this into the benchmarking epoch for the next generation of plugging it into our infrastructure to see how well we're actually doing it all these different properties and finally after we've optimized it enough we can include it in parameterization and keep adding more data to this to keep increasing the accuracy of these force fields. So I think now I wanna turn it over to Michael who has much more to say about our roadmap for this. And let's see. So just a little bit about, it's a roadmap for which physical properties to use when. Backing up a little bit we know that all the thermodynamics is determined by the Gibbs free energy which is explicitly a function of temperature pressure and composition and nothing else. So the theoretical base is if we have if we have enough data that gives us the free energy as a function of temperature pressure and composition we'll get out with thermodynamics right. And most of the thermodynamics observables we get are actually derivative properties or second derivative properties of that free energy. And we'll always tell us why we get the thermodynamics right. So there are some other things we can measure that are not just derivative properties of free energy. Like for example, static dielectrics give the change express things to do the change of free energy with respect to an external electric field which provides additional information about electrostatic energy terms. So we wanna be thinking about things into what information does it give us when we look these properties. But we also need to balance availability of experimental data. Some experimental data is available. Some isn't information content of experimental data. Ease of producing or obtaining new experimental data for cases where the chemical coverage is not good for existing data and simulation expense. Some of these involve one five nanosecond simulation of 2000 molecules. Some of them involve free energy calculations that take hundreds of nanoseconds. And so in some cases we will want to generate new data for parts of chemical space that don't have enough information. This is pretty clear now. We can get decent coverage but there's going to be spaces. There's gonna be functionalities that we don't have enough data for already. So low hanging fruit, molar volumes and densities which are response functions free energy with derivative of free energy with respect to pressure is the molar volume. So that gives us that information. Static dielectrics give us the response of free energy to change an electric field. Temperature dependence of the above. If we can get temperature dependence of those properties then we start getting temperature and dependence as well. One thing that's problematic but might be useful at first is enthalpies of vaporization, hydration for energies and enthalpies. So why do I say they're problematic? One thing we want to do if we want to get liquid phase properties right we want to avoid to the extent possible that we're dealing with things that involve phase transfer especially if we're dealing with non-polarizable models. It's not a horrible thing to do that because we know that most current force fields use this information and are not that bad. But it's something and perhaps the science thing we can look at is to understand what, how much error that introduces. Why do we, why can't we just ignore that entirely? Because enthalpies provide temperature dependence information on the free energy. In order to get the thermodynamics right we do need to have temperature dependence there. So something, in order to really constrain the properties in order to constrain the primers we need to have some sort of temperature dependence. Paul was very, I want to say something. Just with the static dielectrics I was struck by over and over the graph that was in the paper from John and I think you go out there and maybe on the inverse static dielectric constant. So we know that like something that's purely like as non-polar as possible. Yes. Hexane or have static dielectric constants that are about two. Yes. A fixed charge model will always predict that they're one. Correct, yes. Is that actually a good property to go after or there's ad hoc post hoc corrections one can make and those were made. But to me that is of the flavor of the post hoc corrections that are made for polarization energy. Obviously for a very polar liquid that's less of a problem. Right, yeah. The dielectric will be mostly determined by that. But I'm just wondering if you have any thoughts about that or maybe it's better for the break after. Yeah, so fundamentally if you want to get the thermodynamics right getting the static dielectric is not actually necessary. So I think the idea is that we need a little more information than density. You can't just fit to the densities. And so I agree that the long run that might not be once we get enough good coverage with enough of other properties then can show that other things can strain it sufficiently that might be something that we don't want to include in the fit. We do also know that like there's clear gain to be had by including dielectric constants like alcohol data I showed. So you might not necessarily expect that it would be a great property to fit too but it does clearly help if we do. So maybe we'll have to see how far that goes. Because it gets some things right by getting the change of the fringe with respect to the electrostatics. So it's a good signal if the electrostatics are badly wrong. But maybe it's something we need to think about some sort of priors on it that if the dielectric is less than a certain amount then we neglect the dielectric. We just, we turn it off as the dielectric goes below 10 or something. They turn off the dependence. Which Bayesian inference can, we can stick the priors on that help us do that. Just one comment about the enthalpies of vaporization. I agree with the point you made. To the extent that we do include it, I think that you're on at any rate relatively safer ground if you don't have flexible molecules. Because what you don't want is a molecule that is open in water and then forms an internal hydro bond. Right, yes. It's rigid then there's a polarization problem but not the conflict. Right, and this is what I'm saying we might be useful at first. In order to actually constrain the parameters we might need to include even flawed temperature. The energetic data which provides the temperature response to free energy that is required to get the full Gibbs free energy. Okay, so of course we've talked about volume dependence and temperature dependence but there's also very importantly composition dependence. And some things that are nearly as low hanging are density and dielectric from mixtures. The partial molar volumes which is essentially the difference between ideality and the actual volume of a solution can provide a lot of information about the pairwise interactions. There's a lot of dielectric information from mixtures and looking at variation might be fruitful. Compressibility thermal expansion, speed of sound are second derivative properties that people that have been working on simpler systems like refrigerants and alkanes, fossil fuels have discovered that you can get a lot of information. If you include this there's a lot of experimental data there and it can get you a lot if you can include that as well. The speed of sound, it's a surprising amount of data, a literature data and it turns out that it ends up frequently being a canary in the coal mine for force fields that if speed of sound goes wrong before anything else when parameters start to drift. Partial molar volumes, heats of mixing, residual heat capacity, so these are all things that there's a fair amount of data on and are very easy to do simulations for. Instead of just one simulation you need n plus two simulations where n is the number of compositions you look at. And so heats of, and let's say about heats of mixing is there you start getting temperature dependent data that you don't have to worry about phase behavior. You just look at what's the enthalpy as I go from two pure fluids and then I mix those fluids and those things, then a lot of it, your reference state is no longer the vapor, your reference state is the pure fluid. So that's something where we start tying much more directly to exactly what we wanna be measuring which are things like drug binding affinities. So those are pretty low hanging and in many cases there's a lot of good data on these. More computationally expensive the probe intermolecular energies, partition coefficients, infinite dilution activity coefficients which is again these are essentially getting at how does the free energy change of the function of composition, relative solubilities. All these require free energy calculation but for small molecules they're not, we're talking an order of magnitude more computational power. These are things that NIST has a lot of these data for that are less well curated than some of the others but something that are things that if we we know how to do and we can include them. More complex but closer, even more complex, closer we wanna predict host gas binding free energies with chemically tuned hosting guests to get at chemical interactions we're particularly interested and Mike Gilson and his group have done a lot of work in this. Ligand binding free energy calculations that we are confident can converge. Now this is sort of the, this is what we wanna be doing, right? We wanna get that right but it's not gonna be something we are going to include in parameterization in the near term because it sort of hits all the bad things. It's very expensive to do, we're not sure if we do it and the experiments unless they're carried out under very careful conditions can be problematic in interpreting them as well. Thanks. Thanks. So do you mind describing just quickly how the parameters get fit to all of these? Like what's the basic schema for, like I get how you fit torsions and angles but like how does all of this get, like how do the parameters get up fit to all of these? So maybe be a little more precise. I wanna. So you have your force build and you wanna optimize the parameters and you want it to be optimal to all of the valence properties that have already been described. Then you're like, oh, I want the densities, right? Two mixtures. How does that happen in practice? Why don't you go. Yesterday, Lieping talked about different, you have different. So it's force balance. So the initial phase is gonna be using force balance where the objectives are targets that Lieping has defined which have different error models. So there's a penalty for deviations from densities. There's a weight that controls how much you weight different classes of these targets and I'm sure Lieping can talk about it in much more sophisticated ways. In the Bayesian epoch of this, then we're thinking of using a likelihood function. Both of those, you need some quantification of the experimental error because you want to weight deviations that are really important or less than others. That's what I thought. I just needed it all together. Yes. Again today. So it's either contributing to this error penalty function that's also regularized in force balance where it's contributing to likelihood functions in the Bayesian epoch. Thank you. Okay, so others. I think there's some other things. Oh, yes. Yeah, I got a quick question. Could you comment on the quality of the experimental data you are using? So all data's it's curated. Right. So good question. So a lot of the data that's... Error bars. Yeah, so the data that's in thermal ML, it's curated in two ways. One way is they just suck up everything in the journals and put it in there. But for, especially the pure data, I have to check on the mixture data. They run another set of in-house air analysis where they actually fit the data to a thermodynamic model and see if it's all self-consistent and also if it's consistent with thermodynamics. In other words, your heat capacity is at different temperatures and your enthalpies of vaporization, they have to be related. If they're not related, something's wrong. So there's both self-consistency within the data and thermodynamic consistency. For the mixtures, I'm not as clear what their internal data models are. It's complicated that these data models are not open and has to do with how the Department of Commerce's Charter is complete. But the point is that they're on board and they're gonna be helping us look at the data and make sure that we're not using... So they have their... Besides the paper published to Error Bars, they have their Error Bars as well from their internal uncertainty models that are essentially always larger. We're probably gonna be using those and we're gonna be working with them to make sure when we bring the data in, we're bringing in data that has been... They've been looking at and can tell us a little more about what's going on there. Unfortunately, they can't be here because of the shutdown. So I was hoping that they could say more about that. But anyway, yeah. That deserves a whole talk in and of itself. So there's other things that are going to be hard to fit to because of lack of open data sources. But things that we could put in here as benchmarks that maybe would never be part of the parameter data, but could be put in as benchmarks. Small molecule crystal structure. You don't want it to screw it up too badly. Like there's a number of other things that were floated yesterday as benchmarks, but might not match all the criteria for things we actually want to parameterize to. So we're gonna draw and develop high-quality data sets. So we want high-quality computer-readable data sets. With those experimental uncertainties, also very important in order to know how much we should weigh in the fitting. And unambiguous definitions of systems. So the NIST demo archive, I've talked a little bit about for all the liquid data, binding database subsets, free solve for hydration free energies, partition coefficients, relative solubilities, activity coefficients, can industry provide some data since it can be made public? And important thing is NIST can archive it for almost all these things. And NIST can provide essentially doing what they do with the journal data. So it's out there and can be put in thermal ML and can be read. So liquid thermodynamic data, looking at it a little bit. So NIST thermal ML archive, we can do some sorting through this. And what we see is that there's a lot of data, usually for any given set of mixtures, there's a lot of temperature pressure and composition data. Usually most experiments do, most papers, they take those from a lot of experiments. What atmosphere and biological temperature ranges and chemical coverage is somewhat sparse. So alketh-LH is sort of a molecule, a limited parameter molecule data set that just has alkanes, oh wait, ether, ethers and L alcohols. But you can see some of the sort of interesting what there is here. So it's interesting, there's a lot of density data, a lot of speed of sound data. Some dielectric constant data, heat capacities, enthalpy vaporizations are moderately well represented. A binary, there's as much binary data as there is pure data. Density, speeds of sounds, dielectric constants, excess molar enthalpies, excess molar heat capacities, excess molar volume, there's decent amounts, binary activity coefficients. So there's a lot of data there that can be drawn from that has a lot more information content, I think than just pure properties. How are you planning on handling ionizable species in this? Great question, this is something we've been talking about and how do we deal with changes upon ionization and mixing, things like that? So you say you have even a pure liquid that involves a carboxylic acid or a zwitter ion, something like that. I could imagine when you're calculating the thermodynamic quantities out of that, you might have some mixture of ionization states. I would imagine that what we need to do is we need to be doing some quality control, putting in the making sure that we're running simulations where the protonation state is relatively unambiguous. But maybe Micah's. Well, I think that's the thing working, that I mean, I think there is an issue because your typical pure liquid data will not be for ionized species. That would be an ionic liquid, which is another whole field, which maybe worth looking at actually. But like acids. Acid and water. In water, yes, but not pure. Yeah, it would be mixture data and stuff like that. Yeah, you add some, I mean, and I guess the complication of it, I think, is that you wanna do this all in an automated, no, that's way, but that seems like that's gonna take potentially some manual curation. So we've been also looking into these ionizable species as part of the log D challenges for sample and measuring PKAs for a bunch of different compounds, and then also measuring log Ds and log Ps separately, having people predict various things. So I think there's going to be a very fruitful route by looking at octanol water partitioning data for species where we can get good PKA measurements experimentally as well. So there are constant pH simulation methods that we and others like then while we have been developing that will allow us to compute these properties for the full mixture of ionizable species. Otherwise, if you just have the experimental PKAs and you only know that there's like either fully ionized or not fully ionized, like one molecule in water, you can also compute, run two different simulations than combine the data after the fact. So there's a bunch of different strategies, but I think the near term strategy is all just avoid anything that's anything close to the PKAs. I mean, we put together the perfect roughly, you just pull it down from thermo and L, but I think for the near and medium term future, there's going to have to be some human creation just to make sure, do I really want that data? Not just about, do I trust the uncertainties, but is the composition well-defined? That exactly gets to that, and we're not gonna be able to automatically determine is the composition well-defined, but with that human intervention for a long time. I imagine this will be true of like a totomer species as well. Yeah. I think that especially, for a lot of the, the goal is if you're trying to parametrize by atom type, or not atom type, sorry, by chemical environment, then you're gonna wanna try to do as much as possible at the beginning, the simplest species possible, where it's usually possible to figure out, we know what it should be. But yeah, that's gonna be problematic when expanding the data set. So I think that's definitely something that is, it has come up in our conversations before, but it is good to be reminded by it again. So get this explicitly stated. Yeah, so eventually need more, so even looking at this, this is a lot of data, but it's not, the coverage is not great. This is heavily focused on things like alcohols, and alkanes, not that many nitrogen-containing species, very few sulfur-containing species. And so we know we're going to need to gather more experimental data to cover chemical space. NIST and Kader-Rab, they've been working on ways to potentially automate experiments to generate things that are relatively cheap, like densities, it's a function of composition, heats of mixing. These can be done in a controlled way and then pipe that data into thermal amount and start using it. But that, the first year, maybe, we definitely won't be doing that. We'll be working on existing data sets and seeing what we can do from there. Questions on, good questions here. Okay, computing physical properties are expensive. So one topic, this will take a little less time since it's more just presenting, there's not as many, it's a little more cut and dried than which data to choose, which involves a lot of thinking about it. So computing physical properties is expensive, even if we are doing a simulation just to calculate a density. Yeah, for one molecule, easy. But what about doing a parameter scan on that? Evaluating the least squares, air penalty, or data likelihood for each trial parameter set requires computing either expectation or which takes an hour, I don't know, minimum 10 minutes, 30 minutes on a GPU. If you're doing that over hundreds of thousands of parameters, that's really hard to do. Or free energy, which takes about an order of magnitude more, but both are expensive and importantly have statistical error. So we need a hierarchy of schemes to evaluate the likelihood, even for multi-dimensional optimization. So what we wanna come up with is a strategy where more expensive levels that is doing a simulation are only used as needed, such that the inference cost is immortalized over all of those measurements. And then when converged, then remaining evaluation, you always wanna be doing at the end a couple more simulations at that data point. So you have confidence, you wanna be not doing your evaluation on the actual simulation, but some well-characterized model of the simulation. So in the way we're thinking of doing, we've started doing this together, our most expensive level, the true thing is doing a molecular simulation at a parameter set. Using reweighting, using mbar formalism to estimate new properties at different parameters that are nearby. And then finally using some sort of fast surrogate model, a Gaussian process neural network trained on the next lowest level, or trained at previous lowest levels in order and then doing optimization on that. And we want it to be a way of fitting with uncertainty characterization. So we have a sense of knowing when we've gone beyond the bounds and can perform more simulations there. So let me, so yeah, so this is the idea that I need to say most expensive reweighting to bar surrogate and then using surrogate models where you originally start in some region far from equilibrium in the parameter space. It's in equilibrium in the simulation space, but it might not be ideal parameters. It wanders out of your trust region and then you'd run another simulation, generate a new mbar surface, explore in that area until you leave your trust region of that expanded area. And the advantage of mbar over sort of like standard one step perturbation is that you can use all of the data you've generated so far. All the simulation data, different parameters can be combined together to create expectations at new parameters to use all of that data. And you continue until you get to a point where you start to converge. At that point you've got, you can run a few more simulations until you've lowered, you've got much lower confidence interval on your properties and your essentially surrogate model matches your simulations to within precision. Questions on this? I say three here. We could in principle use more levels. It's not clear we'll need it, but the goal is to design the algorithm such that they can handle an additional level if needed. Okay, so that gives you preliminary data for how we put this together. So reweighting is cheap and can be made arbitrarily good. Let me say what I mean by that. So reweighting to one state, probably most people have seen this, but I want to just make everything as clear as I can. So if you have some observable, you have the expectation at some state i. In this case we'll generally be dealing with different parameters. So that's one particular parameter step. I want averages with parameters at i. What I can do is I can calculate, I can generate data from simulation state j if I reweight each of them by this factor. And this factor is just, this is just the Boltzmann, the normalized Boltzmann distribution. There's no way to, how do we go back? Anyway, I won't touch it, so that it won't keep on advancing. Right, so e to the beta, free energy minus beta change in free energy from state one to state two, minus beta changes potential in energy of that configuration from state one to state two. That is just the Boltzmann weight. And all I've done is I've taken the partition function at the bottom and just moved it up to the top. So this is a normalized probability. So all I'm doing is essentially reweighting that observation with a difference in probability of how probable that would be in state one versus how state i, versus how probable it would be in state j. So samples that are very like things you'd already see get in high and i get high weights. Things that are unlike what you would see get low weights. So it essentially is a filtered average, the weighted average, that says I can tell by the basic laws of thermodynamics if this is something that I would see in parameter, I set in parameter set i or not. If I would see it, great. I'll include it in my average. If not, I won't include it with a weight corresponding to the probability. So that's what we're doing is when we run simulations, a lot of those configurations would also be seen at similar parameter sets. And the free energy we can use, we essentially, with the same data, we can estimate what that free energy difference would be as well. And in formalism, maybe the uncertainties are, you can get uncertainties out of this, they're well controlled, or you can use bootstrap, which works even slightly better. Okay, but the problem is you get poor convergence. If your difference in potential energy from one state to the next state varies too much. In other words, you have poor phase state overlap. You don't have any configurations in your sample from parameters at j that you would see in i. So the solution is to await to multiple states. So one way of writing this is, okay, so now my properties at state i are equal to a weighted ensemble of, I know what I have here is, I've run several simulations at several states, one, two, and three. And if I'm interested in a particular state i, then the numerator here is the Boltzmann probability of that state. The denominator is the sum of the probabilities from each of the other states I'm interested in. And then what I do is I average it over the mixture distribution. In other words, I just average it over all the samples I collected from all the simulations. And so your central set, you've collected simulations in a bunch of different parameter states. This weighting allows you to say, ah, this particular configuration, would I see it in parameter state i? Yes, each of my individual samples will contribute a little to that average. And they'll contribute to that average according to how likely they would be in in my particular parameter sets. This is what's called, it's reweighting from not a single mix distribution, but the mixture distribution of all the data I've collected so far. So you're essentially going in and picking up, I just say, hey, here's all my data at all the parameter sets. This is a formula for figuring out how probable would a given sample be in the parameter set I care about now? Questions on this? This is important to understand what's going on. See, yeah. So my impression, the way you described it, is that it would be pretty strictly interpolative. In other words, you'd have a trust radius. You could extrapolate a short distance around the trust radius. But really the way you're gonna use this in practice is like a fairly wide spaced gridding of parameter sets, and then you'll be trying to interpolate to the right. Right, so I think there's a couple different ways that we could use. Ideally, the way we'll use it is we start in a region, start wandering around, and when we exit the trust radius, say, I'm gonna carry out another simulation here. And now my trust area is this. And I keep going, expanding that trust radius, and what I'll do, whether it's the optimization or whether it's, let's think about it in terms of Bayesian inference, that my probability is gonna pile up at the edge. And it's saying, I really need to go further that way. Let's keep expanding the trust radius by carrying out more simulations. So let me give an example of this. So what I have here is what the level plots, so what we have here is two-part plot of root and square deviation of saturated liquid density from a vapor-liquid equilibrium simulation. And this is data coming from a collaboration with Rich Messerly, who's a very talented NRC postdoc at NIST, who I think is listening on Zoom right now, since he couldn't come because of the shutdown. And what you see in red lines are direct simulations. You carry out about 400 direct simulations and interpolate between them. And then what I have in the, ignore the bluish line, the green line is getting M bar from a single reference point. What you see is that if you're close in parameters in sigma and epsilon, it gives you almost the same RMSD as the simulations. If we then include a set of M bar simulations at a single epsilon, but a series of sigma, because it turns out that the parameters are much more sensitive to sigma, then we can reconstruct what the properties are and then compute the deviations of that with the dotted blue line. You can see the dotted blue lines almost exactly along the red lines, the direct simulation lines. Let me finish the thought. And so in this case, this relatively small number of simulations enables us to get results that are almost good as good as the direct simulation, because a lot of those direct simulations, we're simulating the same configurations that we had already simulated. I think the multiple state re-weighting is really awesome. And I have a little question about how fast the memory and computational cost grows with the number of states, because in my experience using M bar and fitting to say multiple temperatures and pressures, that the size of the matrix grows as the square of the number of states. It does. I wasn't sure how the computational cost grew because we had to converge these relative free energies of states. Yes, so that's a good question. The scaling is not fantastic with large numbers of states. However, we've got it, playing around with it, depending on the number of samples you have. And we're assuming most of these cases, maybe we only have a few thousand samples, we can scale up to hundreds of states and have it run pretty fast. But there's also two parts. One part is there's two types of states that you care about. There's states you've collected data at. And that actually scales with the number of states. That scales n squared, that goes badly. But then you have the states I want to get information at. If I want to get data at 3,000 states, that scales linearly. And so in most cases, what we hope is we're never running including hundreds of simulations of hundreds of different data points. We're including simulations at dozens of data points. And then we're predicting at larger numbers. You don't always have to include the states that are, you don't have to include the states that are very far away from where you want to test them. So good question. It turns out that mathematically, yes, you have to... And always leave out states that are early on in the optimization. If you know they're not involved, you can leave them out. Pause i Newton method. And only use the last l simulation. Yes, yes, yes, exactly. Right. So there's a good question. Yeah, this is... So this is, yeah, they're good to get into because this is an idea that's complex to get into but much more straightforward once you've entirely grasped what's going on. So it's worth spending the time on that. I have a question about, you said that the probability is going to pile up and then you'll know where you need to simulate it more. Could you explain that? Sorry, that was just saying that if you have some sort of algorithm, you have a trust region, you have an algorithm, it's driving towards the edge of the trust region. At some point it's going to say, I want to leave the trust region, you're going to run another simulation there so you can keep expanding the trust region. That's all I'm at. Thanks. If it was Bayesian, you'd visualize that as your maximum of probability would be near the edge of your trust region. Okay. So this works for real. Now the issue is, if you're doing two, three, four dimensional scans, this works well. The problem is, is if you're doing a highly multi-dimensional scan, if you're trying to optimize all of your Leonard Jones parameters at once, then you start getting a lot of data there. So, oh, sorry, trust regions. I mentioned this as well. We can detect where we're rating fails. We have pretty good, we're still working on all of the theory on this, but we have good heuristics to know when M-bar fails. And that comes down to the number of effective samples. Fundamentally, when you're calculating your expectation, what you're doing is, you collected a lot of data and you have weights depending that says how likely are you to have observed that simulation in your new parameter set. But there's a failure mode. You never get the weights to go to zero. The weights have to equal one. So what happens is, if you have no samples that look anything like samples you'd collect from a simulation at State Eye, it'll pile all the weight into the simulation that looks most like it. It'll say there's only one sample that you have from State Eye, but usually that's a pathology. You have one simulation from State Eye. You have all of the weight into the sample that looks the most like simulations from State Eye. Now, if that happens, you have very poor configuration space overlap. However, what you can do is take the weights and say about how many samples do I have? If all the weight is in one sample, you have about one sample, which is not very good for computing averages. If your weights are completely evenly spread out, you have n samples. Kish's estimator of number of effective samples, this is for normalized weights, ends up working very well as a heuristic. So where the weight is just the Boltzmann probability in State Eye divided by the sum of the Boltzmann probability of that configuration from all of the other states. And what we found is over a large number of predicted observables. If the number of effective samples is over 50, our error estimates are not good. Are good if it's over 50. If it's under 50, the error estimates are not good. So you have sort of a two-part trust interval. If your estimated uncertainty is too large, then you can't trust it anymore. If your number of effective samples is too large, you can't trust the error estimate. So it's the minimum of the two. If you're at a point that fails either of those, then you can't really trust the estimates outside of that. So if here's the example. So this is data from looking at the me potential. So me potential is used a lot in alkene simulations. We have a large range of temperatures and pressures. So it's just like letter jones, except that you have a free parameter. Instead of a 12, it could be some other number. And so what we've done here is, here we've got simulations in epsilon, epsilon, sigma, no, there we go. Epsilon, sigma and lambda. And what I have is, as you go away from our initial sample point x in, so the level curves are the number of effective samples. As you get further away from epsilon and sigma, you can see that the number of effect, and as you get further away in lambda, that's expressed by color, you get further off. If you're very close at the same lambda, then you've got 400 effective samples. Essentially, 40% of your samples look just the same as if you've been running with the difference in epsilon. As you get further out, in this case, in epsilon, you can go a lot further in epsilon and in sigma, then the number of effective samples starts to drop. And at a certain point, you see these shapes instead of being these nice ovals, which express the nice, smooth, dependent difference of properties. They start getting all wacky shaped. And if you're going to a lambda that's a much stiffer inner core, then you've got five or six effective samples from the total simulation. So, number of effective samples seems to be a very good heuristic to use for your trust region for this. Questions on that? Crap. Okay, I'll go on. Okay, so with these fast models, we can use the full machine near Bayesian inference. So if you have a relatively small number of parameters, it turns out you can do some tricky math. And if you calculate a couple of things, essentially, if you calculate these numbers, these, you can treat these as basis functions and you can calculate the energy at any parameter having just tabulated five or six numbers from each simulation. So I won't get into it because in general, that won't work for us, but it's a nice trick for simple systems. And what you can do is then essentially, you run a few simulations and you know your probability everywhere. You can do Bayesian inference. And in this case, here I'm using F, this is again data from the paper that Rich led. FN using new potential fit to phase equilibrium data. Here we're using new potential again. What you can do is, given a set of data for phase equilibrium behavior, you've got Markov chain Monte Carlo to sample the posterior of the probability of ethane models as a function of these three parameters. Two continuous parameters and one integer parameter. So you've got the visitation for each of these models. And we can turn those into base factors. We can say, of a given data set, what lambda does it support? In this case, it turns out that there's two different, for your, there's two different parameters that are equally well weighted. The data doesn't allow us to distinguish between the model. The model is just as good as describing both of those models are equally good at describing the experimental data and others are not. And I think this is sort of a microcosm of the way we want to approach a lot of these parameterization problems and these choice of model problems. Sample the posterior in the model space and use these sorts of statistical informations to make quantitative decisions about which models work and which models don't. For the people online, you do. Okay, so when your lambda parameter gets as high as 18, what happens with the thermodynamic energy conservation in these simulations? Because it becomes one of these hardwall potentials. You have to take really small time steps or the thing just slams against it. That's not a problem. They're the time steps of such that it's, yeah, Rich was careful on this, so, yeah. Anyway, so that, questions on the overall, because this gives you a sense of most, I haven't talked too much about surrogate models because in this case I could get away without using surrogate models and just use mbar. But the question's about the process so far. Again, this is not the way it is going to be exactly in the openMM property calculator infrastructure, but it's a good walkthrough of being actually used to answer these questions. Okay, so even cheaper using a surrogate model or a meta model. So if we're dealing with high dimensional spaces, even mbar, to really get good coverage of the space, mbar's gonna be enough. You need to evaluate the energies at thousands of different parameters and that does start to get slow. We've generally found that running mbar, that the process of general trajectory reevaluate the energies at several different parameters is maybe a thousand times faster. So once you have to evaluate a thousand points, then it starts being the same cost as your simulations themselves. So, I mean, to do a thousand times more, but it starts taking up a larger and larger percentage of space. So what we wanna be doing is use some sort of model for the free energy. You notice that a lot of these cases that are likely, we have very nice oval shapes, which indicates nice linear responses, nice smooth responses in terms of parameters as the properties as the parameters change. So here's a look at something else that's done in our group using this approach, looking at the hydration for energy of a Lener-Jones sphere in tip 3P water. What you see is that these properties are generally relatively smoothly varying. As long as you don't cross a phase boundary, they don't change that much for parameter parameters. These are smooth functions. We have to do a lot of work to get those smooth functions. They're not obvious smooth functions, but they're relatively smooth functions. And so we wanted to replace the smooth function of properties as a function of parameter from calculations by some multi-dimensional fitting that includes proper ranges that are uncertainties. So what we started working with is Gaussian processes, looking at other options for higher dimensions, polynomial cast expansion, neural nets of the machine learning. There's a lot of options here that we really haven't explored. And criteria, the thing is we have smooth functions. They're not, we want to be careful to really want to overfit like the property, the density is a function of segment, it does not suddenly take an upturn right at the edge just because we have a little blip in our data there. It's probably uncertainty. They're going to be smooth and we almost always have good local data because if we're to look locally, we can collect the M bar, we can get the estimates using M bar with high precision. The further away it is, the worse it is. So we have smooth data and we have good local information. How can we best use that into statistical learning? Yeah, we don't know that's a scenario of investigation. So here there's, we don't have as much data here. Here's an example where we're taking cyclohexane that has four Leonard Jones parameters and here's an example of doing Monte Carlo sampling on the posterior just looking at, I can't remember if it's just density for this density and heat of vaporization. I think it's density and heat of vaporization. And so we're sampling the four dimensions, we can't really look at the surfaces, but we can draw the marginals. You see that something, the hydrogen parameters are much less constrained by the data than the carbon parameters are, which is you'd expect that because they're being, they're smaller and they're being dragged along by the carbon parameters, but all these covariances, we can get out of this. And yet maybe start exploiting this information a little like which parameters are covaried with each other. So that's sort of the approach that we want to use. The scaling, we know it works well for lower dimensions. The scaling, we're not sure how well it works. So how does it scale to number of parameters? Gas in process may only scale to 10 to 15 dimensions and best of getting other options. We probably want to be looking at optimizing in parameter eigenspaces, learning how parameters are coupled. Many sets of parameters have low correlations together. If things are highly correlated, do we want to optimize, rather than optimizing every single carbon independently, do we want to optimize, say, a carbon and then a carbon deviation parameter depending on what its neighbors are as told by Smirks? We can decompose this into a lower dimensional manifold and optimize on that rather than optimizing every parameter independently. There's natural coordinates within the parameter set in the same way there's natural coordinates within coordinate sets. We want to take advantage of that. That'll take some thinking. Properties are generally smooth functions of ours. We want to be able to take advantage of that. Maybe we need to collect relatively little data locally because in most cases, it's a linear response. We just have single, paralyzed correlations. We can get everything. Different ways we can accelerate things. This is where we got our Jones. We can use for bond charge corrections, valence trends as well, even different functional forms using ideas like reversal or jump money, or Carlo to build evidence for different forms. I don't have any slides of that now. We've started playing around with it, but there's some great data from what Josh was using, jumping between different numbers of atom types. And so we use the same thing here to jump between different functional forms. Let me, yeah. Particular things, choices of combining rules, functional forms, atom types that you talked before. We want to automate that. We build an atom calculator or IPI to make it easily swappable for property and easily optimized. Conceptually, it's straightforward to actually getting it to work and being robust is will be hard. Correct me if I'm wrong, but if we find that properties are generally smooth functions of parameters, does that mean that our model distributions of like, does that mean like our distributions of parameter probabilities is well modeled by a Gaussian? So if it's linear, and we're using the Gaussian likelihood model for the data that the area is normally distributed. So normally distributed, if it's a linear response, if it's a nonlinear response, it'll stop being Gaussian. So we know there's cases where, yes, it behaves nice and linear, but it's smooth, but it like starts curving after a while. So, but I would not be surprised if we can take advantage of the fact that locally it's gonna look Gaussian and they'll just be gradually changing Gaussians as we move along. I mean, you have a better sense for how smooth these parameters are than we do with force balance. That's what you're already getting that information out. So we should talk about it and figure out how to take advantage of what you've learned in force balance and incorporate that to optimize how we move through these spaces. Uh, yeah. So benchmarks discussed from yesterday. So that's the end of the how we're gonna try to speed these up. There's a lot of research work to do there, but I think there's, hopefully it seems to us that there's gonna be a lot of routes for really speeding these things up. So we actually can use lots of condensed phase data, optimize many parameters at the same time. One thing you might look at is if you take the second derivative of an interaction, let's say a dimer interaction between two of the atoms and the different molecules, you can get that directly reports if you take it as a function of distance, for example, that those derivatives annihilate all terms in the force field except for that particular interaction. So you can tease out a non-bond force constant charges, et cetera from that and it might give additional information on the parameters either to start with or to compare or through there. Yeah, yeah, I guess one question is how much you have collected, how much those, even if it's paralyzed, the statistical mechanics stops being paralyzed. I'm wondering, yeah, so that's a good thing to think about. This wouldn't be parallel, but you can get it out. You're doing quantum calculations anyway. You're getting the second derivatives. You have a dimer system and you happen to be doing those. Yeah. And a monomer system, if you do the same thing, if you transform to internals, for example, the second derivative of the energy with respect to the bond eases out just the bond interaction and likewise for all the other. Right, so I guess, and this could be too simplistic a way to think about it is that there's a lot of cancellation of areas of quantum. I don't know that we do want to match exactly to the pair things. We want to match to the Gibbs free energy. And certainly there's a close correspondence, but I wonder if that's overfitting. And if we go to the bulk, it could be that there's not enough information from these bulk simulations to completely constrain the free energy. In that case, we would need to start looking at some of those interactions as well. Problem is, of course, their gas-based calculation. Right, yeah. Yeah, and I think that's the sort of thing that might fit well into benchmarks is that, hey, we're gonna fit to this data, but it better not screw up too badly this other information because we know that physics constrains that to be up to polarization changes that has to be matched as well. So, yeah. So each generation force will do values in benchmarks with decreasing coverage. Benchmarks will be publicly posted. And for the benchmarks, this is why this fits in here, the benchmarks will be run with a property calculator framework, in which you can upload and run those using the same framework. Other things in the benchmarks using some reserves testing set from training sets I think is a good idea. But then we also want a benchmark against things that are not on the training set, different properties, binding for energies, small molecule crystal structures, liquid structure factors, phase change data. I thought I'd put this in here at some point, but other things like viscosities, self-diffusion coefficients, things that we don't necessarily want to parameterize to, but there will be nice checks to see that we're not getting the physics fundamentally wrong. And I think for the particular benchmarks to use and the datasets to use, definitely want to continue conversations, particularly in the datasets and the benchmarks things. We're zeroing in on what we want to do, but I think there's still a lot of details to be worked out as we move, especially into year two. I don't know if this, I think the math of this is intriguing. I mean, basically, if you, let's say you're trying to optimize 100 parameters, but you can do these Gaussian things on 10 at a time, and you're getting essentially medium-dimensional little marginals of the full base distribution, or at least some approximation of that. And so, I don't know whether you can sort of have overlapping sets of marginals that would be useful. Yes, exactly. So I think understanding essentially the mutual covariance of all the data will help us drastically reduce the dimensionality of the fits we have to do. Right, and I guess it seems like there's two flavors of coupling. One of them is the kinds of things you've shown where you have sigma and epsilon for one. Highly couple, yes. Then you've got sigma for one atom type and sigma for another atom type, and those are only going to couple to the extent that they show up in the same data, I guess. I need to think about that more. In the same measurement. Yes, yes. For ethanol, I only have... If for ethanol oxygen, that type of oxygen, I only have, if my ethanol oxygens never see an acetamide nitrogen in any experimental data, then they can be optimized separately. Probably, I think there must be some correlation through there, because there are some correlation that feeds to the data itself. They go through other types. Right, but yeah, can you identify weak couples? What's it? They'll propagate through carbon. You're right, you're absolutely right, there will be stronger and weaker couplings, which you might be able to learn in benefit from. Right, so the great thing is, is once we feel that our surrogate models are well-fit to, you know, the data, we can do all of this stuff can be done. It's very fast to do these analyses. Well, I mean, if once you get to 100 dimensions, they're not very fast, but they're sure a heck of a lot better than running molecular simulations. Thanks. I think that's the last slide I have on this.