 All right, well, I think that's about as good as it's going to get. So good morning or good afternoon to all of us who are here. My name's Lee Ping Wong. I'm one of the Open Force Field Consortium academic members. I'm an assistant professor at UC Davis and really glad to be talking in front of such an expert crowd of scientists from academia and industry today. And mainly what I'm going to cover today is the Near Term Parameter Optimization Strategy, which uses a software I developed called Force Balance. And this slide simply refers us back to the roadmap slide that Michael Shirtz had in his roadmap presentation because Force Balance is an instance of the parameter optimizer that we are going to use. By having this talk as one of the early science talks, we'll get an opportunity to see how all of the components of the Open Force Field project are linked together. So first of all, what is Force Balance at a very high level? It is a force field optimization tool. The software is written in Python and it has a main executable called Force Balance that you run on the command line for carrying out parameter optimizations. And Force Balance has a few design goals that were decided from the outset when it was initially developed about seven years ago. Force Balance is designed for flexibility, which means that users should be able to optimize force fields using a wide variety of functional forms and reference data, which means from quantum mechanical calculations or experimental measurements, as well as the molecular mechanics simulation software that actually runs the simulation and does the force field calculations. Force Balance is also designed for reproducibility. And we all know that force field development has kind of a reproducibility problem if the process of force field development involves changing parameters manually and then running again and see how things have changed because the next person who might want to reproduce the work might not make the exact same choices as the first person. So Force Balance sets up your parameterization calculation as a single run of a command, which really allows you to systematically improve on previous work by adding more data or more physical detail to your previous runs. Force Balance was originally written using a copy, was originally licensed under a copy left license. This has now changed. It's freely available for commercial use under the three clause BSE license. And if you download the software distribution, it comes with 18 example calculations, parameter optimizations, some of which have been published in the literature, and all of the calculations come with all of the data that is needed to reproduce the results. So to begin with, I'll give you some previous example applications of Force Balance. One of the things that the software is known for is that we've developed a series of water models. And the water model that is most familiar to all of us is TIP3P, right, by the venerable model that Bill Dorgensen came up with in the early 80s. And since then, because simulation methods have changed, we've gone from Monte Carlo to molecular dynamics, and we've gone from cut-offs to particle mesh eWald, we've found that TIP3P doesn't really reproduce the thermodynamic properties of water all that accurately. And so it becomes a possible testing ground for the Force Balance method to fit the parameters of TIP3P to more closely reproduce the experimental results. And you can see in the top three slides here, this is the result of fitting all of the parameters of the TIP3P model using Force Balance to fitting the parameters to a set of experimental thermodynamic properties. I'm showing you three, but in the whole optimization, there are six of them. You can see by the curves that the TIP3P FB model, the optimized model shown in the blue, is able to more closely reproduce the experimental data shown in the black compared to the original TIP3P model, which is shown in the gray. But we also know that being able to fit the data is not the only hallmark of a good force field. You also need to be able to predict things, right? And so the standard validation of a force field is to predict the things that we already know. And some other well-known properties of water are the radial distribution function and the self-diffusion coefficient and the shear viscosity. And if you look at these properties, which were ones that we did not fit using Force Balance, we calculated these outside of Force Balance after the optimization, you can see that the validation properties are also in much improved agreement with experiment as well. And perhaps one remarkable feature is that after the parameter optimization, you can see that the parameters only changed by just a few percent. I think the parameter that changed the most was by maybe five percent, but you can see a very dramatic difference in the properties. So you can see that very small changes in the parameters give you dramatic changes in the properties. So this might not be something that you want to tweak by hand. If the computer can do it for you in a numerical optimization, well, that's a, well, why not take advantage of that? Here's another example application. So Amber FB15 is a protein force field that we actually published in 2017. And this is a protein force field that builds on the Amber 99 SB model with all of the bonded angle and dihedral parameters optimized to fit high accuracy quantum mechanical potential energy surfaces. And even though our fit was purely to quantum mechanical data, we ran some validation studies on the temperature dependence of protein structure and found some improved agreement with experiment. And perhaps this could be rationalized by comparing our optimized molecular mechanics potential energy surfaces to the quantum mechanical surfaces, which is the upper left here. So if you compare the quantum mechanical surface with the original molecular mechanics surface, which is the top middle, you can see that the original M.M. model will overestimate the energy away from the minima, which and the difference in the energies is represented in the right panel here. And this plays into the narrative that the previous generation force fields are able to reproduce your equilibrium structures, but outside of equilibrium, the performance is not as guaranteed. Whereas if you fit the parameters to a comprehensive potential energy surface, you get more accurate energies that are farther away from the minima. And that might be an explanation for better temperature dependent behavior. So in a subsequent publication, in a subsequent publication, three different protein force fields were tested to look at the free energy along a folding coordinate of a small peptide. And at the experimental melting temperature, Ambr-FB15 correctly predicts a 50-50 ratio between the folded and unfolded states. Whereas two of the previous generation force fields overestimated the population of the native state. So those two examples of a force field that force balance was used to build. And here are some more examples from the literature, updated with a few publications from this year. So the projects that force balance has been used for are now extending past my research group. So a group in the UK led by David Huggins implemented some code into force balance to fit radial distribution function data. And they used it to develop a model of water that uses the Buckingham potential. That was exciting to see. And more recently, we also developed a model for polarizable nanoporous graphene. This is a post-doc at Udong working in my group. And I'm also excited about a submitted work where force balance was used to develop a coarse-grain model that uses a hydration-free energy target. And this is, again, a collaboration with Jonathan Essex based in the UK. And in particular, force balance has been used to develop a long series of water models. And perhaps that's because it includes the data set that was initially compiled to parametrize the first of these models. I and Eba, which I'm a co-author on with John Kedura, who's also in this room. Yeah, question. I was struck by how much tip 3PFB looks like sort of a hybrid of SPCE and SPCEB, which are both very laboriously not hand-fitted, but laboriously fitted models, also which they take some different thermodynamic assumptions the way they work. Do you see any similarities between tip 4PFB and, say, tip 4P2005, tip 4PEWALT? The tip 4PFB model was optimized starting from tip 4PEWALT. And it makes the same thermodynamic assumptions as tip 4PEWALT, actually. So we add a polarization correction onto the energy of every molecule when it's in the liquid in keeping with their example. And I think the main difference between tip 4PFB and tip 4PEWALT is that tip 4PFB will give you a dielectric constant that's a little bit higher, more consistent with the experimental number because we directly fitted that number. But I should also bring up that a lot of people don't necessarily think that you should be fitting directly to the dielectric constant. So we haven't completely closed the book on that question either. OK, so and here are the characteristics of the data sets that we use to fit the parameters using force balance. And this is just to show you that the data sets are rather large and diverse according to our desire to fit the force field parameters to different kinds of data. So the water data set includes thermodynamic properties of liquids, but also experimental gas phase properties, as well as properties calculated from using ab initio methods. And the protein data set mainly involves a large number of ab initio calculations. And all of this data is available for download with the force balance distribution. And I'm going back to this figure again. I want to briefly go over the optimization workflow, which will provide context for what's about to follow. So to start a force balance optimization, you start with a force field file that contains the initial values of your parameters. And you also specify which parameters you want to optimize, which means the force field file has to be labeled in some way. So I'll get into more specifics soon. And then as you go from the bottom, you follow the cycle up to the very top. Now you use the force field to calculate properties. And the properties are calculated by automatically calling molecular simulation engines. So in our collaboration, we're going to use the open force field toolkit to set up the parametri system, followed by open MM to carry out the calculations. And then this is going to allow us to compute the objective function, which is a least squares quantity calculated from the differences between your simulated values and the reference values that you have prepared before the calculation. And the way force balance does this in a general way is that force balance will allow you to calculate the properties using different values of force field parameters simply by writing out a separate force field file with the numbers changed and reading it back in and setting up a new parametri system and re-evaluating the properties. Later on, I'm going to go on to talk about how we're going to more tightly integrate force field updates with the open force field toolkit. After you evaluate the properties, you're able to evaluate the objective function, which includes a regularization term. And regularization is another word for a penalty function that keeps your parameters from going too far away from their physically motivated initial guesses. And along with the objective function, we also have the gradient of the objective function and an approximate Hessian matrix, which allows force balance to take more efficient optimization steps. And this cycle repeats until the convergence criteria are met. OK, so a few slides on the theory. So I'm going to talk about force field parameter updates followed by the objective function of the optimization. So when you're optimizing force field parameters, one of the very first challenges we'll run into is that the numbers are all different sizes. And it depends on what molecular mechanics software you're using. So a bond force constant is maybe 100,000 or 500,000 if you're using the kilojoule per mole and nanometer system of units, whereas a bond length is on the order of 1 tenth of a nanometer. And if you just carry out the optimization directly in these numbers, it leads to a very badly conditioned Hessian matrix. So the way and furthermore, parameters will often obey complex functional relationships and constraints. The easiest example of this is that you want the charges on a molecule to sum up to the same number. You don't want the optimization to actually change the sum of the charges. And in other cases, you might want your parameters to obey certain constraints, like stay within a desired range. And how do we handle all of the different scales and the relationships between parameters that a user might want when doing a parameter optimization? This is handled in force balance by having a separation between the parameters that we actually optimize and the parameters that are written to the force field. So the parameters that are optimized are called mathematical parameters. These are the ones that the optimization algorithm actually sees. And they are fully unconstrained. And they are all designed so that they are all of the same order of magnitude, like order 1. So at least we don't expect these to be changing by 7 or 8 orders of magnitude. But 3 or 4 orders of magnitude should be 5. And the physical parameters, the ones that are closer to being actually written to the force field, these are related to the mathematical parameters by a shifting and a scaling. And what this equation here says is that in order to write out the values of the physical parameters at a given optimization step, you start with your initial physical parameters and you add on your mathematical parameters times a rescaling factor. And an example of the rescaling matrix is shown to the right here. And this rescaling matrix is a diagonal matrix where the entries are hyperparameters. These are parameters that you set at the start of the optimization. And we call these prior widths. And the reason why we call them prior widths is that if you think about this kind of optimization as finding the peak of a probability distribution, the prior width is like your prior expectation of the distribution of the probability of that parameter. So it corresponds to the size of expected changes of the parameter over the optimization or over parameters of the same type. So if you look at the top left prior width, this corresponds to a bond length because we expect it to vary by perhaps no more than 0.01 nanometers. Whereas a bond force constant might have a much larger prior width corresponding to the variation of maybe 10 to the fifth kilojoules per mole nanometer. And because the prior widths also play the role with the scaling factors, that means the mathematical parameters are all varying on order one. And then the matrix P takes care of the different scales of all of these different scales of parameter values. And typically you need to specify one prior width for each parameter type that you have. Meaning that even if you have hundreds of parameters in an optimization, you need to specify fewer than 10 prior widths because we want the number of hyper parameters to be as low as possible. Okay, so even though we now have the shifting and the scaling, that is insufficient to take care of all of the different mathematical relationships that you want your parameters to have. So how do we handle these mathematical relationships? We do it with so-called evaluated parameters, okay? Evaluated parameters are just mathematical functions of your physical parameters, but they can also be functions of each other. So I can, so the independent variables that I optimize can be shifted and scaled into these physical parameters and they can be further processed into evaluated parameters. They can be even processed more deeply in multiple layers of functions if you want to. So that ultimately the number that you actually write into the force field file that your molecular mechanics program is using can obey pretty much any mathematical relationship that you want. And that includes things like summing up to a constant restricted to within a range or even like obeying the law of cosines. So this allows you to, so I think this affords us a very high degree of flexibility to like how much we want to control the parameters that we are optimizing. Okay, so moving on to the theory of the objective function. The objective function in force balance really is a hierarchical least squares quantity. So at the root it's a least squares quantity, but there are multiple levels of sums and weights and normalizations so that at the very, so that at the end of the day you end up with a single number that you are optimizing. So at the top level you have your objective function which I call L-Tote and it's a function of your mathematical parameters K math. And you can see here that it is given by a sum over the targets where every target has it's comes with its weight plus the regularization term that also comes with its weight. Okay, in many cases it's simply sufficient to use weights of one here. And if you go a little bit deeper into a given target a target will often involve a sum over properties. So for example, one target in force balance is thermodynamic properties of liquids and there are six properties that you can calculate and fit to the experiment. So this sum will run over six properties and each property has a weight that is specific to it. Okay, and then going deeper into the individual property here is where you see the least squares objective function appear. So you now have a weighted and normalized sum over the individual data points. So for example, I may be fitting the density of a liquid over a range of temperatures and this will be a sum over maybe like 20 or 30 different temperature points. And here there's so many weights that go into the individual data points that the user pretty much never has to set these but very often there are reasonable algorithms that we need to procedurally set these weights. So I'll talk about these a little bit later. And once you have the objective function then you need an algorithm to efficiently minimize it. And because this is a least squares objective function there's a trick called the Gauss-Newton approximation that we are able to make. And that's shown in this equation here. So if you want to estimate the matrix of second derivatives or the Hessian of an objective function with respect to your force field parameters, you can first take a general second derivative and you will find that it consists of a product of the first derivatives of your properties and another term consisting of the second derivatives of your properties and the Gauss-Newton approximation simply just omits this term and that's going to give you this approximate Hessian that you can use in a Newton-Raphson type of minimization and that is what force boundless uses. So this is the equation for a parameter update where the mathematical parameters in iteration I plus one is given by the current value of the parameters times the inverse of the Hessian where we add a multiple of the identity matrix to the Hessian before inverting it. And the larger a multiple you add the smaller step you take in the closest and the closer it is to steepest descent. So the value of lambda that we choose is often is such that if you have a specified trust radius lambda is chosen so the step is within that trust radius or you can do a line search minimization over lambda in order to minimize the objective function along this curve of all of the possible regularized Newton-Raphson steps. Force boundless also implements the BFGS Hessian updating algorithm but we don't use it because this Gauss-Newton method is already quite efficient. So I'm just going to go briefly into the theory for the two most important targets that we use. So one target is the liquid thermodynamic properties. And this allows the simultaneous fitting of up to six thermodynamic properties. This includes density, heat of vaporization. They're all listed up there over a range of temperatures and pressures. And the way this objective function is evaluated is that one constant pressure simulation is run for every thermodynamic phase point. And when the simulation is run we save the trajectory frames at intervals. So you can imagine you might save 1,000 trajectory frames at picosecond intervals for a one nanosecond simulation. And for these safe trajectory frames we calculate the potential energy derivatives with respect to force field parameters using a finite difference approximation. And if you have your potential energy derivatives then you can calculate your thermodynamic property derivatives without having to run a separate simulation. So this is an important cost-saving tool which allows us to estimate the variation of the density with respect to all of our force field parameters without having to run a separate simulation for each parameter that we want to optimize. Okay, so force balance implements this fluctuation formula allowing these properties to be optimized more efficiently. And this is the other kind of target that I want to talk about. This is the ab initio energies and forces. And this really just allows the fitting of single point energies and forces to pre-computed quantum mechanical data for a set of configurations. So in this talk I'm not going to make specific comments on which level of theory or basis set is the most appropriate that will come later. It is something I've thought about very much but force balance is going to assume that you've pre-computed these values and you're just fitting the force field to reproduce these values. And when we are fitting ab initio energies there are a few special configurations. Dave Saroudi taught me a lot of this a few years ago when we talked on the phone. And so the weighting scheme that we employ here emphasizes configurations that have a low quantum mechanical energy under the hypothesis that they're going to be more thermodynamically favorable in the simulation. And also, even though what you're seeing here is a scatter plot of QM energies of x-axis and MM energies on the y-axis you can see that the scatter plots are not very symmetric. And that is because we want to avoid at all costs molecular mechanics energies actually underestimating the quantum mechanical energies because if you have that situation then your simulation is going to give you the wrong equilibrium structure which is in a sense the zeroth order or most serious error that you can run into when developing a force field. So in this scatter plot here I'm showing you a bunch of energies calculated at different configurations. The blue dots correspond to the original values of the parameters and then the orange dots correspond to the quality of fit with the current parameters. Okay, and you can see that not only are we closer to the diagonal line but we have a very small number of points that are below the diagonal line. And I think this asymmetry is important when fitting quantum mechanical single point properties. Yeah, okay. And just to briefly describe the broader capabilities there is a wide range of targets and simulation software that are supported such as binding energies, vibrational frequencies more recently we implemented surface tension and there are also multiple simulation software packages that are supported such as Amber, OpenMM, Girlmax and Tinker. More recently as of last month we have an early version of Smirnoff force field support in force balance. So you're actually able to optimize the Smirnoff parameters by running a force balance calculation now. At the moment the open eye commercial toolkit is still required until the open force field toolkit implements the RD kit support as we talked about earlier. And most of the codes that force balance implements are specific only to the target or only to the engine. So if you want to fit a particular target using a particular engine and force balance doesn't support it yet only a minimal amount of code is needed in order to enable that particular optimization in the matrix. Okay, and as a last note because these calculations can be very computationally intense they have a big cluster you might wonder, you know you don't want to just run it on your laptop you want to use the cluster resources, right? So force balance has an interface to a library called work queue which supports computationally intensive optimization jobs by evaluating individual targets or individual simulations within a target of remote computer resources. And the full disclosure here, work queue is not BST license because I didn't develop it but because this component is optional the copy left license does not infect force balance and I've actually spoken that length with the author of this package and he's happy for us to use it in this way. Okay, so that's the situation that we currently have but it's fully possible to use force balance without using work queue. And now I want to just very briefly go over some more concrete details if concrete details are helpful to wrap your head around this kind of abstract calculation. So first, this is the installation procedure if you are starting from a clean environment, okay? And this would not have been possible if I were not instructed on good packaging practices as part of this collaboration. So you can see that force balance can be installed through Conda and all of the dependencies of force balance including the open force view toolkit can be installed through Conda as well. And the last two lines here, I've asked you to clone the force balance repository from where it lives on GitHub because the code contains example calculations that you can then run, okay? So force balance is runnable without too much legwork out of the box but if you want to say use force balance with Gromax or Tinker or Amber you need to install that associated software as well. So these instructions include installing OpenMM along with the open force field toolkit. And to set up a calculation I mentioned in the very beginning that you need to specify which parameters are to be optimized. And here I'm showing you five lines that come out of a Smirnoff force field parameter file. This, the line in quotes here is a smirk string. So this is a pattern match for atoms within molecules and the values of the parameters are defined in red here. And the parts that force balance adds are in blue. So if you add an attribute called parametrize then force balance is going to recognize arm in half and epsilon are numbers that need to be optimized, okay? So parametrize is the part that force balance reads and there is something else called parameter eval which allows a parameter to be evaluated as a function of another parameter. And this is a brief overview of the directory structure of a calculation. So to run force balance you need to have two main folders the force field folder which contains your force field file and your target folders which contain your, which contain your individual data sets and associated simulation files. And I thought that for my very first Smirnoff optimization I will optimize a model of ethanol, okay? And here we have the liquid properties of ethanol. I have the, I guess I should have included water as well, right, because it's not pure ethanol. But this includes the density and heat of vaporization of ethanol for two different temperatures. I've also included a one-dimensional torsional energy profile for the ethanol molecule as I rotate the bond between the carbon and the oxygen. And then there is also an input file that force balance reads in order to read the settings of the optimization. And the setting up of the targets is always the most time-consuming part of setting up a force balance calculation. But the good part is that once a target is set up they pretty much don't need to be modified anymore as is evidenced by all of the water models that we published using basically the same set of targets. So, and this is kind of a snippet of the output of force balance as a text file. You can see that it's printing out the, it's printing out the density and the heat of vaporization. The reference values from experiment are in the third column here. The calculated values of density and heat of vaporization are in this fourth column here. It's also going to print out information about the quality of fit for the one-dimensional torsional energy profile, as well as the objective function value. Okay, so this is at the end of the, this is like before we've taken any optimization steps. And after we take the first optimization step, you get some updated property numbers that I've shown in purple here. And you don't need to stare very long at these numbers because I'm going to show you graphically that the values of the computed observables are now much closer to their reference values. And the next slide I want to show you, this is the graphic summary of that example optimization that I ran where the black curves are the reference data. So we have the density at two different temperatures experimentally measured shown in black here. And the iterations of force balance are shown in red, orange, green and blue for the initial parameters and then the first step and the second step and the third step. And you can see that the initial values of the parameters predicted density that is close to, but still has some distance from the experimental numbers. And after the first optimization step, we're already much closer to the experiment. You can see the same story in the heat of vaporization that pretty much one optimization step is all it takes to bring the agreement, to get close enough agreement to experiment to within the error of the simulation. We have about eight parameters in what we are optimizing here, including the Leonard Jones parameters. And here you can see the results for the one dimensional torsion drive that the initial Smirnoff parameters are giving the red curve compared to the black curve. And by the time you have two optimization steps, there's no more improvement in the quality of fit there. And this last here is a plot of objective function and the gradient norm showing you that both of these values have decreased significantly by the time you've taken the first optimization step. So here, this quasi Newton approach really is able to get us to a pretty decent looking minimum after just one optimization step. Okay, so in order to, I'm sorry for the strange audio, it might be the jacket that I'm wearing. Oh, sure, sure, for anyone listening in through Zoom, I'm not sure if your microphone is on. If you could turn it off, that'd be very helpful. Right, okay, yeah, I think it's better. Okay, so thanks. So, and this here is just a very short summary of the files that force balance produces. So force balance is going to produce a folder that contains all of the intermediate calculations and results. So if you really want to dive into the results of the optimization, you go into this folder and you can look at all of the associated simulation files. There is also a folder called result that contains the force field with the optimized parameter values. And there's the output file that contains all of the results that I showed you graphically as well as the, as well as a checkpoint file for restarting your optimization from where you left off. For example, you might add an extra target and see what happens to your parameters there. So for what I'm about to show you, I would like you to sort of appreciate that setting up these target folders is not very easy. And also trying to make sense of these optimization results is also not very easy. It took me about two hours to make this slide by messing with Excel, it was pretty frustrating. And so you might think that you might want to look at your optimization results more at a glance and maybe have something a little bit more interactive for setting up and running your calculations. And that was the motivation for a sub project that we started about six months ago supported by, supported by MOLSI, the Molecular Simulation Software Institute. And so my postdoc, Dr. Yudong Chiu, really led this project with support from the MOLSI fellowship. And he did basically develop the web interface for running force balance calculations. And this web interface, I just use it running locally on my laptop so you can think of it as a Jupyter notebook, although you can also imagine deploying that on a server and also connecting it to the server, okay? And so thinking about this web interface, mainly as a graphical user interface, it gives you an interactive wizard for setting up targets, which is going to make the workflow a little bit simpler because you can validate step by step to see if there's any problems with your files. It will also give you real time feedback and visualization of optimization results. But once you have a web interface, it's now difficult to reproduce the sequence of clicks and choices that a user made in setting up their calculation. So really what the web interface does is that it's still going to produce that force balance input file and directory structure and run the same optimization that the command line program is running. Okay, so and really our main, our goal over the six month project period is to get all of the fundamental components in place so that somebody not very skilled in web development like me can fill in additional features by following the examples of the existing ones. So with the help of the technology working, I would like to do a little interactive demo of this. So I'm currently in the folder containing the code that runs the force balance web server on the back end. And I'm going to run, I'm going to do run.py. And then when this runs, it's going to give me, it's gonna give me a URL to go to. So I'm going to copy it and go back to my web browser. And now I'm going to paste this URL in here. And now it brings me to the force balance manager web page which is currently blank, okay? So and it says, no project exists. Please click create projects. I'm going to click the create project button here. Let's see here. Okay, and now I can choose a name for my project. So basically a force balance optimization. I'm going to keep the default name and then press the create button. That's a, and now this brings me to the setup of the force field and the target and the optimization. And when I input a force field file, I can do this simply by navigating to the file location, going to force field. And this right now only supports a Gromax force field file. I'm going to fill in the Smirnoff support later. But once the force field file is loaded in, you can see that it is parsed. And then the values of the parameters to be optimized are displayed there for you. And next I'm going to set up a single target. So I click this create new target button and that's going to launch the wizard for creating the target. I'm going to call the target name a dimer. And this is going to be fitting ab initial properties using Gromax as the simulation engine. So I'm going to launch the target wizard now. Okay, so the target wizard is launched and it's going to bring me through a series of steps to upload the files that are needed for building the target. So if I go into the target folder here, navigate to the correct folder. And if I provide the grow file, it can check that the grow file is valid and visualize the molecules. Okay. And going to the next stage, we have to upload the quantum mechanical data which force balance requires in a format called qdata.txt. So let me try clicking on this button here. Let's see, I have a little bit of trouble clicking the button. Okay, click next here. Okay, and now it's time to upload the qdata.txt. I'll hit, I'll click this button, navigate to qdata.txt and upload it, go to the next step. And by now you've gotten the idea. So I'll go through this as quickly as possible. I'll upload the mdp file which is the Gromax to run parameter file as well as the Gromax topology file. So now that these are all uploaded, we click the validate button which tests that the target has been set up correctly. And then now I've hit the create button which means that the target is created. And next I can change the settings of the optimizer. This is similar to just editing the input file but with a web interface, you have the flexibility that it can show you which parameters are relevant for you to edit without you having to go through all of the parameters that are available. So I make some simple modifications. And then now I run the launch optimizer, okay? And then, now if I, and now if you look at the status icon on the right, the optimizer is now running. And if you click on output, you can see that the force balance output is being produced as it takes the optimization steps. So now we are on iteration two and then iteration three is going to be produced as it runs. And I think that I've set up the optimization so that it's going to converge after three iterations. So if we click on results, you can see there's a plot of the objective function there. So the basic idea here is that we just have the basic functionality of the web interface in place. And it shouldn't take a lot of expertise in web design or development to just put in the other targets that are needed and to add some extra cool visualizations and so on. So I think that will be a helpful tool for anybody who is interested in using force balance in the future. And in the time that is left, I pretty much only want to have a short discussion of the future plans, okay? So this slide is very detailed. You don't need to read all of it. And a lot of it is there just for me to basically remind myself, okay? But the way the force balance is going to be involved in this open force field effort is that it's our near-term plan for creating the next generation of the Smirnoff force field where the parameter types are all the same, but we are going to optimize the parameters themselves, okay? So the Bayesian optimization, which John Kedera is going to talk about next, that's going to be phased in as it comes online. And then the next bullet point is that in order to optimize Smirnoff, we're going to need a lot of quantum mechanical and experimental data, and managing all of this data is going to be a big challenge, okay? So the quantum mechanical and experimental data generation and organization is going to come from, like for example, the QC archive sub-project, the torsion drive sub-project and the curation of experimental data from thermal ML. And these are things that they're going to hear about in the rest of this meeting today. And parameter fitting will initially proceed in a piecewise fashion as parts of this infrastructure come online, but once we have all of the data in place, then we can carry out a fully coupled optimization fitting all of the parameters to all of the data. And the rest are some detailed descriptions of what are the remaining tasks for force balance in the next 12 months. So one part, which you're going to hear about more in tomorrow's talk, is that we need to develop a procedure for generating data to fit the bond and angle and improper dihedral parameters, okay? And the tasks specific to force balance would be, for example, the calculation and storage of optimized geometries and vibrational modes in an online database. The online database that we're using is called QC archive, which was created by Molsi, mainly Daniel Smith, who is here. And for many molecules, perhaps a vibrational analysis is not enough. You need to scan along the degrees of freedom where you expect larger displacements. So we need to detect what these, what these potentially soft degrees of freedom are and then scan along them. That's another part of the project. We need to create targets by downloading data from QC archive. So these are all kind of like reasonable tasks that need to be done in order to fit this large set of bond angle and improper parameters. And once all of this is in place, we're going to build a set of molecules and execute the workflow to generate the data and optimize the valence parameters. And I'll be doing this in collaboration with David Mobley's group and David's students. At this point, I just want to mention, give a small pitch that my postdoc who developed the web interface is in need of funding to do research for the next three months. And I was thinking that this part of the project might be a good fit for him. So if you happen to be, so if you happen to have governing board duties, I'll post a short document to that channel for you to look at shortly. And then moving on past the valence parameters of bonds, angles and improper dihedral, we're moving on to the torsions. And there's an entire talk devoted to torsions where you'll find that much progress has been made and there's still much to do. But most of the tasks that need to be done are outside of force balance right now. So that's going to be an upcoming talk on torsion and fragmentation. And there are some other parts of force balance related tasks that are a few months out. One such task is to fit Leonard Jones parameters to experimental data on a few hundred liquids. Okay, so you can imagine this is going to, this was going to take a lot of computational power and we might just barely manage it for a few iterations, right? Like it's like we're going to have to manage all of that data and get it somehow. You know, it's very hard to do a Google search for every single property you want. You can't tell how reliable the data is. So using a government-carried database such as thermal ML is going to be very helpful. The automatic, like the automation of simulation setup is going to be very important. And there's some important infrastructure components that are needed as well. Okay, so as I said before, force balance currently will modify force field by writing out a new file and reading it back in. But the open force field toolkit provides an API for directly modifying the parameters. And by taking advantage of this API, we expect that there will be significant performance improvements to come in the future. Such as, if you want to evaluate the effect of changing a parameter on the energy, in principle, you should not have to reevaluate the entire energy, including the particle mesh eval part if you're only changing a bond parameter, right? So if you change a parameter through the API, then the software might be smart enough to know that. But if you write out an entirely new force field file and set up a new parameterized system, you will just end up doing a brute force reconputation of a lot of components. So this infrastructure project is really going to help with optimizing performance. And another important property, another important optimization that we are going to do is we are going to integrate force balance with a separate property calculation tool that a subgroup of the collaboration is working on. I think this is tentatively called the property calculator. And the key groups working here are the shirts and kadera lads, as well as Simon Boothroyd, a postdoctoral fellow who is mainly working on this. And this is going to allow the use of reweighting and surrogate functions to greatly accelerate thermodynamic property estimation. Basically saying you don't have to run a separate simulation every time you want to estimate the simulated properties at different values of parameters. Going into the longer term, we're also interested in fitting the parameters of semi-empirical charge models to experimental data that's going to involve some integration with efforts going on in Mike Gilson's group, in particular Mike Schauperl and Paul Narenberg has provided a lot of meaningful help here. And this is something that I see a few more months out because there's still some fundamental research being done on the semi-empirical charge model part. And then here's some current challenges and limitations of force balance to be aware of. For example, you have seen that it's important for the user to define a lot of things, right? So that has the prior widths, the pre-determined parameter types and the target weights. So Bayesian optimization is a possible long-term solution to a lot of these important problems. And something that I'm currently grappling with and have grappled with for a few years is that it's difficult to define convergence criteria when the objective function contains statistical noise, which happens if you are simulating thermodynamic properties. Okay, so that is, and possibly with some more involved collaboration with the property calculator group, we might finally work past this very long-standing problem. So I think that's all I had for my talk and I'm interested in any questions you may have. Just a quick question on bookkeeping. So are you gonna, how are you gonna label each version of this so they know which parameter was used and what it looked like in the database? And so they know what happened in each calculations. So I think John may have a better answer than me, but the very basics of it is that when we store data in QC archive and thermal ML, is that it's going to contain important metadata or provenance information, which will let you know where that data came from. I hope that partially answered your question. Are you asking about versioning the force fields and the inputs to them? Yeah, yeah, we intend to version control all of the stuff that goes into each generation of the force field and then keep expanding that and released version force fields as well. Hi, I have a question about the objective function, like how much user control there is over the objective function? And if that comes into your cool website, GUI. I think that at the moment, we don't have much fine grained control over the objective function. Although the intent is that for liquid property targets for example, that you should be able to control the weights for individual properties, as well as individual phase points within each property. So currently, this can be done through direct editing of the files in the target folder, but the web interface currently doesn't support that. Thank you. Hi, have you thought about, put any thought into using solid state properties as target properties in particular? I mean, a lot of the molecules we deal with in pharma, we never see the pure liquid, but we can get small molecule crystal structures, we can get melting points, we can get thermodynamic solubilities, stuff like that. Very good question. We have the very rudiments of that. So you can fit the density of ice, for example, in the development of some of the water models. I included the density of various ice phases at different pressures. And maybe just going one step beyond that, it should not be difficult to implement the fitting of the lattice constants, but that's all of the thought that I have put into that problem. If there are any particular targets that are very helpful for solids, I'd be interested in adding those. It would be great to include crystal, like CCSD data, for example. Thermodynamic solubility is something that, we're increasingly having to optimize for within a series. So if we could be able to parameterize against that before we design molecules, that'd be really helpful. Right, yeah, I think that I think, and David Mobley might give a more accurate answer here because he created this free solve database of solubilities. I think we're initially looking at a validation, right? So a solvation free energy, right? Hydration, specifically. But that's different from the thermodynamic solubilities because that includes the crystal. But if there's data you could share too for public molecules, please find us. I've got a question towards overfitting. So basically to prevent overfitting in your objection function, you include some soft constraints to your initial parameters. That's what you said? That's correct. Yeah, that's what the regularizer is for, right? Yeah, mm-hmm. Is it time for a coffee break? I think so, all right. Thanks, everybody. Lee Pink, again.